When stored in memory the bytes, words, and doublewords in the packed data types are stored in consecutive addresses, with the least significant byte, word, or doubleword being stored in the lowest address and the more significant bytes, words, or doubleword being stored at consecutively higher addresses. The ordering of…
-
-
There are several cases where elements of packed data may be required to be repositioned within the packed data, or the elements of two packed data operands may need to be merged. There are cases where either input or the desired output representation of a data may not be ideal…
-
These instructions generate a mask of ones or zeros which can be used by logical operations to select elements within a register: a developer can implement a packed conditional move operation without a set of branch instructions.
-
The MMX technology supports both saturating and wraparound modes. In wraparound mode, results that overflow or underflow are truncated and only the lower (least significant) bits of the result are returned. In saturation mode, results of an operation that overflow or underflow are clipped (saturated) to a data-range limit for…
-
Intel’s Streaming SIMD Extensions, better known as SSE, are often associated with 128-bit floating-point operations on XMM registers. That is the part of SSE most developers remember today: four single-precision floating-point values packed into one register and processed with one instruction. However, the original SSE instruction set did more than…
-
MMX was the first widely adopted SIMD instruction set on x86 processors. It was introduced by Intel in the Pentium MMX generation and later supported by AMD and other x86-compatible processors. At the time, it was a major step forward for multimedia and communications software because it allowed one instruction…
-
Any computer, whether sequential or parallel, operates by executing instructions on data. A stream of instructions (the algorithm) tells the computer what to do at each step. A stream of data (the input to the algorithm) is affected by these instructions. A widely used classification of parallel systems, due to…
-
SSE state management is the part of SIMD programming concerned with the processor state used by SSE floating-point instructions. Most SSE code does not need explicit state management. If you are writing ordinary code with intrinsics such as _mm_add_ps, _mm_mul_ps, _mm_loadu_ps, and _mm_storeu_ps, the compiler, operating system, and calling convention…
-
SHUFPS is able to shuffle any of the numbers from one source operand to the lower two destination fields; the upper two destination fields are generated from a shuffle of any of the four SP FP numbers from the second source operand. By using the same register for both sources,…
-
A basic building block operation in geometry involves computing divisions and square roots. For instance, transformation often involves dividing each x, y, z coordinate by the W perspective coordinate; normalization is another common geometry operation, which requires the computation of 1/square-root. In order to optimize these cases, SSE introduces two…