SIMD on x64/x86 – Page 3 – Stefano Tommesani

SIMD on x64/x86

MMX Conversion Instructions: packing, unpacking, and reordering data

April 24, 2010

There are several cases where elements of packed data may be required to be repositioned within the packed data, or the elements of two packed data operands may need to be merged. There are cases where either input or the desired output representation of a data may not be ideal…

Continue Reading
SIMD on x64/x86

MMX Comparison Instructions: building masks for packed integers

April 24, 2010

These instructions generate a mask of ones or zeros which can be used by logical operations to select elements within a register: a developer can implement a packed conditional move operation without a set of branch instructions.

Continue Reading
SIMD on x64/x86

MMX Arithmetic Instructions: Wrapping, Saturation, and Packed Multiplication

April 24, 2010

The MMX technology supports both saturating and wraparound modes. In wraparound mode, results that overflow or underflow are truncated and only the lower (least significant) bits of the result are returned. In saturation mode, results of an operation that overflow or underflow are clipped (saturated) to a data-range limit for…

Continue Reading
SIMD on x64/x86

SSE Integer Instructions: the MMX extensions introduced with SSE

April 24, 2010

Intel’s Streaming SIMD Extensions, better known as SSE, are often associated with 128-bit floating-point operations on XMM registers. That is the part of SSE most developers remember today: four single-precision floating-point values packed into one register and processed with one instruction. However, the original SSE instruction set did more than…

Continue Reading
SIMD on x64/x86

MMX Primer: Packed Integer SIMD on Early x86 CPUs

April 24, 2010

MMX was the first widely adopted SIMD instruction set on x86 processors. It was introduced by Intel in the Pentium MMX generation and later supported by AMD and other x86-compatible processors. At the time, it was a major step forward for multimedia and communications software because it allowed one instruction…

Continue Reading
SIMD on x64/x86

Programming models

April 24, 2010

Any computer, whether sequential or parallel, operates by executing instructions on data. A stream of instructions (the algorithm) tells the computer what to do at each step. A stream of data (the input to the algorithm) is affected by these instructions. A widely used classification of parallel systems, due to…

Continue Reading
SIMD on x64/x86

SSE State Management: MXCSR, FXSAVE, FXRSTOR, and FP control

April 25, 2000

SSE state management is the part of SIMD programming concerned with the processor state used by SSE floating-point instructions. Most SSE code does not need explicit state management. If you are writing ordinary code with intrinsics such as _mm_add_ps, _mm_mul_ps, _mm_loadu_ps, and _mm_storeu_ps, the compiler, operating system, and calling convention…

Continue Reading
SIMD on x64/x86

SSE Shuffle

April 25, 2000

SHUFPS is able to shuffle any of the numbers from one source operand to the lower two destination fields; the upper two destination fields are generated from a shuffle of any of the four SP FP numbers from the second source operand. By using the same register for both sources,…

Continue Reading
SIMD on x64/x86

SSE Reciprocal

April 25, 2000

A basic building block operation in geometry involves computing divisions and square roots. For instance, transformation often involves dividing each x, y, z coordinate by the W perspective coordinate; normalization is another common geometry operation, which requires the computation of 1/square-root. In order to optimize these cases, SSE introduces two…

Continue Reading
SIMD on x64/x86

SSE Logical

April 25, 2000

ANDPS returns a bitwise AND between the two operands. ANDNPS returns a bitwise AND NOT between the two operands. ORPS returns a bitwise OR between the two operands. XORPS returns a bitwise XOR between the two operands.

Continue Reading