Stefano Tommesani

SIMD

MMX Conversion

April 24, 2010

There are several cases where elements of packed data may be required to be repositioned within the packed data, or the elements of two packed data operands may need to be merged. There are cases where either input or the desired output representation of a data may not be ideal…

Continue Reading
SIMD

MMX Comparison

April 24, 2010

These instructions generate a mask of ones or zeros which can be used by logical operations to select elements within a register: a developer can implement a packed conditional move operation without a set of branch instructions.

Continue Reading
SIMD

MMX Arithmetic

April 24, 2010

The MMX technology supports both saturating and wraparound modes. In wraparound mode, results that overflow or underflow are truncated and only the lower (least significant) bits of the result are returned. In saturation mode, results of an operation that overflow or underflow are clipped (saturated) to a data-range limit for…

Continue Reading
SIMD

SSE Primer

April 24, 2010

The Intel Streaming SIMD Extensions (SSE) comprise a set of extensions to the Intel x86 architecture that is designed to greatly enhance the performance of advanced media and communication applications. In this section the SSE integer instructions that extend the MMX instruction set will be closely examined. They may be…

Continue Reading
SIMD

MMX Primer

April 24, 2010

The MMX technology is designed to accelerate multimedia and communications applications by including new instructions and data types that allow applications to achieve a new level of performance. It exploits the parallelism inherent in many multimedia and communications algorithms, yet maintains full compatibility with existing operating systems and applications. A…

Continue Reading
SIMD

Programming models

April 24, 2010

Any computer, whether sequential or parallel, operates by executing instructions on data. A stream of instructions (the algorithm) tells the computer what to do at each step. A stream of data (the input to the algorithm) is affected by these instructions. A widely used classification of parallel systems, due to…

Continue Reading
SIMD

SSE State Management

April 25, 2000

LDMXCSR loads the SSE control and status register from memory, while STMXCSR stores it to memory. FXSAVE saves FP, MMX and SSE state to memory, while FXRSTOR loads it from memory.

Continue Reading
SIMD

SSE Shuffle

April 25, 2000

SHUFPS is able to shuffle any of the numbers from one source operand to the lower two destination fields; the upper two destination fields are generated from a shuffle of any of the four SP FP numbers from the second source operand. By using the same register for both sources,…

Continue Reading
SIMD

SSE Reciprocal

April 25, 2000

A basic building block operation in geometry involves computing divisions and square roots. For instance, transformation often involves dividing each x, y, z coordinate by the W perspective coordinate; normalization is another common geometry operation, which requires the computation of 1/square-root. In order to optimize these cases, SSE introduces two…

Continue Reading
SIMD

SSE Logical

April 25, 2000

ANDPS returns a bitwise AND between the two operands. ANDNPS returns a bitwise AND NOT between the two operands. ORPS returns a bitwise OR between the two operands. XORPS returns a bitwise XOR between the two operands.

Continue Reading