• SIMD

    MMX Conversion

    There are several cases where elements of packed data may be required to be repositioned within the packed data, or the elements of two packed data operands may need to be merged. There are cases where either input or the desired output representation of a data may not be ideal…

  • SIMD

    MMX Comparison

    These instructions generate a mask of ones or zeros which can be used by logical operations to select elements within a register: a developer can implement a packed conditional move operation without a set of branch instructions.

  • SIMD

    MMX Arithmetic

    The MMX technology supports both saturating and wraparound modes. In wraparound mode, results that overflow or underflow are truncated and only the lower (least significant) bits of the result are returned. In saturation mode, results of an operation that overflow or underflow are clipped (saturated) to a data-range limit for…

  • SIMD

    SSE Primer

    The Intel Streaming SIMD Extensions (SSE) comprise a set of extensions to the Intel x86 architecture that is designed to greatly enhance the performance of advanced media and communication applications. In this section the SSE integer instructions that extend the MMX instruction set will be closely examined. They may be…

  • SIMD

    MMX Primer

    The MMX technology is designed to accelerate multimedia and communications applications by including new instructions and data types that allow applications to achieve a new level of performance. It exploits the parallelism inherent in many multimedia and communications algorithms, yet maintains full compatibility with existing operating systems and applications.  A…

  • SIMD

    Programming models

    Any computer, whether sequential or parallel, operates by executing instructions on data. A stream of instructions (the algorithm) tells the computer what to do at each step. A stream of data (the input to the algorithm) is affected by these instructions. A widely used classification of parallel systems, due to…

  • SIMD

    SSE State Management

    LDMXCSR loads the SSE control and status register from memory, while STMXCSR stores it to memory. FXSAVE saves FP, MMX and SSE state to memory, while FXRSTOR loads it from memory.

  • SIMD

    SSE Shuffle

    SHUFPS is able to shuffle any of the numbers from one source operand to the lower two destination fields; the upper two destination fields are generated from a shuffle of any of the four SP FP numbers from the second source operand. By using the same register for both sources,…

  • SIMD

    SSE Reciprocal

    A basic building block operation in geometry involves computing divisions and square roots. For instance, transformation often involves dividing each x, y, z coordinate by the W perspective coordinate; normalization is another common geometry operation, which requires the computation of 1/square-root. In order to optimize these cases, SSE introduces two…

  • SIMD

    SSE Logical

    ANDPS returns a bitwise AND between the two operands. ANDNPS returns a bitwise AND NOT between the two operands. ORPS returns a bitwise OR between the two operands. XORPS returns a bitwise XOR between the two operands.