The Intel Streaming SIMD Extensions (SSE) comprise a set of extensions to the Intel x86 architecture that is designed to greatly enhance the performance of advanced media and communication applications. In this section the SSE integer instructions that extend the…
The MMX technology is designed to accelerate multimedia and communications applications by including new instructions and data types that allow applications to achieve a new level of performance. It exploits the parallelism inherent in many multimedia and communications algorithms, yet…
Any computer, whether sequential or parallel, operates by executing instructions on data. A stream of instructions (the algorithm) tells the computer what to do at each step. A stream of data (the input to the algorithm) is affected by these…
LDMXCSR loads the SSE control and status register from memory, while STMXCSR stores it to memory. FXSAVE saves FP, MMX and SSE state to memory, while FXRSTOR loads it from memory.
SHUFPS is able to shuffle any of the numbers from one source operand to the lower two destination fields; the upper two destination fields are generated from a shuffle of any of the four SP FP numbers from the second…
A basic building block operation in geometry involves computing divisions and square roots. For instance, transformation often involves dividing each x, y, z coordinate by the W perspective coordinate; normalization is another common geometry operation, which requires the computation of…
ANDPS returns a bitwise AND between the two operands. ANDNPS returns a bitwise AND NOT between the two operands. ORPS returns a bitwise OR between the two operands. XORPS returns a bitwise XOR between the two operands.
MOVAPS transfers 128 bits of packed data from memory to SIMD floating-point registers and vice versa, or between SIMD floating-point registers, while MOVUPS makes no assumption for alignment. MOVHPS transfers 64 bits of packed data from memory to the upper…
These instructions support packed and scalar conversions between 128-bit SIMD floating-point registers and either 64-bit integer MMX registers or 32-bit integer x86 registers. CVTPI2PS converts two 32-bit signed integers in an MMX register to the two least significant numbers of…
The basic single precision FP comparison instruction is similar to existing MMX instruction variants: it produces a redundant mask per float of all 1?s or all 0?s, depending upon the result of the comparison. This approach allows the mask to…