• SIMD

    SSE Data Movement

    MOVAPS transfers 128 bits of packed data from memory to SIMD floating-point registers and vice versa, or between SIMD floating-point registers, while MOVUPS makes no assumption for alignment. MOVHPS transfers 64 bits of packed data from memory to the upper two fields of a SIMD floating-point register and vice versa,…

  • SIMD

    SSE Conversion

    These instructions support packed and scalar conversions between 128-bit SIMD floating-point registers and either 64-bit integer MMX registers or 32-bit integer x86 registers.  CVTPI2PS converts two 32-bit signed integers in an MMX register to the two least significant numbers of the SSE destination register. The upper two significant numbers in…

  • SIMD

    SSE Comparison

    The basic single precision FP comparison instruction is similar to existing MMX instruction variants: it produces a redundant mask per float of all 1?s or all 0?s, depending upon the result of the comparison. This approach allows the mask to be used with subsequent logic operations (AND, ANDN, OR, XOR)…

  • SIMD

    SSE Cacheability Control

    Data referenced by a program can have temporal (data will be used again) or spatial (data will be in adjacent locations, such as the same cache line) locality, but some multimedia data types are referenced once and not reused in the immediate future (called non-temporal data). Thus, non-temporal data should…

  • SIMD

    SSE2 64-bit FP instructions

    In the introduction we have outlined the applications that require 64-bit precision, scientific simulations and CAD/CAM being notable examples. However, the transition from normal scalar code to 64-bit floating-point SSE2 code is complex and it may require some major design changes. A more conservative approach would be moving to scalar…

  • SIMD

    SSE2 preview

    The forthcoming Intel Pentium 4 processor (code-named Willamette)  will feature a new set of SIMD instructions that improve the capabilities of both the MMX and SSE instruction sets. The key benefits of SSE2 are that MMX instructions can work on 128-bit data blocks, and that SSE instructions now support 64-bit…

  • SIMD

    MMX Performance on Intel Pentium 4

    The recent arrival of the Intel Pentium 4 processor has generated the usual flurry of benchmarks and comments, most of them emphasizing that current software does not fully exploit the power of this new architecture (click here for an overview of the SSE2 instruction set). However, until the Pentium 4…

  • SIMD

    MMX / iSSE latency

    The following table summarizes the latencies of MMX/iSSE instructions on the Intel Pentium III and Pentium 4 processors, and on the AMD Athlon processor:     Instruction Pentium III Pentium 4 AMD Athlon MOVD mm,r32 1 2 3 MOVD r32,mm 1 5 5 MOVQ mm,mm 1 6 2 PACKSSWB /…

  • SIMD

    Map of Instruction sets / CPU

    The following table lists the Instruction Sets supported by each processor.   Processor MMX Extended MMX SSE SSE2 3DNow! Intel Pentium           Intel Pentium MMX         Intel Pentium II         Intel Celeron         Intel Pentium III  …

  • SIMD

    Intel Pentium III

    The Intel P6 core, introduced with the Pentium Pro processor and used in all current Intel processors, features a RISC-like microarchitecture and an out-of-order execution unit, representing a radical shift from previous designs.  The P6’s new dynamic execution micro-architecture removes the constraint of linear instruction sequencing between the traditional fetch…