Data referenced by a program can have temporal (data will be used again) or spatial (data will be in adjacent locations, such as the same cache line) locality, but some multimedia data types are referenced once and not reused in…
In the introduction we have outlined the applications that require 64-bit precision, scientific simulations and CAD/CAM being notable examples. However, the transition from normal scalar code to 64-bit floating-point SSE2 code is complex and it may require some major design…
The forthcoming Intel Pentium 4 processor (code-named Willamette) will feature a new set of SIMD instructions that improve the capabilities of both the MMX and SSE instruction sets. The key benefits of SSE2 are that MMX instructions can work on…
The recent arrival of the Intel Pentium 4 processor has generated the usual flurry of benchmarks and comments, most of them emphasizing that current software does not fully exploit the power of this new architecture (click here for an overview…
The following table summarizes the latencies of MMX/iSSE instructions on the Intel Pentium III and Pentium 4 processors, and on the AMD Athlon processor: Instruction Pentium III Pentium 4 AMD Athlon MOVD mm,r32 1 2 3 MOVD r32,mm…
The following table lists the Instruction Sets supported by each processor. Processor MMX Extended MMX SSE SSE2 3DNow! Intel Pentium Intel Pentium MMX Intel Pentium II …
The Intel P6 core, introduced with the Pentium Pro processor and used in all current Intel processors, features a RISC-like microarchitecture and an out-of-order execution unit, representing a radical shift from previous designs. The P6’s new dynamic execution micro-architecture removes…
The MMX instructions are supported by every x86 processor introduced in the market after the venerable Intel Pentium MMX, so it should be fairly safe to assume that the processor that your code is running on has MMX instructions. But…
The Streaming SIMD Extensions enhance the Intel x86 architecture in four ways: 8 new 128-bit SIMD floating-point registers that can be directly addressed; 50 new instructions that work on packed floating-point data; 8 new instructions designed to control cacheability of…