SSE2 64-bit FP instructions

In the introduction we have outlined the applications that require 64-bit precision, scientific simulations and CAD/CAM being notable examples. However, the transition from normal scalar code to 64-bit floating-point SSE2 code is complex and it may require some major design…

SSE2 preview

The forthcoming Intel Pentium 4 processor (code-named Willamette)  will feature a new set of SIMD instructions that improve the capabilities of both the MMX and SSE instruction sets. The key benefits of SSE2 are that MMX instructions can work on…

MMX / iSSE latency

The following table summarizes the latencies of MMX/iSSE instructions on the Intel Pentium III and Pentium 4 processors, and on the AMD Athlon processor:     Instruction Pentium III Pentium 4 AMD Athlon MOVD mm,r32 1 2 3 MOVD r32,mm…

Intel Pentium III

The Intel P6 core, introduced with the Pentium Pro processor and used in all current Intel processors, features a RISC-like microarchitecture and an out-of-order execution unit, representing a radical shift from previous designs.  The P6’s new dynamic execution micro-architecture removes…

Detecting MMX and SSE

The MMX instructions are supported by every x86 processor introduced in the market after the venerable Intel Pentium MMX, so it should be fairly safe to assume that the processor that your code is running on has MMX instructions. But…

3DNow!

The latest trend in PC games is 3D graphics: during the past few years, almost all kinds of games have turned to 3D graphics, greatly increasing the demand of processors with strong floating-point performance, because the front end of a…

SSE Introduction

SSE Packed

The Streaming SIMD Extensions enhance the Intel x86 architecture in four ways: 8 new 128-bit SIMD floating-point registers that can be directly addressed; 50 new instructions that work on packed floating-point data; 8 new instructions designed to control cacheability of…