The MMX technology is designed to accelerate multimedia and communications applications by including new instructions and data types that allow applications to achieve a new level of performance. It exploits the parallelism inherent in many multimedia and communications algorithms, yet maintains full compatibility with existing operating systems and applications.
A wide range of software applications, including graphics, MPEG video, music synthesis, speech compression and recognition, image processing, games, video conferencing and more, shows many common, fundamental characteristics:
- small integer data types (for example: 8-bit pixels, 16-bit audio samples)
- small, highly repetitive loops
- frequent multiplies and accumulates
- compute-intensive algorithms
- highly parallel operations
The MMX technology is designed as a set of general purpose integer instructions that can be applied to the needs of the wide diversity of multimedia and communications applications. The highlights of the technology are:
- Single Instruction, Multiple Data (SIMD) technique
- 57 new instructions
- 8 64-bit wide MMX registers, named mm0 up to mm7
- 4 new data types
MMX technology introduces four new data types: three packed data types (bytes, words and doublewords, respectively being 8, 16 and 32 bits wide for each data element) and a new 64-bit entity. Each element within the packed data types is an independent fixed-point integer. The architecture does not specify the place of the fixed point within the elements, because it is up to the developer the control of its place within each element throughout the calculation. This adds a burden on the developer, but it also leaves a large amount of flexibility to choose and change the precision of fixed-point numbers during the course of the application in order to fully control the dynamic range of values.
The four MMX technology data types are:
- Packed byte -- 8 bytes packed into one 64-bit quantity
- Packed word -- 4 16-bit words packed into one 64-bit quantity
- Packed doubleword – 2 32-bit double words packed into one 64-bit quantity
- Quadword -- one 64-bit quantity
As an example, graphics pixel data are generally represented in 8-bit integers, or bytes. With MMX technology, eight of these pixels are packed together in a 64-bit quantity and moved into an MMX register; when an MMX instruction executes, it takes all eight of the pixel values at once from the MMX register, performs the arithmetic or logical operation on all eight elements in parallel, and writes the result into an MMX register. The degree of parallelism that can be achieved with the MMX technology depends on the size of data, ranging from 8 when using 8-bit data to 1, i.e. no parallelism, when using 64-bit data.
The MMX technology is integrated into Intel x86 architecture in a way that maintains full compatibility with existing operating systems. This is obtained by aliasing MMX registers and state upon the x86 floating-point registers and state. Therefore, no new registers or states are added to support MMX technology, so that the operating system uses the standard mechanisms for interacting with the floating point state to save and restore MMX code: floating-point instructions that save/restore the floating-point state also handle the MMX state (for example, during context switching).
Aliasing the MMX state upon the floating-point state does not preclude applications from executing both MMX routines and floating point routines, but the developer cannot freely interleave MMX and floating point instructions, and he must insert an EMMS instruction before switching between MMX and floating point code sequences.
The MMX instructions cover several functional areas including:
- basic arithmetic operations such as add, subtract, multiply, arithmetic shift and multiply-add
- comparison operations
- conversion instructions to convert between the new data types: pack data together, and unpack from small to larger data types
- logical operations such as AND, AND NOT,OR, and XOR
- shift operations
- data transfer instructions for MMX register-to-register transfers, or 64-bit and 32-bit load/store to memory
- state management instruction to handle MMX to floating point transitions
Arithmetic, comparison and shift instructions are designed to support the different packed integer data types: these instructions have a different opcode for each supported data type. As a result, the MMX technology instructions are implemented with 57 opcodes.
All MMX instructions, except the EMMS instruction, reference and operate on two operands: the source and the destination operand. The first operand is the destination and the second operand is the source. The instruction overwrites the destination operand with the result. For example, a two-operand instruction
OPERATION DEST, SRC
would be decoded as:
DEST = DEST OPERATION SRC
A typical MMX instruction has this syntax:
- Prefix: P for Packed
- Instruction operation: for example - ADD, CMP, or XOR
- US for Unsigned Saturation
- S for Signed saturation
- B, W, D, Q for the data type: packed byte, packed word, packed doubleword, or quadword.
As an example, PADDSB is a MMX instruction (P) that sums (ADD) the 8 bytes (B) of the source and destination operands and saturates the result (S).
Instructions that have different input and output data elements have two data-type suffixes: for example, the conversion instruction converts from one data type to another, so it has two suffixes, one for the original data type and the second for the converted data type.
The next pages describe in depth the full set of MMX instructions, grouped by functional areas. The box on the right side representes the syntax of that instruction; here is a list of the symbols used to represent operands in the instruction statements:
- imm8: an immediate byte value, imm8 is a signed number between -128 and +127 inclusive.
- r/m32: a doubleword register or memory operand used for instructions whose operand-size attribute is 32 bits.
- mm/m32: indicates the lowest 32 bits of an MMX register or a 32-bit memory location.
- mm/m64: indicates a 64-bit MMX register or a 64-bit memory location.
As an example,
OP mm, mm/m64
means that the destination operand of the OP instruction is an MMX register, while the source operand can either be an MMX register or a 64-bit memory operand.
The Intel MMX Application Notes offer a wide overview of the benefits achievable by using MMX instructions. All performance data was extracted from Application Notes, and it generally refers to the Pentium MMX microarchitecture.
Before starting to code in assembly for MMX, you should take a look at Quexal, the visual development environment for MMX and ISSE coding that will make your life a lot easier!
Here is a list of currently available Application Notes, grouped by arguments. The column on the right shows the speed-up obtained moving from scalar C code to MMX code.