Stefano Tommesani

  • Increase font size
  • Default font size
  • Decrease font size
Home Programming MMX Primer

MMX Primer

The MMX technology is designed to accelerate multimedia and communications applications by including new instructions and data types that allow applications to achieve a new level of performance. It exploits the parallelism inherent in many multimedia and communications algorithms, yet maintains full compatibility with existing operating systems and applications. 
A wide range of software applications, including graphics, MPEG video, music synthesis, speech compression and recognition, image processing, games, video conferencing and more, shows many common, fundamental characteristics: 

  • small integer data types (for example: 8-bit pixels, 16-bit audio samples) 
  • small, highly repetitive loops 
  • frequent multiplies and accumulates 
  • compute-intensive algorithms 
  • highly parallel operations 

The MMX technology is designed as a set of general purpose integer instructions that can be applied to the needs of the wide diversity of multimedia and communications applications. The highlights of the technology are:

  • Single Instruction, Multiple Data (SIMD) technique 
  • 57 new instructions 
  • 8 64-bit wide MMX registers, named mm0 up to mm7
  • 4 new data types 

MMX technology introduces four new data types: three packed data types (bytes, words and doublewords, respectively being 8, 16 and 32 bits wide for each data element) and a new 64-bit entity. Each element within the packed data types is an independent fixed-point integer. The architecture does not specify the place of the fixed point within the elements, because it is up to the developer the control of its place within each element throughout the calculation. This adds a burden on the developer, but it also leaves a large amount of flexibility to choose and change the precision of fixed-point numbers during the course of the application in order to fully control the dynamic range of values.
The four MMX technology data types are: 

  • Packed byte -- 8 bytes packed into one 64-bit quantity 
  • Packed word -- 4 16-bit words packed into one 64-bit quantity 
  • Packed doubleword – 2 32-bit double words packed into one 64-bit quantity 
  • Quadword -- one 64-bit quantity 

SIMD additionAs an example, graphics pixel data are generally represented in 8-bit integers, or bytes. With MMX technology, eight of these pixels are packed together in a 64-bit quantity and moved into an MMX register; when an MMX instruction executes, it takes all eight of the pixel values at once from the MMX register, performs the arithmetic or logical operation on all eight elements in parallel, and writes the result into an MMX register. The degree of parallelism that can be achieved with the MMX technology depends on the size of data, ranging from 8 when using 8-bit data to 1, i.e. no parallelism, when using 64-bit data.
Aliasing of MMX over FPThe MMX technology is integrated into Intel x86 architecture in a way that maintains full compatibility with existing operating systems. This is obtained by aliasing MMX registers and state upon the x86 floating-point registers and state. Therefore, no new registers or states are added to support MMX technology, so that the operating system uses the standard mechanisms for interacting with the floating point state to save and restore MMX code: floating-point instructions that save/restore the floating-point state also handle the MMX state (for example, during context switching).
Aliasing the MMX state upon the floating-point state does not preclude applications from executing both MMX routines and floating point routines, but the developer cannot freely interleave MMX and floating point instructions, and he must insert an EMMS instruction before switching between MMX and floating point code sequences.
 

 

2. Instruction set

The MMX instructions cover several functional areas including: 

  • basic arithmetic operations such as add, subtract, multiply, arithmetic shift and multiply-add 
  • comparison operations 
  • conversion instructions to convert between the new data types: pack data together, and unpack from small to larger data types 
  • logical operations such as AND, AND NOT,OR, and XOR 
  • shift operations 
  • data transfer instructions for MMX register-to-register transfers, or 64-bit and 32-bit load/store to memory 
  • state management instruction to handle MMX to floating point transitions

Arithmetic, comparison and shift instructions are designed to support the different packed integer data types: these instructions have a different opcode for each supported data type. As a result, the MMX technology instructions are implemented with 57 opcodes.
All MMX instructions, except the EMMS instruction, reference and operate on two operands: the source and the destination operand. The first operand is the destination and the second operand is the source. The instruction overwrites the destination operand with the result. For example, a two-operand instruction 

OPERATION DEST, SRC

would be decoded as:

DEST = DEST OPERATION SRC

A typical MMX instruction has this syntax: 

  • Prefix: P for Packed 
  • Instruction operation: for example - ADD, CMP, or XOR 
  • Suffix
    • US for Unsigned Saturation 
    • S for Signed saturation 
    • B, W, D, Q for the data type: packed byte, packed word, packed doubleword, or quadword.

As an example, PADDSB is a MMX instruction (P) that sums (ADD) the 8 bytes (B) of the source and destination operands and saturates the result (S).
Instructions that have different input and output data elements have two data-type suffixes: for example, the conversion instruction converts from one data type to another, so it has two suffixes, one for the original data type and the second for the converted data type.
The next pages describe in depth the full set of MMX instructions, grouped by functional areas. The box on the right side representes the syntax of that instruction; here is a list of the symbols used to represent operands in the instruction statements: 

  • imm8: an immediate byte value, imm8 is a signed number between -128 and +127 inclusive.
  • r/m32: a doubleword register or memory operand used for instructions whose operand-size attribute is 32 bits. 
  • mm/m32: indicates the lowest 32 bits of an MMX register or a 32-bit memory location.
  • mm/m64: indicates a 64-bit MMX register or a 64-bit memory location.

As an example, 
OP mm, mm/m64
means that the destination operand of the OP instruction is an MMX register, while the source operand can either be an MMX register or a 64-bit memory operand.

 

3. Examples and benchmarks

The Intel MMX Application Notes offer a wide overview of the benefits achievable by using MMX instructions. All performance data was extracted from Application Notes, and it generally refers to the Pentium MMX microarchitecture.
Before starting to code in assembly for MMX, you should take a look at Quexal, the visual development environment for MMX and ISSE coding that will make your life a lot easier!
Here is a list of currently available Application Notes, grouped by arguments. The column on the right shows the speed-up obtained moving from scalar C code to MMX code.

     
     
    Title Speed-up
    Audio
    Audio Echo Effects 5.9x
    MPEG1 Audio Kernels
    G.728 Code Book Search 2.7x
    Levinson-Durbin Filter
    Schur-Weiner Filter
    Communications
    Passband Echo Canceller
    Baseband Echo Canceller
    1/3 T Equalizer
    2/3 T Spaced Equalizer
    DSP Kernels
    Efficient Vector/Matrix Multiply Routine 14.6x
    Matrix Transpose 2x
    Real 16-bit FFT
    Dot Product - 16x16 -> 32 5x
    Real FIR - 16 bit 5x
    Vector Arithmetic and Logic Operations 6x
    High Precision Multiply
    Data Alignment
    Graphics (2D)
    Fractals with MMX Technology 1.5x
    Sprite Overlay
    Graphics (3D)
    Advanced Procedural Texturing 10x
    AGP and 3D Graphics Software
    MMX Technology for 3D Rendering
    3D Bilinear Texture Mapping 7x
    Gourand Shading
    3D Transform 3.1x
    Image Processing
    YUV12 to RGB Color Conversion
    2X 8-bit Image Scaling 13.5x
    Bilinear Interpolation 3.9x
    Median Filter 3.8x
    Row Filter - 8 bit
    Column Filter
    Alpha Blending 8x
    24 to 16 bit Conversion
    RGB -> YUV > 10x
    Speech Recognition
    Viterbi Decoding 2x
    L1 Distance Measure 3.3x
    L2 Norm Distance Measure 7.3x
    Video
    IDCT 2D 8x8 3.5x
    Motion Compensation
    Absolute Difference 5x
    Haar Transform - 2x2 2.2x
    Get Bits 2.4x
    Video Loop Filter 1.9

     

Quote this article on your site

To create link towards this article on your website,
copy and paste the text below in your page.




Preview :

MMX Primer
Saturday, 24 April 2010

Powered by QuoteThis © 2008
 
View Stefano Tommesani's profile on LinkedIn

Latest Articles

Castle on the hill of crappy audio quality 19 March 2017, 01.53 Audio
Castle on the hill of crappy audio quality
As the yearly dynamic range day is close (March 31st), let's have a look at one of the biggest audio massacres of the year, Ed Sheeran's "Castle on the hill". First time I heard the song, I thought my headphones just got
Necessary evil: testing private methods 29 January 2017, 21.41 Testing
Necessary evil: testing private methods
Some might say that testing private methods should be avoided because it means not testing the contract, that is the interface implemented by the class, but the internal implementation of the class itself. Still, not all
I am right and you are wrong 28 December 2016, 14.23 Web
I am right and you are wrong
Have you ever convinced anyone that disagreed with you about a deeply held belief? Better yet, have you changed your mind lately on an important topic after discussing with someone else that did not share your point of
How Commercial Insight changes R&D 06 November 2016, 01.21 Web
How Commercial Insight changes R&D
The CEB's Commercial Insight is based on three pillars: Be credible/relevant – Demonstrate an understanding of the customer’s world, substantiating claims with real-world evidence. Be frame-breaking – Disrupt the
Windows Forms smells funny, but... 07 April 2016, 15.38 Software
Windows Forms smells funny, but...
In the "2016 .NET Community Report" just released by Telerik, the answers to the question "What technology would you choose if building for Windows Desktop?" were as follows: So roughly half of new desktop developments would

Translate