Stefano Tommesani

  • Increase font size
  • Default font size
  • Decrease font size
Home Programming MMX Examples

MMX Examples

This section describes example uses of the MMX instruction set to implement basic coding structures.

Conditional Select

Operating on multiple data operands using a single instruction presents an interesting issue: what happens when a computation is only done if the operand value passes some conditional check? For example, in an absolute value calculation, only if the number is already negative a 2’s complement is performed on it:

for i = 1 to 100
     if a[i] < 0 
       then b[i] = - a[i] 
       else b[i] = a[i]

There are different approaches possible, and some are simpler than others. Using a branch approach does not work well for two reasons: a branch-based solution is slower because of the inherent branch misprediction penalty, and because of the need to convert packed data types to scalars. Direct conditional execution support does not work well for the x86 IA since it requires three independent operands (source, source/destination, and predicate vector).
The MMX technology adopts a simpler design: a conditional execution is converted into a conditional assignment. MMX compare operations result in a bit mask corresponding to the length of the operands: for example, a compare operation operating on packed byte operands produce byte-wide masks. These masks then can be used in conjunction with logical operations to achieve conditional assignment. Consider the following example:

If True
    then Ra := Rb 
    else Ra := Rc

Assuming that register Rx contains all 1’s if the condition is true and all 0’s if the condition is false, Ra can be computed with the following logical expression:

Ra = (Rb AND Rx) OR (Rc ANDNOT Rx)

This approach works for operations with a register as the destination. Conditional assignment to memory can be implemented as a sequence of load, conditional assignment, and store. 
The Chroma Keying example demonstrates how conditional selection using the MMX instruction set removes branches, in addition to performing multiple selection operations in parallel. Text overlay on a pix/video background, and sprite overlays in games are some of the other operations that would benefit from this technique.
Most have seen the television weather man overlaid on the image of a weather map. In this example a blue screen is used to overlay an image of a woman on a background picture.
PCMPEQ (packed compare for equality) is performed on the weathercaster and blue-screen images, yielding a bitmask that traces the outline of the weathercaster.

This bitmask image is PANDNed (packed and not) with the weathercaster image, yielding the first intermediate image: now the weathercaster has no background behind her.

The same bitmask image is PANDed (packed and) with the weather map image, yielding the second intermediate image.

The two intermediate images are PORed (packed or) together, resulting in final composite of the weathercaster over weather map

 

Vector Dot Product

The vector dot product is one of the most basic algorithms used in signal-processing of multimedia data such as images, audio, video and sound. The following example shows how the PMADD instruction helps speed up algorithms using vector dot products. 
The PMADD instruction handles four multiplies and two additions at a time: it starts from a 16-bit, packed data type and generates a 32-bit packed, data type result, then it multiplies all the corresponding elements generating four 32-bit results, and adds the two products on the left together for one result and the two products on the right together for the other result. To complete a multiply-accumulate operation, the results would then be added to another register which is used as the accumulator.
Assuming that the precision supported by the PMADD instruction is sufficient, this dot-product example on eight-element vectors can be completed using eight MMX instructions: 2 PMADDs, 2 more PADDs, 2 shifts (if needed to fix the precision after the multiply operation), and 2 memory moves to load one of the vectors (the other vector is loaded by the PMADD instruction which can have one of its operands come from memory).

Comparing instruction counts with and without MMX technology for this operation yields that only one third of the number of instructions is needed with MMX.

Quote this article on your site

To create link towards this article on your website,
copy and paste the text below in your page.




Preview :

MMX Examples
Saturday, 24 April 2010

Powered by QuoteThis © 2008
 
View Stefano Tommesani's profile on LinkedIn

Latest Articles

Fixing Git pull errors in SourceTree 10 April 2017, 01.44 Software
Fixing Git pull errors in SourceTree
If you encounter the following error when pulling a repository in SourceTree: VirtualAlloc pointer is null, Win32 error 487 it is due to to the Cygwin system failing to allocate a 5 MB large chunk of memory for its heap at
Castle on the hill of crappy audio quality 19 March 2017, 01.53 Audio
Castle on the hill of crappy audio quality
As the yearly dynamic range day is close (March 31st), let's have a look at one of the biggest audio massacres of the year, Ed Sheeran's "Castle on the hill". First time I heard the song, I thought my headphones just got
Necessary evil: testing private methods 29 January 2017, 21.41 Testing
Necessary evil: testing private methods
Some might say that testing private methods should be avoided because it means not testing the contract, that is the interface implemented by the class, but the internal implementation of the class itself. Still, not all
I am right and you are wrong 28 December 2016, 14.23 Web
I am right and you are wrong
Have you ever convinced anyone that disagreed with you about a deeply held belief? Better yet, have you changed your mind lately on an important topic after discussing with someone else that did not share your point of
How Commercial Insight changes R&D 06 November 2016, 01.21 Web
How Commercial Insight changes R&D
The CEB's Commercial Insight is based on three pillars: Be credible/relevant – Demonstrate an understanding of the customer’s world, substantiating claims with real-world evidence. Be frame-breaking – Disrupt the

Translate