Stefano Tommesani

  • Increase font size
  • Default font size
  • Decrease font size
Home SIMD MMX Arithmetic

MMX Arithmetic

The MMX technology supports both saturating and wraparound modes. In wraparound mode, results that overflow or underflow are truncated and only the lower (least significant) bits of the result are returned. In saturation mode, results of an operation that overflow or underflow are clipped (saturated) to a data-range limit for the data type. The result of an operation that exceeds the range of a data type saturates to the maximum value of the range, while a result that is less than the range of a data type saturates to the minimum value of the range. This method of handling overflow and underflow is useful in many applications, such as color calculations.
 

 

PADDB mm,mm/m64
PADDW mm,mm/m64
PADDD mm,mm/m64

The PADD (Packed Add) instructions add the data elements of the source operand to the data elements of the destination register, and the result is written to the destination register. If the result exceeds the data-range limit for the data type, it wraps around. PADD support packed byte (PADDB), packed word (PADDW), and packed doubleword (PADDD) data types.

 

PADDB instruction with 64-bit operands:
DEST[7..0] ← DEST[7..0] + SRC[7..0];
* repeat add operation for 2nd through 7th byte *;
DEST[63..56] ← DEST[63..56] + SRC[63..56];

PADDW instruction with 64-bit operands:
DEST[15..0] ← DEST[15..0] + SRC[15..0];
* repeat add operation for 2nd and 3th word *;
DEST[63..48] ← DEST[63..48] + SRC[63..48];

PADDD instruction with 64-bit operands:
DEST[31..0] ← DEST[31..0] + SRC[31..0];
DEST[63..32] ← DEST[63..32] + SRC[63..32];
PADDB __m64 _mm_add_pi8(__m64 m1, __m64 m2)

PADDW __m64 _mm_addw_pi16(__m64 m1, __m64 m2)

PADDD __m64 _mm_add_pi32(__m64 m1, __m64 m2)

 

PADDSB mm, mm/m64
PADDSW mm, mm/m64
The PADDS (Packed Add with Saturation) instructions add the packed signed data elements of the source operand to the packed signed data elements of the destination operand and saturate the result. PADDS support packed byte (PADDSB) and packed word (PADDSW) data types.

 

PADDSB instruction with 64-bit operands:
DEST[7..0] ← SaturateToSignedByte(DEST[7..0] + SRC (7..0]) ;
* repeat add operation for 2nd through 7th bytes *;
DEST[63..56] ← SaturateToSignedByte(DEST[63..56] + SRC[63..56] );

PADDSW instruction with 64-bit operands
DEST[15..0] ¨ SaturateToSignedWord(DEST[15..0] + SRC[15..0] );
* repeat add operation for 2nd and 3rd words *;
DEST[63..48] ¨ SaturateToSignedWord(DEST[63..48] + SRC[63..48] );

PADDSB __m64 _mm_adds_pi8(__m64 m1, __m64 m2)

PADDSW __m64 _mm_adds_pi16(__m64 m1, __m64 m2)

 

PADDUSB mm, mm/m64
PADDUSW mm, mm/m64

The PADDUS (Packed Add Unsigned with Saturation) instructions add the packed unsigned data elements of the source operand to the packed unsigned data elements of the destination operand and saturate the results. PADDUS support packed byte (PADDUSB) and packed word (PADDUSW) data types.
PADDUSB instruction with 64-bit operands:
DEST[7..0] ← SaturateToUnsignedByte(DEST[7..0] + SRC (7..0] );
* repeat add operation for 2nd through 7th bytes *:
DEST[63..56] ← SaturateToUnsignedByte(DEST[63..56] + SRC[63..56]

PADDUSW instruction with 64-bit operands:
DEST[15..0] ¨ SaturateToUnsignedWord(DEST[15..0] + SRC[15..0] );
* repeat add operation for 2nd and 3rd words *:
DEST[63..48] ¨ SaturateToUnsignedWord(DEST[63..48] + SRC[63..48] );

PADDUSB __m64 _mm_adds_pu8(__m64 m1, __m64 m2)

PADDUSW __m64 _mm_adds_pu16(__m64 m1, __m64 m2)


 

PSUBB mm, mm/m64
PSUBW mm, mm/m64
PSUBD mm, mm/m64

The PSUB (Packed Subtract) instructions subtract the data elements of the source operand from the data elements of the destination operand. If the result is larger or smaller than the data-range limit for the data type, it wraps around. PSUB support packed byte (PSUBB), packed word (PSUBW), and packed doubleword (PSUBD) data types.

 

PSUBB instruction with 64-bit operands:
DEST[7..0] ← DEST[7..0] − SRC[7..0];
* repeat subtract operation for 2nd through 7th byte *;
DEST[63..56] ← DEST[63..56] − SRC[63..56];


PSUBW instruction with 64-bit operands:
DEST[15..0] ← DEST[15..0] − SRC[15..0];
* repeat subtract operation for 2nd and 3rd word *;
DEST[63..48] ← DEST[63..48] − SRC[63..48];


PSUBD instruction with 64-bit operands:
DEST[31..0] ← DEST[31..0] − SRC[31..0];
DEST[63..32] ← DEST[63..32] − SRC[63..32];

PSUBB __m64 _mm_sub_pi8(__m64 m1, __m64 m2)

PSUBW __m64 _mm_sub_pi16(__m64 m1, __m64 m2)

PSUBD __m64 _mm_sub_pi32(__m64 m1, __m64 m2)

 

PSUBSB mm, mm/m64
PSUBSW mm, mm/m64

The PSUBS (Packed Subtract with Saturation) instructions subtract the signed data elements of the source operand from the signed data elements of the destination operand, then the results are saturated to the limits of a signed data element and written to the destination operand. PSUBS support packed byte (PSUBSB) and  packed word (PSUBSW) data types.

 

PSUBSB instruction with 64-bit operands:
DEST[7..0] ← SaturateToSignedByte(DEST[7..0] − SRC (7..0]) ;
* repeat subtract operation for 2nd through 7th bytes *;
DEST[63..56] ← SaturateToSignedByte(DEST[63..56] − SRC[63..56] );

PSUBSW instruction with 64-bit operands
DEST[15..0] ← SaturateToSignedWord(DEST[15..0] − SRC[15..0] );
* repeat subtract operation for 2nd and 7th words *;
DEST[63..48] ← SaturateToSignedWord(DEST[63..48] − SRC[63..48] );

PSUBSB __m64 _mm_subs_pi8(__m64 m1, __m64 m2)

PSUBSW __m64 _mm_subs_pi16(__m64 m1, __m64 m2)

 

PSUBUSB mm, mm/m64
PSUBUSW mm, mm/m64
The PSUBUS (Packed Subtract Unsigned with Saturation) instructions subtract the unsigned data elements of the source operand from the unsigned data elements of the destination register, then the results are saturated to the limits of an unsigned data element and written to the destination operand. PSUBUS support packed byte (PSUBUSB) and packed word (PSUBUSW) data types.
PSUBUSB instruction with 64-bit operands:
DEST[7..0] ← SaturateToUnsignedByte(DEST[7..0] − SRC (7..0] );
* repeat add operation for 2nd through 7th bytes *:
DEST[63..56] ← SaturateToUnsignedByte(DEST[63..56] − SRC[63..56]

PSUBUSW instruction with 64-bit operands:
DEST[15..0] ← SaturateToUnsignedWord(DEST[15..0] − SRC[15..0] );
* repeat add operation for 2nd and 3rd words *:
DEST[63..48] ← SaturateToUnsignedWord(DEST[63..48] − SRC[63..48] );

PSUBUSB __m64 _mm_sub_pu8(__m64 m1, __m64 m2)

PSUBUSW __m64 _mm_sub_pu16(__m64 m1, __m64 m2)

As an example of saturated arithmetic, let us consider the absolute difference of two arrays of bytes: there are no IF statements in MMX, but it is necessary to implement the following algorithm:

if (a > b)
 then c = a – b
 else c = b – a

This algorithm can be coded using saturated substractions: subtracting a from b and b from a, a zero result and the desired absolute difference are obtained,  but since it is impossible to know which is which, the final result is achieved by ORing them together:

c = (a – b) OR (b – a)

Assuming that the MMX registers named MM0 and MM1 hold the source vectors, the following code will compute the absolute difference and store it into MM0:

MOVQ MM2, MM0 make a copy of MM0
PSUBUSB MM0, MM1 compute difference one way
PSUBUSB MM1, MM2 compute difference the other way
POR MM0, MM1 OR them together
 
 

PMULHW mm, mm/m64
PMULLW mm, mm/m64

The PMULHW (Packed Multiply High) and PMULLW (Packed Multiply Low) instructions multiply the four signed words of the source and destination operands and write the high-order or low-order 16 bits of the 32-bit intermediate results to the destination operand.

 

PMULHW instruction with 64-bit operands:
TEMP0[31-0] ← DEST[15-0] * SRC[15-0]; * Signed multiplication *
TEMP1[31-0] ← DEST[31-16] * SRC[31-16];
TEMP2[31-0] ← DEST[47-32] * SRC[47-32];
TEMP3[31-0] ← DEST[63-48] * SRC[63-48];
DEST[15-0] ← TEMP0[31-16];
DEST[31-16] ← TEMP1[31-16];
DEST[47-32] ← TEMP2[31-16];
DEST[63-48] ← TEMP3[31-16];

PMULLW instruction with 64-bit operands:
TEMP0[31-0] ← DEST[15-0] * SRC[15-0]; * Signed multiplication *
TEMP1[31-0] ← DEST[31-16] * SRC[31-16];
TEMP2[31-0] ← DEST[47-32] * SRC[47-32];
TEMP3[31-0] ← DEST[63-48] * SRC[63-48];
DEST[15-0] ← TEMP0[15-0];
DEST[31-16] ← TEMP1[15-0];
DEST[47-32] ← TEMP2[15-0];
DEST[63-48] ← TEMP3[15-0];

PMULHW __m64 _mm_mulhi_pi16 (__m64 m1, __m64 m2)

PMULLW __m64 _mm_mullo_pi16(__m64 m1, __m64 m2)

 

PMADDWD mm, mm/m64

The PMADDWD (Packed Multiply and Add) instruction multiplies the four signed words of the destination operand by the four signed words of the source operand. The two high-order words are summed and stored in the upper doubleword of the destination operand, and the two low-order words are summed and stored in the lower doubleword of the destination operand. 

 

PMADDWD instruction with 64-bit operands:
DEST[31..0] ← (DEST[15..0] * SRC[15..0]) + (DEST[31..16] * SRC[31..16]);
DEST[63..32] ← (DEST[47..32] * SRC[47..32]) + (DEST[63..48] * SRC[63..48]);
PMADDWD __m64 _mm_madd_pi16(__m64 m1, __m64 m2)

Complex multiplication is an operation which requires four multiplications and two additions, leading naturally to the use of the PMADDWD instruction. In order to use this instruction it is necessary to format the data into four 16-bit values, each holding a read or imaginary component: the constant vector can be outlined as [Re –Im Im Re].
The following code fragment multiplies the complex number stored in the MMX register MM0 by the complex constant hold in register MM1 with the pattern explained above. The real component of the complex product is given by 
Re(Data)*Re(Const) – Im(Data)*Im(Const) 
and the imaginary component of the complex product by 
Re(Data)*Im(Const) + Im(Data)*Re(Const).

PUNPCKLDQ MM0, MM0 convert the data in the [Re Im Re Im] format
PMADDWD MM0, MM1 perform the complex multiply 

Note that the output is a packed word, so a pack instruction may be used to convert the result to 16-bit, matching the format of the input.
 
 

Quote this article on your site

To create link towards this article on your website,
copy and paste the text below in your page.




Preview :

MMX Arithmetic
Saturday, 24 April 2010

Powered by QuoteThis © 2008
 
View Stefano Tommesani's profile on LinkedIn

Latest Articles

Fixing Git pull errors in SourceTree 10 April 2017, 01.44 Software
Fixing Git pull errors in SourceTree
If you encounter the following error when pulling a repository in SourceTree: VirtualAlloc pointer is null, Win32 error 487 it is due to to the Cygwin system failing to allocate a 5 MB large chunk of memory for its heap at
Castle on the hill of crappy audio quality 19 March 2017, 01.53 Audio
Castle on the hill of crappy audio quality
As the yearly dynamic range day is close (March 31st), let's have a look at one of the biggest audio massacres of the year, Ed Sheeran's "Castle on the hill". First time I heard the song, I thought my headphones just got
Necessary evil: testing private methods 29 January 2017, 21.41 Testing
Necessary evil: testing private methods
Some might say that testing private methods should be avoided because it means not testing the contract, that is the interface implemented by the class, but the internal implementation of the class itself. Still, not all
I am right and you are wrong 28 December 2016, 14.23 Web
I am right and you are wrong
Have you ever convinced anyone that disagreed with you about a deeply held belief? Better yet, have you changed your mind lately on an important topic after discussing with someone else that did not share your point of
How Commercial Insight changes R&D 06 November 2016, 01.21 Web
How Commercial Insight changes R&D
The CEB's Commercial Insight is based on three pillars: Be credible/relevant – Demonstrate an understanding of the customer’s world, substantiating claims with real-world evidence. Be frame-breaking – Disrupt the

Translate