Stefano Tommesani

  • Increase font size
  • Default font size
  • Decrease font size
Home Programming MMX Arithmetic

MMX Arithmetic

Hits

The MMX technology supports both saturating and wraparound modes. In wraparound mode, results that overflow or underflow are truncated and only the lower (least significant) bits of the result are returned. In saturation mode, results of an operation that overflow or underflow are clipped (saturated) to a data-range limit for the data type. The result of an operation that exceeds the range of a data type saturates to the maximum value of the range, while a result that is less than the range of a data type saturates to the minimum value of the range. This method of handling overflow and underflow is useful in many applications, such as color calculations.
 

 

PADDB mm,mm/m64
PADDW mm,mm/m64
PADDD mm,mm/m64

The PADD (Packed Add) instructions add the data elements of the source operand to the data elements of the destination register, and the result is written to the destination register. If the result exceeds the data-range limit for the data type, it wraps around. PADD support packed byte (PADDB), packed word (PADDW), and packed doubleword (PADDD) data types.

 

PADDB instruction with 64-bit operands:
DEST[7..0] ← DEST[7..0] + SRC[7..0];
* repeat add operation for 2nd through 7th byte *;
DEST[63..56] ← DEST[63..56] + SRC[63..56];

PADDW instruction with 64-bit operands:
DEST[15..0] ← DEST[15..0] + SRC[15..0];
* repeat add operation for 2nd and 3th word *;
DEST[63..48] ← DEST[63..48] + SRC[63..48];

PADDD instruction with 64-bit operands:
DEST[31..0] ← DEST[31..0] + SRC[31..0];
DEST[63..32] ← DEST[63..32] + SRC[63..32];
PADDB __m64 _mm_add_pi8(__m64 m1, __m64 m2)

PADDW __m64 _mm_addw_pi16(__m64 m1, __m64 m2)

PADDD __m64 _mm_add_pi32(__m64 m1, __m64 m2)

 

PADDSB mm, mm/m64
PADDSW mm, mm/m64
The PADDS (Packed Add with Saturation) instructions add the packed signed data elements of the source operand to the packed signed data elements of the destination operand and saturate the result. PADDS support packed byte (PADDSB) and packed word (PADDSW) data types.

 

PADDSB instruction with 64-bit operands:
DEST[7..0] ← SaturateToSignedByte(DEST[7..0] + SRC (7..0]) ;
* repeat add operation for 2nd through 7th bytes *;
DEST[63..56] ← SaturateToSignedByte(DEST[63..56] + SRC[63..56] );

PADDSW instruction with 64-bit operands
DEST[15..0] ¨ SaturateToSignedWord(DEST[15..0] + SRC[15..0] );
* repeat add operation for 2nd and 3rd words *;
DEST[63..48] ¨ SaturateToSignedWord(DEST[63..48] + SRC[63..48] );

PADDSB __m64 _mm_adds_pi8(__m64 m1, __m64 m2)

PADDSW __m64 _mm_adds_pi16(__m64 m1, __m64 m2)

 

PADDUSB mm, mm/m64
PADDUSW mm, mm/m64

The PADDUS (Packed Add Unsigned with Saturation) instructions add the packed unsigned data elements of the source operand to the packed unsigned data elements of the destination operand and saturate the results. PADDUS support packed byte (PADDUSB) and packed word (PADDUSW) data types.
PADDUSB instruction with 64-bit operands:
DEST[7..0] ← SaturateToUnsignedByte(DEST[7..0] + SRC (7..0] );
* repeat add operation for 2nd through 7th bytes *:
DEST[63..56] ← SaturateToUnsignedByte(DEST[63..56] + SRC[63..56]

PADDUSW instruction with 64-bit operands:
DEST[15..0] ¨ SaturateToUnsignedWord(DEST[15..0] + SRC[15..0] );
* repeat add operation for 2nd and 3rd words *:
DEST[63..48] ¨ SaturateToUnsignedWord(DEST[63..48] + SRC[63..48] );

PADDUSB __m64 _mm_adds_pu8(__m64 m1, __m64 m2)

PADDUSW __m64 _mm_adds_pu16(__m64 m1, __m64 m2)


 

PSUBB mm, mm/m64
PSUBW mm, mm/m64
PSUBD mm, mm/m64

The PSUB (Packed Subtract) instructions subtract the data elements of the source operand from the data elements of the destination operand. If the result is larger or smaller than the data-range limit for the data type, it wraps around. PSUB support packed byte (PSUBB), packed word (PSUBW), and packed doubleword (PSUBD) data types.

 

PSUBB instruction with 64-bit operands:
DEST[7..0] ← DEST[7..0] − SRC[7..0];
* repeat subtract operation for 2nd through 7th byte *;
DEST[63..56] ← DEST[63..56] − SRC[63..56];


PSUBW instruction with 64-bit operands:
DEST[15..0] ← DEST[15..0] − SRC[15..0];
* repeat subtract operation for 2nd and 3rd word *;
DEST[63..48] ← DEST[63..48] − SRC[63..48];


PSUBD instruction with 64-bit operands:
DEST[31..0] ← DEST[31..0] − SRC[31..0];
DEST[63..32] ← DEST[63..32] − SRC[63..32];

PSUBB __m64 _mm_sub_pi8(__m64 m1, __m64 m2)

PSUBW __m64 _mm_sub_pi16(__m64 m1, __m64 m2)

PSUBD __m64 _mm_sub_pi32(__m64 m1, __m64 m2)

 

PSUBSB mm, mm/m64
PSUBSW mm, mm/m64

The PSUBS (Packed Subtract with Saturation) instructions subtract the signed data elements of the source operand from the signed data elements of the destination operand, then the results are saturated to the limits of a signed data element and written to the destination operand. PSUBS support packed byte (PSUBSB) and  packed word (PSUBSW) data types.

 

PSUBSB instruction with 64-bit operands:
DEST[7..0] ← SaturateToSignedByte(DEST[7..0] − SRC (7..0]) ;
* repeat subtract operation for 2nd through 7th bytes *;
DEST[63..56] ← SaturateToSignedByte(DEST[63..56] − SRC[63..56] );

PSUBSW instruction with 64-bit operands
DEST[15..0] ← SaturateToSignedWord(DEST[15..0] − SRC[15..0] );
* repeat subtract operation for 2nd and 7th words *;
DEST[63..48] ← SaturateToSignedWord(DEST[63..48] − SRC[63..48] );

PSUBSB __m64 _mm_subs_pi8(__m64 m1, __m64 m2)

PSUBSW __m64 _mm_subs_pi16(__m64 m1, __m64 m2)

 

PSUBUSB mm, mm/m64
PSUBUSW mm, mm/m64
The PSUBUS (Packed Subtract Unsigned with Saturation) instructions subtract the unsigned data elements of the source operand from the unsigned data elements of the destination register, then the results are saturated to the limits of an unsigned data element and written to the destination operand. PSUBUS support packed byte (PSUBUSB) and packed word (PSUBUSW) data types.
PSUBUSB instruction with 64-bit operands:
DEST[7..0] ← SaturateToUnsignedByte(DEST[7..0] − SRC (7..0] );
* repeat add operation for 2nd through 7th bytes *:
DEST[63..56] ← SaturateToUnsignedByte(DEST[63..56] − SRC[63..56]

PSUBUSW instruction with 64-bit operands:
DEST[15..0] ← SaturateToUnsignedWord(DEST[15..0] − SRC[15..0] );
* repeat add operation for 2nd and 3rd words *:
DEST[63..48] ← SaturateToUnsignedWord(DEST[63..48] − SRC[63..48] );

PSUBUSB __m64 _mm_sub_pu8(__m64 m1, __m64 m2)

PSUBUSW __m64 _mm_sub_pu16(__m64 m1, __m64 m2)

As an example of saturated arithmetic, let us consider the absolute difference of two arrays of bytes: there are no IF statements in MMX, but it is necessary to implement the following algorithm:

if (a > b)
 then c = a – b
 else c = b – a

This algorithm can be coded using saturated substractions: subtracting a from b and b from a, a zero result and the desired absolute difference are obtained,  but since it is impossible to know which is which, the final result is achieved by ORing them together:

c = (a – b) OR (b – a)

Assuming that the MMX registers named MM0 and MM1 hold the source vectors, the following code will compute the absolute difference and store it into MM0:

MOVQ MM2, MM0 make a copy of MM0
PSUBUSB MM0, MM1 compute difference one way
PSUBUSB MM1, MM2 compute difference the other way
POR MM0, MM1 OR them together
 
 

PMULHW mm, mm/m64
PMULLW mm, mm/m64

The PMULHW (Packed Multiply High) and PMULLW (Packed Multiply Low) instructions multiply the four signed words of the source and destination operands and write the high-order or low-order 16 bits of the 32-bit intermediate results to the destination operand.

 

PMULHW instruction with 64-bit operands:
TEMP0[31-0] ← DEST[15-0] * SRC[15-0]; * Signed multiplication *
TEMP1[31-0] ← DEST[31-16] * SRC[31-16];
TEMP2[31-0] ← DEST[47-32] * SRC[47-32];
TEMP3[31-0] ← DEST[63-48] * SRC[63-48];
DEST[15-0] ← TEMP0[31-16];
DEST[31-16] ← TEMP1[31-16];
DEST[47-32] ← TEMP2[31-16];
DEST[63-48] ← TEMP3[31-16];

PMULLW instruction with 64-bit operands:
TEMP0[31-0] ← DEST[15-0] * SRC[15-0]; * Signed multiplication *
TEMP1[31-0] ← DEST[31-16] * SRC[31-16];
TEMP2[31-0] ← DEST[47-32] * SRC[47-32];
TEMP3[31-0] ← DEST[63-48] * SRC[63-48];
DEST[15-0] ← TEMP0[15-0];
DEST[31-16] ← TEMP1[15-0];
DEST[47-32] ← TEMP2[15-0];
DEST[63-48] ← TEMP3[15-0];

PMULHW __m64 _mm_mulhi_pi16 (__m64 m1, __m64 m2)

PMULLW __m64 _mm_mullo_pi16(__m64 m1, __m64 m2)

 

PMADDWD mm, mm/m64

The PMADDWD (Packed Multiply and Add) instruction multiplies the four signed words of the destination operand by the four signed words of the source operand. The two high-order words are summed and stored in the upper doubleword of the destination operand, and the two low-order words are summed and stored in the lower doubleword of the destination operand. 

 

PMADDWD instruction with 64-bit operands:
DEST[31..0] ← (DEST[15..0] * SRC[15..0]) + (DEST[31..16] * SRC[31..16]);
DEST[63..32] ← (DEST[47..32] * SRC[47..32]) + (DEST[63..48] * SRC[63..48]);
PMADDWD __m64 _mm_madd_pi16(__m64 m1, __m64 m2)

Complex multiplication is an operation which requires four multiplications and two additions, leading naturally to the use of the PMADDWD instruction. In order to use this instruction it is necessary to format the data into four 16-bit values, each holding a read or imaginary component: the constant vector can be outlined as [Re –Im Im Re].
The following code fragment multiplies the complex number stored in the MMX register MM0 by the complex constant hold in register MM1 with the pattern explained above. The real component of the complex product is given by 
Re(Data)*Re(Const) – Im(Data)*Im(Const) 
and the imaginary component of the complex product by 
Re(Data)*Im(Const) + Im(Data)*Re(Const).

PUNPCKLDQ MM0, MM0 convert the data in the [Re Im Re Im] format
PMADDWD MM0, MM1 perform the complex multiply 

Note that the output is a packed word, so a pack instruction may be used to convert the result to 16-bit, matching the format of the input.
 
 

Quote this article on your site

To create link towards this article on your website,
copy and paste the text below in your page.




Preview :

MMX Arithmetic
Saturday, 24 April 2010

Powered by QuoteThis © 2008
 

Latest Articles

A software to stand out 27 January 2018, 14.35 Web
A software to stand out
Standing out of the pack starts by being visible, and being noticed by the right group of professionals. No matter how good your profile is, it is lost in a sea of similar profiles, so you need to show up and start attracting
Web page scraping, the easy way 07 January 2018, 00.46 Web
Web page scraping, the easy way
There are many ways to extract data elements from web pages, almost all of them prettier and cooler than the method proposed here, but as we are in an hurry, let's get that data quickly, ok? Suppose we have to extract the
Scraping dynamic page content 06 January 2018, 23.57 Web
Scraping dynamic page content
One of the most common roadblocks when scraping the content of web sites is getting the full contents of the page, including JS-generated data elements (probably, the ones you are looking for). So, when using CEFSharp to scrape
Unit-testing file I/O 26 November 2017, 12.09 Testing
Unit-testing file I/O
Two good news: file I/O is unit-testable, and it is surprisingly easy to do. Let's see how it works! A software no-one asked for First, we need a piece of software that deals with files and that has to be unit-tested. The
Fixing Git pull errors in SourceTree 10 April 2017, 01.44 Software
Fixing Git pull errors in SourceTree
If you encounter the following error when pulling a repository in SourceTree: VirtualAlloc pointer is null, Win32 error 487 it is due to to the Cygwin system failing to allocate a 5 MB large chunk of memory for its heap at
View Stefano Tommesani's profile on LinkedIn