Stefano Tommesani

  • Increase font size
  • Default font size
  • Decrease font size
Home Programming MMX Conversion

MMX Conversion

There are several cases where elements of packed data may be required to be repositioned within the packed data, or the elements of two packed data operands may need to be merged. There are cases where either input or the desired output representation of a data may not be ideal for maximizing computation throughput.  There are also situations where one needs to perform intermediate computations in wider format (perhaps packed word format), while the result is presented in packed byte format.
In the above cases, there is a need to extract some elements of a packed data type and write them into a different position in the packed result. One general solution to this issue is to provide an instruction that takes two packed data operands and allows merging of their bytes in any arbitrary order into the destination packed data operand. However, such a general solution is expensive to implement, requiring a full cross bar connection.
MMX technology defines instructions that requires a relatively easy swizzle network and yet allows the efficient repositioning and combining of elements from packed data operands in most cases. 
SSE technology adds a shuffle words instruction that represents a better general solution at the expense of backward compatibility.
 

 

PACKSSWB mm, mm/m64
PACKSSDW mm, mm/m64

The PACKSS (Packed with Signed Saturation) instruction packs and saturates the signed data elements from the source and the destination operands and writes the signed results to the destination operand. 
PACKSSWB packs four signed words from the source operand and four signed words from the destination operand into eight signed bytes in the destination register. If the signed value of a word is larger or smaller than the range of a signed byte, the value is saturated (in the case of an overflow to 0x7F, and in case of an underflow to 0x80).

PACKSSDW instruction packs two signed doublewords from the source operand and two signed doublewords from the destination operand into four signed words in the destination register. If the signed value of a doubleword is larger or smaller than the range of a signed word, the value is saturated (in the case of an overflow to 0x7FFF, and in the case of an underflow to 0x8000).

PACKSSWB instruction with 64-bit operands
DEST[7..0] ← SaturateSignedWordToSignedByte DEST[15..0];
DEST[15..8] ← SaturateSignedWordToSignedByte DEST[31..16];
DEST[23..16] ← SaturateSignedWordToSignedByte DEST[47..32];
DEST[31..24] ← SaturateSignedWordToSignedByte DEST[63..48];
DEST[39..32] ← SaturateSignedWordToSignedByte SRC[15..0];
DEST[47..40] ← SaturateSignedWordToSignedByte SRC[31..16];
DEST[55..48] ← SaturateSignedWordToSignedByte SRC[47..32];
DEST[63..56] ← SaturateSignedWordToSignedByte SRC[63..48];

PACKSSDW instruction with 64-bit operands
DEST[15..0] ← SaturateSignedDoublewordToSignedWord DEST[31..0];
DEST[31..16] ← SaturateSignedDoublewordToSignedWord DEST[63..32];
DEST[47..32] ← SaturateSignedDoublewordToSignedWord SRC[31..0];
DEST[63..48] ← SaturateSignedDoublewordToSignedWord SRC[63..32];

PACKSSWB __m64 _mm_packs_pi16(__m64 m1, __m64 m2)

PACKSSDW __m64 _mm_packs_pi32 (__m64 m1, __m64 m2)


 

PACKUSWB mm, mm/m64

The PACKUSWB (Packed with Unsigned Saturation) instruction packs and saturates four signed words of the source operand and four signed words of the destination operand into eight unsigned bytes stored into the destination operand. If the signed value of the word is larger or smaller than the range of an unsigned byte, the value is saturated (in the case of an overflow to 0xFF and in the case of an underflow to 0x00). 

PACKUSWB instruction with 64-bit operands:
DEST[7..0] ← SaturateSignedWordToUnsignedByte DEST[15..0];
DEST[15..8] ← SaturateSignedWordToUnsignedByte DEST[31..16];
DEST[23..16] ← SaturateSignedWordToUnsignedByte DEST[47..32];
DEST[31..24] ← SaturateSignedWordToUnsignedByte DEST[63..48];
DEST[39..32] ← SaturateSignedWordToUnsignedByte SRC[15..0];
DEST[47..40] ← SaturateSignedWordToUnsignedByte SRC[31..16];
DEST[55..48] ← SaturateSignedWordToUnsignedByte SRC[47..32];
DEST[63..56] ← SaturateSignedWordToUnsignedByte SRC[63..48];
PACKUSWB __m64 _mm_packs_pu16(__m64 m1, __m64 m2)


 

PUNPCKHBW mm, mm/m64
PUNPCKHWD mm, mm/m64
PUNPCKHDQ mm, mm/m64

The PUNPCKH (Unpack High Packed Data) instructions unpack and interleave the high-order data elements of the destination and source operands into the destination operand, ignoring  the low-order data elements. If the source operand is all zeros, the result is a zero extension of the high order elements of the destination operand. 
PUNPCKH supports packed byte (PUNPCKHBW), packed word (PUNPCKHWD) and packed doubleword (PUNPCKHDQ) source data types. 

PUNPCKHBW instruction with 64-bit operands:
DEST[7..0] ← DEST[39..32];
DEST[15..8] ← SRC[39..32];
DEST[23..16] ← DEST[47..40];
DEST[31..24] ← SRC[47..40];
DEST[39..32] ← DEST[55..48];
DEST[47..40] ← SRC[55..48];
DEST[55..48] ← DEST[63..56];
DEST[63..56] ← SRC[63..56];

PUNPCKHW instruction with 64-bit operands:
DEST[15..0] ← DEST[47..32];
DEST[31..16] ← SRC[47..32];
DEST[47..32] ← DEST[63..48];
DEST[63..48] ← SRC[63..48];

PUNPCKHDQ instruction with 64-bit operands:
DEST[31..0] ← DEST[63..32]
DEST[63..32] ← SRC[63..32];

PUNPCKHBW __m64 _mm_unpackhi_pi8(__m64 m1, __m64 m2)

PUNPCKHWD __m64 _mm_unpackhi_pi16(__m64 m1,__m64 m2)

PUNPCKHDQ __m64 _mm_unpackhi_pi32(__m64 m1, __m64 m2)


 

PUNPCKLBW mm, mm/m32
PUNPCKLWD mm, mm/m32
PUNPCKLDQ mm, mm/m32

The PUNPCKL (Unpack Low Packed Data) instructions unpack and interleave the low-order data elements of the destination and source operands into the destination operand. When unpacking from a memory operand, only 32 bits are accessed. If the source operand has a value of all zeros, the result is a zero extension of the low order elements of the destination operand. PUNPCKL supports packed byte (PUNPCKLBW), packed word (PUNPCKLWD) and packed doubleword (PUNPCKLDQ) source data types.

PUNPCKLBW instruction with 64-bit operands:
DEST[63..56] ← SRC[31..24];
DEST[55..48] ← DEST[31..24];
DEST[47..40] ← SRC[23..16];
DEST[39..32] ← DEST[23..16];
DEST[31..24] ← SRC[15..8];
DEST[23..16] ← DEST[15..8];
DEST[15..8] ← SRC[7..0];
DEST[7..0] ← DEST[7..0];

PUNPCKLWD instruction with 64-bit operands:
DEST[63..48] ← SRC[31..16];
DEST[47..32] ← DEST[31..16];
DEST[31..16] ← SRC[15..0];
DEST[15..0] ← DEST[15..0];

PUNPCKLDQ instruction with 64-bit operands:
DEST[63..32] ← SRC[31..0];
DEST[31..0] ← DEST[31..0];

PUNPCKLBW __m64 _mm_unpacklo_pi8 (__m64 m1, __m64 m2)

PUNPCKLWD __m64 _mm_unpacklo_pi16 (__m64 m1, __m64 m2)

PUNPCKLDQ __m64 _mm_unpacklo_pi32 (__m64 m1, __m64 m2)

Quote this article on your site

To create link towards this article on your website,
copy and paste the text below in your page.




Preview :

MMX Conversion
Saturday, 24 April 2010

Powered by QuoteThis © 2008
 
View Stefano Tommesani's profile on LinkedIn

Latest Articles

Fixing Git pull errors in SourceTree 10 April 2017, 01.44 Software
Fixing Git pull errors in SourceTree
If you encounter the following error when pulling a repository in SourceTree: VirtualAlloc pointer is null, Win32 error 487 it is due to to the Cygwin system failing to allocate a 5 MB large chunk of memory for its heap at
Castle on the hill of crappy audio quality 19 March 2017, 01.53 Audio
Castle on the hill of crappy audio quality
As the yearly dynamic range day is close (March 31st), let's have a look at one of the biggest audio massacres of the year, Ed Sheeran's "Castle on the hill". First time I heard the song, I thought my headphones just got
Necessary evil: testing private methods 29 January 2017, 21.41 Testing
Necessary evil: testing private methods
Some might say that testing private methods should be avoided because it means not testing the contract, that is the interface implemented by the class, but the internal implementation of the class itself. Still, not all
I am right and you are wrong 28 December 2016, 14.23 Web
I am right and you are wrong
Have you ever convinced anyone that disagreed with you about a deeply held belief? Better yet, have you changed your mind lately on an important topic after discussing with someone else that did not share your point of
How Commercial Insight changes R&D 06 November 2016, 01.21 Web
How Commercial Insight changes R&D
The CEB's Commercial Insight is based on three pillars: Be credible/relevant – Demonstrate an understanding of the customer’s world, substantiating claims with real-world evidence. Be frame-breaking – Disrupt the

Translate