SIMD

MMX Conversion

There are several cases where elements of packed data may be required to be repositioned within the packed data, or the elements of two packed data operands may need to be merged. There are cases where either input or the desired output representation of a data may not be ideal for maximizing computation throughput.  There are also situations where one needs to perform intermediate computations in wider format (perhaps packed word format), while the result is presented in packed byte format.
In the above cases, there is a need to extract some elements of a packed data type and write them into a different position in the packed result. One general solution to this issue is to provide an instruction that takes two packed data operands and allows merging of their bytes in any arbitrary order into the destination packed data operand. However, such a general solution is expensive to implement, requiring a full cross bar connection.
MMX technology defines instructions that requires a relatively easy swizzle network and yet allows the efficient repositioning and combining of elements from packed data operands in most cases. 
SSE technology adds a shuffle words instruction that represents a better general solution at the expense of backward compatibility.
 

 

PACKSSWB mm, mm/m64
PACKSSDW mm, mm/m64

The PACKSS (Packed with Signed Saturation) instruction packs and saturates the signed data elements from the source and the destination operands and writes the signed results to the destination operand. 
PACKSSWB packs four signed words from the source operand and four signed words from the destination operand into eight signed bytes in the destination register. If the signed value of a word is larger or smaller than the range of a signed byte, the value is saturated (in the case of an overflow to 0x7F, and in case of an underflow to 0x80).

PACKSSDW instruction packs two signed doublewords from the source operand and two signed doublewords from the destination operand into four signed words in the destination register. If the signed value of a doubleword is larger or smaller than the range of a signed word, the value is saturated (in the case of an overflow to 0x7FFF, and in the case of an underflow to 0x8000).

PACKSSWB instruction with 64-bit operands
DEST[7..0] ? SaturateSignedWordToSignedByte DEST[15..0];
DEST[15..8] ? SaturateSignedWordToSignedByte DEST[31..16];
DEST[23..16] ? SaturateSignedWordToSignedByte DEST[47..32];
DEST[31..24] ? SaturateSignedWordToSignedByte DEST[63..48];
DEST[39..32] ? SaturateSignedWordToSignedByte SRC[15..0];
DEST[47..40] ? SaturateSignedWordToSignedByte SRC[31..16];
DEST[55..48] ? SaturateSignedWordToSignedByte SRC[47..32];
DEST[63..56] ? SaturateSignedWordToSignedByte SRC[63..48];

PACKSSDW instruction with 64-bit operands
DEST[15..0] ? SaturateSignedDoublewordToSignedWord DEST[31..0];
DEST[31..16] ? SaturateSignedDoublewordToSignedWord DEST[63..32];
DEST[47..32] ? SaturateSignedDoublewordToSignedWord SRC[31..0];
DEST[63..48] ? SaturateSignedDoublewordToSignedWord SRC[63..32];

PACKSSWB __m64 _mm_packs_pi16(__m64 m1, __m64 m2)

PACKSSDW __m64 _mm_packs_pi32 (__m64 m1, __m64 m2)


 

PACKUSWB mm, mm/m64

The PACKUSWB (Packed with Unsigned Saturation) instruction packs and saturates four signed words of the source operand and four signed words of the destination operand into eight unsigned bytes stored into the destination operand. If the signed value of the word is larger or smaller than the range of an unsigned byte, the value is saturated (in the case of an overflow to 0xFF and in the case of an underflow to 0x00). 

PACKUSWB instruction with 64-bit operands:
DEST[7..0] ? SaturateSignedWordToUnsignedByte DEST[15..0];
DEST[15..8] ? SaturateSignedWordToUnsignedByte DEST[31..16];
DEST[23..16] ? SaturateSignedWordToUnsignedByte DEST[47..32];
DEST[31..24] ? SaturateSignedWordToUnsignedByte DEST[63..48];
DEST[39..32] ? SaturateSignedWordToUnsignedByte SRC[15..0];
DEST[47..40] ? SaturateSignedWordToUnsignedByte SRC[31..16];
DEST[55..48] ? SaturateSignedWordToUnsignedByte SRC[47..32];
DEST[63..56] ? SaturateSignedWordToUnsignedByte SRC[63..48];
PACKUSWB __m64 _mm_packs_pu16(__m64 m1, __m64 m2)


 

PUNPCKHBW mm, mm/m64
PUNPCKHWD mm, mm/m64
PUNPCKHDQ mm, mm/m64

The PUNPCKH (Unpack High Packed Data) instructions unpack and interleave the high-order data elements of the destination and source operands into the destination operand, ignoring  the low-order data elements. If the source operand is all zeros, the result is a zero extension of the high order elements of the destination operand. 
PUNPCKH supports packed byte (PUNPCKHBW), packed word (PUNPCKHWD) and packed doubleword (PUNPCKHDQ) source data types. 

PUNPCKHBW instruction with 64-bit operands:
DEST[7..0] ? DEST[39..32];
DEST[15..8] ? SRC[39..32];
DEST[23..16] ? DEST[47..40];
DEST[31..24] ? SRC[47..40];
DEST[39..32] ? DEST[55..48];
DEST[47..40] ? SRC[55..48];
DEST[55..48] ? DEST[63..56];
DEST[63..56] ? SRC[63..56];

PUNPCKHW instruction with 64-bit operands:
DEST[15..0] ? DEST[47..32];
DEST[31..16] ? SRC[47..32];
DEST[47..32] ? DEST[63..48];
DEST[63..48] ? SRC[63..48];

PUNPCKHDQ instruction with 64-bit operands:
DEST[31..0] ? DEST[63..32]
DEST[63..32] ? SRC[63..32];

PUNPCKHBW __m64 _mm_unpackhi_pi8(__m64 m1, __m64 m2)

PUNPCKHWD __m64 _mm_unpackhi_pi16(__m64 m1,__m64 m2)

PUNPCKHDQ __m64 _mm_unpackhi_pi32(__m64 m1, __m64 m2)


 

PUNPCKLBW mm, mm/m32
PUNPCKLWD mm, mm/m32
PUNPCKLDQ mm, mm/m32

The PUNPCKL (Unpack Low Packed Data) instructions unpack and interleave the low-order data elements of the destination and source operands into the destination operand. When unpacking from a memory operand, only 32 bits are accessed. If the source operand has a value of all zeros, the result is a zero extension of the low order elements of the destination operand. PUNPCKL supports packed byte (PUNPCKLBW), packed word (PUNPCKLWD) and packed doubleword (PUNPCKLDQ) source data types.

PUNPCKLBW instruction with 64-bit operands:
DEST[63..56] ? SRC[31..24];
DEST[55..48] ? DEST[31..24];
DEST[47..40] ? SRC[23..16];
DEST[39..32] ? DEST[23..16];
DEST[31..24] ? SRC[15..8];
DEST[23..16] ? DEST[15..8];
DEST[15..8] ? SRC[7..0];
DEST[7..0] ? DEST[7..0];

PUNPCKLWD instruction with 64-bit operands:
DEST[63..48] ? SRC[31..16];
DEST[47..32] ? DEST[31..16];
DEST[31..16] ? SRC[15..0];
DEST[15..0] ? DEST[15..0];

PUNPCKLDQ instruction with 64-bit operands:
DEST[63..32] ? SRC[31..0];
DEST[31..0] ? DEST[31..0];

PUNPCKLBW __m64 _mm_unpacklo_pi8 (__m64 m1, __m64 m2)

PUNPCKLWD __m64 _mm_unpacklo_pi16 (__m64 m1, __m64 m2)

PUNPCKLDQ __m64 _mm_unpacklo_pi32 (__m64 m1, __m64 m2)

Leave a Reply

Your email address will not be published. Required fields are marked *