|
Packed Arithmetic Intrinsics
Intrinsic | Instruction | Operation | R0 | R1 | R2 | R3 |
---|---|---|---|---|---|---|
_mm_add_ss | ADDSS | Adds |
a0 [op] b0 |
a1 |
a2 |
a3 |
_mm_add_ps | ADDPS | Adds |
a0 [op] b0 |
a1 [op] b1 |
a2 [op] b2 |
a3 [op] b3 |
_mm_sub_ss | SUBSS | Subtracts |
a0 [op] b0 |
a1 |
a2 |
a3 |
_mm_sub_ps | SUBPS | Subtracts |
a0 [op] b0 |
a1 [op] b1 |
a2 [op] b2 |
a3 [op] b3 |
_mm_mul_ss | MULSS | Multiplies |
a0 [op] b0 |
a1 |
a2 |
a3 |
_mm_mul_ps | MULPS | Multiplies |
a0 [op] b0 |
a1 [op] b1 |
a2 [op] b2 |
a3 [op] b3 |
_mm_div_ss | DIVSS | Divides |
a0 [op] b0 |
a1 |
a2 |
a3 |
_mm_div_ps | DIVPS | Divides |
a0 [op] b0 |
a1 [op] b1 |
a2 [op] b2 |
a3 [op] b3 |
_mm_sqrt_ss | SQRTSS | Computes squared root |
[op] a0 |
a1 |
a2 |
a3 |
_mm_sqrt_ps | SQRTPS | Computes squared root |
[op] a0 |
[op] b1 |
[op] b2 |
[op] b3 |
_mm_rcp_ss | RCPSS | Computes reciprocal |
[op] a0 |
a1 |
a2 |
a3 |
_mm_rcp_ps | RCPPS | Computes reciprocal |
[op] a0 |
[op] b1 |
[op] b2 |
[op] b3 |
_mm_rsqrt_ss | RSQRTSS | Computes reciprocal square root |
[op] a0 |
a1 |
a2 |
a3 |
_mm_rsqrt_ps | RSQRTPS | Computes reciprocal squared root |
[op] a0 |
[op] b1 |
[op] b2 |
[op] b3 |
_mm_min_ss | MINSS | Computes minimum |
[op]( a0,b0) |
a1 |
a2 |
a3 |
_mm_min_ps | MINPS | Computes minimum |
[op]( a0,b0) |
[op] (a1, b1) |
[op] (a2, b2) |
[op] (a3, b3) |
_mm_max_ss | MAXSS | Computes maximum |
[op]( a0,b0) |
a1 |
a2 |
a3 |
_mm_max_ps | MAXPS | Computes maximum |
[op]( a0,b0) |
[op] (a1, b1) |
[op] (a2, b2) |
[op] (a3, b3) |
Logical Intrinsics
Intrinsic name | Operation | Corresponding instruction |
---|---|---|
_mm_and_ps | Bitwise AND | ANDPS |
_mm_andnot_ps | Logical NOT | ANDNPS |
_mm_or_ps | Bitwise OR | ORPS |
_mm_xor_ps | Bitwise Exclusive OR | XORPS |
Compare Intrinsics
Intrinsic name | Comparison | Corresponding instruction |
---|---|---|
_mm_cmpeq_ss | Equal | CMPEQSS |
_mm_cmpeq_ps | Equal | CMPEQPS |
_mm_cmplt_ss | Less than | CMPLTSS |
_mm_cmplt_ps | Less than | CMPLTPS |
_mm_cmple_ss | Less than or equal | CMPLESS |
_mm_cmple_ps | Less than or equal | CMPLEPS |
_mm_cmpgt_ss | Greater than | CMPLTSS |
_mm_cmpgt_ps | Greater than | CMPLTPS |
_mm_cmpge_ss | Greater than or equal | CMPLESS |
_mm_cmpge_ps | Greater than or equal | CMPLEPS |
_mm_cmpneq_ss | Not equal | CMPNEQSS |
_mm_cmpneq_ps | Not equal | CMPNEQPS |
_mm_cmpnlt_ss | Not less than | CMPNLTSS |
_mm_cmpnlt_ps | Not less than | CMPNLTPS |
_mm_cmpnle_ss | Not less than or equal | CMPNLESS |
_mm_cmple_ps | Not less than or equal | CMPNLEPS |
_mm_cmpngt_ss | Not greater than | CMPNLTSS |
_mm_cmpngt_ps | Not greater than | CMPNLTPS |
_mm_cmpnge_ss | Not greater than or equal | CMPNLESS |
_mm_cmpnge_ps | Not greater than or equal | CMPNLEPS |
_mm_cmpord_ss | Ordered | CMPORDSS |
_mm_cmpord_ps | Ordered | CMPORDPS |
_mm_cmpunord_ss | Unordered | CMPUNORDSS |
_mm_cmpunord_ps | Unordered | CMPUNORDPS |
_mm_comieq_ss | Equal | COMISS |
_mm_comilt_ss | Less than | COMISS |
_mm_comile_ss | Less than or equal | COMISS |
_mm_comigt_ss | Greater than | COMISS |
_mm_comige_ss | Greater than or equal | COMISS |
_mm_comineq_ss | Not equal | COMISS |
_mm_ucomieq_ss | Equal | UCOMISS |
_mm_ucomilt_ss | Less than | UCOMISS |
_mm_ucomile_ss | Less than or equal | UCOMISS |
_mm_ucomigt_ss | Greater than | UCOMISS |
_mm_ucomige_ss | Greater than or equal | UCOMISS |
_mm_ucomineq_ss | Not equal | UCOMISS |
Conversion Operations
Intrinsic name | Corresponding instruction |
---|---|
_mm_cvtss_si32 | CVTSS2SI |
_mm_cvtps_pi32 | CVTPS2PI |
_mm_cvttss_si32 | CVTTSS2SI |
_mm_cvttps_pi32 | CVTTPS2PI |
_mm_cvtsi32_ss | CVTSI2SS |
_mm_cvtpi32_ps | CVTTPS2PI |
_mm_cvtpi16_ps | Composite |
_mm_cvtpu16_ps | Composite |
_mm_cvtpi8_ps | Composite |
_mm_cvtpu8_ps | Composite |
_mm_cvtpi32x2_ps | Composite |
_mm_cvtps_pi16 | Composite |
_mm_cvtps_pi8 | Composite |
Miscellaneous Intrinsics
Intrinsic name | Operation | Corresponding instruction |
---|---|---|
_mm_shuffle_ps | Shuffles | SHUFPS |
_mm_shuffle_pi16 | Shuffles | PSHUFW |
_mm_unpackhi_ps | Unpacks high | UNPCKHPS |
_mm_unpacklo_ps | Unpacks low | UNPCKLPS |
_mm_loadh_pi | Loads high | MOVHPS reg, mem |
_mm_storeh_pi | Stores high | MOVHPS mem, reg |
_mm_movehl_ps | Moves high to low | MOVHLPS |
_mm_movelh_ps | Moves low to high | MOVLHPS |
_mm_loadl_pi | Loads low | MOVLPS reg, mem |
_mm_storel_pi | Stores low | MOVLPS mem, reg |
_mm_movemask_ps | Creates four-bit mask | MOVMSKPS |
_mm_getcsr | Returns register contents | STMXCSR |
_mm_setcsr | Sets control register | LDMXCSR |
Memory and Initialization Load Operations
Intrinsic name | Operation | Corresponding instruction |
---|---|---|
_mm_load_ss | Loads the low value and clears the three high values | MOVSS |
_mm_load1_ps | Loads one value into all four words | MOVSS + Shuffling |
_mm_load_ps | Loads four values, address aligned | MOVAPS |
_mm_loadu_ps | Loads four values, address unaligned | MOVUPS |
_mm_loadr_ps | Loads four values, in reverse order | MOVAPS + Shuffling |
Memory and Initialization Set Operations
Intrinsic name | Operation | Corresponding instruction |
---|---|---|
_mm_set_ss | Sets the low value and clears the three high values | Composite |
_mm_set1_ps | Sets all four words with the same value | Composite |
_mm_set_ps | Sets four values, address aligned | Composite |
_mm_setr_ps | Sets four values, in reverse order | Composite |
_mm_setzero_ps | Clears all four values | Composite |
Memory and Initialization Store Operations
Intrinsic name | Operation | Corresponding instruction |
---|---|---|
_mm_store_ss | Stores the low value | MOVSS |
_mm_store1_ps | Stores the low value across all four words | MOVSS + Shuffling |
_mm_store_ps | Stores four values, address aligned | MOVAPS |
_mm_storeu_ps | Stores four values, address unaligned | MOVUPS |
_mm_storer_ps | Stores four values, in reverse order | MOVAPS + Shuffling |
_mm_move_ss | Sets the low word, and passes in three high values | MOVSS |
Integer Intrinsics
Intrinsic name | Operation | Corresponding instruction |
---|---|---|
_mm_extract_pi16 | Extracts one of four words | PEXTRW |
_mm_insert_pi16 | Inserts a word | PINSRW |
_mm_max_pi16 | Computes the maximum | PMAXSW |
_mm_max_pu8 | Computes the maximum, unsigned | PMAXUB |
_mm_min_pi16 | Computes the minimum | PMINSW |
_mm_min_pu8 | Computes the minimum, unsigned | PMINUB |
_mm_movemask_pi8 | Creates an 8-bit mask | PMOVMSKB |
_mm_mulhi_pu16 | Multiplies, returning high bits | PMULHUW |
_mm_shuffle_pi16 | Returns a combination of four words | PSHUFW |
_mm_maskmove_si64 | Computes conditional store | MASKMOVQ |
_mm_avg_pu8 | Computes rounded average | PAVGB |
_mm_avg_pu16 | Computes rounded average | PAVGW |
_mm_sad_pu8 | Computes sum of absolute differences | PSADBW |
Cache support
void _mm_prefetch(char * p , int i ); PREFETCH
Loads one cache line of data from address p
to a location closer to the processor. The value i
specifies the type of prefetch operation: the constants _MM_HINT_T0
, _MM_HINT_T1
, _MM_HINT_T2
, and _MM_HINT_NTA
, corresponding to the type of prefetch instruction, should be used.
void _mm_stream_pi(__m64 * p , __m64 a ); MOVNTQ
Stores the data in a
to the address p
without polluting the caches. This intrinsic requires you to empty the multimedia state for the MMX register. See Understanding the EMMS Instruction section.
void _mm_stream_ps(float * p , __m128 a ); MOVNTPS
Stores the data in a
to the address p
without polluting the caches. The address must be 16-byte aligned.
void _mm_sfence(void); SFENCE
Guarantees that every preceding store is globally visible before any subsequent store.