# Stefano Tommesani

## Packed Arithmetic Intrinsics

Intrinsic Instruction Operation R0 R1 R2 R3
```a0 [op] b0
```
```a1
```
```a2
```
```a3
```
```a0 [op] b0
```
```a1 [op] b1
```
```a2 [op] b2
```
```a3 [op] b3
```
_mm_sub_ss SUBSS Subtracts
```a0 [op] b0
```
```a1
```
```a2
```
```a3
```
_mm_sub_ps SUBPS Subtracts
```a0 [op] b0
```
```a1 [op] b1
```
```a2 [op] b2
```
```a3 [op] b3
```
_mm_mul_ss MULSS Multiplies
```a0 [op] b0
```
```a1
```
```a2
```
```a3
```
_mm_mul_ps MULPS Multiplies
```a0 [op] b0
```
```a1 [op] b1
```
```a2 [op] b2
```
```a3 [op] b3
```
_mm_div_ss DIVSS Divides
```a0 [op] b0
```
```a1
```
```a2
```
```a3
```
_mm_div_ps DIVPS Divides
```a0 [op] b0
```
```a1 [op] b1
```
```a2 [op] b2
```
```a3 [op] b3
```
_mm_sqrt_ss SQRTSS Computes squared root
```[op] a0
```
```a1
```
```a2
```
```a3
```
_mm_sqrt_ps SQRTPS Computes squared root
```[op] a0
```
```[op] b1
```
```[op] b2
```
```[op] b3
```
_mm_rcp_ss RCPSS Computes reciprocal
```[op] a0
```
```a1
```
```a2
```
```a3
```
_mm_rcp_ps RCPPS Computes reciprocal
```[op] a0
```
```[op] b1
```
```[op] b2
```
```[op] b3
```
_mm_rsqrt_ss RSQRTSS Computes reciprocal square root
```[op] a0
```
```a1
```
```a2
```
```a3
```
_mm_rsqrt_ps RSQRTPS Computes reciprocal squared root
```[op] a0
```
```[op] b1
```
```[op] b2
```
```[op] b3
```
_mm_min_ss MINSS Computes minimum
```[op]( a0,b0)
```
```a1
```
```a2
```
```a3
```
_mm_min_ps MINPS Computes minimum
```[op]( a0,b0)
```
```[op] (a1, b1)
```
```[op] (a2, b2)
```
```[op] (a3, b3)
```
_mm_max_ss MAXSS Computes maximum
```[op]( a0,b0)
```
```a1
```
```a2
```
```a3
```
_mm_max_ps MAXPS Computes maximum
```[op]( a0,b0)
```
```[op] (a1, b1)
```
```[op] (a2, b2)
```
```[op] (a3, b3)
```

## Logical Intrinsics

Intrinsic name Operation Corresponding instruction
_mm_and_ps Bitwise AND ANDPS
_mm_andnot_ps Logical NOT ANDNPS
_mm_or_ps Bitwise OR ORPS
_mm_xor_ps Bitwise Exclusive OR XORPS

## Compare Intrinsics

Intrinsic name Comparison Corresponding instruction
_mm_cmpeq_ss Equal CMPEQSS
_mm_cmpeq_ps Equal CMPEQPS
_mm_cmplt_ss Less than CMPLTSS
_mm_cmplt_ps Less than CMPLTPS
_mm_cmple_ss Less than or equal CMPLESS
_mm_cmple_ps Less than or equal CMPLEPS
_mm_cmpgt_ss Greater than CMPLTSS
_mm_cmpgt_ps Greater than CMPLTPS
_mm_cmpge_ss Greater than or equal CMPLESS
_mm_cmpge_ps Greater than or equal CMPLEPS
_mm_cmpneq_ss Not equal CMPNEQSS
_mm_cmpneq_ps Not equal CMPNEQPS
_mm_cmpnlt_ss Not less than CMPNLTSS
_mm_cmpnlt_ps Not less than CMPNLTPS
_mm_cmpnle_ss Not less than or equal CMPNLESS
_mm_cmple_ps Not less than or equal CMPNLEPS
_mm_cmpngt_ss Not greater than CMPNLTSS
_mm_cmpngt_ps Not greater than CMPNLTPS
_mm_cmpnge_ss Not greater than or equal CMPNLESS
_mm_cmpnge_ps Not greater than or equal CMPNLEPS
_mm_cmpord_ss Ordered CMPORDSS
_mm_cmpord_ps Ordered CMPORDPS
_mm_cmpunord_ss Unordered CMPUNORDSS
_mm_cmpunord_ps Unordered CMPUNORDPS
_mm_comieq_ss Equal COMISS
_mm_comilt_ss Less than COMISS
_mm_comile_ss Less than or equal COMISS
_mm_comigt_ss Greater than COMISS
_mm_comige_ss Greater than or equal COMISS
_mm_comineq_ss Not equal COMISS
_mm_ucomieq_ss Equal UCOMISS
_mm_ucomilt_ss Less than UCOMISS
_mm_ucomile_ss Less than or equal UCOMISS
_mm_ucomigt_ss Greater than UCOMISS
_mm_ucomige_ss Greater than or equal UCOMISS
_mm_ucomineq_ss Not equal UCOMISS

## Conversion Operations

Intrinsic name Corresponding instruction
_mm_cvtss_si32 CVTSS2SI
_mm_cvtps_pi32 CVTPS2PI
_mm_cvttss_si32 CVTTSS2SI
_mm_cvttps_pi32 CVTTPS2PI
_mm_cvtsi32_ss CVTSI2SS
_mm_cvtpi32_ps CVTTPS2PI
_mm_cvtpi16_ps Composite
_mm_cvtpu16_ps Composite
_mm_cvtpi8_ps Composite
_mm_cvtpu8_ps Composite
_mm_cvtpi32x2_ps Composite
_mm_cvtps_pi16 Composite
_mm_cvtps_pi8 Composite

## Miscellaneous Intrinsics

Intrinsic name Operation Corresponding instruction
_mm_shuffle_ps Shuffles SHUFPS
_mm_shuffle_pi16 Shuffles PSHUFW
_mm_unpackhi_ps Unpacks high UNPCKHPS
_mm_unpacklo_ps Unpacks low UNPCKLPS
_mm_storeh_pi Stores high MOVHPS mem, reg
_mm_movehl_ps Moves high to low MOVHLPS
_mm_movelh_ps Moves low to high MOVLHPS
_mm_storel_pi Stores low MOVLPS mem, reg
_mm_getcsr Returns register contents STMXCSR
_mm_setcsr Sets control register LDMXCSR

## Memory and Initialization Load Operations

Intrinsic name Operation Corresponding instruction
_mm_load_ss Loads the low value and clears the three high values MOVSS

## Memory and Initialization Set Operations

Intrinsic name Operation Corresponding instruction
_mm_set_ss Sets the low value and clears the three high values Composite
_mm_set1_ps Sets all four words with the same value Composite
_mm_set_ps Sets four values, address aligned Composite
_mm_setr_ps Sets four values, in reverse order Composite
_mm_setzero_ps Clears all four values Composite

## Memory and Initialization Store Operations

Intrinsic name Operation Corresponding instruction
_mm_store_ss Stores the low value MOVSS
_mm_store1_ps Stores the low value across all four words MOVSS + Shuffling
_mm_store_ps Stores four values, address aligned MOVAPS
_mm_storeu_ps Stores four values, address unaligned MOVUPS
_mm_storer_ps Stores four values, in reverse order MOVAPS + Shuffling
_mm_move_ss Sets the low word, and passes in three high values MOVSS

## Integer Intrinsics

Intrinsic name Operation Corresponding instruction
_mm_extract_pi16 Extracts one of four words PEXTRW
_mm_insert_pi16 Inserts a word PINSRW
_mm_max_pi16 Computes the maximum PMAXSW
_mm_max_pu8 Computes the maximum, unsigned PMAXUB
_mm_min_pi16 Computes the minimum PMINSW
_mm_min_pu8 Computes the minimum, unsigned PMINUB
_mm_mulhi_pu16 Multiplies, returning high bits PMULHUW
_mm_shuffle_pi16 Returns a combination of four words PSHUFW
_mm_avg_pu8 Computes rounded average PAVGB
_mm_avg_pu16 Computes rounded average PAVGW

## Cache support

```void _mm_prefetch(char * p , int i );
PREFETCH
```

Loads one cache line of data from address `p `to a location closer to the processor. The value `i `specifies the type of prefetch operation: the constants `_MM_HINT_T0`, `_MM_HINT_T1`, `_MM_HINT_T2`, and `_MM_HINT_NTA`, corresponding to the type of prefetch instruction, should be used.

```void _mm_stream_pi(__m64 * p , __m64 a );
MOVNTQ
```

Stores the data in `a `to the address `p `without polluting the caches. This intrinsic requires you to empty the multimedia state for the MMX register. See Understanding the EMMS Instruction section.

```void _mm_stream_ps(float * p , __m128 a );
MOVNTPS
```

Stores the data in `a `to the address `p `without polluting the caches. The address must be 16-byte aligned.

```void _mm_sfence(void);
SFENCE
```

Guarantees that every preceding store is globally visible before any subsequent store.

copy and paste the text below in your page.

Preview :

SSE Intrinsics
Thursday, 27 May 2010

Last Updated on Monday, 27 May 2013 15:09

### Latest Articles

Fixing Git pull errors in SourceTree 10 April 2017, 01.44 Software
If you encounter the following error when pulling a repository in SourceTree: VirtualAlloc pointer is null, Win32 error 487 it is due to to the Cygwin system failing to allocate a 5 MB large chunk of memory for its heap at
Castle on the hill of crappy audio quality 19 March 2017, 01.53 Audio
As the yearly dynamic range day is close (March 31st), let's have a look at one of the biggest audio massacres of the year, Ed Sheeran's "Castle on the hill". First time I heard the song, I thought my headphones just got
Necessary evil: testing private methods 29 January 2017, 21.41 Testing
Some might say that testing private methods should be avoided because it means not testing the contract, that is the interface implemented by the class, but the internal implementation of the class itself. Still, not all
I am right and you are wrong 28 December 2016, 14.23 Web
Have you ever convinced anyone that disagreed with you about a deeply held belief? Better yet, have you changed your mind lately on an important topic after discussing with someone else that did not share your point of
How Commercial Insight changes R&D 06 November 2016, 01.21 Web
The CEB's Commercial Insight is based on three pillars: Be credible/relevant – Demonstrate an understanding of the customer’s world, substantiating claims with real-world evidence. Be frame-breaking – Disrupt the