# Stefano Tommesani

• • • ## Arithmetic Operation Intrinsics

Intrinsic name Corresponding instruction Operation R0 value R1 value
```a0 [op] b0
```
```a1
```
```a0 [op] b0
```
```a1 [op] b1
```
_mm_div_sd DIVSD Divides
```a0 [op] b0
```
```a1
```
_mm_div_pd DIVPD Divides
```a0 [op] b0
```
```a1 [op] b1
```
_mm_max_sd MAXSD Computes maximum
```a0 [op] b0
```
```a1
```
_mm_min_pd MAXPD Computes maximum
```a0 [op] b0
```
```a1 [op] b1
```
_mm_min_sd MINSD Computes minimum
```a0 [op] b0
```
```a1
```
_mm_min_pd MINPD Computes minimum
```a0 [op] b0
```
```a1 [op] b1
```
_mm_mul_sd MULSD Multiplies
```a0 [op] b0
```
```a1
```
_mm_mul_pd MULPD Multiplies
```a0 [op] b0
```
```a1 [op] b1
```
_mm_sqrt_sd SQRTSD Computes square root
```a0 [op] b0
```
```a1
```
_mm_sqrt_pd SQRTPD Computes square root
```a0 [op] b0
```
```a1 [op] b1
```
_mm_sub_sd SUBSD Subtracts
```a0 [op] b0
```
```a1
```
_mm_sub_pd SUBPD Subtracts
```a0 [op] b0
```
```a1 [op] b1
```

## Logical Operations

```__m128d _mm_andnot_pd (__m128d a, __m128d b);
ANDNPD
```

Computes the bitwise `AND` of the 128-bit value in `b` and the bitwise `NOT` of the 128-bit value in `a`.

```r0 := (~a0) & b0
r1 := (~a1) & b1
__m128d _mm_and_pd (__m128d a, __m128d b);
ANDPD
```

Computes the bitwise `AND` of the two double-precision, floating-point values of `a` and `b`.

```r0 := a0 & b0
r1 := a1 & b1
__m128d _mm_or_pd (__m128d a, __m128d b);
ORPD
```

Computes the bitwise `OR` of the two double-precision, floating-point values of `a` and `b`.

```r0 := a0 | b0
r1 := a1 | b1
__m128d _mm_xor_pd (__m128d a, __m128d b);
XORPD
```

Computes the bitwise `XOR` of the two double-precision, floating-point values of `a` and `b` .

```r0 := a0 ^ b0
r1 := a1 ^ b1
```

## Comparison Intrinsics

Intrinsic name Corresponding instruction Compare for
_mm_cmpeq_pd CMPEQPD Equality
_mm_cmplt_pd CMPLTPD Less than
_mm_cmple_pd CMPLEPD Less than or equal
_mm_cmpgt_pd CMPLTPDr Greater than
_mm_cmpge_pd CMPLEPDr Greater than or equal
_mm_cmpord_pd CMPORDPD Ordered
_mm_cmpunord_pd CMPUNORDPD Unordered
_mm_cmpneq_pd CMPNEQPD Inequality
_mm_cmpnlt_pd CMPNLTPD Not less than
_mm_cmpnle_pd CMPNLEPD Not less than or equal
_mm_cmpngt_pd CMPNLTPDr Not greater than
_mm_cmpnge_pd CMPLEPDr Not greater than or equal
_mm_cmpeq_sd CMPEQSD Equality
_mm_cmplt_sd CMPLTSD Less than
_mm_cmple_sd CMPLESD Less than or equal
_mm_cmpgt_sd CMPLTSDr Greater than
_mm_cmpge_sd CMPLESDr Greater than or equal
_mm_cmpord_sd CMPORDSD Ordered
_mm_cmpunord_sd CMPUNORDSD Unordered
_mm_cmpneq_sd CMPNEQSD Inequality
_mm_cmpnlt_sd CMPNLTSD Not less than
_mm_cmpnle_sd CMPNLESD Not less than or equal
_mm_cmpngt_sd CMPNLTSDr Not greater than
_mm_cmpnge_sd CMPNLESDR Not greater than or equal
_mm_comieq_sd COMISD Equality
_mm_comilt_sd COMISD Less than
_mm_comile_sd COMISD Less than or equal
_mm_comigt_sd COMISD Greater than
_mm_comige_sd COMISD Greater than or equal
_mm_comineq_sd COMISD Not equal
_mm_ucomieq_sd UCOMISD Equality
_mm_ucomilt_sd UCOMISD Less than
_mm_ucomile_sd UCOMISD Less than or equal
_mm_ucomigt_sd UCOMISD Greater than
_mm_ucomige_sd UCOMISD Greater than or equal
_mm_ucomineq_sd UCOMISD Not equal

## Conversion Operations

Intrinsic name Corresponding instruction Return type Parameters
_mm_cvtpd_ps CVTPD2PS __m128 (__m128d a)
_mm_cvtps_pd CVTPS2PD __m128d (__m128 a)
_mm_cvtepi32_pd CVTDQ2PD __m128d (__m128i a)
_mm_cvtpd_epi32 CVTPD2DQ __m128i (__m128d a)
_mm_cvtsd_si32 CVTSD2SI int (__m128d a)
_mm_cvtsd_ss CVTSD2SS __m128 (__m128 a, __m128d b)
_mm_cvtsi32_sd CVTSI2SD __m128d (__m128d a, int b)
_mm_cvtss_sd CVTSS2SD __m128d (__m128d a, __m128 b)
_mm_cvttpd_epi32 CVTTPD2DQ __m128i (__m128d a)
_mm_cvttsd_si32 CVTTSD2SI int (__m128d a)
_mm_cvtepi32_ps CVTDQ2PS __m128 (__m128i a)
_mm_cvtps_epi32 CVTPS2DQ __m128i (__m128 a)
_mm_cvttps_epi32 CVTTPS2DQ __m128i (__m128 a)
_mm_cvtpd_pi32 CVTPD2PI __m64 (__m128d a)
_mm_cvttpd_pi32 CVTTPD2PI __m64 (__m128d a)
_mm_cvtpi32_pd CVTPI2PD __m128d (__m64 a)

## Miscellaneous Operations

```__m128d _mm_unpackhi_pd (__m128d a, __m128d b);
UNPCKHPD
```

Interleaves the upper double-precision, floating-point values of `a` and `b`.

```r0 := a1
r1 := b1
__m128d _mm_unpacklo_pd (__m128d a, __m128d b);
UNPCKLPD
```

Interleaves the lower double-precision, floating-point values of `a` and `b`.

```r0 := a0
1 := b0
MOVMSKPD
```

Creates a two-bit mask from the sign bits of the two double-precision, floating-point values of `a`.

```r := sign(a1) << 1 | sign(a0)
__m128d _mm_shuffle_pd (__m128d a, __m128d b, int i);
SHUFPD
```

Selects two specific double-precision, floating-point values from `a` and `b`, based on the mask `i`. The mask must be an immediate. See Macro Function for Shuffle Using Streaming SIMD Extensions 2 Instructions section for a description of the shuffle semantics.

## Integer Arithmetic Operations

Intrinsic Instruction Operation
_mm_avg_epu8 PAVGB Computes average
_mm_avg_epu16 PAVGW Computes average
_mm_max_epi16 PMAXSW Computes maxima
_mm_max_epu8 PMAXUB Computes maxima
_mm_min_epi16 PMINSW Computes minima
_mm_min_epu8 PMINUB Computes minima
_mm_mulhi_epi16 PMULHW Multiplication
_mm_mulhi_epu16 PMULHUW Multiplication
_mm_mullo_epi16 PMULLW Multiplication
_mm_mul_su32 PMULUDQ Multiplication
_mm_mul_epu32 PMULUDQ Multiplication
_mm_sub_epi8 PSUBB Subtraction
_mm_sub_epi16 PSUBW Subtraction
_mm_sub_epi32 PSUBD Subtraction
_mm_sub_si64 PSUBQ Subtraction
_mm_sub_epi64 PSUBQ Subtraction
_mm_subs_epi8 PSUBSB Subtraction
_mm_subs_epi16 PSUBSW Subtraction
_mm_subs_epu8 PSUBUSB Subtraction
_mm_subs_epu16 PSUBUSW Subtraction

## Logical Operations Intrinsics

an explanation of the syntax used in code samples in this topic, see Floating-Point Intrinsics Using Streaming SIMD Extensions.

```__m128i _mm_and_si128 (__m128i a, __m128i b);
PAND
```

Computes the bitwise `AND` of the 128-bit value in `a` and the 128-bit value in `b`.

```r := a & b
__m128i _mm_andnot_si128 (__m128i a, __m128i b);
PANDN
```

Computes the bitwise `AND` of the 128-bit value in `b` and the bitwise `NOT` of the 128-bit value in `a`.

```r := (~a) & b
__m128i _mm_or_si128 (__m128i a, __m128i b);
POR
```

Computes the bitwise `OR` of the 128-bit value in `a` and the 128-bit value in `b`.

```r := a | b
__m128i _mm_xor_si128 ( __m128i a, __m128i b);
PXOR
```

Computes the bitwise `XOR` of the 128-bit value in `a` and the 128-bit value in `b`.

```r := a ^ b
```

## Shift Operation Intrinsics

Intrinsic shift Direction shift Type Corresponding instruction
_mm_slli_si128 Left Logical PSLLDQ
_mm_slli_epi16 Left Logical PSLLW
_mm_sll_epi16 Left Logical PSLLW
_mm_slli_epi32 Left Logical PSLLD
_mm_sll_epi32 Left Logical PSLLD
_mm_slli_epi64 Left Logical PSLLQ
_mm_sll_epi64 Left Logical PSLLQ
_mm_srai_epi16 Right Arithmetic PSRAW
_mm_sra_epi16 Right Arithmetic PSRAW
_mm_srli_si128 Right Logical PSRLDQ
_mm_srli_epi16 Right Logical PSRLW
_mm_srl_epi16 Right Logical PSRLW
_mm_srli_epi32 Right Logical PSRLD
_mm_srl_epi32 Right Logical PSRLD
_mm_srli_epi64 Right Logical PSRLQ
_mm_srl_epi64 Right Logical PSRLQ

## Conversion Intrinsics

```__m128i _mm_cvtsi32_si128 (int a);
MOVD
```

Moves 32-bit integer `a` to the least significant 32 bits of an `__m128` object one extending the upper bits.

```r0 := a
r1 := 0x0 ; r2 := 0x0 ; r3 := 0x0
int _mm_cvtsi128_si32 (__m128i a);
MOVD
```

Moves the least significant 32 bits of `a` to a 32-bit integer.

```r := a0
```

## Comparison Intrinsics

Intrinsic name Instruction Comparison Elements Size of elements
_mm_cmpeq_epi8 PCMPEQB Equality 16 8
_mm_cmpeq_epi16 PCMPEQW Equality 8 16
_mm_cmpeq_epi32 PCMPEQD Equality 4 32
_mm_cmpgt_epi8 PCMPGTB Greater than 16 8
_mm_cmpgt_epi16 PCMPGTW Greater than 8 16
_mm_cmpgt_epi32 PCMPGTD Greater than 4 32
_mm_cmplt_epi8 PCMPGTBr Less than 16 8
_mm_cmplt_epi16 PCMPGTWr Less than 8 16
_mm_cmplt_epi32 PCMPGTDr Less than 4 32

## Miscellaneous Operations Intrinsics

Intrinsic Corresponding instruction Operation
_mm_packs_epi16 PACKSSWB Packed saturation
_mm_packs_epi32 PACKSSDW Packed saturation
_mm_packus_epi16 PACKUSWB Packed saturation
_mm_extract_epi16 PEXTRW Extraction
_mm_insert_epi16 PINSRW Insertion
_mm_shuffle_epi32 PSHUFD Shuffle
_mm_shufflehi_epi16 PSHUFHW Shuffle
_mm_shufflelo_epi16 PSHUFLW Shuffle
_mm_unpackhi_epi8 PUNPCKHBW Interleave
_mm_unpackhi_epi16 PUNPCKHWD Interleave
_mm_unpackhi_epi32 PUNPCKHDQ Interleave
_mm_unpackhi_epi64 PUNPCKHQDQ Interleave
_mm_unpacklo_epi8 PUNPCKLBW Interleave
_mm_unpacklo_epi16 PUNPCKLWD Interleave
_mm_unpacklo_epi32 PUNPCKLDQ Interleave
_mm_unpacklo_epi64 PUNPCKLQDQ Interleave
_mm_movepi64_pi64 MOVDQ2Q Move
_mm_movpi64_pi64 MOVQ2DQ Move
_mm_move_epi64 MOVQ Move

## Cache Support Intrinsics

```void _mm_stream_pd (double *p, __m128d a);
MOVLPD
```

Stores the data in `a` to the address p without polluting caches. The address `p` must be 16-byte aligned. If the cache line containing address `p` is already in the cache, the cache will be updated.

```p := a0
p := a1 ```

```__m128i _mm_load_si128 (__m128i *p);
MOVDQA
```

Loads 128-bit value. Address `p` must be 16-byte aligned.

```r := *p
MOVDQU
```

Loads 128-bit value. Address `p` does not need be 16-byte aligned.

```r := *p
MOVQ
```

Load the lower 64 bits of the value pointed to by p into the lower 64 bits of the result, zeroing the upper 64 bits of the result.

```r0:= *p[63:0]
r1:=0x0
```

## Integer Set Operation Intrinsics

Intrinsic Corresponding instruction
_mm_set_epi64 Composite
_mm_set_epi32 Composite
_mm_set_epi16 Composite
_mm_set_epi8 Composite
_mm_set1_epi64 Composite
_mm_set1_epi32 Composite
_mm_set1_epi16 Composite
_mm_set1_epi8 Composite
_mm_setr_epi64 Composite
_mm_setr_epi32 Composite
_mm_setr_epi16 Composite
_mm_setr_epi8 Composite
_mm_setzero_si128 PXOR

## Integer Store Operation Intrinsics

```void _mm_store_si128 (__m128i *p, __m128i a);
MOVDQA
```

Stores 128-bit value. Address `p` must be 16-byte aligned.

```*p := a
void _mm_storeu_si128 (__m128i *p, __m128i a);
MOVDQU
```

Stores 128-bit value. Address `p` does not need to be 16-byte aligned.

```*p := a
void _mm_maskmoveu_si128(__m128i d, __m128i n, char *p);
```

Conditionally store byte elements of `d` to address `p`. The high bit of each byte in the selector `n` determines whether the corresponding byte in `d` will be stored. Address `p` does not need to be 16-byte aligned.

```if (n0) p := d0
if (n1) p := d1
...
if (n15) p := d15
void _mm_store1_epi64(__m128i *p, __m128i a);
MOVQ
```

Stores the lower 64 bits of the value pointed to by `p`.

```*p[63:0]:=a0
```

## Cache Support

```void _mm_stream_si128(__m128i *p, __m128i a)
```

Stores the data in `a` to the address `p` without polluting the caches. If the cache line containing address `p` is already in the cache, the cache will be updated. Address `p` must be 16-byte aligned.

```*p := a
void _mm_stream_si32(int *p, int a)
```

Stores the data in `a` to the address `p` without polluting the caches. If the cache line containing address `p` is already in the cache, the cache will be updated.

```*p := a
void _mm_clflush(void const*p)
```

Cache line containing `p` is flushed and invalidated from all caches in the coherency domain.

```void _mm_lfence(void)
```

Guarantees that every load instruction that precedes, in program order, the load fence instruction is globally visible before any load instruction that follows the fence in program order.

```void _mm_mfence(void)
```

Guarantees that every memory access that precedes, in program order, the memory fence instruction is globally visible before any memory instruction that follows the fence in program order.

```void _mm_pause(void)
```

The execution of the next instruction is delayed an implementation specific amount of time. The instruction does not modify the architectural state.

## Shuffle Function Macro

```_MM_SHUFFLE2(x, y)
/* expands to the value of */
(x<<1) | y
```

You can view the two integers as selectors for choosing which two words from the first input operand and which two words from the second are to be put into the result word.

View of Original and Result Words with Shuffle Function Macro Last Updated on Monday, 27 May 2013 15:35

### Latest Articles In this example, we will import video from a Yi security camera into YouTube. The same process, with eventual adjustment to the naming of directories in the SD card used by the camera to record videos, will also apply to other
A software to stand out 27 January 2018, 14.35 Web Standing out of the pack starts by being visible, and being noticed by the right group of professionals. No matter how good your profile is, it is lost in a sea of similar profiles, so you need to show up and start attracting
Web page scraping, the easy way 07 January 2018, 00.46 Web There are many ways to extract data elements from web pages, almost all of them prettier and cooler than the method proposed here, but as we are in an hurry, let's get that data quickly, ok? Suppose we have to extract the
Scraping dynamic page content 06 January 2018, 23.57 Web One of the most common roadblocks when scraping the content of web sites is getting the full contents of the page, including JS-generated data elements (probably, the ones you are looking for). So, when using CEFSharp to scrape
Unit-testing file I/O 26 November 2017, 12.09 Testing Two good news: file I/O is unit-testable, and it is surprisingly easy to do. Let's see how it works! A software no-one asked for First, we need a piece of software that deals with files and that has to be unit-tested. The