Stefano Tommesani

  • Increase font size
  • Default font size
  • Decrease font size
Home SIMD SSE Arithmetic

SSE Arithmetic

ADDPS (parallel) and ADDSS (scalar) add the pair of operands.
SUBPS (parallel) and SUBSS (scalar) subtract the pair of operands.
sse02MULPS (parallel) and MULSS (scalar) multiply the pair of operands.
DIVPS (parallel) and DIVSS (scalar) divides the pair of operands.
SQRTPS (parallel) and SQRTSS (scalar) return the square root of the source operand to the destination register.
MAXPS (parallel) and MAXSS (scalar) return the maximum of the pair of operands: DestReg[i] = Max(DestReg[i], SrcReg[i]) 
MINPS (parallel) and MINSS (scalar) return the minimum of the pair of operands: DestReg[i] = Min(DestReg[i], SrcReg[i])

ADDSS xmm1, xmm2/m32

Adds the low single-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the single-precision floating-point result in the destination operand. The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. The three high-order doublewords of the destination operand remain unchanged.

DEST[31-0] ← DEST[31-0] + SRC[31-0];
ADDSS __m128 _mm_add_ss(__m128 a, __m128 b)

ADDPS xmm1, xmm2/m128

Performs a SIMD add of the four packed single-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the packed single-precision floating-point results in the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register.

DEST[31-0] ← DEST[31-0] + SRC[31-0];
DEST[63-32] ← DEST[63-32] + SRC[63-32];
DEST[95-64] ← DEST[95-64] + SRC[95-64];
DEST[127-96] ← DEST[127-96] + SRC[127-96];
ADDPS __m128 _mm_add_ps(__m128 a, __m128 b)

SUBSS xmm1, xmm2/m32

Subtracts the low single-precision floating-point value in the source operand (second operand) from the low single-precision floating-point value in the destination operand (first operand), and stores the single-precision floating-point result in the destination operand. The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. The three high-order doublewords of the destination operand remain unchanged.

DEST[31-0] ← DEST[31-0] - SRC[31-0];
SUBSS __m128 _mm_sub_ss(__m128 a, __m128 b)

SUBPS xmm1 xmm2/m128

Performs a SIMD subtract of the four packed single-precision floating-point values in the source operand (second operand) from the four packed single-precision floating-point values in the destination operand (first operand), and stores the packed single-precision floating-point results in the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register.

DEST[31-0] ← DEST[31-0] − SRC[31-0];
DEST[63-32] ← DEST[63-32] − SRC[63-32];
DEST[95-64] ← DEST[95-64] − SRC[95-64];
DEST[127-96] ← DEST[127-96] − SRC[127-96];
SUBPS __m128 _mm_sub_ps(__m128 a, __m128 b)

MULSS xmm1, xmm2/m32

Multiplies the low single-precision floating-point value from the source operand (second operand) by the low single-precision floating-point value in the destination operand (first operand), and stores the single-precision floating-point result in the destination operand. The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. The three high-order doublewords of the destination operand remain unchanged.

DEST[31-0] ← DEST[31-0] * SRC[31-0];
MULSS __m128 _mm_mul_ss(__m128 a, __m128 b)

MULPS xmm1, xmm2/m128

Performs a SIMD multiply of the four packed single-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the packed single-precision floating-point results in the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register.

DEST[31-0] ← DEST[31-0] * SRC[31-0];
DEST[63-32] ← DEST[63-32] * SRC[63-32];
DEST[95-64] ← DEST[95-64] * SRC[95-64];
DEST[127-96] ← DEST[127-96] * SRC[127-96];
MULPS __m128 _mm_mul_ps(__m128 a, __m128 b)

DIVSS xmm1, xmm2/m32

Divides the low single-precision floating-point value in the destination operand (first operand) by the low single-precision floating-point value in the source operand (second operand), and stores the single-precision floating-point result in the destination operand. The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. The three high-order doublewords of the destination operand remain unchanged.

DEST[31-0] ← DEST[31-0] / SRC[31-0];
DIVSS __m128 _mm_div_ss(__m128 a, __m128 b)

DIVPS xmm1, xmm2/m128

Performs a SIMD divide of the two packed single-precision floating-point values in the destination operand (first operand) by the two packed single-precision floating-point values in the source operand (second operand), and stores the packed single-precision floating-point results in the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register.

DEST[31-0] ← DEST[31-0] / (SRC[31-0]);
DEST[63-32] ← DEST[63-32] / (SRC[63-32]);
DEST[95-64] ← DEST[95-64] / (SRC[95-64]);
DEST[127-96] ← DEST[127-96] / (SRC[127-96]);
DIVPS __m128 _mm_div_ps(__m128 a, __m128 b)

SQRTSS xmm1, xmm2/m32

Computes the square root of the low single-precision floating-point value in the source operand (second operand) and stores the single-precision floating-point result in the destination operand. The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. The three high-order doublewords of the destination operand remains unchanged.

DEST[31-0] ← SQRT (SRC[31-0]);
SQRTSS __m128 _mm_sqrt_ss(__m128 a)

SQRTPS xmm1, xmm2/m128

Performs a SIMD computation of the square roots of the four packed single-precision floating point values in the source operand (second operand) stores the packed single-precision floating point results in the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register.

DEST[31-0] ← SQRT(SRC[31-0]);
DEST[63-32] ← SQRT(SRC[63-32]);
DEST[95-64] ← SQRT(SRC[95-64]);
DEST[127-96] ← SQRT(SRC[127-96]);
SQRTPS __m128 _mm_sqrt_ps(__m128 a)

MAXSS xmm1, xmm2/m32

Compares the low single-precision floating-point values in the destination operand (first operand) and the source operand (second operand), and returns the maximum value to the low doubleword of the destination operand. The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. The three high-order doublewords of the destination operand remain unchanged.

DEST[63-0] ← IF ((DEST[31-0] = 0.0) AND (SRC[31-0] = 0.0)) THEN SRC[31-0]
ELSE IF (DEST[31-0] = SNaN) THEN SRC[31-0];
ELSE IF SRC[31-0] = SNaN) THEN SRC[31-0];
ELSE IF (DEST[31-0] > SRC[31-0])
THEN DEST[31-0]
ELSE SRC[31-0];
__m128d _mm_max_ss(__m128d a, __m128d b)

MAXPS xmm1, xmm2/m128

Performs a SIMD compare of the packed single-precision floating-point values in the destination operand (first operand) and the source operand (second operand), and returns the maximum value for each pair of values to the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register.

DEST[31-0] ← IF ((DEST[31-0] = 0.0) AND (SRC[31-0] = 0.0)) THEN SRC[31-0]
ELSE IF (DEST[31-0] = SNaN) THEN SRC[31-0];
ELSE IF SRC[31-0] = SNaN) THEN SRC[31-0];
ELSE IF (DEST[31-0] > SRC[31-0])
THEN DEST[31-0]
ELSE SRC[31-0];
repeat operation for 2nd and 3rd doublewords
DEST[127-64] ← IF ((DEST[127-96] = 0.0) AND (SRC[127-96] = 0.0))
THEN SRC[127-96]
ELSE IF (DEST[127-96] = SNaN) THEN SRC[127-96];
ELSE IF SRC[127-96] = SNaN) THEN SRC[127-96];
ELSE IF (DEST[127-96] > SRC[127-96])
THEN DEST[127-96]
ELSE SRC[127-96];
__m128d _mm_max_ps(__m128d a, __m128d b)

MINSS xmm1, xmm2/m32

Compares the low single-precision floating-point values in the destination operand (first operand) and the source operand (second operand), and returns the minimum value to the low doubleword of the destination operand. The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. The three high-order doublewords of the destination operand remain unchanged.

DEST[63-0] ← IF ((DEST[31-0] = 0.0) AND (SRC[31-0] = 0.0)) THEN SRC[31-0]
ELSE IF (DEST[31-0] = SNaN) THEN SRC[31-0];
ELSE IF SRC[31-0] = SNaN) THEN SRC[31-0];
ELSE IF (DEST[31-0] < SRC[31-0])
THEN DEST[31-0]
ELSE SRC[31-0];
__m128d _mm_min_ss(__m128d a, __m128d b)

MINPS xmm1, xmm2/m128

Performs a SIMD compare of the packed single-precision floating-point values in the destination operand (first operand) and the source operand (second operand), and returns the minimum value for each pair of values to the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register.

DEST[63-0] ← IF ((DEST[31-0] = 0.0) AND (SRC[31-0] = 0.0)) THEN SRC[31-0]
ELSE IF (DEST[31-0] = SNaN) THEN SRC[31-0];
ELSE IF SRC[31-0] = SNaN) THEN SRC[31-0];
ELSE IF (DEST[31-0] > SRC[31-0])
THEN DEST[31-0]
ELSE SRC[31-0];
repeat operation for 2nd and 3rd doublewords
DEST[127-64] ← IF ((DEST127-96] = 0.0) AND (SRC[127-96] = 0.0))
THEN SRC[127-96]
ELSE IF (DEST[127-96] = SNaN) THEN SRC[127-96];
ELSE IF SRC[127-96] = SNaN) THEN SRC[127-96];
ELSE IF (DEST[127-96] < SRC[127-96])
THEN DEST[127-96]
ELSE SRC[127-96];
__m128d _mm_min_ps(__m128d a, __m128d b)
Quote this article on your site

To create link towards this article on your website,
copy and paste the text below in your page.




Preview :

SSE Arithmetic
Saturday, 24 April 2010

Powered by QuoteThis © 2008
Last Updated on Thursday, 25 April 2013 23:55  
View Stefano Tommesani's profile on LinkedIn

Latest Articles

Fixing Git pull errors in SourceTree 10 April 2017, 01.44 Software
Fixing Git pull errors in SourceTree
If you encounter the following error when pulling a repository in SourceTree: VirtualAlloc pointer is null, Win32 error 487 it is due to to the Cygwin system failing to allocate a 5 MB large chunk of memory for its heap at
Castle on the hill of crappy audio quality 19 March 2017, 01.53 Audio
Castle on the hill of crappy audio quality
As the yearly dynamic range day is close (March 31st), let's have a look at one of the biggest audio massacres of the year, Ed Sheeran's "Castle on the hill". First time I heard the song, I thought my headphones just got
Necessary evil: testing private methods 29 January 2017, 21.41 Testing
Necessary evil: testing private methods
Some might say that testing private methods should be avoided because it means not testing the contract, that is the interface implemented by the class, but the internal implementation of the class itself. Still, not all
I am right and you are wrong 28 December 2016, 14.23 Web
I am right and you are wrong
Have you ever convinced anyone that disagreed with you about a deeply held belief? Better yet, have you changed your mind lately on an important topic after discussing with someone else that did not share your point of
How Commercial Insight changes R&D 06 November 2016, 01.21 Web
How Commercial Insight changes R&D
The CEB's Commercial Insight is based on three pillars: Be credible/relevant – Demonstrate an understanding of the customer’s world, substantiating claims with real-world evidence. Be frame-breaking – Disrupt the

Translate