SSE introduced 128-bit XMM registers and a new set of SIMD instructions for single-precision floating-point arithmetic. Alongside arithmetic, comparison, shuffle, and logical operations, SSE also added several important conversion instructions.
These conversion instructions move data between two worlds:
- packed or scalar single-precision floating-point values in XMM registers;
- signed integer values in MMX registers, general-purpose integer registers, or memory.
The original SSE conversion instructions are:
CVTPI2PS
CVTSI2SS
CVTPS2PI
CVTTPS2PI
CVTSS2SI
CVTTSS2SI
They are easy to overlook, but they are essential when SIMD code must interact with scalar integer code, legacy MMX code, or older multimedia processing pipelines.
This article explains what each instruction does, how the corresponding C/C++ intrinsics work, and which details are most important when writing correct SSE code.
Why Conversion Instructions Matter
SIMD code often needs to move between integer and floating-point representations.
For example:
- image data may be stored as bytes or integers but processed as floating-point values;
- audio samples may be converted to floats for filtering;
- geometry data may begin as integer screen coordinates and become floating-point vectors;
- floating-point results may need to be rounded or truncated back to integer pixels;
- scalar code may need to extract one value from a SIMD computation.
The conversion instructions exist to make these transitions efficient.
Without them, you would need to store SIMD data to memory, convert values one by one using scalar instructions, then load the results back into SIMD registers. SSE conversion instructions avoid that unnecessary round trip in many common cases.
The Original SSE Conversion Instructions
The original SSE conversion instructions operate on single-precision floating-point values and signed integer values.
| Instruction | Direction | Description |
|---|---|---|
CVTPI2PS | integer to float | Converts two signed 32-bit integers from MMX or memory to two single-precision floats in an XMM register |
CVTSI2SS | integer to float | Converts one signed 32-bit integer from a general-purpose register or memory to one scalar single-precision float |
CVTPS2PI | float to integer | Converts two single-precision floats to two signed 32-bit integers in an MMX register, using the current rounding mode |
CVTTPS2PI | float to integer | Converts two single-precision floats to two signed 32-bit integers in an MMX register, using truncation |
CVTSS2SI | float to integer | Converts one scalar single-precision float to one signed 32-bit integer, using the current rounding mode |
CVTTSS2SI | float to integer | Converts one scalar single-precision float to one signed 32-bit integer, using truncation |
The instruction names follow a useful pattern:
CVT = convert
TT = truncate
PI = packed integers in MMX registers
PS = packed single-precision floats
SS = scalar single-precision float
SI = scalar integer
The most important distinction is between CVT and CVTT.
CVT...instructions use the current MXCSR rounding mode.CVTT...instructions truncate toward zero.
That distinction is critical for correctness.
Packed Integer to Packed Float: CVTPI2PS
CVTPI2PS converts two signed 32-bit integers into two single-precision floating-point values.
The source is either:
- an MMX register containing two signed 32-bit integers;
- or a 64-bit memory operand containing two signed 32-bit integers.
The destination is an XMM register.
Only the lower two floating-point elements of the destination XMM register are written. The upper two floating-point elements are preserved.
Conceptually:
dst[0] = (float)src_int32[0]
dst[1] = (float)src_int32[1]
dst[2] = old dst[2]
dst[3] = old dst[3]
The corresponding intrinsic is:
__m128 _mm_cvtpi32_ps(__m128 a, __m64 b);
The first argument provides the original XMM value whose upper two floats are preserved. The second argument provides the two 32-bit integer values to convert.
Example:
#include <xmmintrin.h>
#include <mmintrin.h>
void convert_two_ints_to_floats(float* output)
{
__m64 integers = _mm_set_pi32(20, 10);
// Upper two lanes come from this value.
__m128 base = _mm_set_ps(4.0f, 3.0f, 0.0f, 0.0f);
// Converts the two 32-bit integers into the lower two float lanes.
__m128 result = _mm_cvtpi32_ps(base, integers);
_mm_storeu_ps(output, result);
// Required after using MMX registers.
_mm_empty();
}
After the conversion, the lower two floats contain the converted integer values. The upper two floats remain whatever they were in base.
This “upper lanes are preserved” behavior is important. If you expected all four lanes to be overwritten, the result may surprise you.
Scalar Integer to Scalar Float: CVTSI2SS
CVTSI2SS converts one signed integer into one scalar single-precision floating-point value.
The source integer is a general-purpose integer register or a memory operand. In the intrinsic form, it is simply passed as an int.
The destination is an XMM register, but only the lowest floating-point lane is overwritten.
Conceptually:
dst[0] = (float)integer
dst[1] = old dst[1]
dst[2] = old dst[2]
dst[3] = old dst[3]
The corresponding intrinsic is:
__m128 _mm_cvtsi32_ss(__m128 a, int b);
Example:
#include <xmmintrin.h>
void convert_scalar_int_to_float(float* output)
{
__m128 base = _mm_set_ps(4.0f, 3.0f, 2.0f, 1.0f);
// Converts 42 to a float and places it in the lowest lane.
__m128 result = _mm_cvtsi32_ss(base, 42);
_mm_storeu_ps(output, result);
}
The result is:
result[0] = 42.0f
result[1] = 2.0f
result[2] = 3.0f
result[3] = 4.0f
Notice again that only the lowest lane is modified.
This instruction is useful when scalar integer code needs to feed one value into an SSE floating-point calculation.
Packed Float to Packed Integer: CVTPS2PI
CVTPS2PI converts two single-precision floating-point values into two signed 32-bit integers.
The source is the lower two floats of an XMM register, or a 64-bit memory operand containing two floats.
The destination is an MMX register.
The conversion uses the current rounding mode in the MXCSR register.
Conceptually:
dst_int32[0] = convert_using_mxcsr_rounding(src_float[0])
dst_int32[1] = convert_using_mxcsr_rounding(src_float[1])
The corresponding intrinsic is:
__m64 _mm_cvtps_pi32(__m128 a);
Example:
#include <xmmintrin.h>
#include <mmintrin.h>
void convert_two_floats_to_ints(void)
{
__m128 values = _mm_set_ps(0.0f, 0.0f, 2.9f, 1.2f);
// Converts the lower two floats using the current MXCSR rounding mode.
__m64 integers = _mm_cvtps_pi32(values);
_mm_empty();
}
With the default round-to-nearest mode, 1.2f becomes 1, and 2.9f becomes 3.
However, if the MXCSR rounding mode has been changed, the result may be different.
That is why CVTPS2PI is not the same as a C cast from float to int.
Packed Float to Packed Integer With Truncation: CVTTPS2PI
CVTTPS2PI is the truncating version of CVTPS2PI.
It converts the lower two single-precision floating-point values into two signed 32-bit integers, but it always truncates toward zero.
The corresponding intrinsic is:
__m64 _mm_cvttps_pi32(__m128 a);
Example:
#include <xmmintrin.h>
#include <mmintrin.h>
void truncate_two_floats_to_ints(void)
{
__m128 values = _mm_set_ps(0.0f, 0.0f, -2.9f, 1.9f);
// Truncates toward zero.
__m64 integers = _mm_cvttps_pi32(values);
_mm_empty();
}
The conceptual result is:
1.9f -> 1
-2.9f -> -2
This behavior matches the usual C conversion from floating-point to integer more closely than the rounded conversion.
Use the truncating form when you explicitly want fractional parts discarded.
Scalar Float to Scalar Integer: CVTSS2SI
CVTSS2SI converts the lowest single-precision float in an XMM register into a signed integer in a general-purpose register.
The conversion uses the current MXCSR rounding mode.
The corresponding intrinsic is:
int _mm_cvtss_si32(__m128 a);
Example:
#include <xmmintrin.h>
int convert_scalar_float_to_int(void)
{
__m128 value = _mm_set_ss(3.7f);
// Uses the current MXCSR rounding mode.
int result = _mm_cvtss_si32(value);
return result;
}
With the default round-to-nearest mode, 3.7f becomes 4.
But if the MXCSR rounding mode is changed, the result may be different.
Scalar Float to Scalar Integer With Truncation: CVTTSS2SI
CVTTSS2SI is the truncating version of CVTSS2SI.
It converts the lowest single-precision float into a signed integer by truncating toward zero.
The corresponding intrinsic is:
int _mm_cvttss_si32(__m128 a);
Example:
#include <xmmintrin.h>
int truncate_scalar_float_to_int(void)
{
__m128 value = _mm_set_ss(3.7f);
// Truncates toward zero.
int result = _mm_cvttss_si32(value);
return result;
}
The result is:
3
For negative values:
-3.7f -> -3
This is usually the version you want when implementing behavior similar to a C cast:
int x = (int)some_float;
Rounding vs Truncation
The distinction between rounded and truncating conversion is one of the most important parts of SSE conversion programming.
| Instruction | Intrinsic | Rounding behavior |
|---|---|---|
CVTPS2PI | _mm_cvtps_pi32 | Uses MXCSR rounding mode |
CVTTPS2PI | _mm_cvttps_pi32 | Truncates toward zero |
CVTSS2SI | _mm_cvtss_si32 | Uses MXCSR rounding mode |
CVTTSS2SI | _mm_cvttss_si32 | Truncates toward zero |
The default MXCSR rounding mode is normally round-to-nearest.
That means this:
_mm_cvtss_si32(_mm_set_ss(2.9f))
normally returns:
3
But this:
_mm_cvttss_si32(_mm_set_ss(2.9f))
returns:
2
For negative numbers:
rounding: -2.9f -> -3
truncation: -2.9f -> -2
Choose deliberately.
Do not use the rounded version when you expect truncation.
MXCSR and the Current Rounding Mode
SSE floating-point behavior is controlled by the MXCSR register.
Among other things, MXCSR contains the rounding-control bits used by the non-truncating conversion instructions.
The relevant conversion instructions are:
CVTPS2PI
CVTSS2SI
Their results depend on the current MXCSR rounding mode.
The truncating instructions ignore the MXCSR rounding mode for the purpose of rounding direction:
CVTTPS2PI
CVTTSS2SI
They always truncate toward zero.
This distinction matters in libraries. A function should not accidentally depend on a caller’s rounding mode unless that is part of the intended behavior.
MMX Destination Means MMX Cleanup
Several original SSE conversion instructions use MMX registers:
CVTPI2PS
CVTPS2PI
CVTTPS2PI
The corresponding intrinsics use the __m64 type:
__m128 _mm_cvtpi32_ps(__m128 a, __m64 b);
__m64 _mm_cvtps_pi32(__m128 a);
__m64 _mm_cvttps_pi32(__m128 a);
Because these instructions touch MMX state, you should call _mm_empty() before returning to code that may use x87 floating-point instructions.
Example:
#include <xmmintrin.h>
#include <mmintrin.h>
void mmx_conversion_example(void)
{
__m128 values = _mm_set_ps(0.0f, 0.0f, 20.0f, 10.0f);
__m64 ints = _mm_cvtps_pi32(values);
// Clean up MMX/x87 state after using __m64.
_mm_empty();
}
This is not required for pure XMM-only code such as _mm_cvtsi32_ss, _mm_cvtss_si32, or _mm_cvttss_si32.
A useful practical rule is:
If your SSE code uses __m64, remember _mm_empty().
Which Lanes Are Converted?
Another common source of confusion is that the original SSE conversion instructions do not always process all four lanes of an XMM register.
| Instruction | Source elements used | Destination elements written |
|---|---|---|
CVTPI2PS | two 32-bit integers | lower two float lanes |
CVTSI2SS | one scalar integer | lowest float lane |
CVTPS2PI | lower two float lanes | two 32-bit integers in MMX |
CVTTPS2PI | lower two float lanes | two 32-bit integers in MMX |
CVTSS2SI | lowest float lane | one scalar integer |
CVTTSS2SI | lowest float lane | one scalar integer |
This is why the first argument of _mm_cvtsi32_ss and _mm_cvtpi32_ps matters: part of it may be preserved.
Example:
#include <xmmintrin.h>
void lane_preservation_example(float* output)
{
__m128 base = _mm_set_ps(40.0f, 30.0f, 20.0f, 10.0f);
__m128 result = _mm_cvtsi32_ss(base, 99);
_mm_storeu_ps(output, result);
}
The lowest lane becomes 99.0f, but the upper lanes are copied from base.
Convenience Intrinsics and Composite Conversions
Some SSE conversion intrinsics are convenience functions that may expand to multiple instructions rather than mapping to one native SSE instruction.
Examples include:
__m128 _mm_cvtpi16_ps(__m64 a);
__m128 _mm_cvtpu16_ps(__m64 a);
__m128 _mm_cvtpi8_ps(__m64 a);
__m128 _mm_cvtpu8_ps(__m64 a);
__m128 _mm_cvtpi32x2_ps(__m64 a, __m64 b);
__m64 _mm_cvtps_pi16(__m128 a);
__m64 _mm_cvtps_pi8(__m128 a);
These are useful when converting smaller integer types such as 8-bit or 16-bit values to floats.
For example:
#include <xmmintrin.h>
#include <mmintrin.h>
void convert_four_16bit_ints_to_floats(float* output)
{
__m64 samples = _mm_set_pi16(400, 300, 200, 100);
__m128 result = _mm_cvtpi16_ps(samples);
_mm_storeu_ps(output, result);
_mm_empty();
}
These intrinsics are convenient, but because they use __m64, they still belong to the MMX/SSE transition world. Clean up with _mm_empty() after using them.
Original SSE vs SSE2 Conversions
The original SSE conversion instructions are limited.
They mainly convert between:
- single-precision floats;
- signed 32-bit integers;
- MMX registers;
- scalar general-purpose integer registers.
SSE2 later added a much richer conversion set, including:
- packed 32-bit integer to packed single-precision float;
- packed single-precision float to packed 32-bit integer in XMM registers;
- double-precision floating-point conversions;
- conversions between
__m128,__m128i, and__m128d.
For example, with SSE2 you can convert four packed 32-bit integers to four floats using XMM registers, without going through MMX:
#include <emmintrin.h>
void sse2_integer_to_float(float* output)
{
__m128i integers = _mm_set_epi32(40, 30, 20, 10);
__m128 floats = _mm_cvtepi32_ps(integers);
_mm_storeu_ps(output, floats);
}
No _mm_empty() is needed here because this code uses XMM registers only.
For modern code, SSE2 and later are usually preferable to the original MMX-based SSE conversion path.
Practical Examples
Convert One Integer to One Float
#include <xmmintrin.h>
float int_to_float_sse(int value)
{
__m128 base = _mm_setzero_ps();
__m128 converted = _mm_cvtsi32_ss(base, value);
return _mm_cvtss_f32(converted);
}
This converts one scalar integer into one scalar float.
Convert One Float to Integer With Rounding
#include <xmmintrin.h>
int float_to_int_rounded(float value)
{
__m128 v = _mm_set_ss(value);
return _mm_cvtss_si32(v);
}
This uses the current MXCSR rounding mode.
Convert One Float to Integer With Truncation
#include <xmmintrin.h>
int float_to_int_truncated(float value)
{
__m128 v = _mm_set_ss(value);
return _mm_cvttss_si32(v);
}
This truncates toward zero.
For most C-style casts from float to int, this is the closer match.
Convert Two Floats to Two Integers
#include <xmmintrin.h>
#include <mmintrin.h>
void convert_two_floats_to_two_ints(void)
{
__m128 values = _mm_set_ps(0.0f, 0.0f, 7.9f, 3.2f);
__m64 rounded = _mm_cvtps_pi32(values);
__m64 truncated = _mm_cvttps_pi32(values);
_mm_empty();
}
The rounded version uses MXCSR rounding. The truncating version discards the fractional part.
Common Mistakes
Mistake 1: Confusing Rounded and Truncating Conversions
This is probably the most common mistake.
int a = _mm_cvtss_si32(_mm_set_ss(2.9f)); // usually 3
int b = _mm_cvttss_si32(_mm_set_ss(2.9f)); // 2
If you expected both to return 2, the wrong instruction was used.
Use the TT version when you want truncation.
Mistake 2: Forgetting That Some Instructions Use MMX
These intrinsics use __m64:
_mm_cvtpi32_ps
_mm_cvtps_pi32
_mm_cvttps_pi32
They touch MMX state.
Call _mm_empty() after using them if the program may later use x87 floating-point code.
Mistake 3: Expecting All Four XMM Lanes to Be Overwritten
CVTSI2SS only modifies the lowest float lane.
CVTPI2PS only modifies the lower two float lanes.
The remaining lanes are preserved from the input XMM register.
Always initialize the input register deliberately.
Mistake 4: Using Original SSE Conversions in New Code Without Considering SSE2
For modern x86 code, SSE2 usually gives a cleaner solution because it supports XMM integer registers through __m128i.
Instead of converting through MMX, prefer XMM-only conversion paths when SSE2 is available.
Mistake 5: Assuming Floating-Point to Integer Conversion Always Succeeds
If a floating-point value is too large to fit in the destination integer type, the result is not a normal saturated integer conversion.
SSE conversion instructions have defined exception behavior and special invalid results depending on the instruction and exception masking.
If your input may be outside the integer range, clamp it before conversion.
Best Practices
Use the original SSE conversion instructions when:
- maintaining old SSE/MMX code;
- converting between MMX integer data and XMM floating-point data;
- writing educational code that demonstrates the original SSE programming model;
- working with legacy image, audio, or video processing routines.
Prefer SSE2 or later when:
- writing new x86-64 code;
- converting packed integers to packed floats entirely within XMM registers;
- avoiding MMX state cleanup;
- processing four or more 32-bit values at a time;
- building portable modern SIMD code.
Use truncating conversions when:
- you want C-style float-to-int behavior;
- you want to discard the fractional part;
- the current MXCSR rounding mode should not affect the result.
Use rounded conversions when:
- you explicitly want MXCSR-controlled rounding;
- the program deliberately manages the SSE rounding mode;
- numerical behavior requires round-to-nearest, floor, ceiling, or another MXCSR rounding mode.
Summary
SSE conversion instructions are the bridge between integer and floating-point SIMD code in the original SSE instruction set.
The key instructions are:
CVTPI2PS two packed 32-bit integers to two floats
CVTSI2SS one scalar integer to one scalar float
CVTPS2PI two floats to two packed 32-bit integers, rounded
CVTTPS2PI two floats to two packed 32-bit integers, truncated
CVTSS2SI one scalar float to one scalar integer, rounded
CVTTSS2SI one scalar float to one scalar integer, truncated
The three most important rules are:
- Use
CVTT...instructions when you need truncation toward zero. - Remember that some original SSE conversions use MMX registers and therefore require
_mm_empty(). - Prefer SSE2 or later for new code when you can keep integer and floating-point SIMD values entirely in XMM registers.
The original SSE conversion instructions are not the best target for most new code, but they are still important for understanding legacy SIMD routines and the transition from MMX to the modern SSE programming model.


