Stefano Tommesani

  • Increase font size
  • Default font size
  • Decrease font size
Home SIMD SSE Introduction

SSE Introduction


The Streaming SIMD Extensions enhance the Intel x86 architecture in four ways:

  1. 8 new 128-bit SIMD floating-point registers that can be directly addressed;
  2. 50 new instructions that work on packed floating-point data;
  3. 8 new instructions designed to control cacheability of all MMX and 32-bit x86 data types, including the ability to stream data to memory without polluting the caches, and to prefetch data before it is actually used;
  4. 12 new instructions that extend the MMX instruction set.

This set enables the programmer to develop algorithms that can mix packed, single-precision, floating-point and integer using both SSE and MMX instructions respectively. 
This approach was chosen because most media processing applications have the following characteristics:

  • inherently parallel
  • wide dynamic range, hence floating-point based
  • regular memory access patterns
  • data independent control flow.

Intel SSE provides eight 128-bit general-purpose registers, each of which can be directly addressed using the register names XMM0 to XMM7. Each register consists of four 32-bit single precision, floating-point numbers, numbered 0 through 3. MMX registers are mapped onto the floating-point registers, requiring the EMMS instruction to pass from MMX code to x87 floating-point code; since SIMD floating-point registers are a separate register file, MMX or floating-point instructions can be mixed with SSE instructions without execution of a special instruction such as EMMS. On the downside, they require support from the operating system, since they must be saved when switching tasks.
There is a new control/status register MXCSR, that is used to mask/unmask numerical exception handling, to set rounding modes, to set flush-to-zero mode, and to view status flags.
SSE instructions operate on either all or the least significant pairs of packed data operands in parallel. The packed instructions (with PS suffix) operate on a pair of operands, while scalar instructions (with SS suffix) always operate on the least significant pair of the two operands; for scalar operations, the three upper components from the first operand are passed through to the destination.

SSE Packed
SSE Packed
SSE Scalar
SSE Scalar

The SSE set consists of 70 instructions: the following sections give a brief overview of each group of instructions in the SSE set and the instructions within each group.

Last Updated on Friday, 26 April 2013 00:07  

Latest Articles

Easily upload videos of security cameras to YouTube
In this example, we will import video from a Yi security camera into YouTube. The same process, with eventual adjustment to the naming of directories in the SD card used by the camera to record videos, will also apply to other
A software to stand out 27 January 2018, 14.35 Web
A software to stand out
Standing out of the pack starts by being visible, and being noticed by the right group of professionals. No matter how good your profile is, it is lost in a sea of similar profiles, so you need to show up and start attracting
Web page scraping, the easy way 07 January 2018, 00.46 Web
Web page scraping, the easy way
There are many ways to extract data elements from web pages, almost all of them prettier and cooler than the method proposed here, but as we are in an hurry, let's get that data quickly, ok? Suppose we have to extract the
Scraping dynamic page content 06 January 2018, 23.57 Web
Scraping dynamic page content
One of the most common roadblocks when scraping the content of web sites is getting the full contents of the page, including JS-generated data elements (probably, the ones you are looking for). So, when using CEFSharp to scrape
Unit-testing file I/O 26 November 2017, 12.09 Testing
Unit-testing file I/O
Two good news: file I/O is unit-testable, and it is surprisingly easy to do. Let's see how it works! A software no-one asked for First, we need a piece of software that deals with files and that has to be unit-tested. The