Stefano Tommesani

  • Increase font size
  • Default font size
  • Decrease font size
Home Programming SSE2 preview

SSE2 preview

The forthcoming Intel Pentium 4 processor (code-named Willamette)  will feature a new set of SIMD instructions that improve the capabilities of both the MMX and SSE instruction sets. The key benefits of SSE2 are that MMX instructions can work on 128-bit data blocks, and that SSE instructions now support 64-bit floating-point values. 
Extending the width of MMX parallel computations puts Intel’s integer SIMD processing capabilities on a par with Motorola’s AltiVec, used in the Macintosh G4 series: in the next section we will analyze the performance benefits of doubling the data block size and the effort required to turn old MMX code into shiny new SSE2 code.
The original SSE instruction set worked on 32-bit floating-point data elements, processing 4 of them in parallel (4x32 = 128 bit). This approach is finely tailored to 3D games engines, which perform lots of matrix by vector multiplies: the SSE multiplier can multiply a 4-elements vector by a row of a 4x4 matrix with a single instruction, yielding an effective 4x speed-up. The benefits of SSE accelerated geometry setup are likely to fade in the near future, thanks to the new generation of graphics boards that feature hardware-assisted triangle setup and lightning, but there is a long list of multimedia and scientific applications that could be greatly enhanced by parallel floating-point computations. Current RISC processors, such as the Digital Alpha, still offer better FP performance than x86 CPUs, even Athlons at 1 Ghz, and therefore they are the ideal platform to run scientific simulations. As this kind of software often performs computations on large data sets in a regular order, we can reasonably state that SSE instructions could be successfully applied and close the performance gap between x86 and RISC processors.
Unfortunately, some of them require the extra 64-bit precision that current SSE instructions do not support. The lack of 64-bit support should not be blamed on Intel designers: the main target for SSE is mainstream multimedia software, especially 3D games, where the precision difference between 32-bit and 64-bit FP computations would be hardly noticeable. However, Intel has always showed great interest in the scientific field: as an example, consider the Pentium processor, whose FP unit was much more powerful that the integer unit making it a strong contender for several applications, such as CAD.
SSE2 is designed to fix this problem: it supports both 32-bit and 64-bit floating point values, but  keeping the data block size fixed to 128-bits means that SSE2 instructions can only process two 64-bit data values in parallel. Even if the potential speed-up halves from four down to two, it is still compelling, as it enables a level of performance that normal FP code cannot match until 3+ Ghz processors come around. What’s more, peeking at the Pentium 4 microarchitecture reveals that the performance gain achieved by using SSE2 could actually be much greater than 2x, as the scalar FP unit suffers latencies that are much longer than on the P6 core, while the SSE2 unit is streamlined to offer blazing speed. The conclusion is that developers may be forced to use SSE2 instructions to effectively harness the FP power of the Pentium 4, and that the speed of current FP-intensive applications should be disappointing, considered the 2.0+ Ghz core frequency.

Quote this article on your site

To create link towards this article on your website,
copy and paste the text below in your page.

Preview :

SSE2 preview
Tuesday, 25 April 2000

Powered by QuoteThis © 2008
View Stefano Tommesani's profile on LinkedIn

Latest Articles

Fixing Git pull errors in SourceTree 10 April 2017, 01.44 Software
Fixing Git pull errors in SourceTree
If you encounter the following error when pulling a repository in SourceTree: VirtualAlloc pointer is null, Win32 error 487 it is due to to the Cygwin system failing to allocate a 5 MB large chunk of memory for its heap at
Castle on the hill of crappy audio quality 19 March 2017, 01.53 Audio
Castle on the hill of crappy audio quality
As the yearly dynamic range day is close (March 31st), let's have a look at one of the biggest audio massacres of the year, Ed Sheeran's "Castle on the hill". First time I heard the song, I thought my headphones just got
Necessary evil: testing private methods 29 January 2017, 21.41 Testing
Necessary evil: testing private methods
Some might say that testing private methods should be avoided because it means not testing the contract, that is the interface implemented by the class, but the internal implementation of the class itself. Still, not all
I am right and you are wrong 28 December 2016, 14.23 Web
I am right and you are wrong
Have you ever convinced anyone that disagreed with you about a deeply held belief? Better yet, have you changed your mind lately on an important topic after discussing with someone else that did not share your point of
How Commercial Insight changes R&D 06 November 2016, 01.21 Web
How Commercial Insight changes R&D
The CEB's Commercial Insight is based on three pillars: Be credible/relevant – Demonstrate an understanding of the customer’s world, substantiating claims with real-world evidence. Be frame-breaking – Disrupt the