Stefano Tommesani

  • Increase font size
  • Default font size
  • Decrease font size
Home Programming SSE Cacheability Control

SSE Cacheability Control

Data referenced by a program can have temporal (data will be used again) or spatial (data will be in adjacent locations, such as the same cache line) locality, but some multimedia data types are referenced once and not reused in the immediate future (called non-temporal data). Thus, non-temporal data should not overwrite the applicationÂ’s cached code and data: the cacheability control instructions enable the programmer to control caching so that non-temporal accesses will minimize cache pollution.
In addition, the execution engine needs to be fed such that it does not become stalled waiting for data. SSE allows the programmer to prefetch data long before its final use to minimize memory latency.  Prior to SSE, read miss latency and execution and subsequent store miss latency comprised total execution in a serial fashion. SSE lets read miss latency overlap execution via the use of prefetching, and it allowes store miss latency to be reduced and overlap execution via streaming stores. 

Cacheability Control

The following three instructions provide programmatic control for minimizing cache pollution when writing data to memory from either MMX or SSE registers.
MASKMOVQ stores data from an MMX register to the location specified by the EDI register. The most significant bit in each byte of the second MMX mask register is used to selectively write the data of the first register on a per-byte basis. This instruction does not write-allocate (i.e., the processor will not fetch the corresponding cache line into the cache hierarchy, prior to performing the store), and so minimizes cache pollution.
MOVNTQ stores data from an MMX register to memory; this instruction is implicitly weakly-ordered, does not write-allocate, and minimizes cache pollution.
MOVNTPS stores data from a SIMD floating-point register to memory. The memory address must be aligned to a 16-byte boundary; if it is not aligned, a general protection exception will occur. The instruction is implicitly weakly ordered, does not write-allocate, and minimizes cache pollution.
PREFETCH loads either non-temporal data or temporal data in the specified cache level. As this instruction merely provides a hint to the hardware, it will not generate exceptions or faults.
SFENCE guarantees that every store instruction that precedes the store fence instruction in program order is globally visible before any store instruction that follows the fence. The SFENCE instruction provides an efficient way of ensuring ordering between routines that produce weakly-ordered results and routines that consume this data. The use of weakly-ordered memory types can be important under certain data sharing relation-ships, such as a producer-consumer relationship. The use of weakly-ordered memory can make the assembling of data more efficient, but care must be taken to ensure that the consumer obtains the data that the producer intended it to see.

Quote this article on your site

To create link towards this article on your website,
copy and paste the text below in your page.

Preview :

SSE Cacheability Control
Tuesday, 25 April 2000

Powered by QuoteThis © 2008
View Stefano Tommesani's profile on LinkedIn

Latest Articles

Unit-testing file I/O 26 November 2017, 12.09 Testing
Unit-testing file I/O
Two good news: file I/O is unit-testable, and it is surprisingly easy to do. Let's see how it works! A software no-one asked for First, we need a piece of software that deals with files and that has to be unit-tested. The
Fixing Git pull errors in SourceTree 10 April 2017, 01.44 Software
Fixing Git pull errors in SourceTree
If you encounter the following error when pulling a repository in SourceTree: VirtualAlloc pointer is null, Win32 error 487 it is due to to the Cygwin system failing to allocate a 5 MB large chunk of memory for its heap at
Castle on the hill of crappy audio quality 19 March 2017, 01.53 Audio
Castle on the hill of crappy audio quality
As the yearly dynamic range day is close (March 31st), let's have a look at one of the biggest audio massacres of the year, Ed Sheeran's "Castle on the hill". First time I heard the song, I thought my headphones just got
Necessary evil: testing private methods 29 January 2017, 21.41 Testing
Necessary evil: testing private methods
Some might say that testing private methods should be avoided because it means not testing the contract, that is the interface implemented by the class, but the internal implementation of the class itself. Still, not all
I am right and you are wrong 28 December 2016, 14.23 Web
I am right and you are wrong
Have you ever convinced anyone that disagreed with you about a deeply held belief? Better yet, have you changed your mind lately on an important topic after discussing with someone else that did not share your point of