Skip to content
Stefano Tommesani

  • Home
  • Programming
    • SIMD on x64/x86
    • Multi-thread
    • C# and .NET
    • Testing
  • Software
  • Video
  • Marketing
  • Home
  • Programming
    • SIMD on x64/x86
    • Multi-thread
    • C# and .NET
    • Testing
  • Software
  • Video
  • Marketing
  • sse10
    SIMD on x64/x86

    SSE Logical

    April 25, 2000

    ANDPS returns a bitwise AND between the two operands. ANDNPS returns a bitwise AND NOT between the two operands. ORPS returns a bitwise OR between the two operands. XORPS returns a bitwise XOR between the two operands.

    Continue Reading
  • sse08
    SIMD on x64/x86

    SSE Data Movement

    April 25, 2000

    MOVAPS transfers 128 bits of packed data from memory to SIMD floating-point registers and vice versa, or between SIMD floating-point registers, while MOVUPS makes no assumption for alignment. MOVHPS transfers 64 bits of packed data from memory to the upper two fields of a SIMD floating-point register and vice versa,…

    Continue Reading
  • SIMD on x64/x86

    SSE Conversion Instructions: Converting Between Floats, Integers, MMX, and XMM Registers

    April 25, 2000

    SSE introduced 128-bit XMM registers and a new set of SIMD instructions for single-precision floating-point arithmetic. Alongside arithmetic, comparison, shuffle, and logical operations, SSE also added several important conversion instructions. These conversion instructions move data between two worlds: The original SSE conversion instructions are: They are easy to overlook, but…

    Continue Reading
  • sse09
    SIMD on x64/x86

    SSE Comparison Instructions: Floating-Point Masks and Conditional SIMD Logic

    April 25, 2000

    The basic single precision FP comparison instruction is similar to existing MMX instruction variants: it produces a redundant mask per float of all 1?s or all 0?s, depending upon the result of the comparison. This approach allows the mask to be used with subsequent logic operations (AND, ANDN, OR, XOR)…

    Continue Reading
  • Cacheability Control
    SIMD on x64/x86

    SSE Cacheability Control: Prefetching, Streaming Stores, and Non-Temporal memory access

    April 25, 2000

    Data referenced by a program can have temporal (data will be used again) or spatial (data will be in adjacent locations, such as the same cache line) locality, but some multimedia data types are referenced once and not reused in the immediate future (called non-temporal data). Thus, non-temporal data should…

    Continue Reading
  • SIMD on x64/x86

    SSE2 64-bit Floating-Point Instructions: Scalar and Packed Double Precision

    April 25, 2000

    In the introduction we have outlined the applications that require 64-bit precision, scientific simulations and CAD/CAM being notable examples. However, the transition from normal scalar code to 64-bit floating-point SSE2 code is complex and it may require some major design changes. A more conservative approach would be moving to scalar…

    Continue Reading
  • SIMD on x64/x86

    SSE2 Overview: the SIMD extension that made XMM registers essential

    April 25, 2000

    SSE2, short for Streaming SIMD Extensions 2, was one of the most important instruction-set extensions in the history of x86 processors. It was introduced by Intel with the Pentium 4 processor, code-named Willamette, as the successor to the original SSE instruction set. At the time, SSE2 was presented as an…

    Continue Reading
  • SIMD on x64/x86

    MMX Performance on Intel Pentium 4

    April 25, 2000

    The recent arrival of the Intel Pentium 4 processor has generated the usual flurry of benchmarks and comments, most of them emphasizing that current software does not fully exploit the power of this new architecture (click here for an overview of the SSE2 instruction set). However, until the Pentium 4…

    Continue Reading
  • SIMD on x64/x86

    SIMD Instruction Latency Map

    April 25, 2000

    Instruction latency is one of the most important details to understand when optimizing SIMD code. A SIMD instruction may look simple at the source-code level, but the number of cycles required before its result can be used depends heavily on the exact instruction, operand type, vector width, instruction encoding, and…

    Continue Reading
  • SIMD on x64/x86

    Map of SIMD Instruction Sets and CPUs

    April 25, 2000

    The original version of this article was written in 2000, when the practical SIMD landscape on x86 processors was still small enough to fit in a compact table. At that time, the important questions were simple: That map was useful because the market was transitioning from scalar x86 code to…

    Continue Reading
 Older Posts
Newer Posts 

Recent Posts

  • AltaLux 2.0: a new multiscale engine and a simpler way to enhance images
  • Terminal thinking
  • Day-by-day: forecasting project completion through work flow simulation
  • Skills, or do we have the right developers?
  • AltaLux 1.9.1.92: major update for performance, correctness, and documentation

Downloads

Icon
AltaLux 1.9.1 (x64) plugin for IrfanView
192.98 KB 1 file(s)
Icon
AltaLux 1.9.1 (x86) plugin for IrfanView
158.34 KB 1 file(s)
Icon
AltaLux 2.0.0 (x64) plugin for IrfanView
219.23 KB 1 file(s)

Vintage CPUs of the day

  • IBM Blue Lightning DX2 IBM Blue Lightning DX2
  • Intel Pentium 150 Intel Pentium 150
  • AMD Am386 DX 40 MHz AMD Am386 DX 40 MHz
  • Intel Pentium MMX 200 MHz Intel Pentium MMX 200 MHz

Categories

  • Audio
  • C# and .NET
  • GPGPU
  • Marketing
  • Multi-thread
  • OOD / OOP
  • Programming
  • SIMD on x64/x86
  • Software
  • Testing
  • Uncategorized
  • Various
  • Video
  • Web
  • Web
© 2026 Stefano Tommesani. All rights reserved.
Graceful Theme by Optima Themes