The Intel P6 core, introduced with the Pentium Pro processor and used in all current Intel processors, features a RISC-like microarchitecture and an out-of-order execution unit, representing a radical shift from previous designs.
The P6’s new dynamic execution micro-architecture removes the constraint of linear instruction sequencing between the traditional fetch and execute phases. An instruction buffer opens a wide window on the instructions that are not executed yet, allowing the execute phase of the processor to have much more visibility into the instruction stream so that a better scheduling policy may be adopted. Optimal scheduling requires the execute phase to be replaced by decoupled dispatch/execute and retire phases, so that instructions can start in any order that satisfies dependency bounds, but must be completed and therefore retired in the original order. This approach greatly increases performance as it more fully utilizes the resources of the processor core.
The P6 core executes x86 instructions by breaking them into simpler micro-instructions called micro-ops. This task is performed by three parallel decoders in the D1 stage of the pipeline: the first decoder is capable of decoding one x86 instruction of four or fewer