Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture
2-14 Vol. 1
INTEL
®
64 AND IA-32 ARCHITECTURES
• Instruction queue provides caching of short loops to improve efficiency.
• Stack pointer tracker improves efficiency of executing procedure/function entries
and exits.
• Branch prediction unit employs dedicated hardware to handle different types of
branches for improved branch prediction.
• Advanced branch prediction algorithm directs instruction fetch unit to fetch
instructions likely in the architectural code path for decoding.
2.2.3.2 Execution Core
The execution core of the Intel Core microarchitecture is superscalar and can process
instructions out of order to increases the overall rate of instructions executed per
cycle (IPC). The execution core employs the following feature to improve execution
throughput and efficiency:
• Up to six micro-ops can be dispatched to execute per cycle
• Up to four instructions can be retired per cycle
• Three full arithmetic logical units
• SIMD instructions can be dispatched through three issue ports
• Most SIMD instructions have 1-cycle throughput (including 128-bit SIMD instruc-
tions)
• Up to eight floating-point operation per cycle
• Many long-latency computation operation are pipelined in hardware to increase
overall throughput
• Reduced exposure to data access delays using Intel Smart Memory Access
2.2.4 SIMD Instructions
Beginning with the Pentium II and Pentium with Intel MMX technology processor
families, five extensions have been introduced into the Intel 64 and IA-32 architec-
tures to perform single-instruction multiple-data (SIMD) operations. These exten-
sions include the MMX technology, SSE extensions, SSE2 extensions, SSE3
extensions, and Supplemental Streaming SIMD Extensions 3. Each of these exten-
sions provides a group of instructions that perform SIMD operations on packed
integer and/or packed floating-point data elements.
SIMD integer operations can use the 64-bit MMX or the 128-bit XMM registers. SIMD
floating-point operations use 128-bit XMM registers. Figure 2-4 shows a summary of
the various SIMD extensions (MMX technology, SSE, SSE2, SSE3, and SSSE3), the
data types they operate on, and how the data types are packed into MMX and XMM
registers.
The Intel MMX technology was introduced in the Pentium II and Pentium with MMX
technology processor families. MMX instructions perform SIMD operations on packed