user manual

Execution Unit Resources 153
22007E/0November 1999 AMD Athlon Processor x86 Code Optimization
Table 7. Sample 1 Integer Register Operations
Instruction
Number
Decode
Pipe
Decode
Type
Clocks
Instruction 12345678
1IMUL EAX, ECX0VPDIMMMM
2 INC ESI 0 DP D I E
3 MOV EDI, 0x07F4 1 DP D I E
4 ADD EDI, EBX 2 DP D I E
5SHL EAX, 8 0DP D IE
6 OR EAX, 0x0F 1 DP D I E
7INC EBX 2DP D IE
8 ADD ESI, EDX 0 DP D I E
Comments for Each Instruction Number
1. The IMUL is a VectorPath instruction. It cannot be decode or paired with other operations and, therefore,
dispatches alone in pipe 0. The multiply latency is four cycles.
2. The simple INC operation is paired with instructions 3 and 4. The INC executes in IEU0 in cycle 4.
3. The MOV executes in IEU1 in cycle 4.
4. The ADD operation depends on instruction 3. It executes in IEU2 in cycle 5.
5. The SHL operation depends on the multiply result (instruction 1). The MacroOP waits in a reservation
station and is eventually scheduled to execute in cycle 7 after the multiply result is available.
6. This operation executes in cycle 8 in IEU1.
7. This simple operation has a resource contention for execution in IEU2 in cycle 5. Therefore, the operation
does not execute until cycle 6.
8. The ADD operation executes immediately in IEU0 after dispatching.