user manual

Execution Unit Resources 153

22007E/0—November 1999 AMD Athlon™ Processor x86 Code Optimization

Table 7. Sample 1 – Integer Register Operations

Instruction

Number

Decode

Pipe

Decode

Type

Clocks

Instruction 12345678

1IMUL EAX, ECX0VPDIMMMM

2 INC ESI 0 DP D I E

3 MOV EDI, 0x07F4 1 DP D I E

4 ADD EDI, EBX 2 DP D I E

5SHL EAX, 8 0DP D IE

6 OR EAX, 0x0F 1 DP D I E

7INC EBX 2DP D IE

8 ADD ESI, EDX 0 DP D I E

Comments for Each Instruction Number

1. The IMUL is a VectorPath instruction. It cannot be decode or paired with other operations and, therefore,

dispatches alone in pipe 0. The multiply latency is four cycles.

2. The simple INC operation is paired with instructions 3 and 4. The INC executes in IEU0 in cycle 4.

3. The MOV executes in IEU1 in cycle 4.

4. The ADD operation depends on instruction 3. It executes in IEU2 in cycle 5.

5. The SHL operation depends on the multiply result (instruction 1). The MacroOP waits in a reservation

station and is eventually scheduled to execute in cycle 7 after the multiply result is available.

6. This operation executes in cycle 8 in IEU1.

7. This simple operation has a resource contention for execution in IEU2 in cycle 5. Therefore, the operation

does not execute until cycle 6.

8. The ADD operation executes immediately in IEU0 after dispatching.