User's Manual

Intel
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
September 2006 DM
Order Number: 252480-006US 173
Intel XScale
®
Processor—Intel
®
IXP42X product line and IXC1100 control plane processors
3.10.2.5 Multiply/Multiply Accumulate (MAC) Pipeline
The Multiply-Accumulate (MAC) unit executes the multiply and multiply-accumulate
instructions supported by the IXP42X product line and IXC1100 control plane
processors. The MAC implements the 40-bit IXP42X product line and IXC1100 control
plane processors accumulator register acc0 and handles the instructions, which
transfer its value to and from general-purpose ARM registers.
The following are important characteristics about the MAC:
The MAC is not truly pipelined, as the processing of a single instruction may require
use of the same data path resources for several cycles before a new instruction can
be accepted. The type of instruction and source arguments determines the number
of cycles required.
No more than two instructions can occupy the MAC pipeline concurrently.
When the MAC is processing an instruction, another instruction may not enter M1
unless the original instruction completes in the next cycle.
The MAC unit can operate on 16-bit packed signed data. This reduces register
pressure and memory traffic size. Two 16-bit data items can be loaded into a
register with one LDR.
The MAC can achieve throughput of one multiply per cycle when performing a 16-
by-32-bit multiply.
3.10.2.5.1 Behavioral Description
The execution of the MAC unit starts at the beginning of the M1 pipe stage, where it
receives two 32-bit source operands. Results are completed N cycles later (where N is
dependent on the operand size) and returned to the register file. For more information
on MAC instruction latencies, refer to “Instruction Latencies” on page 160.
An instruction that occupies the M1 or M2 pipe stages will also occupy the X1 and X2
pipe stage, respectively. Each cycle, a MAC operation progresses for M1 to M5. A MAC
operation may complete anywhere from M2-M5. If a MAC operation enters M3-M5, it is
considered committed because it will modify architectural state regardless of
subsequent events.
3.10.3 Basic Optimizations
This section outlines optimizations specific to ARM architecture. These optimizations
have been modified to suit the IXP42X product line and IXC1100 control plane
processors where needed.
3.10.3.1 Conditional Instructions
The IXP42X product line and IXC1100 control plane processors’ architecture provides
the ability to execute instructions conditionally. This feature combined with the ability
of the IXP42X product line and IXC1100 control plane processors instructions to modify
the condition codes makes possible a wide array of optimizations.
3.10.3.1.1 Optimizing Condition Checks
The IXP42X product line and IXC1100 control plane processors’ instructions can
selectively modify the state of the condition codes. When generating code for if-else
and loop conditions it is often beneficial to make use of this feature to set condition
codes, thereby eliminating the need for a subsequent compare instruction.
Consider the C code segment: