User's Manual

Intel
®
IXP42X product line and IXC1100 control plane processors—Intel XScale
®
Processor
Intel
®
IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
DM September 2006
186 Order Number: 252480-006US
3.10.4.4.4 Bandwidth Limitations
Overuse of prefetches can usurp resources and degrade performance. This happens
because once the bus traffic requests exceed the system resource capacity, the
processor stalls. The IXP42X product line and IXC1100 control plane processors data
transfer resources are:
Four fill buffers
Four pending buffers
Eight half-cache line write buffer
SDRAM resources are typically:
Four memory banks
One page buffer per bank referencing a 4K address range
Four transfer request buffers
Consider how these resources work together. A fill buffer is allocated for each cache
read miss. A fill buffer is also allocated each cache write miss if the memory space is
write allocate along with a pending buffer. A subsequent read to the same cache line
does not require a new fill buffer, but does require a pending buffer and a subsequent
write will also require a new pending buffer. A fill buffer is also allocated for each read
to a non-cached memory and a write buffer is needed for each memory write to non-
cached memory that is non-coalescing. Consequently, a STM instruction listing eight
registers and referencing non-cached memory will use eight write buffers assuming
they don’t coalesce and two write buffers if they do coalesce. A cache eviction requires
a write buffer for each dirty bit set in the cache line. The prefetch instruction requires a
fill buffer for each cache line and 0, 1, or 2 write buffers for an eviction.
When adding prefetch instructions, caution must be asserted to insure that the
combination of prefetch and instruction bus requests do not exceed the system
resource capacity described above or performance will be degraded instead of
improved. The important points are to spread prefetch operations over calculations so
as to allow bus traffic to free flow and to minimize the number of necessary prefetches.
3.10.4.4.5 Cache Memory Considerations
Stride, the way data structures are walked through, can affect the temporal quality of
the data and reduce or increase cache conflicts. The IXP42X product line and IXC1100
control plane processors data cache and mini-data caches each have 32 sets of 32
bytes. This means that each cache line in a set is on a modular 1-K-address boundary.
The caution is to choose data structure sizes and stride requirements that do not
overwhelm a given set causing conflicts and increased register pressure. Register
pressure can be increased because additional registers are required to track prefetch
addresses. The effects can be affected by rearranging data structure components to
use more parallel access to search and compare elements. Similarly rearranging
sections of data structures so that sections often written fit in the same half cache line,
16 bytes for the IXP42X product line and IXC1100 control plane processors, can reduce
cache eviction write-backs. On a global scale, techniques such as array merging can
enhance the spatial locality of the data.
As an example of array merging, consider the following code: