User's Manual

ManualsBrandsIntel ManualsPersonal ComputerIntel Intel Personal Computer IXC1100

181

182

183

184

185

186

187

188

189

190

Intel

IXP42X product line and IXC1100 control plane processors—Intel XScale

Processor

Intel

IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor

DM September 2006

186 Order Number: 252480-006US

3.10.4.4.4 Bandwidth Limitations

Overuse of prefetches can usurp resources and degrade performance. This happens

because once the bus traffic requests exceed the system resource capacity, the

processor stalls. The IXP42X product line and IXC1100 control plane processors data

transfer resources are:

• Four fill buffers

• Four pending buffers

• Eight half-cache line write buffer

SDRAM resources are typically:

• Four memory banks

• One page buffer per bank referencing a 4K address range

• Four transfer request buffers

Consider how these resources work together. A fill buffer is allocated for each cache

read miss. A fill buffer is also allocated each cache write miss if the memory space is

write allocate along with a pending buffer. A subsequent read to the same cache line

does not require a new fill buffer, but does require a pending buffer and a subsequent

write will also require a new pending buffer. A fill buffer is also allocated for each read

to a non-cached memory and a write buffer is needed for each memory write to non-

cached memory that is non-coalescing. Consequently, a STM instruction listing eight

registers and referencing non-cached memory will use eight write buffers assuming

they don’t coalesce and two write buffers if they do coalesce. A cache eviction requires

a write buffer for each dirty bit set in the cache line. The prefetch instruction requires a

fill buffer for each cache line and 0, 1, or 2 write buffers for an eviction.

When adding prefetch instructions, caution must be asserted to insure that the

combination of prefetch and instruction bus requests do not exceed the system

resource capacity described above or performance will be degraded instead of

improved. The important points are to spread prefetch operations over calculations so

as to allow bus traffic to free flow and to minimize the number of necessary prefetches.

3.10.4.4.5 Cache Memory Considerations

Stride, the way data structures are walked through, can affect the temporal quality of

the data and reduce or increase cache conflicts. The IXP42X product line and IXC1100

control plane processors data cache and mini-data caches each have 32 sets of 32

bytes. This means that each cache line in a set is on a modular 1-K-address boundary.

The caution is to choose data structure sizes and stride requirements that do not

overwhelm a given set causing conflicts and increased register pressure. Register

pressure can be increased because additional registers are required to track prefetch

addresses. The effects can be affected by rearranging data structure components to

use more parallel access to search and compare elements. Similarly rearranging

sections of data structures so that sections often written fit in the same half cache line,

16 bytes for the IXP42X product line and IXC1100 control plane processors, can reduce

cache eviction write-backs. On a global scale, techniques such as array merging can

enhance the spatial locality of the data.

As an example of array merging, consider the following code: