User's Manual
Intel
®
 IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor
September 2006 DM
Order Number: 252480-006US 141
Intel XScale
®
 Processor—Intel
®
 IXP42X product line and IXC1100 control plane processors
3.7.4.4 Data/Bus Request Buffer Full Mode
The Data Cache has buffers available to service cache misses or uncacheable accesses. 
For every memory request that the Data Cache receives from the processor core a 
buffer is speculatively allocated in case an external memory request is required or 
temporary storage is needed for an unaligned access. If no buffers are available, the 
Data Cache will stall the processor core. How often the Data Cache stalls depends on 
the performance of the bus external to the IXP42X product line and IXC1100 control 
plane processors and what the memory access latency is for Data Cache miss requests 
to external memory. If the IXP42X product line and IXC1100 control plane processors 
memory access latency is high, possibly due to starvation, these Data Cache buffers 
will become full. This performance monitoring mode is provided to see if the IXP42X 
product line and IXC1100 control plane processors are being starved of the bus 
external to the IXP42X product line and IXC1100 control plane processors, which will 
effect the performance of the application running on the IXP42X product line and 
IXC1100 control plane processors. 
PMN0 accumulates the number of clock cycles the processor is being stalled due to this 
condition and PMN1 monitors the number of times this condition occurs. 
Statistics derived from these two events:
• The average number of cycles the processor stalled on a data-cache access that 
may overflow the data-cache buffers. This is calculated by dividing PMN0 by PMN1. 
This statistic lets you know if the duration event cycles are due to many requests or 
are attributed to just a few requests. If the average is high, the IXP42X product line 
and IXC1100 control plane processors may be starved of the bus external to the 
IXP42X product line and IXC1100 control plane processors. 
• The percentage of total execution cycles the processor stalled because a Data 
Cache request buffer was not available. This is calculated by dividing PMN0 by 
CCNT, which was used to measure total execution time. 
3.7.4.5 Stall/Write-Back Statistics
When an instruction requires the result of a previous instruction and that result is not 
yet available, the IXP42X product line and IXC1100 control plane processors stall in 
order to preserve the correct data dependencies. PMN0 counts the number of stall 
cycles due to data-dependencies. Not all data-dependencies cause a stall; only the 
following dependencies cause such a stall penalty:
• Load-use penalty: attempting to use the result of a load before the load completes. 
To avoid the penalty, software should delay using the result of a load until it’s 
available. This penalty shows the latency effect of data-cache access.
• Multiply/Accumulate-use penalty: attempting to use the result of a multiply or 
multiply-accumulate operation before the operation completes. Again, to avoid the 
penalty, software should delay using the result until it’s available.
• ALU use penalty: there are a few isolated cases where back to back ALU operations 
may result in one cycle delay in the execution. These cases are defined in 
Table 3.9, “Performance Considerations” on page 159.
PMN1 counts the number of write-back operations emitted by the data cache. These 
write-backs occur when the data cache evicts a dirty line of data to make room for a 
newly requested line or as the result of clean operation (CP15, register 7). 
Statistics derived from these two events:
• The percentage of total execution cycles the processor stalled because of a data 
dependency. This is calculated by dividing PMN0 by CCNT, which was used to 
measure total execution time. Often a compiler can reschedule code to avoid these 
penalties when given the right optimization switches.










