Intel 64 and IA-32 Architectures Software Developers Manual Volume 3B, System Programming Guide Part 2

Table Of Contents
Vol. 3 A-61
PERFORMANCE-MONITORING EVENTS
03H 20H LOAD_BLOCK.L1D Loads blocked
by the L1 data
cache
This event indicates that loads are blocked
due to one or more reasons. Some
triggers for this event are:
The number of L1 data cache misses
exceeds the maximum number of
outstanding misses supported by the
processor. This includes misses
generated as result of demand fetches,
software prefetches or hardware
prefetches.
Cache line split loads.
Partial reads, such as reads to un-
cacheable memory, I/O instructions and
more.
A locked load operation is in progress.
The number of events is greater or
equal to the number of load operations
that were blocked.
04H 01H SB_DRAIN_
CYCLES
Cycles while
stores are
blocked due to
store buffer
drain
This event counts every cycle during
which the store buffer is draining. This
includes:
Serializing operations such as CPUID
Synchronizing operations such as XCHG
Interrupt acknowledgment
Other conditions, such as cache flushing
04H 02H STORE_BLOCK.
ORDER
Cycles while
store is waiting
for a preceding
store to be
globally
observed
This event counts the total duration, in
number of cycles, which stores are waiting
for a preceding stored cache line to be
observed by other cores.
This situation happens as a result of the
strong store ordering behavior, as defined
in “Memory Ordering,” Chapter 7, Intel® 64
and IA-32 Architectures Software
Developer’s Manual, Volume 3A.
The stall may occur and be noticeable if
there are many cases when a store either
misses the L1 data cache or hits a cache
line in the Shared state. If the store
requires a bus transaction to read the
cache line then the stall ends when snoop
response for the bus transaction arrives.
Table A-6. Non-Architectural Performance Events
in Processors Based on Intel Core Microarchitecture (Contd.)
Event
Num
Umask
Value Event Name Definition
Description and
Comment