Intel 64 and IA-32 Architectures Software Developers Manual Volume 3B, System Programming Guide Part 2

Table Of Contents
18-76 Vol. 3
DEBUGGING AND PERFORMANCE MONITORING
threshold interrupt is generated after the PEBS assist completes, followed by the
counter overflow interrupt (two separate interrupts are generated).
Uncore counters may be programmed to interrupt one or more processor cores (see
Section 18.17.2). It is possible for interrupts posted from the uncore facility to occur
coincident with counter overflow interrupts from the processor core. Software must
check core and uncore status registers to determine the exact origin of counter over-
flow interrupts.
18.17.1.2 Load Latency Performance Monitoring Facility
The load latency facility provides software a means to characterize the average load
latency to different levels of cache/memory hierarchy. This facility requires processor
supporting enhanced PEBS record format in the PEBS buffer, see Table 18-21. The
facility measures latency from micro-operation (uop) dispatch to when data is
globally observable (GO).
To use this feature software must assure:
One of the IA32_PERFEVTSELx MSR is programmed to specify the event unit
MEM_INST_RETIRED, and the LATENCY_ABOVE_THRESHOLD event mask must
be specified (IA32_PerfEvtSelX[15:0] = 0x100B). The corresponding counter
IA32_PMCx will accumulate event counts for architecturally visible loads which
exceed the programmed latency threshold specified separately in a MSR. Stores
are ignored when this event is programmed. The CMASK or INV fields of the
IA32_PerfEvtSelX register used for counting load latency must be 0. Writing
other values will result in undefined behavior.
The MSR_PEBS_LD_LAT_THRESHOLD MSR is programmed with the desired
latency threshold in core clock cycles. Loads with latencies greater than this
value are eligible for counting and latency data reporting. The minimum value
that may be programmed in this register is 3 (the minimum detectable load
latency is 4 core clock cycles).
The PEBS enable bit in the IA32_PEBS_ENABLE register is set for the corre-
sponding IA32_PMCx counter register. This means that both the PEBS_EN_CTRX
and LL_EN_CTRX bits must be set for the counter(s) of interest. For example, to
enable load latency on counter IA32_PMC0, the IA32_PEBS_ENABLE register
must be programmed with the 64-bit value 0x00000001.00000001.
When the load-latency facility is enabled, load operations are randomly selected by
hardware and tagged to carry information related to data source locality and latency.
Latency and data source information of tagged loads are updated internally.
When a PEBS assist occurs, the last update of latency and data source information
are captured by the assist and written as part of the PEBS record. The PEBS sample
after value (SAV), specified in PEBS CounterX Reset, operates orthogonally to the
tagging mechanism. Loads are randomly tagged to collect latency data. The SAV
controls the number of tagged loads with latency information that will be written into
the PEBS record field by the PEBS assists. The load latency data written to the PEBS