Intel 64 and IA-32 Architectures Software Developers Manual Volume 3B, System Programming Guide Part 2

Table Of Contents
A-102 Vol. 3
PERFORMANCE-MONITORING EVENTS
A.5 PERFORMANCE MONITORING EVENTS FOR
INTEL
®
ATOM
PROCESSORS
Processors based on the Intel Atom microarchitecture support the architectural and
non-architectural performance-monitoring events listed in Table A-1 and Table A-6.
In addition, they also support the following non-architectural performance-moni-
toring events listed in Table A-7.
Table A-7. Non-Architectural Performance Events for Intel Atom Processors
Event
Num.
Umask
Value Event Name Definition Description and Comment
02H 81HSTORe_FORWA
RDS.GOOD
Good store
forwards
This event counts the number of times store
data was forwarded directly to a load.
06H 00HSEGMENT_REG_
LOADS.ANY
Number of
segment
register loads
This event counts the number of segment
register load operations. Instructions that
load new values into segment registers cause
a penalty. This event indicates performance
issues in 16-bit code. If this event occurs
frequently, it may be useful to calculate the
number of instructions retired per segment
register load. If the resulting calculation is low
(on average a small number of instructions
are executed between segment register
loads), then the code’s segment register
usage should be optimized.
As a result of branch misprediction, this event
is speculative and may include segment
register loads that do not actually occur.
However, most segment register loads are
internally serialized and such speculative
effects are minimized.
07H 01H PREFETCH.PREF
ETCHT0
Streaming SIMD
Extensions
(SSE)
PrefetchT0
instructions
executed.
This event counts the number of times the
SSE instruction prefetchT0 is executed. This
instruction prefetches the data to the L1
data cache and L2 cache.
07H 06H PREFETCH.SW_
L2
Streaming SIMD
Extensions
(SSE)
PrefetchT1 and
PrefetchT2
instructions
executed
This event counts the number of times the
SSE instructions prefetchT1 and prefetchT2
are executed. These instructions prefetch the
data to the L2 cache