Intel® Xeon® Processor E5 v2 and E7 v2 Product Families Uncore Performance Monitoring Reference Manual Reference Number: 329468-002 February 2014
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT.
Contents 1 Introduction .............................................................................................................. 5 1.1 Introduction ....................................................................................................... 5 1.2 Uncore PMON Overview........................................................................................ 7 1.3 Section References .............................................................................................. 8 1.
2.5 2.6 2.7 2.8 2.9 4 2.4.4 HA Box Events Ordered By Code ...............................................................59 2.4.5 HA Box Common Metrics (Derived Events) .................................................60 2.4.6 HA Box Performance Monitor Event List .....................................................60 Memory Controller (iMC) Performance Monitoring...................................................76 2.5.1 Overview of the iMC ...........................................................
2.9.2 2.10 2.11 R2PCIe Performance Monitoring Overview................................................ 163 2.9.2.1 R2PCIe PMON Registers - On Overflow and the Consequences (PMI/Freeze) .......................................................................... 163 2.9.3 R2PCIe Performance Monitors ................................................................ 164 2.9.3.1 R2PCIe Box Level PMON State .................................................. 164 2.9.3.2 R2PCIe PMON state - Counter/Control Pairs .
Figures 1-1 1-2 1-3 1-4 Intel Xeon Processor E7-8800 v2 family Block Diagram ........................................... Intel Xeon Processor E5-2600 v2 Product Family Block Diagram ................................ Intel Xeon Processor E5-1600 v2 Product Family Block Diagram ................................ Perfmon Control/Counter Block Diagram ................................................................
2-41 2-42 2-43 2-44 2-45 2-46 2-47 2-48 2-49 2-50 2-51 2-52 2-53 2-54 2-55 2-56 2-57 2-58 2-59 2-60 2-61 2-62 2-63 2-64 2-65 2-66 2-67 2-68 2-69 2-70 2-71 2-72 2-73 2-74 2-75 2-76 2-77 2-78 2-79 2-80 2-81 2-82 2-83 2-84 2-85 2-86 2-87 2-88 2-89 2-90 2-91 2-92 HA_PCI_PMON_CTR{3-0} Register – Field Definitions ............................................ 57 HA_PCI_PMON_BOX_OPCODEMATCH Register – Field Definitions ............................. 58 HA_PCI_PMON_BOX_ADDRMATCH1 Register – Field Definitions .......
2-93 2-94 2-95 2-96 2-97 2-98 2-99 2-100 2-101 2-102 2-103 2-104 2-105 2-106 2-107 2-108 2-109 2-110 2-111 2-112 2-113 2-114 2-115 2-116 2-117 2-118 2-119 2-120 2-121 2-122 2-123 2-124 2-125 2-126 2-127 2-128 2-129 2-130 2-131 2-132 2-133 2-134 2-135 2-136 2-137 2-138 2-139 2-140 2-141 2-142 2-143 2-144 8 Unit Masks for RD_CAS_RANK6 ...........................................................................92 Unit Masks for RD_CAS_RANK7 .....................................................................
2-145 2-146 2-147 2-148 2-149 2-150 2-151 2-152 2-153 2-154 2-155 2-156 2-157 2-158 2-159 2-160 2-161 2-162 2-163 2-164 2-165 2-166 2-167 2-168 2-169 2-170 2-171 2-172 2-173 2-174 2-175 2-176 2-177 2-178 2-179 2-180 2-181 2-182 2-183 2-184 2-185 2-186 2-187 2-188 2-189 2-190 2-191 2-192 2-193 2-194 2-195 2-196 Unit Masks for RxL_INSERTS_DRS ..................................................................... 150 Unit Masks for RxL_INSERTS_HOM ...............................................................
2-197 2-198 2-199 2-200 2-201 2-202 2-203 2-204 2-205 2-206 2-207 2-208 2-209 2-210 2-211 2-212 2-213 2-214 2-215 2-216 2-217 2-218 Unit Masks for QPI0_AD_CREDITS_EMPTY........................................................... 179 Unit Masks for QPI0_BL_CREDITS_EMPTY ........................................................... 179 Unit Masks for QPI1_AD_CREDITS_EMPTY........................................................... 180 Unit Masks for QPI1_BL_CREDITS_EMPTY .........................................
Introduction CHAPTER 1INTRODUCTION 1.1 INTRODUCTION The uncore sub-system of the Intel® Xeon® processor E7-8800 v2E5-2600 v2, and E5-1600 v2 Product Families are shown in Figure 1-1, Figure 1-2 and Figure 1-3. The uncore subsystem consists of a variety of components, ranging from the CBox caching agent to the power controller unit (PCU), integrated memory controller (iMC) and home agent (HA), to name a few. Most of these components provide similar performance monitoring capabilities. Figure 1-1.
Introduction Figure 1-2.Intel Xeon Processor E5-2600 v2 Product Family Block Diagram NOTE This diagram represents one possible EP configuration. Not all skus support all features.
Introduction Figure 1-3.Intel Xeon Processor E5-1600 v2 Product Family Block Diagram NOTE This diagram represents one possible EN configuration. Not all skus support all features. 1.2 UNCORE PMON OVERVIEW The uncore performance monitoring facilities are organized into per-component performance monitoring (or ‘PMON’) units. A PMON unit within an uncore component may contain one of more sets of counter registers.
Introduction switches and thread migration performed by the OS, it is recommended that the monitoring software agent establish a fixed affinity binding to prevent cross-talk of event counts from different uncore PMU. The programming interface of the counter registers and control registers fall into two address spaces: • Accessed by MSR are PMON registers within the Cbo units, PCU, and U-Box, see Table 1-2.
Introduction 1.4 UNCORE PMON - TYPICAL CONTROL/COUNTER LOGIC Following is a diagram of the standard perfmon counter block illustrating how event information is routed and stored within each counter and how its paired control register helps to select and filter the incoming information. Details for how control bits affect event information is presented in each of the box subsections of Chapter 2, with some summary information below. NOTE: The PCU uses an adaptation of this block (refer to Section 2.7.
Introduction Notification after X events: . - instead of manually stopping the counters at intervals (often wall clock time) pre-determined by software, hardware can be set to notify monitoring software when a set number of events has occurred. The Overflow Enable bit is provided for just that purpose. See Section 2.1.1, “Counter Overflow” for more information on how to use this mechanism. Applying a Threshold to Incoming Events: .
Introduction Table 1-2.
Introduction Table 1-2.
Introduction Table 1-3.
Introduction • e.g. with:Cn_MSR_PMON_BOX_FILTER.{opc,nid}={0x182,my_node} Requires reading a fixed data register • For the case where the metric requires the information contained in a fixed data register, the mnemonic for the register will be included in the equation. Software will be responsible for configuring the data register and setting it to start counting with the other events used by the metric. • e.g. POWER_THROTTLE_CYCLES.
Uncore Performance Monitoring Uncore Per-Socket Performance Monitoring Control CHAPTER 2UNCORE PERFORMANCE MONITORING 2.1 UNCORE PER-SOCKET PERFORMANCE MONITORING CONTROL To manage the large number of counter registers distributed across many units and collect event data efficiently, this section describes the hierarchical technique to start/stop/restart event counting that a software agent may need to perform during a monitoring session. 2.1.
Uncore Performance Monitoring Uncore Per-Socket Performance Monitoring Control e.g. Set C0_MSR_PMON_CTL2.en to 1 NOTE Recommended: set the .en bit for all counters in each box a user intends to monitor, and left alone for the duration of the monitoring session. NOTE For cases where there is no sharing of these counters among software agents independently sampling the counters, software could set the enable bits for all counters it intends to use during the setup phase.
Uncore Performance Monitoring Uncore Per-Socket Performance Monitoring Control Monitoring: e) Select how to gather data. If polling, skip to f. If sampling: To set up a sample interval, software can pre-program the data register with a value of [2^(register bit width - up to 48) - sample interval length]. Doing so allows software, through use of the pmi mechanism, to be notified when the number of events in the sample have been captured.
Uncore Performance Monitoring Uncore Per-Socket Performance Monitoring Control 2.1.4 Enabling a New Sample Interval from Frozen Counters a) Clear all uncore counters: For each box in which counting occurred, set *_PMON_BOX_CTL.rst_ctrs to 1. b) Clear all overflow bits. This includes clearing U_MSR_PMON_GLOBAL_STATUS.ov_* as well as any *_BOX_STATUS registers that have their overflow bits set. e.g.
Uncore Performance Monitoring Uncore Per-Socket Performance Monitoring Control Table 2-2. U_MSR_PMON_GLOBAL_CTL Register – Field Definitions Field Bits Attr HW Reset Val Description frz_all 31 WO 0 Freeze all uncore performance monitors. wk_on_pmi 30 RW 0 If PMI event requested to send to core... 0 - Send event to cores already woken 1 - Wake any sleeping core and send PMI to all cores. unfrz_all 29 WO 0 Unfreeze all uncore performance monitors. rsv 28:27 RV 0 Reserved.
Uncore Performance Monitoring UBox Performance Monitoring Table 2-3. U_MSR_PMON_GLOBAL_STATUS Register – Field Definitions Field rsv Bits Attr HW Reset Val Description 31:27 RV 0 Reserved ov_rp 26 RW1C 0 Set if overflow is detected from an R2PCIe PMON register. NOTE: Write of ‘1’ will clear the bit. ov_rq1 25 RW1C 0 Set if overflow is detected from an R3QPI1 PMON register. NOTE: Write of ‘1’ will clear the bit.
Uncore Performance Monitoring UBox Performance Monitoring • The master for reading and writing physically distributed registers across physical processor using the Message Channel. • The UBox is the intermediary for interrupt traffic, receiving interrupts from the system and dispatching interrupts to the appropriate core. • The UBox serves as the system lock master used when quiescing the platform (e.g., Intel® QPI bus lock). 2.2.
Uncore Performance Monitoring UBox Performance Monitoring 2.2.3.1 UBox Box Level PMON State The following registers represent the state governing all box-level PMUs in the UBox. If an overflow is detected from one of the UBox PMON registers, the corresponding bit in the U_MSR_PMON_BOX_STATUS.ov field will be set. To reset these overflow bits, a user must write a value of ‘1’ to them (which will clear the bits). Table 2-5. U_MSR_PMON_BOX_STATUS Register – Field Definitions Field rsv ov 2.2.3.
Uncore Performance Monitoring UBox Performance Monitoring Field Bits HW Reset Val Attr Description umask 15:8 RW 0 Select subevents to be counted within the selected event. ev_sel 7:0 RW 0 Select event to be counted. The UBox performance monitor data registers are 44-bit wide. A counter overflow occurs when a carry out from bit 43 is detected.
Uncore Performance Monitoring UBox Performance Monitoring 2.2.4 UBOX Box Events Ordered By Code The following table summarizes the directly measured UBOX Box events. Symbol Name EVENT_MSG 2.2.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring Table 2-11. Unit Masks for PHOLD_CYCLES Extension ASSERT_TO_ACK umask [15:8] bxxxxxxx1 Description Assert to ACK RACU_REQUESTS • Title: RACU Request • Category: RACU Events • Event Code: 0x46 • Max. Inc/Cyc:. 1, Register Restrictions: 0-1 • Definition: • NOTE: This will be dropped because PHOLD is not implemented this way. 2.3 CACHEING AGENT (CBO) PERFORMANCE MONITORING 2.3.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring 2.3.2 CBo Performance Monitoring Overview Each of the CBos in the uncore supports event monitoring through four 44-bit wide counters (Cn_MSR_PMON_CTR{3:0}). Event programming in the CBo is restricted such that each events can only be measured in certain counters within the CBo. For example, counter 0 is dedicated to occupancy events. No other counter may be used to capture occupancy events.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring MSR Name MSR Address Size (bits) Description C1_MSR_PMON_CTR1 0x0D37 64 CBo 1 PMON Counter 1 C1_MSR_PMON_CTR0 0x0D36 64 CBo 1 PMON Counter 0 C1_MSR_PMON_BOX_FILTER 0x0D34 32 CBo 1 PMON Filter C1_MSR_PMON_BOX_FILTER1 0x0D3A 32 CBo 1 PMON Filter1 C1_MSR_PMON_CTL3 0x0D33 32 CBo 1 PMON Control for Counter 3 C1_MSR_PMON_CTL2 0x0D32 32 CBo 1 PMON Control for Counter 2 C1_MSR_PMON_CTL1 0x0D31 32 CBo 1 PMON Con
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring MSR Name MSR Address Size (bits) Description C3_MSR_PMON_CTL1 0x0D71 32 CBo 3 PMON Control for Counter 1 C3_MSR_PMON_CTL0 0x0D70 32 CBo 3 PMON Control for Counter 0 0x0D64 32 CBo 3 PMON Box-Wide Control C4_MSR_PMON_CTR3 0x0D99 64 CBo 4 PMON Counter 3 C4_MSR_PMON_CTR2 0x0D98 64 CBo 4 PMON Counter 2 C4_MSR_PMON_CTR1 0x0D97 64 CBo 4 PMON Counter 1 C4_MSR_PMON_CTR0 0x0D96 64 CBo 4 PMON Counter 0 C4_MSR_PMON
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring MSR Name MSR Address Size (bits) Description C6_MSR_PMON_CTR2 0x0DD8 64 CBo 6 PMON Counter 2 C6_MSR_PMON_CTR1 0x0DD7 64 CBo 6 PMON Counter 1 C6_MSR_PMON_CTR0 0x0DD6 64 CBo 6 PMON Counter 0 C6_MSR_PMON_BOX_FILTER 0x0DD4 32 CBo 6 PMON Filter C6_MSR_PMON_BOX_FILTER1 0x0DDA 32 CBo 6 PMON Filter1 C6_MSR_PMON_CTL3 0x0DD3 32 CBo 6 PMON Control for Counter 3 C6_MSR_PMON_CTL2 0x0DD2 32 CBo 6 PMON Control for Cou
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring MSR Name MSR Address Size (bits) Description C8_MSR_PMON_CTL2 0x0E12 32 CBo 8 PMON Control for Counter 2 C8_MSR_PMON_CTL1 0x0E11 32 CBo 8 PMON Control for Counter 1 C8_MSR_PMON_CTL0 0x0E10 32 CBo 8 PMON Control for Counter 0 0x0E04 32 CBo 8 PMON Box-Wide Control C9_MSR_PMON_CTR3 0x0E39 64 CBo 9 PMON Counter 3 C9_MSR_PMON_CTR2 0x0E38 64 CBo 9 PMON Counter 2 C9_MSR_PMON_CTR1 0x0E37 64 CBo 9 PMON Counter 1
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring MSR Name MSR Address Size (bits) Description C11_MSR_PMON_CTR3 0x0E79 64 CBo 11 PMON Counter 3 C11_MSR_PMON_CTR2 0x0E78 64 CBo 11 PMON Counter 2 C11_MSR_PMON_CTR1 0x0E77 64 CBo 11 PMON Counter 1 C11_MSR_PMON_CTR0 0x0E76 64 CBo 11 PMON Counter 0 C11_MSR_PMON_BOX_FILTER 0x0E74 32 CBo 11 PMON Filter C11_MSR_PMON_BOX_FILTER1 0x0E7A 32 CBo 11 PMON Filter1 C11_MSR_PMON_CTL3 0x0E73 32 CBo 11 PMON Control for C
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring MSR Name MSR Address Size (bits) Description C13_MSR_PMON_CTL3 0x0EB3 32 CBo 13 PMON Control for Counter 3 C13_MSR_PMON_CTL2 0x0EB2 32 CBo 13 PMON Control for Counter 2 C13_MSR_PMON_CTL1 0x0EB1 32 CBo 13 PMON Control for Counter 1 C13_MSR_PMON_CTL0 0x0EB0 32 CBo 13 PMON Control for Counter 0 0x0EA4 32 CBo 13 PMON Box-Wide Control C14_MSR_PMON_CTR3 0x0ED9 64 CBo 14 PMON Counter 3 C14_MSR_PMON_CTR2 0x0ED8 6
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring Table 2-13. Cn_MSR_PMON_BOX_CTL Register – Field Definitions Field 2.3.3.2 Bits Attr HW Reset Val Description rsv 31:18 RV 0 Reserved rsv 17:16 RV 0 Reserved; SW must write to 1 else behavior is undefined. rsv 15:9 RV 0 Reserved frz 8 WO 0 Freeze. If set to 1 the counters in this box will be frozen. rsv 7:2 RV 0 Reserved rst_ctrs 1 WO 0 Reset Counters.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring message to the UBox (refer to Section 2.1.1, “Counter Overflow”). During the interval of time between overflow and global disable, the counter value will wrap and continue to collect events. If accessible, software can continuously read the data registers without disabling event collection. Table 2-15. Cn_MSR_PMON_CTR{3-0} Register – Field Definitions Field rsv event_count 2.3.3.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring Table 2-17. Cn_MSR_PMON_BOX_FILTER1 Register – Field Definitions Field Bits Attr HW Reset Val isoc 31 RW nc 30 RW 0 Match on Non-Coherent Requests rsv 29 RV 0 Reserved. SW must write 0 else behavior is undefined. 28:20 RW 0 Match on Opcode (see Table 2-18, “Opcode Match by IDI Packet Type for Cn_MSR_PMON_BOX_FILTER.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring opc Value Opcode Defn 0x19D PCIWrUpdate PCIe Write Update (prior generation uncore in Intel® Xeon® processor E52600 Product Family) - see PCIRMW, except does not return data back to IIO from ownership read request. 0x19E PCIRdCur PCIe read current - Read Current requests from IIO. Used to read data without changing state.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring 2.3.4.2 Acronyms frequently used in CBo Events: The Rings: AD (Address) Ring - Core Read/Write Requests and Intel QPI Snoops. Carries Intel QPI requests and snoop responses from C to QPI. BL (Block or Data) Ring - Data == 2 transfers for 1 cache line AK (Acknowledge) Ring - Acknowledges QPI to CBo and CBo to Core. Carries snoop responses from Core to CBo.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring Symbol Name 2.3.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring Symbol Name: Definition Equation CYC_USED_DNEVEN: Cycles Used in the Down direction, Even polarity RING_BL_USED.DN_EVEN / SAMPLE_INTERVAL CYC_USED_DNODD: Cycles Used in the Down direction, Odd polarity RING_BL_USED.DN_ODD / SAMPLE_INTERVAL CYC_USED_UPEVEN: Cycles Used in the Up direction, Even polarity RING_BL_USED.UP_EVEN / SAMPLE_INTERVAL CYC_USED_UPODD: Cycles Used in the Up direction, Odd polarity RING_BL_USED.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring Symbol Name: Definition 2.3.7 Equation MEM_WB_BYTES: Data written back to memory in Number of Bytes LLC_VICTIMS.M_STATE * 64 PARTIAL_PCI_READS: Number of partial PCI reads TOR_INSERTS.OPCODE with:Cn_MSR_PMON_BOX_FILTER1.opc=0x195 PARTIAL_PCI_WRITES: Number of partial PCI writes TOR_INSERTS.OPCODE with:Cn_MSR_PMON_BOX_FILTER1.opc=0x1E5 PCIE_DATA_BYTES: Data from PCIe in Number of Bytes (TOR_INSERTS.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring LLC_LOOKUP • Title: Cache Lookups • Category: CACHE Events • Event Code: 0x34 • Max. Inc/Cyc:. 1, Register Restrictions: 0-1 • Definition: Counts the number of times the LLC was accessed - this includes code, data, prefetches and hints coming from L2. This has numerous filters available. Note the non-standard filtering equation. This event will count requests that lookup the cache multiple times with multiple increments.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring Table 2-20. Unit Masks for LLC_VICTIMS Extension umask [15:8] MISS bxxxx1xxx NID bx1xxxxxx Filter Dep Description CBoFilter1[ 15:0] Victimized Lines that Match NID Qualify one of the other subevents by the Target NID. The NID is programmed in Cn_MSR_PMON_BOX_FILTER.nid. In conjunction with STATE = I, it is possible to monitor misses to specific NIDs in the system.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring Table 2-22. Unit Masks for RING_AD_USED Extension umask [15:8] Description UP_VR0_EVEN bxxxxxxx1 Up and Even on Vring 0 Filters for the Up and Even ring polarity on Virtual Ring 0. UP_VR0_ODD bxxxxxx1x Up and Odd on Vring 0 Filters for the Up and Odd ring polarity on Virtual Ring 0. DOWN_VR0_EVEN bxxxxx1xx Down and Even on Vring 0 Filters for the Down and Even ring polarity on Virtual Ring 0.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring Table 2-23. Unit Masks for RING_AK_USED Extension umask [15:8] Description UP_VR1_ODD bxx1xxxxx Up and Odd on VRing 1 Filters for the Up and Odd ring polarity on Virtual Ring 1. UP b00110011 Up DOWN_VR1_EVEN bx1xxxxxx Down and Even on VRing 1 Filters for the Down and Even ring polarity on Virtual Ring 1. DOWN_VR1_ODD b1xxxxxxx Down and Odd on VRing 1 Filters for the Down and Odd ring polarity on Virtual Ring 1.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring RING_BOUNCES • Title: Number of LLC responses that bounced on the Ring. • Category: RING Events • Event Code: 0x05 • Max. Inc/Cyc:. 1, Register Restrictions: 0-1 • Definition: Table 2-25. Unit Masks for RING_BOUNCES Extension umask [15:8] Description AD_IRQ bxxxxxx1x AK bxxxxx1xx Acknowledgements to core BL bxxxx1xxx Data Responses to core IV bxxx1xxxx Snoops of processor's cache.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring Table 2-27. Unit Masks for RxR_EXT_STARVED Extension umask [15:8] Description IRQ bxxxxxxx1 IPQ IRQ is externally starved and therefore we are blocking the IPQ. IPQ bxxxxxx1x IRQ IPQ is externally starved and therefore we are blocking the IRQ. PRQ bxxxxx1xx IRQ is blocking the ingress queue and causing the starvation. ISMQ_BIDS bxxxx1xxx ISMQ_BID Number of times that the ISMQ Bid.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring Table 2-29. Unit Masks for RxR_IPQ_RETRY Extension umask [15:8] Description ANY bxxxxxxx1 Any Reject Counts the number of times that a request form the IPQ was retried because of a TOR reject. TOR rejects from the IPQ can be caused by the Egress being full or Address Conflicts.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring Table 2-30. Unit Masks for RxR_IRQ_RETRY Extension umask [15:8] Description RTID bxxxx1xxx No RTIDs Counts the number of times that requests from the IRQ were retried because there were no RTIDs available. RTIDs are required after a request misses the LLC and needs to send snoops and/or requests to memory. If there are no RTIDs available, requests will queue up in the IRQ and retry until one becomes available.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring Table 2-31. Unit Masks for RxR_ISMQ_RETRY Extension umask [15:8] Description IIO_CREDITS bxx1xxxxx No IIO Credits Number of times a request attempted to acquire the NCS/NCB credit for sending messages on BL to the IIO. There is a single credit in each CBo that is shared between the NCS and NCB message classes for sending transactions on the BL ring (such as read data) to the IIO.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring Table 2-33. Unit Masks for TOR_INSERTS Extension 50 umask [15:8] Filter Dep Description OPCODE b00000001 CBoFilter1[ 28:20] Opcode Match Transactions inserted into the TOR that match an opcode (matched by Cn_MSR_PMON_BOX_FILTER.opc) MISS_OPCODE b00000011 CBoFilter1[ 28:20] Miss Opcode Match Miss transactions inserted into the TOR that match an opcode.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring Table 2-33. Unit Masks for TOR_INSERTS Extension umask [15:8] Filter Dep Description NID_ALL b01001000 CBoFilter1[ 15:0] NID Matched All NID matched (matches an RTID destination) transactions inserted into the TOR. The NID is programmed in Cn_MSR_PMON_BOX_FILTER.nid. In conjunction with STATE = I, it is possible to monitor misses to specific NIDs in the system.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring Table 2-34. Unit Masks for TOR_OCCUPANCY Extension 52 umask [15:8] Filter Dep Description ALL b00001000 Any All valid TOR entries. This includes requests that reside in the TOR for a short time, such as LLC Hits that do not need to snoop cores or requests that get rejected and have to be retried through one of the ingress queues.
Uncore Performance Monitoring Cacheing Agent (Cbo) Performance Monitoring Table 2-34. Unit Masks for TOR_OCCUPANCY Extension umask [15:8] MISS_REMOTE_OPCODE b10000011 REMOTE b10001000 MISS_REMOTE b10001010 Filter Dep Description CBoFilter1[ 28:20] Misses to Remote Memory - Opcode Matched Number of outstanding Miss transactions, satisfied by an opcode, in the TOR that are satisfied by remote caches or remote memory. TxR_ADS_USED • Title: • Category: EGRESS Events • Event Code: 0x04 • Max.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring Table 2-36. Unit Masks for TxR_INSERTS Extension umask [15:8] Description AK_CORE bxx1xxxxx AK - Corebo Ring transactions from the Corebo destined for the AK ring. This is commonly used for snoop responses coming from the core and destined for a Cachebo. BL_CORE bx1xxxxxx BL - Corebo Ring transactions from the Corebo destined for the BL ring. This is commonly used for transferring writeback data to the cache. 2.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring For information on how to setup a monitoring session, refer to Section 2.1, “Uncore Per-Socket Performance Monitoring Control”. 2.4.2.1 HA PMON Registers - On Overflow and the Consequences (PMI/Freeze) If a overflow is detected from an HA performance counter enabled to communicate its overflow (HAn_PCI_PMON_CTL.ov_en is set to 1), the overflow bit is set at the box level (HAn_PCI_PMON_BOX_STATUS.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring In the case of the HA, the HA_PCI_PMON_BOX_CTL register provides the ability to manually freeze the counters in the box (.frz) and reset the generic state (.rst_ctrs and .rst_ctrl). If an overflow is detected from one of the HA PMON registers, the corresponding bit in the HA_PCI_PMON_BOX_STATUS.ov field will be set. To reset these overflow bits, a user must write a value of ‘1’ to them (which will clear the bits). Table 2-38.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring Table 2-40. HA_PCI_PMON_CTL{3-0} Register – Field Definitions Field thresh Bits Attr 31:24 RW-V HW Reset Val 0 Description Threshold used in counter comparison. rsv 23 RV 0 Reserved. SW must write to 0 else behavior is undefined. en 22 RW-V 0 Local Counter Enable. rsv 21 RV 0 Reserved. SW must write to 0 else behavior is undefined.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring In addition to generic event counting, each HA provides a pair of Address Match registers and an Opcode Match register that allow a user to filter incoming packet traffic according to the packet Opcode, Message Class and Physical Address. The ADDR_OPC_MATCH.FILT event is provided to capture the filter match as an event.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring • iMC RPQ/WPQ Events Determine cycles the HA is stuck without credits in to the iMCs read/write queues. 2.4.3.1 On the Major HA Structures: The 128-entry TF (Tracker File) holds all transactions that arrive in the HA from the time they arrive until they are completed and leave the HA. Transactions could stay in this structure much longer than they are needed.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring Symbol Name 2.4.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring ADDR_OPC_MATCH • Title: QPI Address/Opcode Match • Category: ADDR_OPCODE_MATCH Events • Event Code: 0x20 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Table 2-44.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring released after the snoop response and data return (or post in the case of a write) and the response is returned on the ring. Table 2-45.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring Table 2-47. Unit Masks for CONFLICT_CYCLES umask [15:8] Extension Description CONFLICT bxxxxxx1x Conflict Detected Counts the number of cycles that we are handling conflicts. LAST bxxxxx1xx Last in conflict chain Count every last conflict in conflict chain. Can be used to compute the average conflict chain length as (#Ackcnflts/ #LastConflictor)+1.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring DIRECTORY_LOOKUP • Title: Directory Lookups • Category: DIRECTORY Events • Event Code: 0x0c • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Counts the number of transactions that looked up the directory. Can be filtered by requests that had to snoop and those that did not have to. • NOTE: Only valid for parts that implement the Directory. Table 2-48.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring • Definition: Accumulates the number of credits available to the QPI Link 2 BL Ingress buffer. IGR_NO_CREDIT_CYCLES • Title: Cycles without QPI Ingress Credits • Category: QPI_IGR_CREDITS Events • Event Code: 0x22 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Counts the number of cycles when the HA does not have credits to send messages to the QPI Agent.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring Table 2-52. Unit Masks for IMC_WRITES Extension FULL umask [15:8] bxxxxxxx1 Description Full Line Non-ISOCH PARTIAL bxxxxxx1x Partial Non-ISOCH FULL_ISOCH bxxxxx1xx ISOCH Full Line PARTIAL_ISOCH bxxxx1xxx ISOCH Partial ALL b00001111 All Writes IODC_CONFLICTS • Title: IODC Conflicts • Category: IODC Events • Event Code: 0x57 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Table 2-53.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring Table 2-54. Unit Masks for OSB Extension READS_LOCAL umask [15:8] bxxxxxx1x Description Local Reads INVITOE_LOCAL bxxxxx1xx Local InvItoE REMOTE bxxxx1xxx Remote OSB_EDR • Title: OSB Early Data Return • Category: OSB (Opportunistic Snoop Broadcast) Events • Event Code: 0x54 • Max. Inc/Cyc:.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring Table 2-56. Unit Masks for REQUESTS Extension umask [15:8] Description WRITES b00001100 Writes Incoming write requests. INVITOE_LOCAL bxxx1xxxx Local InvItoEs This filter includes only InvItoEs coming from the local socket. INVITOE_REMOTE bxx1xxxxx Remote InvItoEs This filter includes only InvItoEs coming from remote sockets. RING_AD_USED • Title: HA AD Ring in Use • Category: RING Events • Event Code: 0x3e • Max. Inc/Cyc:.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring • Definition: Counts the number of cycles that the AK ring is being used at this ring stop. This includes when packets are passing by and when packets are being sunk, but does not include when packets are being sent from the ring stop. • NOTE: On a 2 column implementation (e.g. 10C) CW_EVEN is actually CW_VR0_EVEN+CW_VR1_EVEN (similarly for CCW/ODD).
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring Table 2-59. Unit Masks for RING_BL_USED umask [15:8] Extension Description CCW_VR0_ODD bxxxx1xxx Counterclockwise and Odd on VRing 0 Filters for the Counterclockwise and Odd ring polarity on Virtual Ring 0. CW_VR1_EVEN bxxx1xxxx Clockwise and Even on VRing 1 Filters for the Clockwise and Even ring polarity on Virtual Ring 1.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring Table 2-61. Unit Masks for SNOOP_RESP Extension umask [15:8] Description RSPI bxxxxxxx1 RspI Filters for snoops responses of RspI. RspI is returned when the remote cache does not have the data, or when the remote cache silently evicts data (such as when an RFO hits non-modified data). RSPS bxxxxxx1x RspS Filters for snoop responses of RspS. RspS is returned when a remote cache has data but is not forwarding it.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring Table 2-62. Unit Masks for SNP_RESP_RECV_LOCAL Extension umask [15:8] Description RSPI bxxxxxxx1 RspI Filters for snoops responses of RspI. RspI is returned when the remote cache does not have the data, or when the remote cache silently evicts data (such as when an RFO hits non-modified data). RSPS bxxxxxx1x RspS Filters for snoop responses of RspS. RspS is returned when a remote cache has data but is not forwarding it.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring Table 2-63.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring Table 2-65. Unit Masks for TxR_AD_CYCLES_FULL Extension umask [15:8] Description SCHED0 bxxxxxxx1 Scheduler 0 Filter for cycles full from scheduler bank 0 SCHED1 bxxxxxx1x Scheduler 1 Filter for cycles full from scheduler bank 1 ALL bxxxxxx11 All Cycles full from both schedulers TxR_AK • Title: Outbound Ring Transactions on AK • Category: OUTBOUND_TX Events • Event Code: 0x0e • Max. Inc/Cyc:.
Uncore Performance Monitoring Home Agent (HA) Performance Monitoring Table 2-67. Unit Masks for TxR_BL Extension umask [15:8] Description DRS_CORE bxxxxxx1x Data to Core Filter for data being sent directly to the requesting core. DRS_QPI bxxxxx1xx Data to QPI Filter for data being sent to a remote socket over QPI. TxR_BL_CYCLES_FULL • Title: BL Egress Full • Category: EGRESS Events • Event Code: 0x36 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: BL Egress Full Table 2-68.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring regular and special buffers at the same time. One can filter based on the memory controller channel. One or more channels can be tracked at a given time. Table 2-70. Unit Masks for WPQ_CYCLES_NO_REG_CREDITS Extension umask [15:8] Description CHN0 b00000001 Channel 0 Filter for memory controller channel 0 only. CHN1 b00000010 Channel 1 Filter for memory controller channel 1 only.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring • Eight independent banks per rank • Support for DDR3 frequencies of 800,1067, 1333, 1600 GT/s. dependent on the number of DIMMs per channel.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring 2.5.4 iMC Performance Monitors Table 2-71.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring Table 2-72. MC_CHy_PCI_PMON_BOX_CTL Register – Field Definitions Field Bits Attr HW Reset Val Description ig 31:18 RV 0 Ignored rsv 17:16 RV 0 Reserved; SW must write to 1 else behavior is undefined. 15:9 RV 0 Ignored 8 WO 0 Freeze. If set to 1 the counters in this box will be frozen. ig frz 7:2 RV 0 Ignored rst_ctrs ig 1 WO 0 Reset Counters. When set to 1, the Counter Registers will be reset to 0.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring Table 2-74. MC_CHy_PCI_PMON_CTL{3-0} Register – Field Definitions Field thresh HW Reset Val Bits Attr Description 31:24 RW-V 0 Threshold used in counter comparison. rsv 23 RV 0 Reserved. SW must write to 0 else behavior is undefined. en 22 RW-V 0 Local Counter Enable. rsv 21 RV 0 Reserved. SW must write to 0 else behavior is undefined.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring Table 2-75. MC_CHy_PCI_PMON_FIXED_CTL Register – Field Definitions Field ig Bits 31:24 Attr RV HW Reset Val 0 Description Ignored rsv 23 RV 0 Reserved. SW must write to 0 else behavior is undefined. en 22 RW-V 0 Local Counter Enable. rsv 21 RV 0 Reserved. SW must write to 0 else behavior is undefined.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring Event Code Ctrs Max Inc/ Cyc DCLOCKTICKS 0x00 0-3 1 DRAM Clockticks ACT_COUNT 0x01 0-3 1 DRAM Activate Count PRE_COUNT 0x02 0-3 1 DRAM Precharge commands. Symbol Name 82 Description CAS_COUNT 0x04 0-3 1 DRAM RD_CAS and WR_CAS Commands.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring Event Code Ctrs Max Inc/ Cyc WR_CAS_RANK6 0xbe 0-3 1 WR_CAS Access to Rank 6 WR_CAS_RANK7 0xbf 0-3 1 WR_CAS Access to Rank 7 WMM_TO_RMM 0xc0 0-3 1 Transition from WMM to RMM because of low threshold WRONG_MM 0xc1 0-3 1 Not getting the requested Major Mode Symbol Name 2.5.7 Description iMC Box Common Metrics (Derived Events) The following table summarizes metrics commonly calculated from iMC Box events.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring 2.5.8 iMC Box Performance Monitor Event List The section enumerates performance monitoring events for the iMC Box. ACT_COUNT • Title: DRAM Activate Count • Category: ACT Events • Event Code: 0x01 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Counts the number of DRAM Activate commands sent on this channel.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring Table 2-79. Unit Masks for CAS_COUNT Extension umask [15:8] Description RD_REG bxxxxxxx1 All DRAM RD_CAS (w/ and w/out auto-pre) Counts the total number or DRAM Read CAS commands issued on this channel. This includes both regular RD CAS commands as well as those with implicit Precharge. AutoPre is only used in systems that are using closed page policy.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring Table 2-80. Unit Masks for DRAM_REFRESH Extension umask [15:8] PANIC bxxxxxx1x HIGH bxxxxx1xx Description ECC_CORRECTABLE_ERRORS • Title: ECC Correctable Errors • Category: ECC Events • Event Code: 0x09 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Counts the number of ECC errors detected and corrected by the iMC on this channel. This counter is only useful with ECC DRAM devices.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring POWER_CHANNEL_PPD • Title: Channel PPD Cycles • Category: POWER Events • Event Code: 0x85 • Max. Inc/Cyc:. 4, Register Restrictions: 0-3 • Definition: Number of cycles when all the ranks in the channel are in PPD mode. If IBT=off is enabled, then this can be used to count those cycles. If it is not enabled, then this can count the number of cycles when that could have been taken advantage of.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring POWER_SELF_REFRESH • Title: Clock-Enabled Self-Refresh • Category: POWER Events • Event Code: 0x43 • Max. Inc/Cyc:. 0, Register Restrictions: 0-3 • Definition: Counts the number of cycles when the iMC is in self-refresh and the iMC still has a clock. This happens in some package C-states. For example, the PCU may ask the iMC to enter selfrefresh even though some of the cores are still processing.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring Table 2-84. Unit Masks for PREEMPTION Extension umask [15:8] Description RD_PREEMPT_RD bxxxxxxx1 Read over Read Preemption Filter for when a read preempts another read. RD_PREEMPT_WR bxxxxxx1x Read over Write Preemption Filter for when a read preempts a write. PRE_COUNT • Title: DRAM Precharge commands. • Category: PRE Events • Event Code: 0x02 • Max. Inc/Cyc:.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring RD_CAS_RANK0 • Title: RD_CAS Access to Rank 0 • Category: CAS Events • Event Code: 0xb0 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Table 2-87.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring Table 2-89. Unit Masks for RD_CAS_RANK2 Extension BANK0 umask [15:8] bxxxxxxx1 Description Bank 0 BANK1 bxxxxxx1x Bank 1 BANK2 bxxxxx1xx Bank 2 BANK3 bxxxx1xxx Bank 3 BANK4 bxxx1xxxx Bank 4 BANK5 bxx1xxxxx Bank 5 BANK6 bx1xxxxxx Bank 6 BANK7 b1xxxxxxx Bank 7 RD_CAS_RANK3 • Title: RD_CAS Access to Rank 3 • Category: CAS Events • Event Code: 0xb3 • Max. Inc/Cyc:.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring Table 2-91. Unit Masks for RD_CAS_RANK4 Extension umask [15:8] Description BANK4 bxxx1xxxx Bank 4 BANK5 bxx1xxxxx Bank 5 BANK6 bx1xxxxxx Bank 6 BANK7 b1xxxxxxx Bank 7 RD_CAS_RANK5 • Title: RD_CAS Access to Rank 5 • Category: CAS Events • Event Code: 0xb5 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Table 2-92.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring RD_CAS_RANK7 • Title: RD_CAS Access to Rank 7 • Category: CAS Events • Event Code: 0xb7 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Table 2-94.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring VMSE_WR_PUSH • Title: VMSE WR PUSH issued • Category: VMSE Events • Event Code: 0x90 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Table 2-95.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring WPQ_INSERTS • Title: Write Pending Queue Allocations • Category: WPQ Events • Event Code: 0x20 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Counts the number of allocations into the Write Pending Queue. This can then be used to calculate the average queuing latency (in conjunction with the WPQ occupancy count). The WPQ is used to schedule write out to the memory controller and to track the writes.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring Table 2-97. Unit Masks for WR_CAS_RANK0 Extension BANK0 umask [15:8] bxxxxxxx1 Description Bank 0 BANK1 bxxxxxx1x Bank 1 BANK2 bxxxxx1xx Bank 2 BANK3 bxxxx1xxx Bank 3 BANK4 bxxx1xxxx Bank 4 BANK5 bxx1xxxxx Bank 5 BANK6 bx1xxxxxx Bank 6 BANK7 b1xxxxxxx Bank 7 WR_CAS_RANK1 • Title: WR_CAS Access to Rank 1 • Category: CAS Events • Event Code: 0xb9 • Max. Inc/Cyc:.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring Table 2-99. Unit Masks for WR_CAS_RANK2 Extension umask [15:8] Description BANK4 bxxx1xxxx Bank 4 BANK5 bxx1xxxxx Bank 5 BANK6 bx1xxxxxx Bank 6 BANK7 b1xxxxxxx Bank 7 WR_CAS_RANK3 • Title: WR_CAS Access to Rank 3 • Category: CAS Events • Event Code: 0xbb • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Table 2-100.
Uncore Performance Monitoring Memory Controller (iMC) Performance Monitoring WR_CAS_RANK5 • Title: WR_CAS Access to Rank 5 • Category: CAS Events • Event Code: 0xbd • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Table 2-102.
Uncore Performance Monitoring IRP Performance Monitoring Table 2-104. Unit Masks for WR_CAS_RANK7 umask [15:8] Extension BANK0 bxxxxxxx1 Description Bank 0 BANK1 bxxxxxx1x Bank 1 BANK2 bxxxxx1xx Bank 2 BANK3 bxxxx1xxx Bank 3 BANK4 bxxx1xxxx Bank 4 BANK5 bxx1xxxxx Bank 5 BANK6 bx1xxxxxx Bank 6 BANK7 b1xxxxxxx Bank 7 2.6 IRP PERFORMANCE MONITORING 2.6.1 Overview of the R2PCIe Box IRP is responsible for maintaining coherency for IIO traffic that needs to be coherent (e.g.
Uncore Performance Monitoring IRP Performance Monitoring PCICFG Address Register Name Size (bits) Description Generic Counters 2.6.3.1 IRP1_PCI_PMON_CTR1 C0 64 IRP 1 PMON Counter 1 IRP1_PCI_PMON_CTR0 B8 64 IRP 1 PMON Counter 0 IRP0_PCI_PMON_CTR1 B0 64 IRP 0 PMON Counter 1 IRP0_PCI_PMON_CTR0 A0 64 IRP 0 PMON Counter 0 IRP Box Level PMON State The following registers represent the state governing all box-level PMUs in the IRP Box.
Uncore Performance Monitoring IRP Performance Monitoring Table 2-108. IRP_PCI_PMON_CTL{3-0} Register – Field Definitions Field thresh Bits Attr 31:24 RW-V HW Reset Val 0 Description Threshold used in counter comparison. rsv 23 RV 0 Reserved. SW must write to 0 else behavior is undefined. en 22 RW-V 0 Local Counter Enable. rsv 21:20 RV 0 Reserved. SW must write to 0 else behavior is undefined.
Uncore Performance Monitoring IRP Performance Monitoring Event Code Ctrs Max Inc/ Cyc CLOCKTICKS 0x00 0-1 1 Clocks in the IRP RxR_BL_DRS_INSERTS 0x01 0-1 1 BL Ingress Occupancy - DRS RxR_BL_NCB_INSERTS 0x02 0-1 1 BL Ingress Occupancy - NCB BL Ingress Occupancy - NCS Symbol Name 2.6.
Uncore Performance Monitoring IRP Performance Monitoring Table 2-110. Unit Masks for ADDRESS_MATCH umask [15:8] Extension Description STALL_COUNT bxxxxxxx1 Conflict Stalls When it is not possible to merge two conflicting requests, a stall event occurs. This is bad for performance. MERGE_COUNT bxxxxxx1x Conflict Merges When two requests to the same address from the same source are received back to back, it is possible to merge the two of them together.
Uncore Performance Monitoring IRP Performance Monitoring CACHE_READ_OCCUPANCY • Title: Outstanding Read Occupancy • Category: WRITE_CACHE Events • Event Code: 0x10 • Max. Inc/Cyc:. 128, Register Restrictions: 0-1 • Definition: Accumulates the number of reads that are outstanding in the uncore in each cycle. This can be used with the read transaction count to calculate the average read latency in the uncore. The occupancy increments when a read request is issued, and decrements when the data is returned.
Uncore Performance Monitoring IRP Performance Monitoring Table 2-115. Unit Masks for CACHE_WRITE_OCCUPANCY Extension umask [15:8] Description ANY b00000001 Any Source Tracks all requests from any source port. SOURCE b00000010 Select Source Tracks only those requests that come from the port specified in the IRP_PmonFilter.OrderingQ register. This register allows one to select one specific queue. It is not possible to monitor multiple queues at a time.
Uncore Performance Monitoring IRP Performance Monitoring RxR_BL_DRS_INSERTS • Title: BL Ingress Occupancy - DRS • Category: BL_INGRESS_DRS Events • Event Code: 0x01 • Max. Inc/Cyc:. 1, Register Restrictions: 0-1 • Definition: Counts the number of allocations into the BL Ingress. This queue is where the IRP receives data from R2PCIe (the ring). It is used for data returns from read requests as well as outbound MMIO writes.
Uncore Performance Monitoring IRP Performance Monitoring • Definition: Counts the number of cycles when the BL Ingress is full. This queue is where the IRP receives data from R2PCIe (the ring). It is used for data returns from read requests as well as outbound MMIO writes. RxR_BL_NCS_INSERTS • Title: BL Ingress Occupancy - NCS • Category: BL_INGRESS_NCS Events • Event Code: 0x03 • Max. Inc/Cyc:. 1, Register Restrictions: 0-1 • Definition: Counts the number of allocations into the BL Ingress.
Uncore Performance Monitoring IRP Performance Monitoring Table 2-117. Unit Masks for TRANSACTIONS umask [15:8] Extension Filter Dep Description READS bxxxxxxx1 Reads Tracks only read requests (not including read prefetches). WRITES bxxxxxx1x Writes Tracks only write requests. Each write request should have a prefetch, so there is no need to explicitly track these requests. For writes that are tickled and have to retry, the counter will be incremented for each retry.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring TxR_REQUEST_OCCUPANCY • Title: Outbound Request Queue Occupancy • Category: OUTBOUND_REQUESTS Events • Event Code: 0x0d • Max. Inc/Cyc:. 1, Register Restrictions: 0-1 • Definition: Accumulates the number of outstanding outbound requests from the IRP to the switch (towards the devices). This can be used in conjunction with the allocations event in order to calculate average latency of outbound requests.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring (PCU_MSR_PMON_CTL{3:0}) to monitor any PCU event. The PCU counters can increment by a maximum of 4b per cycle. Two extra 64-bit counters are also provided by the PCU to track C-State Residence. Although documented in this manual for reference, these counters exist outside of the PMON infrastructure. For information on how to setup a monitoring session, refer to Section 2.1, “Uncore Per-Socket Performance Monitoring Control”. 2.7.2.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring 2.7.3.1 PCU Box Level PMON State The following registers represent the state governing all box-level PMUs in the PCU. In the case of the PCU, the PCU_MSR_PMON_BOX_CTL register provides the ability to manually freeze the counters in the box (.frz) and reset the generic state (.rst_ctrs and .rst_ctrl).
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring - .occ_invert - Changes the .thresh test condition to ‘<‘ for the occupancy events (when .ev_sel[7] is set to 1) - .occ_edge_det - Rather than accumulating the raw count each cycle (for events that can increment by 1 per cycle), the register can capture transitions from no event to an event incoming for the PCU’s occupancy events (when .ev_sel[7] is set to 1). Table 2-121.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring Field occ_sel Bits Attr 15:14 RW-V HW Reset Val 0 Description Select which of three occupancy counters to use. 01 - Cores in C0 10 - Cores in C3 11 - Cores in C6 rsv ev_sel 13:8 RV 0 Reserved 7:0 RW-V 0 Select event to be counted. NOTE: Bit 7 denotes whether the event requires the use of an occupancy subcounter. The PCU performance monitor data registers are 48-bit wide.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring Table 2-123.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring NOTE Given the nature of many of the PCU events, a great deal of additional information can be measured by setting the .edge_det bit. By doing so, an event such as “Cycles Changing Frequency” becomes “Number of Frequency Transitions.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring 116 Extra Select Bit Max Inc/ Cyc Event Code Ctrs VOLT_TRANS_CYCLES_INCREASE 0x01 0-3 0 1 Cycles Increasing Voltage VOLT_TRANS_CYCLES_DECREASE 0x02 0-3 0 1 Cycles Decreasing Voltage VOLT_TRANS_CYCLES_CHANGE 0x03 0-3 0 1 Cycles Changing Voltage FREQ_MAX_LIMIT_THERMAL_CYCLE S 0x04 0-3 0 1 Thermal Strongest Upper Limit Cycles FREQ_MAX_POWER_CYCLES 0x05 0-3 0 1 Power Strongest Upper Limit Cycles FREQ_MAX
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring Event Code Ctrs Extra Select Bit Max Inc/ Cyc Description CORE4_TRANSITION_CYCLES 0x74 0-3 0 1 Core 4 C State Transition Cycles CORE5_TRANSITION_CYCLES 0x75 0-3 0 1 Core 5 C State Transition Cycles CORE6_TRANSITION_CYCLES 0x76 0-3 0 1 Core 6 C State Transition Cycles CORE7_TRANSITION_CYCLES 0x77 0-3 0 1 Core 7 C State Transition Cycles CORE8_TRANSITION_CYCLES 0x78 0-3 0 1 Core 8 C State Transition Cyc
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring Symbol Name: Definition 2.7.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring • Definition: Number of cycles spent performing core C state transitions. There is one event per core. CORE12_TRANSITION_CYCLES • Title: Core 12 C State Transition Cycles • Category: CORE_C_STATE_TRANSITION Events • Event Code: 0x7c • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Number of cycles spent performing core C state transitions. There is one event per core.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring CORE4_TRANSITION_CYCLES • Title: Core 4 C State Transition Cycles • Category: CORE_C_STATE_TRANSITION Events • Event Code: 0x74 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Number of cycles spent performing core C state transitions. There is one event per core. CORE5_TRANSITION_CYCLES • Title: Core 5 C State Transition Cycles • Category: CORE_C_STATE_TRANSITION Events • Event Code: 0x75 • Max. Inc/Cyc:.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring DELAYED_C_STATE_ABORT_CORE0 • Title: Deep C State Rejection - Core 0 • Category: Delayed C-State Events • Event Code: 0x17 • Extra Select Bit: Y • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Number of times that a deep C state was requested, but the delayed C state algorithm “rejected” the deep sleep state. In other words, a wake event occurred before the timer expired that causes a transition into the deeper C state.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring DELAYED_C_STATE_ABORT_CORE13 • Title: Deep C State Rejection - Core 13 • Category: Delayed C-State Events • Event Code: 0x24 • Extra Select Bit: Y • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Number of times that a deep C state was requested, but the delayed C state algorithm “rejected” the deep sleep state. In other words, a wake event occurred before the timer expired that causes a transition into the deeper C state.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring DELAYED_C_STATE_ABORT_CORE5 • Title: Deep C State Rejection - Core 5 • Category: Delayed C-State Events • Event Code: 0x1c • Extra Select Bit: Y • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Number of times that a deep C state was requested, but the delayed C state algorithm “rejected” the deep sleep state. In other words, a wake event occurred before the timer expired that causes a transition into the deeper C state.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring DEMOTIONS_CORE0 • Title: Core 0 C State Demotions • Category: CORE_C_STATE_TRANSITION Events • Event Code: 0x1e • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Filter Dependency: PCUFilter[7:0] • Definition: Counts the number of times when a configurable cores had a C-state demotion DEMOTIONS_CORE1 • Title: Core 1 C State Demotions • Category: CORE_C_STATE_TRANSITION Events • Event Code: 0x1f • Max. Inc/Cyc:.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring DEMOTIONS_CORE14 • Title: Core 14 C State Demotions • Category: CORE_C_STATE_TRANSITION Events • Event Code: 0x46 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Filter Dependency: PCUFilter[7:0] • Definition: Counts the number of times when a configurable cores had a C-state demotion DEMOTIONS_CORE2 • Title: Core 2 C State Demotions • Category: CORE_C_STATE_TRANSITION Events • Event Code: 0x20 • Max. Inc/Cyc:.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring DEMOTIONS_CORE7 • Title: Core 7 C State Demotions • Category: CORE_C_STATE_TRANSITION Events • Event Code: 0x25 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Filter Dependency: PCUFilter[7:0] • Definition: Counts the number of times when a configurable cores had a C-state demotion DEMOTIONS_CORE8 • Title: Core 8 C State Demotions • Category: CORE_C_STATE_TRANSITION Events • Event Code: 0x40 • Max. Inc/Cyc:.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring with this event to track the number of times that we transitioned into a frequency greater than or equal to the configurable frequency. One can also use inversion to track cycles when we were less than the configured frequency. • NOTE: The PMON control registers in the PCU only update on a frequency transition. Changing the measuring threshold during a sample interval may introduce errors in the counts.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring FREQ_MAX_LIMIT_THERMAL_CYCLES • Title: Thermal Strongest Upper Limit Cycles • Category: FREQ_MAX_LIMIT Events • Event Code: 0x04 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Counts the number of cycles when thermal conditions are the upper limit on frequency. This is related to the THERMAL_THROTTLE CYCLES_ABOVE_TEMP event, which always counts cycles when we are above the thermal temperature.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring FREQ_TRANS_CYCLES • Title: Cycles spent changing Frequency • Category: FREQ_TRANS Events • Event Code: 0x60 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Counts the number of cycles when the system is changing frequency. This can not be filtered by thread ID.
Uncore Performance Monitoring Power Control (PCU) Performance Monitoring • Definition: Counts the number of cycles that we are in external PROCHOT mode. This mode is triggered when a sensor off the die determines that something off-die (like DRAM) is too hot and must throttle to avoid damaging the chip. PROCHOT_INTERNAL_CYCLES • Title: Internal Prochot • Category: PROCHOT Events • Event Code: 0x09 • Max. Inc/Cyc:.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring VR_HOT_CYCLES • Title: VR Hot • Category: VR_HOT Events • Event Code: 0x32 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: 2.8 INTEL® QPI LINK LAYER PERFORMANCE MONITORING 2.8.1 Overview of the Intel® QPI Box ® The Intel QPI Link Layer is responsible for packetizing requests from the caching agent on the way out to the system interface.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring U_MSR_PMON_GLOBAL_STATU.ov_q. Assuming all the counters have been locally enabled (.en bit in data registers meant to monitor events) and the overflow bit(s) has been cleared, the QPI Port is prepared for a new sample interval. Once the global controls have been re-enabled (Section 2.1.4, “Enabling a New Sample Interval from Frozen Counters”), counting will resume. 2.8.3 Intel® QPI Performance Monitors Table 2-128.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring 2.8.3.1 Intel® QPI Box Level PMON State The following registers represent the state governing all box-level PMUs in each Port of the Intel® QPI Box. In the case of the Intel® QPI Ports, the Q_Py_PCI_PMON_BOX_CTL register provides the ability to manually freeze the counters in the box (.frz) and reset the generic state (.rst_ctrs and .rst_ctrl).
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring Table 2-131. Q_Py_PCI_PMON_CTL{3-0} Register – Field Definitions Field thresh HW Reset Val Description RW-V 0 Threshold used in counter comparison. Bits Attr 31:24 rsv 23 RV 0 Reserved. SW must write to 0 else behavior is undefined. en 22 RW-V 0 Local Counter Enable. ev_sel_ext 21 RW-V 0 Extension bit to the Event Select field.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring a) Program the match/mask regs (see Table 2-133, “Q_Py_PCI_PMON_PKT_MATCH1 Registers” through Table 2-136, “Q_Py_PCI_PMON_PKT_MASK0 Registers”). b) Set the counter’s control register event select to 0x38 (CTO_COUNT) to capture the mask/match as a performance event. The following table contains the packet traffic that can be monitored if one of the mask/match registers was chosen to select the event. Table 2-133.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring Table 2-134. Q_Py_PCI_PMON_PKT_MATCH0 Registers (Sheet 2 of 2) Field OPC Bits HW Reset Val 8:5 0x0 Description Opcode DRS,NCB: [8] Packet Size, 0 == 9 flits, 1 == 11 flits NCS: [8] Packet Size, 0 == 1 or 2 flits, 1 == 3 flits See Section 2.11, “Packet Matching Reference” for a listing of opcodes that may be filtered per message class.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring Table 2-136. Q_Py_PCI_PMON_PKT_MASK0 Registers (Sheet 2 of 2) Field OPC Bits HW Reset Val 8:5 0x0 Description Opcode See Section 2.11, “Packet Matching Reference” for a listing of opcodes that may be filtered per message class. VNW 4:3 0x0 Virtual Network --- 2:0 0x0 Reserved; Must write to 0 else behavior is undefined.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring Table 2-137. Message Events Derived from the Match/Mask filters Match [12:0] Mask [12:0] DRS.DataC_F_FrcAc kCnflt 0x1C20 && Match1 [19:16] 0x1 0x1FE0 && Mask1 [19:16] 0xF Force Acknowledge Data Response message of a cache line in F state that is response to a core request. The DRS.DataC_F messages are only sent to Intel® QPI. DRS.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring Field Bits qpi_rate 2:0 Attr RO-V HW Reset Val 11b Description QPI Rate This reflects the current QPI rate setting into the PLL 010 - 5.6 GT/s 011 - 6.4 GT/s 100 - 7.2 GT/s 101 - 8 GT/s 110 - 8.8 GT/s 111 - 9.6 GT/s other - Reserved 2.8.4 Intel® QPI LL Performance Monitoring Events 2.8.4.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring Event Code Ctrs Extra Select Bit Max Inc/ Cyc RxL0_POWER_CYCLES 0x0f 0-3 0 1 Cycles in L0 RxL0P_POWER_CYCLES 0x10 0-3 0 1 Cycles in L0p Symbol Name 140 Description L1_POWER_CYCLES 0x12 0-3 0 1 Cycles in L1 DIRECT2CORE 0x13 0-3 0 1 Direct 2 Core Spawning CLOCKTICKS 0x14 0-3 0 1 Number of qfclks TxL_FLITS_G1 0x00 0-3 1 2 Flits Transferred - Group 1 TxL_FLITS_G2 0x01 0-3 1 2 Flits Transfe
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring Event Code Ctrs Extra Select Bit Max Inc/ Cyc TxR_AK_NDR_CREDIT_ACQUIRED 0x29 0-3 1 1 R3QPI Egress Credit Occupancy - AK NDR TxR_BL_DRS_CREDIT_ACQUIRED 0x2a 0-3 1 1 R3QPI Egress Credit Occupancy - DRS TxR_BL_NCB_CREDIT_ACQUIRED 0x2b 0-3 1 1 R3QPI Egress Credit Occupancy - NCB TxR_BL_NCS_CREDIT_ACQUIRED 0x2c 0-3 1 1 R3QPI Egress Credit Occupancy - NCS CTO_COUNT 0x38 0-3 1 2 Count of CTO Events RxL_CR
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring Symbol Name: Definition 142 Equation DRS_F_OR_E_FROM_QPI: DRS response in F or E states received from QPI in bytes. To calculate the total data response for each cache line state, it's necessary to add the contribution from three flavors {DataC, DataC_FrcAckCnflt, DataC_Cmp} of data response packets for each cache line state.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring Symbol Name: Definition 2.8.7 Equation NCB_DATA_MSGS_FROM_QPI: NCB Data Messages From QPI in bytes (RxL_FLITS_G2.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring DIRECT2CORE • Title: Direct 2 Core Spawning • Category: DIRECT2CORE Events • Event Code: 0x13 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Counts the number of DRS packets that we attempted to do direct2core on. There are 4 mutually exclusive filters. Filter [0] can be used to get successful spawns, while [1:3] provide the different failure cases.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring RxL0P_POWER_CYCLES • Title: Cycles in L0p • Category: POWER_RX Events • Event Code: 0x10 • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Number of QPI qfclk cycles spent in L0p power mode. L0p is a mode where we disable 1/2 of the QPI lanes, decreasing our bandwidth in order to save power. It increases snoop and data transfer latencies and decreases overall bandwidth.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring Table 2-140. Unit Masks for RxL_CREDITS_CONSUMED_VN0 umask [15:8] Extension Description NCS bxxxxx1xx NCS VN0 credit for the NCS message class. HOM bxxxx1xxx HOM VN0 credit for the HOM message class. SNP bxxx1xxxx SNP VN0 credit for the SNP message class. NDR bxx1xxxxx NDR VN0 credit for the NDR message class.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring RxL_CYCLES_NE • Title: RxQ Cycles Not Empty • Category: RXQ Events • Event Code: 0x0a • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Counts the number of cycles that the QPI RxQ was not empty. Generally, when data is transmitted across QPI, it will bypass the RxQ and pass directly to the ring interface.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring RxL_FLITS_G1 • Title: Flits Received - Group 1 • Category: FLITS_RX Events • Event Code: 0x02 • Extra Select Bit: Y • Max. Inc/Cyc:. 2, Register Restrictions: 0-3 • Definition: Counts the number of flits received from the QPI Link. This is one of three “groups” that allow us to track flits. It includes filters for SNP, HOM, and DRS message classes. Each “flit” is made up of 80 bits of information (in addition to some ECC data).
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring RxL_FLITS_G2 • Title: Flits Received - Group 2 • Category: FLITS_RX Events • Event Code: 0x03 • Extra Select Bit: Y • Max. Inc/Cyc:. 2, Register Restrictions: 0-3 • Definition: Counts the number of flits received from the QPI Link. This is one of three “groups” that allow us to track flits. It includes filters for NDR, NCB, and NCS message classes. Each “flit” is made up of 80 bits of information (in addition to some ECC data).
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring RxL_INSERTS_DRS • Title: Rx Flit Buffer Allocations - DRS • Category: RXQ Events • Event Code: 0x09 • Extra Select Bit: Y • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Number of allocations into the QPI Rx Flit Buffer. Generally, when data is transmitted across QPI, it will bypass the RxQ and pass directly to the ring interface.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring Table 2-147. Unit Masks for RxL_INSERTS_NCB Extension umask [15:8] Description VN0 bxxxxxxx1 for VN0 VN1 bxxxxxx1x for VN1 RxL_INSERTS_NCS • Title: Rx Flit Buffer Allocations - NCS • Category: RXQ Events • Event Code: 0x0b • Extra Select Bit: Y • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Number of allocations into the QPI Rx Flit Buffer.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring • Definition: Number of allocations into the QPI Rx Flit Buffer. Generally, when data is transmitted across QPI, it will bypass the RxQ and pass directly to the ring interface. If things back up getting transmitted onto the ring, however, it may need to allocate into this buffer, thus increasing the latency.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring event to calculate average occupancy, or with the Flit Buffer Allocations event to track average lifetime. This monitors HOM flits only. Table 2-152. Unit Masks for RxL_OCCUPANCY_HOM Extension umask [15:8] Description VN0 bxxxxxxx1 for VN0 VN1 bxxxxxx1x for VN1 RxL_OCCUPANCY_NCB • Title: RxQ Occupancy - NCB • Category: RXQ Events • Event Code: 0x16 • Extra Select Bit: Y • Max. Inc/Cyc:.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring RxL_OCCUPANCY_NDR • Title: RxQ Occupancy - NDR • Category: RXQ Events • Event Code: 0x1a • Extra Select Bit: Y • Max. Inc/Cyc:. 128, Register Restrictions: 0-3 • Definition: Accumulates the number of elements in the QPI RxQ in each cycle. Generally, when data is transmitted across QPI, it will bypass the RxQ and pass directly to the ring interface.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring TxL0_POWER_CYCLES • Title: Cycles in L0 • Category: POWER_TX Events • Event Code: 0x0c • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Number of QPI qfclk cycles spent in L0 power mode in the Link Layer. L0 is the default mode which provides the highest performance with the most power. Use edge detect to count the number of instances that the link entered L0.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring Table 2-157. Unit Masks for TxL_FLITS_G0 Extension umask [15:8] Description DATA b00000010 Data Tx Flits Number of data flits transmitted over QPI. Each flit contains 64b of data. This includes both DRS and NCB data flits (coherent and noncoherent). This can be used to calculate the data bandwidth of the QPI link.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring Table 2-158. Unit Masks for TxL_FLITS_G1 Extension umask [15:8] Description DRS_DATA b00001000 DRS Data Flits Counts the total number of data flits transmitted over QPI on the DRS (Data Response) channel. DRS flits are used to transmit data with coherency. This does not count data flits transmitted over the NCB channel which transmits non-coherent data. This includes only the data flits (not the header).
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring Table 2-159. Unit Masks for TxL_FLITS_G2 Extension umask [15:8] Description NCB_NONDATA b00001000 Non-Coherent non-data Tx Flits Number of Non-Coherent Bypass non-data flits. These packets are generally used to transmit non-coherent data across QPI, and the flits counted here are for headers and other non-data flits. This includes extended headers.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring TxR_AD_HOM_CREDIT_OCCUPANCY • Title: R3QPI Egress Credit Occupancy - AD HOM • Category: R3QPI_EGRESS_CREDITS Events • Event Code: 0x22 • Extra Select Bit: Y • Max. Inc/Cyc:. 28, Register Restrictions: 0-3 • Definition: Occupancy event that tracks the number of link layer credits into the R3 (for transactions across the BGF) available in each cycle. Flow Control FIFO for HOM messages on AD. Table 2-161.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring TxR_AD_SNP_CREDIT_ACQUIRED • Title: R3QPI Egress Credit Occupancy - SNP • Category: R3QPI_EGRESS_CREDITS Events • Event Code: 0x27 • Extra Select Bit: Y • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Number of link layer credits into the R3 (for transactions across the BGF) acquired each cycle. Flow Control FIFO for Snoop messages on AD. Table 2-164.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring TxR_BL_DRS_CREDIT_ACQUIRED • Title: R3QPI Egress Credit Occupancy - DRS • Category: R3QPI_EGRESS_CREDITS Events • Event Code: 0x2a • Extra Select Bit: Y • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Number of credits into the R3 (for transactions across the BGF) acquired each cycle. DRS message class to BL Egress. Table 2-166.
Uncore Performance Monitoring Intel® QPI Link Layer Performance Monitoring TxR_BL_NCB_CREDIT_OCCUPANCY • Title: R3QPI Egress Credit Occupancy - BL NCB • Category: R3QPI_EGRESS_CREDITS Events • Event Code: 0x20 • Extra Select Bit: Y • Max. Inc/Cyc:. 2, Register Restrictions: 0-3 • Definition: Occupancy event that tracks the number of credits into the R3 (for transactions across the BGF) available in each cycle. NCB message class to BL Egress. Table 2-169.
Uncore Performance Monitoring R2PCIe Performance Monitoring VNA_CREDIT_RETURNS • Title: VNA Credits Returned • Category: VNA_CREDIT_RETURN Events • Event Code: 0x1c • Extra Select Bit: Y • Max. Inc/Cyc:. 1, Register Restrictions: 0-3 • Definition: Number of VNA credits returned. VNA_CREDIT_RETURN_OCCUPANCY • Title: VNA Credits Pending Return - Occupancy • Category: VNA_CREDIT_RETURN Events • Event Code: 0x1b • Extra Select Bit: Y • Max. Inc/Cyc:.
Uncore Performance Monitoring R2PCIe Performance Monitoring 2.9.3 R2PCIe Performance Monitors Table 2-172.
Uncore Performance Monitoring R2PCIe Performance Monitoring Table 2-174. R2_PCI_PMON_BOX_STATUS Register – Field Definitions Field 2.9.3.2 Bits Attr HW Reset Val Description ig 31:4 RV 0 Ignored ov 3:0 RW1C 0 If an overflow is detected from the corresponding R2_PCI_PMON_CTR register, it’s overflow bit will be set. NOTE: Write of ‘1’ will clear the bit. R2PCIe PMON state - Counter/Control Pairs The following table defines the layout of the R2PCIe performance monitor control registers.
Uncore Performance Monitoring R2PCIe Performance Monitoring Table 2-176. R2_PCI_PMON_CTR{3-0} Register – Field Definitions Field Bits ig event_count Attr HW Reset Val Description 63:44 RV 0 Ignored 43:0 RW-V 0 44-bit performance event counter 2.9.4 R2PCIe Performance Monitoring Events 2.9.4.1 An Overview R2PCIe provides events to track information related to all the traffic passing through it’s boundaries.
Uncore Performance Monitoring R2PCIe Performance Monitoring Symbol Name: Definition 2.9.7 Equation CYC_USED_UPEVEN: Cycles Used in the Up direction, Even polarity RING_BL_USED.CW_EVEN / SAMPLE_INTERVAL CYC_USED_UPODD: Cycles Used in the Up direction, Odd polarity RING_BL_USED.CW_ODD / SAMPLE_INTERVAL RING_THRU_DNEVEN_BYTES: Ring throughput in the Down direction, Even polarity in Bytes RING_BL_USED.
Uncore Performance Monitoring R2PCIe Performance Monitoring Table 2-177. Unit Masks for RING_AD_USED Extension umask [15:8] Description CCW_VR0_EVEN bxxxxx1xx Counterclockwise and Even on VRing 0 Filters for the Counterclockwise and Even ring polarity on Virtual Ring 0. CCW_VR0_ODD bxxxx1xxx Counterclockwise and Odd on VRing 0 Filters for the Counterclockwise and Odd ring polarity on Virtual Ring 0.
Uncore Performance Monitoring R2PCIe Performance Monitoring Table 2-178. Unit Masks for RING_AK_USED Extension umask [15:8] Description CCW_VR1_ODD b1xxxxxxx Counterclockwise and Odd on VRing 1 Filters for the Counterclockwise and Odd ring polarity on Virtual Ring 1. CCW b11001100 Counterclockwise RING_BL_USED • Title: R2 BL Ring in Use • Category: RING Events • Event Code: 0x09 • Max. Inc/Cyc:.
Uncore Performance Monitoring R2PCIe Performance Monitoring used is dependent on the system programming. Therefore, one should generally set both the UP and DN bits for a given polarity (or both) at a given time. Table 2-180.
Uncore Performance Monitoring R2PCIe Performance Monitoring Occupancy Accumulator event in order to calculate average queue latency. Multiple ingress buffers can be tracked at a given time using multiple counters. Table 2-183. Unit Masks for RxR_INSERTS Extension umask [15:8] Description NCB bxxx1xxxx NCB NCB Ingress Queue NCS bxx1xxxxx NCS NCS Ingress Queue RxR_OCCUPANCY • Title: Ingress Occupancy Accumulator • Category: INGRESS Events • Event Code: 0x13 • Max. Inc/Cyc:.
Uncore Performance Monitoring R2PCIe Performance Monitoring a single Egress queue can be tracked at any given time. It is not possible to filter based on direction or polarity. Table 2-186. Unit Masks for TxR_CYCLES_NE Extension umask [15:8] Description AD bxxxxxxx1 AD AD Egress Queue AK bxxxxxx1x AK AK Egress Queue BL bxxxxx1xx BL BL Egress Queue TxR_NACK_CCW • Title: Egress CCW NACK • Category: EGRESS Events • Event Code: 0x28 • Max. Inc/Cyc:.
Uncore Performance Monitoring R3QPI Performance Monitoring 2.10 R3QPI PERFORMANCE MONITORING 2.10.1 Overview of the R3QPI Box R3QPI is the interface between the Intel® QPI Link Layer, which packetizes requests, and the Ring. R3QPI is the interface between the ring and the Intel® QPI Link Layer. It is responsible for translating between ring protocol packets and flits that are used for transmitting data across the Intel® QPI interface.
Uncore Performance Monitoring R3QPI Performance Monitoring Once a freeze has occurred, in order to see a new freeze, the overflow field responsible for the freeze, must be cleared by setting the corresponding bit in R3_Ly_PCI_PMON_BOX_STATUS.ov and U_MSR_PMON_GLOBAL_STATU.ov_rq. Assuming all the counters have been locally enabled (.en bit in data registers meant to monitor events) and the overflow bit(s) has been cleared, the R3QPI Link is prepared for a new sample interval.
Uncore Performance Monitoring R3QPI Performance Monitoring Table 2-190. R3_Ly_PCI_PMON_BOX_CTL Register – Field Definitions Field ig Bits HW Reset Val Description 31:9 RV 0 Ignored 8 WO 0 Freeze. If set to 1 the counters in this box will be frozen. frz ig Attr 7:2 RV 0 Ignored rst_ctrs 1 WO 0 Reset Counters. When set to 1, the Counter Registers will be reset to 0. rst_ctrl 0 WO 0 Reset Control. When set to 1, the Counter Control Registers will be reset to 0. U Table 2-191.
Uncore Performance Monitoring R3QPI Performance Monitoring Table 2-192. R3_Ly_PCI_PMON_CTL{2-0} Register – Field Definitions Field thresh HW Reset Val Description RW-V 0 Threshold used in counter comparison. Bits Attr 31:24 rsv 23 RV 0 Reserved. SW must write to 0 else behavior is undefined. en 22 RW-V 0 Local Counter Enable. rsv 21 RV 0 Reserved. SW must write to 0 else behavior is undefined.
Uncore Performance Monitoring R3QPI Performance Monitoring 2.10.5 R3QPI Box Events Ordered By Code The following table summarizes the directly measured R3QPI Box events. Symbol Name CLOCKTICKS 2.10.
Uncore Performance Monitoring R3QPI Performance Monitoring CLOCKTICKS • Title: Number of uclks in domain • Category: UCLK Events • Event Code: 0x01 • Max. Inc/Cyc:. 0, Register Restrictions: 0-2 • Definition: Counts the number of uclks in the QPI uclk domain. This could be slightly different than the count in the Ubox because of enable/freeze delays. However, because the QPI Agent is close to the Ubox, they generally should not diverge by more than a handful of cycles.
Uncore Performance Monitoring R3QPI Performance Monitoring HA_R2_BL_CREDITS_EMPTY • Title: HA/R2 AD Credits Empty • Category: EGRESS Credit Events • Event Code: 0x2f • Max. Inc/Cyc:. 1, Register Restrictions: 0-1 • Definition: No credits available to send to either HA or R2 on the BL Ring • NOTE: Counter 0 counts lack of credits to the lesser numbered Cboxes (0-8) Counter 1 counts lack of credits to Cbox to the higher numbered CBoxes (8-13,15+17,16+18). Table 2-196.
Uncore Performance Monitoring R3QPI Performance Monitoring Table 2-198. Unit Masks for QPI0_BL_CREDITS_EMPTY umask [15:8] Extension VN0_SNP bxxxxx1xx Description VN0 SNP Messages VN0_NDR bxxxx1xxx VN0 NDR Messages VN1_HOM bxxx1xxxx VN1 HOM Messages VN1_SNP bxx1xxxxx VN1 SNP Messages VN1_NDR bx1xxxxxx VN1 NDR Messages QPI1_AD_CREDITS_EMPTY • Title: QPI1 AD Credits Empty • Category: EGRESS Credit Events • Event Code: 0x2a • Max. Inc/Cyc:.
Uncore Performance Monitoring R3QPI Performance Monitoring RING_AD_USED • Title: R3 AD Ring in Use • Category: RING Events • Event Code: 0x07 • Max. Inc/Cyc:. 1, Register Restrictions: 0-2 • Definition: Counts the number of cycles that the AD ring is being used at this ring stop. This includes when packets are passing by and when packets are being sunk, but does not include when packets are being sent from the ring stop. • NOTE: On a 2 column implementation (e.g.
Uncore Performance Monitoring R3QPI Performance Monitoring Table 2-202. Unit Masks for RING_AK_USED Extension umask [15:8] Description CW b00110011 Clockwise CCW b11001100 Counterclockwise RING_BL_USED • Title: R3 BL Ring in Use • Category: RING Events • Event Code: 0x09 • Max. Inc/Cyc:. 1, Register Restrictions: 0-2 • Definition: Counts the number of cycles that the BL ring is being used at this ring stop.
Uncore Performance Monitoring R3QPI Performance Monitoring Table 2-204. Unit Masks for RING_IV_USED Extension umask [15:8] Description CCW b11001100 Counterclockwise Filters for Counterclockwise polarity ANY b11111111 Any Filters any polarity RxR_AD_BYPASSED • Title: AD Ingress Bypassed • Category: INGRESS Events • Event Code: 0x12 • Max. Inc/Cyc:.
Uncore Performance Monitoring R3QPI Performance Monitoring Table 2-206. Unit Masks for RxR_INSERTS Extension umask [15:8] Description HOM bxxxxxxx1 HOM HOM Ingress Queue SNP bxxxxxx1x SNP SNP Ingress Queue NDR bxxxxx1xx NDR NDR Ingress Queue DRS bxxxx1xxx DRS DRS Ingress Queue NCB bxxx1xxxx NCB NCB Ingress Queue NCS bxx1xxxxx NCS NCS Ingress Queue RxR_OCCUPANCY • Title: Ingress Occupancy Accumulator • Category: INGRESS Events • Event Code: 0x13 • Max. Inc/Cyc:.
Uncore Performance Monitoring R3QPI Performance Monitoring TxR_CYCLES_NE • Title: Egress Cycles Not Empty • Category: EGRESS Events • Event Code: 0x23 • Max. Inc/Cyc:. 1, Register Restrictions: 0-1 • Definition: Counts the number of cycles when the QPI Egress is not empty. This tracks one of the three rings that are used by the QPI agent. This can be used in conjunction with the QPI Egress Occupancy Accumulator event in order to calculate average queue occupancy.
Uncore Performance Monitoring R3QPI Performance Monitoring • Definition: Number of times a request failed to acquire a DRS VN0 credit. In order for a request to be transferred across QPI, it must be guaranteed to have a flit buffer on the remote socket to sink into. There are two credit pools, VNA and VN0. VNA is a shared pool used to achieve high performance. The VN0 pool has reserved entries for each message class and is used to prevent deadlock.
Uncore Performance Monitoring R3QPI Performance Monitoring Table 2-211. Unit Masks for VN0_CREDITS_USED Extension umask [15:8] Description DRS bxxxx1xxx DRS Message Class Filter for Data Response (DRS). DRS is generally used to transmit data with coherency. For example, remote reads and writes, or cache to cache transfers will transmit their data using DRS. NCB bxxx1xxxx NCB Message Class Filter for Non-Coherent Broadcast (NCB). NCB is generally used to transmit data without coherency.
Uncore Performance Monitoring R3QPI Performance Monitoring VN1_CREDITS_USED • Title: VN1 Credit Used • Category: LINK_VN1_CREDITS Events • Event Code: 0x38 • Max. Inc/Cyc:. 1, Register Restrictions: 0-1 • Definition: Number of times a VN1 credit was used on the DRS message channel. In order for a request to be transferred across QPI, it must be guaranteed to have a flit buffer on the remote socket to sink into. There are two credit pools, VNA and VN1. VNA is a shared pool used to achieve high performance.
Uncore Performance Monitoring R3QPI Performance Monitoring Table 2-214. Unit Masks for VNA_CREDITS_ACQUIRED umask [15:8] Extension Description AD bxxxxxxx1 HOM Message Class Filter for the Home (HOM) message class. HOM is generally used to send requests, request responses, and snoop responses. BL bxxxxx1xx HOM Message Class Filter for the Home (HOM) message class. HOM is generally used to send requests, request responses, and snoop responses.
Uncore Performance Monitoring Packet Matching Reference be transmitted, as those holding VN0 credits will still (potentially) be able to transmit. Generally it is the goal of the uncore that VNA credits should not run out, as this can substantially throttle back useful QPI bandwidth. VNA_CREDIT_CYCLES_USED • Title: Cycles with 1 or more VNA credits in use • Category: LINK_VNA_CREDITS Events • Event Code: 0x32 • Max. Inc/Cyc:.
Uncore Performance Monitoring Packet Matching Reference Opc HOM0 HOM1 NDR SNP 1001 AckCnfltWbI RspFwdI FrcAckCnlft (only from xNCs) --- 1010 RdDataMigratory RspFwdS Cmp_FwdCode SnpDataMigratory (only for xNCs) 1011 --- RspFwdIWb Cmp_FwdInvOwn (only for xNCs) --- 1100 WbMtoI RspFwdSWb Cmp_FwdInvItoE --- 1101 WbMtoE RspIWb --- --- 1110 WbMtoS (only from xNCs) RspSWb --- --- 1111 AckCnflt --- --- PrefetchHint Opc NCS NCB DRS 0000 NcRd NcWr DataC_(FEIMS) 0001 In
Uncore Performance Monitoring Packet Matching Reference Opc MC Gen By? Cmp_FwdCode 1010 NDR Ci,Ho Complete request, forward the line in F (or S) state to the requestor specified, invalidate local copy or leave it in S state. Cmp_FwdInvItoE 1100 NDR Ci,Ho Complete request, invalidate local copy Cmp_FwdInvOwn 1011 NDR Ci DataC_(FEIMS) 0000 DRS Ci, Co,Ho Data Response in (FEIMS) state NOTE: Set RDS field to specify which state is to be measured.
Uncore Performance Monitoring Packet Matching Reference Opc MC Gen By? NcP2PB 1110 NCB Ui,Uoi NcP2PS 1101 NCS Name Desc Peer-to-peer transaction between I/O entities (noncoherent bypass channel) Peer-to-peer transaction between I/O entities.
Uncore Performance Monitoring Packet Matching Reference Name 194 Gen By? Opc MC Desc RspIWb 1101 HOM1 Co,Hi, Ho Peer has evicted the data with an in-flight WbIData[Ptl] message to the home and has not sent any message to the requestor.