GPA Manual for G-series RVUs

ManualsBrandsHP ManualsServerHP NonStop G-Series

Guardian Performance Analyzer (GPA) Manual – (544977-001) Page 1 of 106

Guardian Performance

Analyzer (GPA) Manual for

G-series RVUs

Abstract

GPA consolidates and analyzes system performance data collected by Measure, the

HP NonStop™ system performance measuring product. On the basis of its analysis,

GPA indicates how well or how poorly the system is performing and makes

recommendations for improving the overall performance of the system.

Product Version

D40ABG

Supported Release Version Updates (RVUs)

This publication supports G06.29 and all subsequent G-series RVUs until otherwise

indicated by its replacement publication.

Part Number Published

544977-001 January 2008

Summary of content (106 pages)

PAGE 1
Guardian Performance Analyzer (GPA) Manual for G-series RVUs Abstract GPA consolidates and analyzes system performance data collected by Measure, the HP NonStop™ system performance measuring product. On the basis of its analysis, GPA indicates how well or how poorly the system is performing and makes recommendations for improving the overall performance of the system. Product Version D40ABG Supported Release Version Updates (RVUs) This publication supports G06.
PAGE 2
Document History Part Number Product Version Published 135081 GPA D00 and G00 January 1998 089503 C30.
PAGE 3
Table of Contents Guardian Performance Analyzer (GPA) Manual.................................................................1 Abstract ...........................................................................................................................1 Product Version ...............................................................................................................1 Supported Release Version Updates (RVUs)..................................................................
PAGE 4
Processor Load Balance and Performance Charts .......................................................41 Disk Volume Performance Analysis...............................................................................44 Cache Performance Analysis ........................................................................................46 Disk Subprocess Analysis and Recommendations .......................................................47 Processor/Disk Configuration Diagram.....................................
PAGE 5
List of Figures and Tables Figure 1-1. How GPA Works ..............................................................................................17 Figure 2-1. GPA Analysis and Tuning Procedure ...............................................................21 Figure 2-2. Code Characters in a GPA Text Statement Paragraph ....................................24 Example 3-1. Node Characteristics ....................................................................................34 Example 3-2.
PAGE 6
Example 4-21. Disk Volume Performance Analysis for \NODEB After Primary Changes...77 Example 4-22. Processor/Disk Configuration Diagram for \NODEC Before Primary Changes ........................................................................................................................77 Example 4-23. Processor/Disk Configuration Diagram for \NODEC After Primary Changes ......................................................................................................................................
PAGE 7
What’s New in This Manual New and Changed Information The G-series version of GPA will cease active development with this release. There are no plans to support the new ZMS measure data format on the G-series; instead, GPA will only support the legacy format on G-series.
PAGE 8
About This Manual This manual describes the Guardian Performance Analyzer (GPA) and tells you how to install and use it on a NonStop system. The manual is intended mainly for system performance analysts, system managers, and others responsible for the performance of NonStop systems. We assume in this manual that you are familiar with the Guardian operating system and with Measure, the NonStop system performance measuring product.
PAGE 9
Notation Conventions Hypertext Links Blue underline is used to indicate a hypertext link within text. By clicking a passage of text with a blue underline, you are taken to the location described. For example: This requirement is described under Backup DAM Volumes and Physical Disk Drives on page 3-2. General Syntax Notation This list summarizes the notation conventions for syntax presentation in this manual. UPPERCASE LETTERS Uppercase letters indicate keywords and reserved words.
PAGE 10
TERM [\system-name.]$terminal-name INT[ERRUPTS] A group of items enclosed in brackets is a list from which you can choose one item or none. The items in the list can be arranged either vertically, with aligned brackets on each side of the list, or horizontally, enclosed in a pair of brackets and separated by vertical lines. For example: FC [ num ] [ -num ] [ text ] K [ X | D ] address { } Braces A group of items enclosed in braces is a list from which you are required to choose one item.
PAGE 11
Punctuation Parentheses, commas, semicolons, and other symbols not previously described must be typed as shown. For example: error := NEXTFILENAME ( file-name ) ; LISTOPENS SU $process-name.#su-name Quotation marks around a symbol such as a bracket or brace indicate the symbol is a required character that you must type as shown. For example: "[" repetition-constant-list "]" Item Spacing Spaces shown between items are required unless one of the items is a punctuation symbol such as a parenthesis or a comma.
PAGE 12
error := COMPRESSEDIT ( filenum ) ; !i,o !i:i In procedure calls, the !i:i notation follows an input string parameter that has a corresponding parameter specifying the length of the string in bytes. For example: error := FILENAME_COMPARE_ ( filename1:length , filename2:length ) ; !i:i !i:i !o:i In procedure calls, the !o:i notation follows an output buffer parameter that has a corresponding input parameter specifying the maximum length of the output buffer in bytes.
PAGE 13
[ ] Brackets Brackets enclose items that are sometimes, but not always, displayed. For example: Event number = number [ Subject = first-subject-value ] A group of items enclosed in brackets is a list of all possible items that can be displayed, of which one or none might actually be displayed. The items in the list can be arranged either vertically, with aligned brackets on each side of the list, or horizontally, enclosed in a pair of brackets and separated by vertical lines.
PAGE 14
Notation for Management Programming Interfaces This list summarizes the notation conventions used in the boxed descriptions of programmatic commands, event messages, and error lists in this manual. UPPERCASE LETTERS Uppercase letters indicate names from definition files. Type these names exactly as shown. For example: ZCOM-TKN-SUBJ-SERV lowercase letters Words in lowercase letters are words that are part of the notation, including Data Definition Language (DDL) keywords.
PAGE 15
Section 1: Introducing GPA The Guardian Performance Analyzer (GPA) is a software tool designed specifically for use by system performance analysts, system operations managers, or other persons responsible for the proper performance of a NonStop system. GPA consolidates and analyzes system performance data collected by Measure, the NonStop system performance measuring product.
PAGE 16
• Makes recommendations for tuning the system. • Provides PUP or SCF input (PUPIN/SCFIN) control statements for implementing tuning recommendations and a PUP or SCF backout file (PUPBAK/SCFBAK) to restore the system to its previous state, depending on the operating system. • Enables you to choose the optional reports that meet your specific requirements. • Analyzes any NonStop system node with up to 16 processors of any type.
PAGE 17
Figure 1-1. How GPA Works The GPA module is essentially a model of a NonStop system node. In performing its analysis, GPA breaks the system down into the following subsystems: CPU, memory, disk volume, and disk cache. GPA then assesses the performance of the system as a whole and also separately analyzes each subsystem with respect to a number of relevant performance parameters such as CPU utilization, disk volume queuing, cache hit percentage, and so on.
PAGE 18
among the CPUs in the system in order to help balance the load on the CPUs. The SETCACHE commands are used to redistribute the cache allocations for the volumes to improve the operating efficiency of the system. PUPBAK is a PUP INPUT file that can be run after PUPIN, if necessary, to restore the system to its original measured state. SCFIN and SCFBAK have the same function as PUPIN and PUPBAK, but the PUP utility is used on systems prior to G02. SCF is used on G02 and later operating systems.
PAGE 19
Optional Report Sections In the GPA Optional Report, you specify the level of additional detailed information to be provided. You choose the type of output you want for each report section. Three output types are available: • Noprint: Skip this optional report section. • Summary: A breakdown of the number of processes in each CPU for each of the following reports (each of which is a class of process).
PAGE 20
GPA Requirements GPA does not have to be run on the system being analyzed; the Measure data can be moved to another NonStop system. GPA requires the use of the following NonStop system products: Item NonStop system Product Number Measure* T9086 SORT** or FASTSORT** (part of Guardian) * Must be installed on the system being analyzed. ** Must be installed on the system on which GPA is run.
PAGE 21
Section 2: Running GPA This section begins with an overview of the procedure for running GPA, followed by detailed information about the procedure. We strongly recommend that you read through the overview section before you attempt to install or run GPA for the first time. Note that using GPA also involves the use of the Measure performance-tool. The GPA procedure in this section includes the required steps for using Measure in conjunction with GPA.
PAGE 22
Establishing the Measurement Period The purpose of this step is to establish the length of the following two time periods for running Measure: • The measurement window: This is the total duration of a single Measure run that collects performance data for the system. • The data collection interval: This is the length of each time period within the measurement window for which Measure collects and reports the data.
PAGE 23
Running Measure In this step, you run Measure to collect performance data for the system. When you run Measure, you specify the measurement window and data collection interval that you established in the previous step. (Refer to the Measure User’s Guide for more information on using this product.) Determining the Analysis Period The object of this step is to examine the data collected with Measure in the preceding step to determine the most appropriate period of system use for GPA to analyze.
PAGE 24
the text as shown in Figure 2-2. • Each statement paragraph is also preceded by a two-digit decimal number that indicates the total number of lines in the paragraph, including blank lines (Figure 2-2). If you change the number of lines in any paragraph, be sure to change the corresponding number. Figure 2-2. Code Characters in a GPA Text Statement Paragraph Running GPA GPA does not require installation; it is configured to run from the installation subvolume $SYSTEM.ZGPA.
PAGE 25
Caution Note that any changes in the disk primary CPU invoked by running GPA’s PUPIN or SCFIN file are not permanent and will not be in effect after a cold load. The GPA Procedure To run GPA, follow the steps outlined below. 1. Establish the measurement window and data collection interval for running Measure on the system, as described earlier. 2. Use the STATUS command to see if the Measure subsystem is currently running. If it is, go directly to Step 3.
PAGE 26
Guardian Performance Analyzer (GPA) Manual – (544977-001) Page 26 of 106
PAGE 27
where: measfile xx:xx yy:yy h is the fully qualified file name you give to the measurement data file. is the time in hours and minutes (on the 24-hour clock) at which you want the measurement to begin. is the time at which you want the measurement to end. is the length in hours of the measurement interval. Example The following command begins the measurement at 9:00 AM, ends the measurement at 5:00 PM, specifies a one-hour data collection interval, and collects the data in the file $PERF.DATA.
PAGE 28
+ + + + ADD MEASUREMENT file-name LIST CPU * ADD PLOT CPU-BUSY-TIME LIST PLOT The plot that Measure displays shows the busy time for each CPU on the node at each of the specified measurement intervals over the entire measurement period. (For a detailed discussion of Measure plots, see the Measure User’s Guide.) b. From the Measure plot, determine the busiest one- to three-hour period for the system. This is the period for which you should run the GPA analysis.
PAGE 29
structured data files differs from that used to collect the system data. In that case, you must obtain the correct version of the MEASFH file (see Step 5) and specify it in the command as indicated. measfh is a fully qualified filename that includes the volume and subvolume. In the LIST command lines, xx:xx is the time at the beginning of the system busy period determined in Step 4b and yy:yy is the time at the end of that period. There should be only one list statement per entity (CPU, PROCESS, and DISC).
PAGE 30
included. For example, to run a detail section for the SYSTEM class and a summary for each of the others, set the following six PARAMS: 8> 9> 10> 11> 12> 13> PARAM PARAM PARAM PARAM PARAM PARAM SYSTEM DETAIL SUBSYSTEM SUMMARY PATHWAY SUMMARY SERVER SUMMARY TRANSIENT SUMMARY OTHER SUMMARY The default section-type is DETAIL. To refrain from printing the optional report, specify: 14> PARAM classname NOPRINT To suppress printing of the textual report only, enter: 15> PARAM TEXT NOPRINT 9.
PAGE 31
If a problem persists, you should contact your NonStop system Representative to report the problem. When you do so, please supply copies of the following: • All files in the GPA installation subvolume, by default $SYSTEM.ZGPA.
PAGE 32
Section 3: Description of GPA Reports Overview of GPA Reports GPA reports include standard reports and optional reports. These reports are listed briefly below and then described in detail.
PAGE 33
Note This applies only to the process class optional reports: SYSTEM, SERVER, SUBSYSTEM, TRANSIENT, PATHWAY, and OTHER. The rest of the optional reports are described in Section 4: Using GPA Information. When you specify the summary option, you receive the Process Distribution Analysis for the class. When you specify the detail option, you receive all five sections. The optional report enables you to perform several more detailed analyses than the standard report does.
PAGE 34
Node Characteristics This subsection of the report (Example 3-1) gives general performance figures for the node GPA has analyzed. For each line item in this subsection, the report shows a measured value and, where it applies, the maximum value the GPA model expects. Example 3-1. Node Characteristics I.
PAGE 35
(6) DYNAMIC PCBS: The number of dynamic process control blocks on the node. This figure represents processes that were started and stopped during the measurement period analyzed by GPA. (These processes are also known as transients). (7) OUT OF BALANCE CPU COUNT: The number of processors on the node that GPA has found to be relatively over- or underutilized, hence out of balance. (8) MOST BUSY CPU: The most utilized processor on the node, that is, the one with the highest percentage busy time.
PAGE 36
Example 3-2. Disk Volume Performance II. DISK VOLUME PERFORMANCE 9 10 11 VOLUME 1 2 3 4 5 6 7 8 12 BUSY% REQUEST CPU QTIME (s) LOW BUSY DISK VOL : $X42 .0 0 2: 3 HI BUSY DISK VOL : $SYSTEM 18.9 25 0: 1 LOW Q-TIME VOL : $ClO 4.2 19 0: 1 HI Q-TIME VOL : $PROJEC .0 114 2: 3 *LOW BUSY DISK PROC: $B40 .1 0 0: 1 *HI BUSY DISK PROC: $SYSTEM 7.2 25 0: 1 *LOW CACHE PERF. : $XPRESS 1.7 51 2: 3 *HI CACHE PERF. : $NSMS 6.3 24 2: 3 (* relates to the disk process.
PAGE 37
Example 3-3. Global Performance Indicators III. GLOBAL PERFORMANCE INDICATORS 1 3 5 7 9 11 EXCESSIVE DISPATCHING PROCESSOR LOAD BALANCE OVER UTILIZED NODE OVER UTILIZED CPU DISK VOLUME QUEUING AVERAGE CACHE HIT % : : : : : : NO POOR NO NO YES 83.20% 2 4 6 8 10 12 INDEX LEVELS > 2 : OVER UTILIZED DISK : CACHE FAULT DETECTED : TRANSIENT PROCESSING : BLOCKED REQUESTS : TOTAL CACHE CALL RATE: NO NO NO NO NO 25.00 /SEC (1) EXCESSIVE DISPATCHING: No excessive dispatching.
PAGE 38
(9) DISK VOLUME QUEUING: Excessive disk volume queuing. The request queue time for one or more disk volumes on the node is too high. (10) BLOCKED REQUESTS: No blocked requests. A blocked request is one that cannot be processed because another application has blocked access to a record, row, table, or file. (11) AVERAGE CACHE HIT %: Average cache hit percentage of 83.20. The average cache hit percentage is an average of the cache hit percentages of all the volumes on the system.
PAGE 39
Example 3-4. System Performance Score IV. SYSTEM PERFORMANCE SCORE 1. 2. 3. 4. 5. *6. * CPU SUBSYSTEM MEMORY SUBSYSTEM DISK CACHE SUBSYSTEM DISK VOLUME SUBSYSTEM SYSTEM RECOVERY PERFORMANCE WEIGHTED ANALYSIS SYSTEM SCORE 1% 1% 83% 9% 52% 1% Average Score rated: BEST = 100, WORST = 0.
PAGE 40
(1) Process Distribution Analysis - Process Counts: The Process Distribution Analysis shows the number of processes in different classes that were running in each CPU during the measurement period. (2) Busy Distribution Analysis - Percent Cpu Busy: The Busy Distribution Analysis shows the percentage of each CPU’s time used by each process class. (3) SYSTEM: The SYSTEM class consists of processes that are part of the OSIMAGE or have an execution priority greater than 199.
PAGE 41
Processor Load Balance and Performance Charts This section of the report consists of two charts: the Processor Load Balance Chart and the Processor Performance Chart. Processor Load Balance Chart This chart (Example 3-6) shows graphically how the system load is distributed among the node’s processors. Each CPU’s percentage busy is shown in the area between the two horizontal scale lines. The plus (+) signs indicate graphically the proportion of the CPU busy time spent doing interrupt processing.
PAGE 42
6 7 8 9 10 11 12 13 14 DISK CHIT MSG DISP RATE RATE RATE RATE : : : : 11.9 16.9 197 456.9 SWAP RATE : .01 MMGR PAGES : 1715 PCB COUNT : 84 TRANSIENTS : 5 HALT IMPACT: MEMORY 15 CPU Count: PCB Count: 18 93 240.6 13.3 11.7 77 242.1 .02 650 87 3 NONE .02 .04 625 1087 77 84 3 8 NONE NONE 76 212.7 16 4 332 Avg CPU Busy (^): Node SWAP Rate : 19 17 38.1 .
PAGE 43
Processor Performance Chart This chart (Example 3-6) gives the following physical and performance data for each processor on the node analyzed: (1) CPU NUM: The processor’s identification number. (2) CPU TYPE: The processor’s type designation. All of the processors in the example are TXPs. (3) MB MEMORY: The processor’s total memory in megabytes. Each processor on this node has 8 megabytes of memory. (4) PCT BUSY: Percentage of time the processor is busy.
PAGE 44
(14) HALT IMPACT: How system performance would be affected if the processor failed. Notice in the example that the failure of CPU 0 would result in a significant shortage of available memory on the other processors. The Processor Performance Chart also provides the following summary data for the entire node: (15) CPU Count: The total number of processors. (16) Avg CPU Busy (^): The average of the percentage busy time for all processors.
PAGE 45
Example 3-7. Disk Volume Performance Analysis DISK 2 1 VOLUME NAME 3 VOLUME 4 UNT CPU(S) NOS P:M PC:BC -------- --- ----$SYSTEM 0:1 0: 1 $NSMS 0:1 2: 3 $ClO 2:3 0: 1 $PROJEC 2: 2: 3 $DRIVER 4: 0: 1 $XPRESS 3: 2: 3 $MEAS 4:5 2: 3 $B40 5: 0: 1 $X42 6: 2: 3 CTL NUM --%01 %01 %01 %01 %01 %01 %01 %01 %01 5 PERFORMANCE 6 REQUEST QUEUE TIME ------25.11 24.26 19.19 114.23 26.49 51.38 56.45 0.00 0.00 ANALYSIS 7 AVG PROCESS DEVICE BUSY% BUSY% ----- ------18.91 7.25 0.00 6.36 4.22 2.62 0.00 5.63 1.93 0.
PAGE 46
these two old items (unit number and controller number) are obsolete on operating system G02 and later. Example 3-8. Disk Volume Performance Analysis: ServerNet Systems DISK 1 2 VOLUME NAME CPUS 3 VOLUME 4 5 GRP MOD SLT NUM NUM NUM PC:BC ------- ----- --- --- --$SYSTEM 0: 1 1 1 15 $NSMS 2: 3 2 1 15 $PROJEC 2: 3 2 1 7 $DRIVER 0: 1 3 4 11 $XPRESS 2: 3 4 1 15 $MEAS 2: 3 4 1 3 $G02 0: 1 3 1 11 $X42 2: 3 1 1 12 PERFORMANCE 6 REQUEST QUEUE TIME ------25.11 24.26 114.23 26.49 51.38 56.45 0.00 0.
PAGE 47
Example 3-9.
PAGE 48
Up to eight PINs can be configured for each logical disk volume. (4) NET CHNG: The recommended change to the number of disk subprocesses for the logical volume. This change requires a SYSGEN. (5) NEW PINS: The total number of PINS after the net change is applied. This analysis considers CPU cycle availability, PCB availability, memory availability, and rquest traffic. Based on all of these factors, it recommends adding or deleting disk subprocesses.
PAGE 49
FREE: Free pages of memory owned by the memory manager. (4) The below information is data for each disk volume for which the processor is the primary CPU, including: (5) The identification number (in octal) of the volume’s controller with an arrowhead pointing toward the number of the backup processor. (6) The name of the volume.
PAGE 50
00 00 01 02 03 --- 02 0 0 0.00 0.00 0 0 0.00 0.00 03 0 0 0.00 0.00 0 0 0.00 0.00 00 43 644184 568.00 568.00 757 775316 764.00 100.00 00 0 0 0.00 0.00 0 0 0.00 0.00 00 0 0 0.00 0.00 0 0 0.00 0.00 -- ------- ------------ ------- ------- ------- ------------ ------- -------- ------- ------------ ------- ------- ------- ------------ ------- ------43 644184 568.00 568.00 1677 1069996 1609.00 805.00 The line items are: (1) NODE: The logical name used in the path to the physical device.
PAGE 51
The following processes are exhibiting a $RECEIVE queue length that may indicate an application/application-configuration problem: 0,433 0,514 0,694 0,743 $X0BS6 $X0BTA $X0BSP $X0BST $SYSTEM $SYSTEM $SYSTEM $SYSTEM SYS02 SYS02 SYS02 SYS02 LOGIN LOGIN LOGIN LOGIN 0.00% 0.00% 0.00% 0.00% MSG/s: MSG/s: MSG/s: MSG/s: 0 0 0 0 Q-LEN: Q-LEN: Q-LEN: Q-LEN: 1.91 1.84 1.92 1.
PAGE 52
(5) The CPU to which the process is to be moved. (6) The percentage busy time for the process. (This figure may be adjusted by the appropriate factor if the CPU to which the process is moved is of a different type than the original one.) (7) Number of pages in memory occupied by the process. (8) Whether the move refers to a process pair (designated by P or B) or a single process (designated by *).
PAGE 53
Example 3-15. Expected System Performance After Tuning ESTIMATED PERFORMANCE PROFILE AFTER TUNING CHANGES.
PAGE 54
statement mentions possible problems found by the analysis and makes recommendations for correcting them. This might include a hardware reconfiguration. • A performance analysis for each processor in the system with regard to its utilization, memory capacity, and failure impact on the remaining processors. This part of the statement section also points out potential problems and suggests ways of improving performance.
PAGE 55
(2) Parameter Value: The parameter value corresponds to the sectiontype that you specified when you set the PARAM. This entry will be either DETAIL or NOPRINT. The reports that are marked with an asterisk (*) can have a parameter value of either SUMMARY, DETAIL, or NOPRINT. Summary Section Choosing the summary option produces a table describing the distribution of programs or process names within the class by count.
PAGE 56
name are standard Guardian system processes. (3) User-configured processes in the SYSTEM class are collapsed into a common name space to help make the section more readable. The question mark (?) character means any single character may replace it. For example, the user-configured process $TAP?? could include the process names $TAP01, $TAP02, $TAP23, and $TAPZZ. (4) The operating system image number that was assigned at the last SYSGEN is also shown in this section. In this case, its location is $SYSTEM.
PAGE 57
Example 3-18.
PAGE 58
the class. (3) Receive Distribution Analysis: The Receive Distribution Analysis is similar to the Send Distribution Analysis except that it reports messages received by the class. (4) Queue Distribution Analysis: The Queue Distribution Analysis is a breakdown of a class by process name showing the relative length of time messages spent waiting in queue. (See the Glossary for a definition of a process’s queue length.
PAGE 59
Section 4: Using GPA Information Interpreting the Standard Report A GPA analysis can detect and point to a number of causes of poor or inefficient system performance. For some of the problems, such as a memory shortage or a load imbalance, GPA can make appropriate tuning recommendations and even, in some cases, provide the means for automatically implementing the recommendations.
PAGE 60
Node Characteristics The Node Characteristics subsection of the GPA report for \NODEA (Example 4-1) shows the following: • The node has a total of 4 processors, 32 megabytes of memory, and 9 disk volumes. • There were 264 static (steady-state) processes and 11 dynamic (transient) processes on the node during the measurement period. • The most utilized processor is CPU 1, the least utilized processor is CPU 3, and the average processor busy time is 36.2 percent.
PAGE 61
Example 4-2. Node Characteristics for \NODEB I. NODE CHARACTERISTICS CPU TYPE : CPU COUNT : TOTAL MEMORY : VOLUME COUNT : STATIC PCBS : DYNAMIC PCBS : OUT OF BALANCE CPU COUNT : MOST BUSY CPU : LEAST BUSY CPU : AVG CPU BUSY : AVG DISK VOLUME BUSY : AVG DISK PROCESS BUSY : % - RECOMMENDED RESOURCES: NODE SWAP RATE : MEASURED VALUE TXP 4 32 9 332 19 1 0 3 38.1 3.4 3.9 54.5 .0 MAX VALUE EXPECTED 16 84 512 800 0 0 70.00 24 % 24.
PAGE 62
shortage problem or that the disk processes are not properly distributed among the CPUs on the node. Example 4-1 shows that \NODEA has a swap rate of 2.3 pages/second due to a shortage of memory. Although the swap rate does not exceed 1 swap/second for \NODEC, GPA has analyzed the swap rate and shows that a lack of memory is the cause for the value of 0.5 swaps/second. • How many processors, if any, are out of balance. That is, how many processors are being relatively over- or underutilized.
PAGE 63
Figure 4-1. Response Time as a Function of CPU Utilization A rundown of the global performance indicators for \NODEA (Example 4-4) shows that the processor load balance on the system is poor and that there is excessive disk queuing. The cache performance, with an average cache hit percentage of 83.2, is also deficient. However, the overriding problem on this node is a serious shortage of memory, as indicated by the node swap rate and the memory subsystem performance score (discussed later). Example 4-4.
PAGE 64
The global performance indicators for \NODEB (Example 4-5) show that the health of the system is generally good except for processor load balance, which GPA considers only fair, and disk volume queuing, which GPA found excessive in some cases. The disk cache performance is acceptable, although it shows potential for improvement. Example 4-5. Global Performance Indicators for \NODEB III.
PAGE 65
Example 4-6. Global Performance Indicators for \NODEC III. GLOBAL PERFORMANCE INDICATORS EXCESSIVE DISPATCHING PROCESSOR LOAD BALANCE OVER UTILIZED NODE OVER UTILIZED CPU DISK VOLUME QUEUING AVERAGE CACHE HIT % : : : : : : NO POOR NO NO NO 88.93% INDEX LEVELS > 2 : OVER UTILIZED DISK : CACHE FAULT DETECTED : TRANSIENT PROCESSING : BLOCKED REQUESTS : TOTAL CACHE CALL RATE: NO NO NO NO NO 461.
PAGE 66
rates the relative performance of the node’s subsystems with scores ranging from 0 to 100 percent. The performance scores for \NODEA (Example 4-8) clearly show how poorly some of the subsystems are performing, mainly because of the node’s severe memory shortage. Example 4-8. System Performance Indicators for \NODEA IV. SYSTEM PERFORMANCE SCORE 1. 2. 3. 4. 5.
PAGE 67
* Average Score rated: BEST = 100, WORST = 0. Process and Busy Distribution Analyses As you look at the Process and Busy Distribution Analyses for \NODEC (Example 4-11), you can see that the SYSTEM processes are fairly evenly distributed among the four processors in the system but that the distribution could be improved. You can also see a relatively large spread in the Busy Distribution for the SYSTEM class, especially between CPU 1 and CPU 2.
PAGE 68
Processor Load Balance and Performance Charts The next major section of the GPA report (Example 4-12 and Example 4-13) contains two charts that give you a more detailed picture of the system based on the performance of the system’s processors. The first chart shows you graphically how the system load is distributed among the processors. Abnormal conditions such as a memory shortage or excessive transient processing are also indicated here (by flags following the CPU numbers).
PAGE 69
From Example 4-13 you can see that on \NODEB, the least utilized processor, CPU 3, is out of balance. Since the percentage busy times for the other three processors fall within the target utilization for the system (discussed later in this section), GPA considers these processors to be in balance. The next chart in this section of the report identifies each of the processors on the node with respect to number and type.
PAGE 70
Example 4-13.
PAGE 71
Disk Volume Performance Analysis To track the performance of the disk volume subsystem, you look at the Disk Volume Performance section (Example 4-14). Here you can see how the volumes are configured with respect to primary and backup CPUs as well as primary disk controllers. You can also see how the volumes compare with regard to a number of performance parameters.
PAGE 72
Example 4-15.
PAGE 73
The Disk Subprocess Analysis also gives you insight into the type of disk activity that is occurring on each volume. For example, you can see that the first PIN on $DATA has a very high net message rate, yet its second PIN is only moderately high. This shows that the first PIN is able to handle many I/O requests, and that these requests, on average, are relatively inexpensive. $AUDIT contrasts with $DATA in that its second PIN is handling more messages than its first. Example 4-16.
PAGE 74
$SYSTEM 7.2/ 824/ 13 $NSMS 14.7/ 1225/ 5 $ClO 2.6/ 227/ 4 $PROJEC 6.3/ 257/ 1 $DRIVER 0.8/ 188/ 1 $XPRESS 1.7/ 295/ 1 $B40 0.1/ 84/ 0 $MEAS 0.5/ 209/ 0 $X42 0.5/ 211/ 0 You can also readily tell from the diagram what portion of any given processor’s total activity is accounted for by each volume on the processor.
PAGE 75
“After Primary Changes” Sections When GPA analyzes a system, it considers whether making changes in the location of the primary disk process for each logical volume will help the overall performance of the system. If GPA determines that the performance of the system will be improved, and that there are sufficient resources available on the CPU with the backup disk process, GPA recommends changing the primary disk process to the location of the backup.
PAGE 76
Q - Moderate Memory Shortage ! - Processor MISSING From Measurement R - Moderate Transient Processing S - Moderate Transients/Mem. Short Example 4-20. Processor Load Balance and Performance Charts for \NODEB After Primary Changes ESTIMATED PERFORMANCE PROFILE AFTER PRIMARY CHANGES.
PAGE 77
Example 4-21.
PAGE 78
BUSY FREE 62.2 19576 PG 44.6 9641 PG 69.4 43764 PG 58.3 14381 PG GMS110.03.003>01 $DATA 5.4/ 1098/ 15 GMS111.03.003>02 $XL80E3 1.8/ 3962/ 16 GMS120.03.003>03 $XL80A1 3.7/ 2355/ 33 0201 $AUDIT 4.0/ 126/ 0 $DATA1 0.0/ 227/ 0 $V80A2 0.4/ 173/ 4 GMS120.03.004>03 $XL80A3 1.9/ 1848/ 13 00
PAGE 79
GMS110.03.005>03 $XL80C3 1.9/ 5124/ 17 GMS120.03.003>03 $V80A3 3.7/ 1549/ 24 0003 $XL80D1 0.0/ 85/ 0 $XL80D2 0.0/ 85/ 0 $XL80D3 0.0/ 90/ 0 01
PAGE 80
Totals 2 processes 9.04 601 To correct a processor utilization imbalance on \NODEB (Example 4-25), GPA has recommended that ten processes running on CPU 1 be moved and distributed as shown among the other three processors in the system.
PAGE 81
Example 4-25. Process Move Recommendations for \NODEB Process Move Recommendations Primary Issue: CPU OUT OF BALANCE CPU: 1 OVER BUSY BY: Action -----MOVE MOVE MOVE MOVE MOVE MOVE MOVE MOVE Fm Cpu --1 1 1 1 1 1 1 1 Totals 8 Pin ---51 59 61 92 62 52 65 58 Process Name ------$PlAX $PlCl $PlS8 $PlSA $PlFl $PlS7 $PPSl $PlS4 23.
PAGE 82
Expected System Performance After Tuning Changes The next section of the report (Example 4-27 and Example 4-28) shows two charts that indicate what effect implementing the GPA tuning recommendations would have on system performance. By comparing the two sections of the report, you can see exactly how the system load would be rebalanced and how other performance parameter values would change.
PAGE 83
! - Processor MISSING From Measurement S - Moderate Transients/Mem. Short Example 4-28.
PAGE 84
Example 4-29 is a cover page to the optional report for \NODEC that shows the user’s choice of sections. Each class chosen is listed with an entry specifying the type of section. In the example, detailed sections are chosen for every class. Example 4-29.
PAGE 85
$ZA000 -------TOTALS 1 1 --- --- --- --70 70 52 52 Guardian Performance Analyzer (GPA) Manual – (544977-001) 2 ---244 Page 85 of 106
PAGE 86
Example 4-31 shows the additional information you receive when you specify the detail option for the SYSTEM class. From the Busy Distribution Analysis in Example 4-31, you can find what percentage of the system resources are being used on the processes in the class. Note that in the figure, the percentages add up to 100% so that the data can be described as “normalized” within the class. For example, the primary disk process (DISK-P) in CPU 0 consumed 17% of the CPU cycles used by the SYSTEM Class.
PAGE 87
-------DISK-P MONITOR DISK-B -------TOTALS --- --- --- --39 1 23 4 9 6 6 6 SYSTEM Queue Distribution Analysis Process Name -------DISK-P MONITOR VIRTUAL TMP Z0 MSENGER UnNamed -------TOTALS Cpu Cpu Cpu Cpu #00 #01 #02 #03 --- --- --- --29 2 21 3 9 6 6 6 4 4 1 1 1 --- --- --- --45 11 32 10 ---69 29 1 ---99 --- --- --- --49 8 29 11 - Descending QueueLen Class % Cls Pct ---56 29 4 4 1 1 1 ---99 Optional Report Section for the SERVER Class A summary section for the SERVER class is shown in Example 4-32
PAGE 88
program while the other CPUs have two or three. The busy distribution analysis shows that this program is using only 3% of the class’s cycles in CPU 2. To fine tune this system, you might consider moving one copy of this program from CPU 1 to CPU 2. Example 4-33.
PAGE 89
-------TOTALS --- --- --- --30 21 25 21 SERVER Queue Distribution Analysis Program Name -------VISAOT RCAPS510 EMULD RCAPS550 RCAPS640 RCAPS610 -------TOTALS Cpu Cpu Cpu Cpu #00 #01 #02 #03 --- --- --- --10 10 10 10 5 3 2 4 4 4 3 4 3 5 1 3 2 2 1 --- --- --- --27 26 19 26 ---99 - Descending QueueLen Class % Cls Pct ---42 16 16 14 6 2 ---99 Implementing GPA Tuning Recommendations On the basis of its analysis, GPA makes the following kinds of explicit tuning recommendations: • Moving primary disk proc
PAGE 90
• Check applications for possible bottlenecks. • Examine the data communications environment for sources of trouble. • Analyze the system hardware to determine if it provides an optimum configuration. For system tuning, you need to keep in mind that GPA is meant to serve mainly as a guide and to provide a quick and convenient performance analysis. For the best longterm solutions to system performance problems, you need to have properly trained personnel do a careful and detailed analysis.
PAGE 91
Example 4-35.
PAGE 92
Example 4-36. Negative Contributing Factors to CPU Score DETAIL 1Negative Contributing Factors to CPU Score 2 3 3 3 4 5 6 Negative Factors --------Hi Swap Memory Mem/Tr. Trans. Disp. OofBal Overbsy --------- NEGATIVE Hi Swap Mem/Tr. Disp.
PAGE 93
Negative Contributing Factors to Memory Subsystem Score Example 4-37 is a detailed report displaying the negative factors that contribute to the memory subsystem score. Example 4-37.
PAGE 94
$PROD7 $PROD8 $SYSTEM $TEST1 $TEST2 --------- ------Disk Cache Score: + + + + + 823 91 93 50 81 78 / 11 = 75% Notes on Disk Cache Performance Score: Good Fair Poor = = = 90% 75% 2% thru thru thru 100% 89% 74% Negative Contributing Factors to Disk Volume Score This subsection of the score report clarifies the subsection Disk Volume Subsystem Performance Score of the System Performance Summary. The disk volume subsystem score is based on the performance and queue time of all disks.
PAGE 95
higher than the limit is one that has a blocked request problem. (2) Hiproc: A disk with request rate per second higher than the request rate limit and high queue time is a disk with high request rate. (3) Indexd: An index problem is detected when a disk has high queue time and cache call per request greater than the cache call ratio limit. (4) Memory: This problem is detected when a disk has high queue time and a shortage of memory.
PAGE 96
The negative factors that contribute to lower the system recovery performance score are the following: (1) CPU: A processor that can create a CPU cycle shortage to the system; when the CPU fails, it adds a negative score to the total score. (2) Memory: A processor that can introduce a memory shortage to the system; when the CPU fails, it adds a negative score to the total score. CPU Failure Simulation Example 4-41 is a detailed report displaying CPU failure simulation.
PAGE 97
Example 4-42.
PAGE 98
Example 4-43 is a detailed report displaying the Server Process Analysis for CPU #04. Example 4-43.
PAGE 99
Example 4-44. Dynamic Server Process Analysis Dynamic Server Process Detail Analysis <----------------------- A V G --------------------- 1 2 Program File Name Volume Subvol File -----------------------$SYSTEM SPAN LOGGER $SYSTEM SPAN PASSTHRU $SYSTEM SPANNET NETWORK $SYSTEM SYSTEM USERTCP 3 Cnt --1 1 2 1 4 5 6 Mem Cpu Pri Pages Pct --- ----- -----140 13 0.12 132 10 0.00 150 20 0.06 131 164 0.03 Disp Rate ----1.0 0.0 0.3 0.0 7 8 Msg Rate ----0.2 0.0 0.0 0.0 R/S Ratio ----0.0 0.8 1.0 13.
PAGE 100
$PROD8 $SYSTEM $TEST1 $TEST2 $TEST3 3 1 0 0 1 4 1 5 5 5 1 1 1 1 1 3 11 5 1 11 6 2 1 1 ------TOTAL 62 166 87 80 80 80 ----1649 214 114 84 89 104 ----1903 48 27 4 9 24 -----254 The disk cache change analysis factors are the following: (1) VOLUME Name: The name of the disk. (2) CPU NUM: The identification number of the primary processor for the volume after primary change (if there is a primary change). (3) GRP NUM: The group number of the physical location address.
PAGE 101
(2) Old Discs: The original disk. (3) Old Reqs: The original disk requests. (4) Old Cache: The original cache pages. (5) New Discs: The new disk currently handled. (6) New Reqs: The new disk requests. (7) New Cache: The new cache pages. (8) Delta Cache: The difference between the new and old cache pages. (9) Delta Discs: The difference between the number of the original disk and the number of the new disk currently being handled.
PAGE 102
Example 4-47. Error Report **************************************************************** GPA **************************************************************** GPA0B0800 CPU data contains information from multiple nodes. GPA0B0800 This situation indicates that multiple measurements GPA0B0800 have been loaded into the same CPU file prior to GPA0B0800 the execution of the compiled queries. Perform a GPA0B0800 FUP purgedata CPU, reload the CPU file and rerun GPA0B0800 the OBEYTUNE command file.
PAGE 103
Glossary #OUT The GPA output file. blocked request A request that cannot be processed because another application has blocked access to a record or file. cache A portion of memory used to store frequently-accessed information in order to save the time otherwise required for disk I/O operations. cache call A request for disk data expected to be found in cache. cache fault An event that occurs when a disk process expects to find a data block in cache and discovers that the memory manager has removed it.
PAGE 104
GPA The program module that performs the system analysis. index level In a key-sequenced file, the B-tree structure for minimizing access time consists of one or more index levels. MEASCOM The command interface used to access Measure. MEASFH A Measure file handler process that builds counter records from the data in the measurement data file. Measure The NonStop system performance monitoring product. message Information sent by one process to another process. This may be a request for service or data.
PAGE 105
PIN A Process Identification Number is the numeric value that identifies a process running on a CPU. PUPBAK A PUP command file that restores a system to its original measured state. PUPIN A PUP command file that contains PUP PRIMARY and PUP SETCACHE commands for implementing GPA tuning recommendations with regard to disk and cache. PUP PRIMARY command A command used to assign the primary CPU for a disk process. PUP SETCACHE command A command used to allocate or change cache blocks for a disk volume.
PAGE 106
SCFIN A SCF command file that contains SCF PRIMARY and SCF ALTER DISK, CACHE commands for implementing GPA tuning recommendations with regard to disk and cache. static process control block A Process Control Block (PCB) dedicated to a single process for the duration of the measurement period analyzed by GPA. system recovery The condition whereby a system continues to function normally when a processor failure occurs.