HP Diagnostics Guide V2500 Server First Edition A5075-96006 HP Diagnostics Guide: V2500 Server Customer Order Number: A5075-90006 December 1998 Printed in: USA
Revision History Edition: First Document Number: A5075-90006 Remarks: Initial release. December, 1998. Notice Copyright Hewlett-Packard Company 1998. All Rights Reserved. Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowed under the copyright laws. The information contained in this document is subject to change without notice.
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xvii Notational conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Utilities board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FPGA configuration and status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Board over-temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fan sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Power failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MidPlane Interface Board (MIB) power failure . . . . . . . . . . . . . . . . 48-Volt maintenance . . . . . . . . . . . . . . . . . . .
LCD messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63 Node status line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63 Processor status line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63 Message display line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65 Console messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Main menu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test Configuration menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of running diagnostics from Test Controller command line . . Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Selecting classes and subtests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Starting tests. . . . . . . . . . . . . . . . . .
Teststation setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .120 pdcfl commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122 7 cpu3000. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 cpu3000 classes and subtests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .126 cpu3000 classes. . . . . . . . . . . . . . . . . . . . . . . . .
Error messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type one error format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type two errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type three errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notes on mem3000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Scan test . . . . .
address decode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .214 AutoRaid recovery map (arrm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .215 Starting arrm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .215 Failure to open and recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .216 consolebar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Event processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . event_logger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . log_event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miscellaneous tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figures Figure 1 Location of the Utilities board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Figure 2 Utilities board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Figure 3 System displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 Figure 4 Front panel LCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 39 Figure 40 Figure 41 Figure 42 Figure 43 Figure 44 Figure 45 Figure 46 Figure 47 Figure 48 Figure 49 Figure 50 Figure 51 Figure 52 Figure 53 Figure 54 V2500 DIMM locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Format of parameter 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Format of parameter7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type one error message format .
Tables Table 1 Table 2 Table 3 Table 4 Table 5 Table 6 Table 7 Table 8 Table 9 Table 10 Table 11 Table 12 Table 13 Table 14 Table 15 Table 16 Table 17 Table 18 Table 19 Table 20 Table 21 Table 22 Table 23 Table 24 Table 25 Table 26 Table 27 Table 28 Table 29 Table 30 Table 31 Table 32 Table 33 Table 34 Table 35 Table 36 Table 37 Table 38 Table 39 Table 40 Environmental conditions monitored by the SMUC and power-on circuit . . .8 Processor initialization steps . . . . . . . . . . . . . . . . . . . . . . .
Table 41 Table 42 Table 43 Table 44 Table 45 (words 29-37) Table 46 Table 47 Table 48 Table 49 Table 50 Table 51 Table 52 Table 53 Table 54 Table 55 Table 56 Table 57 Table 58 Table 59 Table 60 Table 61 Table 62 Table 63 Table 64 Table 65 Table 66 Table 67 Table 68 Table 69 Table 70 Table 71 Table 72 Table 73 Table 74 Table 75 Table 76 Table 77 Table 78 Table 79 Table 80 Table 81 Table 82 Table 83 Table 84 io3000 Class 16 subtests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 85 Table 86 Table 87 Table 88 Table 89 kill_by_name options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .266 sppdsh parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .270 Valid COP IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .272 System rings to alternates names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .282 List of diagnostics . . . . .
xvi List of Tables
Preface This document describes the offline diagnostics for V2500 servers. It is not intended to be a tutorial or troubleshooting guide but a reference guide that contains information on all utilties and scripts used to troubleshoot these systems. Notational conventions This section describes notational conventions used in this book. bold monospace In command examples, bold monospace identifies input that must be typed exactly as shown.
Preface Notational conventions Brackets ( [ ] ) In command examples, square brackets designate optional entries. Curly brackets ({}), Pipe (|) In command syntax diagrams, text surrounded by curly brackets indicates a choice. The choices available are shown inside the curly brackets and separated by the pipe sign (|). The following command example indicates that you can enter either a or b: command {a | b} Keycap Keycap indicates the keyboard keys you must press to execute the command example.
1 Introduction This chapter presents an overview of the diagnostic mechanism for V2500 servers.
Introduction Utilities board Utilities board The diagnostic mechanism in the V2500 servers is centered around the Stingray Core Utilities board (SCUB). The SCUB is mounted under the MidPlane Interconnect board (MIB) toward the front of the system. See Figure 1.
Introduction Utilities board Figure 1 Location of the Utilities board Power board MidPlane Utilities board IOEXS120 12/7/98 Chapter 1 3
Introduction Utilities board The following devices connect to the Utilities board: • Core logic bus • Environmental sensors • Test points • Liquid crystal display (LCD) • Attention lightbar • Teststation The teststation connects to the system via the ethernet and RS232 connections. It is used to configure and run diagnostics on the system. A system will boot and operate without a teststation, and failure of the teststation will not cause interruption of the system.
Introduction Utilities board The microprocessor-controlled JTAG interface captures incoming command packets and sends out scan information packets across the ethernet connection to the teststation. Through the teststation connection, one can read and write every CSR in the system.
Introduction Utilities board Core logic The core logic contains initialization and booting firmware and is described in the following sections. Flash memory The core logic contains a four-MByte electrically erasable programmable read only memory (EEPROM) storage for Processor-Dependent Code (PDC). PDC consists of Power-On Self Test (POST) and Open Boot PROM (OBP). The V2500 server uses these two components plus additional firmware called spp_pdc that is laid over OBP and interfaces OBP to HP-UX.
Introduction Utilities board Console ethernet The ethernet I/O port provides a connection to the teststation over LAN1. Attention lightbar and LCD The attention light bar displays environmental information, such as the source of an environmental error that caused the Utilities board to power down the node. The liquid crystal display provides basic system information. The core logic drives the LCD through the parallel port on the DUART.
Introduction Utilities board SMUC environmental monitoring The following environmental conditions are monitored: • ASIC installation error sensing • FPGA configuration and status • Thermal sensing • Fan Sensing • Power failure sensing • 48-V failure • 48-V maintenance • Ambient air temperature sensing.
Introduction Utilities board Condition Type Action Ambient air warm Environmental warning LED indication, interrupt 48-Volt maintenance Environmental warning LED indication, interrupt Hard error Hard error LED indication, interrupt Environmental condition detected by power-on function The power-on function detects environmental errors (such as ASIC Not Installed OK or FPGA Not OK). It does not turn on power to the node until the conditions are corrected.
Introduction Utilities board The environmental error interrupt and the 1.2 second delay provide the system adequate time to read CSRs to determine the cause of the error, log the condition in NVRAM, and display the condition on the attention lightbar. After the system is powered down, the Utilities board is still powered up, but all outputs are disconnected from the system. Environmental control The Utilities board performs the following functions to control the node environment.
Introduction Utilities board Teststation interface The teststation can be a PA-RISC based workstation. The interface to the teststation is an ethernet AUI port for flexibility in connecting to many workstations. It is also easily expandable. DC test of a node To perform the DC test, the Test Bus Controller (TBC) first scans data to all boards in a node. Then each JTAG device performs a capture step that completes the movement of the test data from the driver to the receiver.
Introduction System displays System displays The V2500 server provides two means of displaying status and error reporting: an LCD and an Attention light bar. Figure 3 System displays DC OFF CON ENASOL BLE E CON SECSLO URELE DC ON TOC LCD display Attention light bar IOLM010 9/18/97 Front panel LCD The front panel is a 20-character by 4-line liquid crystal display as shown in Figure 4.
Introduction System displays Figure 4 Front panel LCD 0 (0,0) MIII IIII IIII IIII IIII IIII IIII IIII abcedfghijklr When the node key switch is turned on, the LCD powers up but is initially blank. Power-On Self Test (POST) starts displaying output to the LCD. The following illustrates this output shown in Figure 4: Node status line The Node Status Line shows the node ID in both decimal and X, Y topology formats.
Introduction System displays Step Table 3 Description 6 Processor internal register final initialization. 7 Processor basic instruction set testing. (optional) 8 Processor basic instruction cache testing. (optional) 9 Processor basic data cache testing. (optional) a Processor basic TLB testing (optional) b Processor post-selftest internal register cleanup. (optional) Processor run-time status codes Status Description R RUN: Performing system initialization operations.
Introduction System displays Table 4 Message display line Message display code Description a Utilities board (SCUB) hardware initialization. b Processor initialization/selftest rendezvous. c Utilities board (SCUB) SRAM test. (optional) d Utilities board (SCUB) SRAM initialization. e Reading Node ID and serial number. f Verifying non-volatile RAM (NVRAM) data structures. g Probing system hardware (ASICs). h Initializing system hardware (ASICs). i Probing processors.
Introduction System displays Attention light bar The Attention light bar is located at the top left corner on the front of the HP 9000 V2500 server as shown in Figure 3 on page 12. This light bar displays system status in three ways: • Off—system powered down • Steady on—system powered up • Flashing—error condition The SMUC prioritizes system environmental errors and warnings and passes the information to the power-on circuit.
Introduction System displays ATTN bit attentio n light bar 1 26-2F 48-V error, NPSLR failure, PWRUP=0-9 1 30-39 48-V error, no supply failure, PWRUP=0-9 1 3A 48-V yo-yo error 1 3B MIB power failure (PB) 1 3C Clock failure 1 3D-3F Not used (3) 1 40-47 MB0-MB7 power failure 1 48-4F PB0L, PB1R, PB2L, PB3R, PB4L, PB5R, PB6L, PB7R power failure 1 50-57 PB0R, PB1L, PB2R, PB3L, PB4R, PB5L, PB6R, PB7L power failure (possibly switch R and L) 1 58-5B IOB (LF,LR,RF,RR) power failure 1
Introduction System displays SCUB 3.3-Volt error This error indicates that the SCUB 3.3-Volt power supply has failed, but the 5-Volt supply has not. ASIC installation error Each ASIC in the node has ASIC Install lines to prevent power-up if an ASIC is installed incorrectly (such as a SPAC installed in an ERAC position). If an ASIC is improperly installed, the Utilities board does not power up the system. This condition is not monitored after power up.
Introduction System displays FPGA configuration and status The SMUC is programmed by a serial data transfer from EEPROM upon utility board power-up. If the transfer does not complete properly, the SMUC cannot configure itself and many environmental conditions cannot be monitored. The power-on circuit monitors both the SMUC and SPUC and does not power up the system, if they are not configured correctly.
Introduction System displays the SMUC, which reports the environmental warning to the processors. The power-on circuit displays the “highest priority” 48-Volt supply that failed. Ambient air sensors The ambient air sensors detect a too warm or too hot condition in the input air stream to the Utilities board (and therefore the entire node). Ambient air too warm is an environmental warning; ambient air too hot is an environmental error that powers down the system.
2 Configuration management The teststation allows the user to configure the node using the ts_config utility. ts_config configures the teststation to communicate with the node. The teststation daemon, ccmd, monitors the node and reports back configuration information, error information and general status. ts_config must be run before using ccmd. Two additional utilities, sppdsh and xconfig, allow reading or writing configuration information and changing it. OBP can also be used to modify the configuration.
Configuration management Teststation Teststation The teststation is used for configuring, monitoring, testing, and error logging. It is not required for normal operation of a node. The teststation communicates with the JTAG interface in the nodes. The JTAG port remains idle if no teststation is connected to it. It receives communications packets, interprets requests, and generates responses to them.
Configuration management ts_config ts_config ts_config [-display display name] V2500 nodes added to the teststation must be configured by ts_config to enable diagnostic and scan capabilities, environmental and hard-error monitoring, and console access. Once the configuration for each node is set, it is retained when new teststation software is installed.
Configuration management ts_config NOTE For shells that are run from the teststation desktop, the DISPLAY variable is set (at the shell start-up) to the local teststation display. ts_config operation The ts_config utility displays an active list of nodes that are powered up and connected to the teststation diagnostic LAN. The operator selects a node and configures the selected node. A sample display is shown below.
Configuration management ts_config The ts_config window title includes in parenthesis the name of the effective user ID running ts_config, either root or sppuser. The ts_config display shows the configuration status of the nodes. Table 6 shows the possible status values.
Configuration management ts_config Configuration Status Description Action Required Active The node is configured and answering requests on the Diagnostic LAN. None required. This is the desired status. Inactive The teststation node configuration file contains information about the specified node, but the node is not responding to requests on the Diagnostic LAN.This status is also shown if a node was configured and then removed from the teststation LAN without being deconfigured.
Configuration management ts_config Upgrade JTAG firmware Step 1. Select the node from the list in the display panel. For example, clicking on node 0 in the list highlights that line as shown in Figure 6. Figure 6 ts_config show node 0 highlighted Notice that after the node has been highlighted that ts_config displays information concerning the node. In this step, it tells the user what action to take next, “This node’s JTAG firmware must be upgraded.
Configuration management ts_config Figure 7 ts_config “Upgrade JTAG firmware” selection. Step 3. A message panel appears as the one shown in Figure 8. Read the message. If this is the desired action, click “Yes” to begin the upgrade. Figure 8 Upgrade JTAG firmware confirmation panel Step 4. After the firmware is loaded a panel appears as the one shown in Figure 9. Click “OK” and then power-cycle the node to activate the new firmware.
Configuration management ts_config Figure 9 ts_config power-cycle panel When the node is powered up, the “Configuration Status” should change to “Not Configured.” Configure a Node Step 1. Select the desired node from the list of available nodes. When the node is selected, the appropriate line is highlighted as shown in Figure 10. Notice the bottom of the display indicates the Node 0 is not configured and provides the steps necessary to configure the node.
Configuration management ts_config Figure 11 ts_config “Configure Node” selection. After invoking ts_config to configure the node, a node configuration panel appears as the one in Figure 12. Figure 12 ts_config node configuration panel Step 3. Enter a name for the V2500 System. The teststation uses this name as the “Complex Name” and to generate the IP hostnames of the Diagnostic and OBP LAN interfaces.
Configuration management ts_config Step 4. Select an appropriate serial connection for the V2500 console from the pop-down option menu in the node configuration panel. ts_config automatically assigns the first unused serial port. If the terminal mux has been configured, the terminal mux ports are included in the list of available serial connections. The IP address information for the Diagnostic interface is provided.
Configuration management ts_config Figure 14 ts_config indicating Node 0 is configured Step 7. Restart the Workspace Manager: Click the right-mouse button on the desktop background to activate the root menu. Select the “Restart” or “Restart Workspace Manager” option, then “OK” to activate the new desktop menu. NOTE If adding multiple nodes to the teststation, wait until the final node is added before restarting the Workspace Manager. Configure the “scub_ip” address Step 1.
Configuration management ts_config Figure 15 ts_config “Configure ‘scub_ip’ address” selection ts_config checks the scub_ip address stored in NVRAM in the node. If the scub_ip address is correct, no action is required. If the node is not detected and scanned by ccmd, ts_config may ask you to try again later. The ccmd detection scan process should take less than a minute. Step 3. If prompted by ts_config (as indicated by the panel in Figure 16), click “Yes” to correctly set the scub_ip address.
Configuration management ts_config Figure 17 ts_config scub_ip address set confirmation panel Initiate a node reset to activate the new scub_ip address. Reset the Node Step 1. Select the desired node from the list of available nodes. Step 2. Select “Actions,” then “Reset Node.” This is indicated in Figure 18. Figure 18 ts_config “Reset Node” selection A panel as the one shown in Figure 19 appears.
Configuration management ts_config Figure 19 ts_config node reset panel Step 3. In the Node Reset panel, select the desired “Reset Level” and “Boot Options,” then click Reset.” Deconfigure a Node Deconfiguring a node removes the selected node from the teststation configuration. The teststation will no longer monitor the environmental and hard-error status of this node. Console access to the node is also be disabled. Step 1. Select the desired node from the list of available nodes. Step 2.
Configuration management ts_config Figure 20 ts_config “Add/Configure Terminal Mux” selection. A panel appears as the on shown Figure 21. This panel requires the terminal mux IP address. Figure 21 ts_config terminal mux IP address panel Step 3. Connect a serial cable from serial port 2 on the teststation to port 1 on the terminal mux. Step 4. Enter the desired “Terminal Mux IP Address” and click “Configure,” as indicated in Figure 22.
Configuration management ts_config Figure 22 Terminal mux IP address entered into panel Remove terminal mux ts_config does not remove the terminal mux if any node consoles are assigned to terminal mux ports. Step 1. Select “Actions,” then “Configure Terminal Mux.” Step 2. Select “Remove Terminal Mux,” then click “Yes.
Configuration management Teststation-to-system communications Teststation-to-system communications This section describes how the teststation communicates with the system using the utilities presented in Chapter 11, “Utilities.” Figure 23 depicts the V-Class server to teststation communications using HP-UX.
Configuration management Teststation-to-system communications The hardware components located on the SCUB are shown in the diagram on the left side of the node or system. They include three ethernet ports and one DUART. A layer of firmware between HP-UX and OBP called spp_pdc allows the HP-UX kernel to communicate with OBP. spp_pdc is platformdependent code and runs on top of OBP providing access to the devices and OBP configuration properties.
Configuration management ccmd ccmd ccmd builds a configuration information database on the teststation. The board names and revisions, the device names and revisions, and the start-up information generated by POST are all read and stored in memory for use by other diagnostic tools. ccmd is typically run automatically from /etc/inittab on the teststation. Entering init on the teststation starts ccmd. init monitors ccmd and respawns it if it ever stops.
Configuration management ccmd If ccmd detects a hard error, it starts the hard_logger script to extract additional information from the node through the JTAG interface. After the hard_logger runs, ccmd resets the node or complex that failed. This behavior can be stopped with autoreset. ccmd sends output to the console. If running under X-windows as sppuser, it sends its output to the teststation console message output window. The -d debug option generates a substantial amount of console output.
Configuration management xconfig xconfig xconfig is the graphical tool that can also modify the parameters initialized by POST to reconfigure a node. The graphical interface allows the user to see the configuration state. Also the names are consistent with the hardware names, since individual configuration parameters are hidden to the user. The drawback of xconfig is that it can not be used as a part of script-based tests, nor can it be used for remote debug. xconfig is started from a shell.
Configuration management xconfig Figure 24 xconfig window—physical location names Chapter 2 43
Configuration management xconfig Figure 25 xconfig window—logical names As buttons are clicked, the item selected changes state and color. There is a legend on the screen to explain the color and status. The change is recorded in the teststation’s image of the node. When the user is satisfied with the new configuration, it should be copied back into the node, and the node should be reset to enable the changes.
Configuration management xconfig The main xconfig window has three sections: • Menu bar—Provides additional capability and functions. • Node configuration map—Provides the status of the node. • Node control panel—Provides the capability to select a node and control the way data flows to it. Menu bar The menu bar appears at the top of the xconfig main window. It has four menus that provide additional features: • File menu—Displays the file and exit options.
Configuration management xconfig Node configuration map The node configuration map is a representation of the left and right side views of a node as shown in Figure 27.
Configuration management xconfig The button boxes are positioned to represent the actual boards as viewed from the left and right sides. Each of the configurable components of the node is in the display. The buttons are used as follows: • Green button—Indicates that the component is present and enabled. • Red button—Indicates that the component is software disabled in the system.
Configuration management xconfig Figure 28 xconfig window node control panel The node number is shown in the node box. A new number can be selected by clicking on the node box and selecting the node from the pulldown menu. A new complex can be selected by clicking on the complex box and selecting it from the pull-down. A node IP address is displayed along with the node number and complex.
Configuration management xconfig When a new node is selected and available, its data is automatically read and the node configuration map updated. The data image is kept on the teststation until it is rebuilt on the node using the Replace button. This is similar to the replace command on sppdsh. Even though data can be rebuilt on a node, it does not become active until POST runs again and reconfigures the system. The Reset or Reset All buttons can be used to restart POST on one or all nodes of a system.
Configuration management Configuration utilities Configuration utilities V2500 diagnostics provides utilities that assist the user with configuration management. autoreset autoreset allows the user to specify whether ccmd should automatically reset a complex after a hard error and after the hard logger error analysis software has run. autoreset occurs if a .
Configuration management Configuration utilities NOTE If there is a node_#.pwr file that is older than the node_#.cfg file, existing node configuration files do not need to be updated. est_config also generates a complex_uts.cfg file that can be compared against a complex.cfg file for accuracy and consistency. xsecure xsecure is an application that helps make a V2500 class teststation secure from external sources.
Configuration management Configuration utilities 52 Chapter 2
3 Power-On Self Test POST is the Power On Self Test firmware for the V-Class platform. POST provides processor and system hardware initialization functionality, as well as providing basic processor selftest and utilities board SRAM pattern test capability. This chapter describes how POST initializes a node and handles power up errors.
Power-On Self Test Overview Overview Upon power up, all processors and hardware must be initialized before the node proceeds with booting. POST begins executing and brings up the node from an indeterminate state and then calls OBP. None of the POST modules can be directly controlled via a user interface. Program control is provided by a set of configuration parameters (processing flags and variable definitions) stored in NVRAM by OBP, do_rest, or xconfig.
Power-On Self Test Overview • Hard reset—If a client had execution control before the hard reset, it invokes POST to initialize the hardware. POST restarts execution and reinitializes all hardware. • Soft reset—If a soft reset condition has occurred while POST was executing, POST restarts execution but does not initialize main memory. It invokdes its interactive prompt.
Power-On Self Test POST modules POST modules POST executes modules listed below in chronological order: • Processor Initialization and Selftest—Each processor initializes itself on power up or reset in parallel with the other processors.
Power-On Self Test POST modules • Page Deallocation Table Support—POST supports reading the page deallocation table (PDT) and remapping memory if it detects a bad page in the HPUX good-memory region. It updates all entries to reflect the new memory layout if remapping occurs. It also clears PDT if memory hardware change is detected. • Client Boot—POST cleans up any residual state from POST execution and boots the client specified in boot_module.
Power-On Self Test Interactive mode Interactive mode POST for the V2500 provides a command line interface for configuration and debugging. The command line interface is invoked if boot_module is set to “interactive,” by a soft reset, or a TOC during POST execution. Interactive mode commands POST supports the following commands at the line prompt: • help—Displays a list of supported commands and their usage. • banner—Displays the POST version and build information.
Power-On Self Test Interactive mode Configuration parameters The following parameters control the runtime operation of POST: • ts_ip—Specifies the teststation IP address for LAN messaging. The value should be set to the IP address of the diagnostics LAN port on the teststation. [default: 15.99.111.
Power-On Self Test Interactive mode Table 9 Name of CTI cache size IP address for listed utilities Utility Parameter name OBP cti-cache-size POST cti_cache_size sppdsh cti_cache_size • boot_module—Specifies which client to turn execution control over to at the completion of POST execution.
Power-On Self Test Interactive mode Table 12 Name of scuba test enable for listed utilities Utility Parameter name OBP scubatest? POST scuba_test_enable sppdsh scuba_test_enable • master_error_enable—Determines whether POST will enable errors or not. This is used in conjunction with use_error_overrides to determine how errors are enabled.
Power-On Self Test Interactive mode Table 15 Name of sforce monarch for listed utilities Utility Parameter name OBP force-monarch? POST force_monarch sppdsh force_monarch • monarch_number—Specifies the monarch processor when force_monarch is enabled.
Power-On Self Test Messages Messages POST has three types of messages: LCD, console, and error. This section discusses each type. LCD messages Each node has an LCD display. Figure 29 shows the display and indicates what each line on the display means.
Power-On Self Test Messages Table 17 Processor initialization steps Step Description 0 Processor internal diagnostic register initialization 1 Processor early data cache initialization. 2 Processor stack SRAM test.(optional) 3 Processor stack SRAM initialization. 4 Processor BIST-based instruction cache initialization. 5 Processor BIST-based data cache initialization 6 Processor internal register final initialization. 7 Processor basic instruction set testing.
Power-On Self Test Messages Status Description d DECONFIG: processor has been deconfigured by POST or the user. - EMPTY: Empty processor slot. ? UNKNOWN: processor slot status in unknown. Message display line The message display line shows the POST initialization progress. This is updated by the monarch processor. The system console also shows detail for some of these steps. Table 19 shows the code definitions.
Power-On Self Test Messages Console messages POST provides several messages that are displayed on the teststation console. This section describes these console messages. Type-of-boot This message reports the type of boot for the current POST execution, and the node ID and monarch processor. For example: POST Hard Boot on [0:PB1R_A] Version and build This message reports the version and build information for POST. For example: HP9000/V2500 POST Release 1.
Power-On Self Test Messages Main memory initialization This message reports that main memory initialization has started. For example: Starting main memory initialization. Memory probe This message reports the status of the memory boards as they are detected and probed for DIMMs For example: Probing memory: MB0L MB1L MB2R MB3R MB4L MB5L MB6R MB7R Installed memory This message reports the total memory installed and available, in megabytes.
Power-On Self Test Messages Each character indicates the physical location of the DIMM and the logical size of the DIMM. The memory information is encoded as follows: Value Memory Type . 32MB : 64MB | 128MB _ Empty # Hardware deconfigured $ Software (user) deconfigured For example: r0 r1 r2 r3 PB0L_A MB0L [.... ....][.... ....][____ ____][____ ____] PB1R_A MB1L [.... ....][.... ....][____ ____][____ ____] PB2L_A MB2R [.... ....][.... ....][____ ____][____ ____] PB3R_A MB3R [.... ....][.... .
Power-On Self Test Messages Booting Boombox Interactive boot This message indicates that POST is entering it's interactive mode. POST provides a console interface for system configuration and debug. For example: Booting Interactive Interactive prompt The following is the POST interactive prompt and is only seen if boot_module is set to interactive.
Power-On Self Test Messages the checksum and was rebuilt to the default structure. For example: Test Station Parameters checksum FAILED, rebuilding... This node may be forced with the sppdsh reboot default command Configuration map failure This message indicates that the configuration map structure failed the checksum and was rebuilt to defaults. Any user deconfigured hardware state is lost. For example: Configuration Map checksum FAILED, rebuilding...
Power-On Self Test Messages Memory board deconfiguration This message indicates that the specified memory board is deconfigured. This can be due to a memory board being found on one side of memory without a corresponding pair, since boards must be used in pairs of even/ odd boards. This can also occur when a memory board has no usable memory. For example: Deconfiguring: MB5L Illegal memory board configuration This message indicates that there is an unallowed memory board configuration.
Power-On Self Test Messages PB0L_B failed to go idle after memory init Unable to force CPU PB2L_A into idle loop Monarch completing memory initialization This message indicates that the monarch processor is completing the memory initialization assigned to the specified processor. For example: Using Monarch to initialize memory assigned to PB2L_A PDT checksum failure This message indicates that the page deallocation table structure failed the checksum and was rebuilt to defaults.
Power-On Self Test Messages Contiguous memory block not found This message indicates that POST could not find a block of contiguous memory to place at address zero to achieve good memory. POST will report no main memory to the OBP for this failure. For example: HP/UX good memory region could not be achieved. Processor not reported This message indicates that a processor failed to mark itself in the system report register.
Power-On Self Test Messages For example: cpu PB1R_A deconfigured due to PB1R_B shutdown. New monarch processor selected This message indicates that the previous monarch processor was deconfigured and a new one was selected.
4 Test Controller The Test Controller is an EEPROM-based utility that provides the environment for executing the offline diagnostic tests. It is controlled through parameters stored in the NVRAM on the Utilities board. The Test Controller reads these parameters to determine its execution mode, the number processors to test, which SMACs to include in the testing, which subtests to run, and other diagnostic test-specific information.
Test Controller Test Controller modes Test Controller modes There are three basic operational modes for this utility: • Stand-alone mode • Interactive mode • I/O Utility mode In stand-alone mode, cxtest invokes the Test Controller. The Test Controller reads test parameters from NVRAM (these parameters are written into NVRAM by cxtest before it invokes the Test Controller), executes the test and subtests specified in NVRAM, and sets a completion bit in NVRAM when the test and subtests are finished.
Test Controller User interface User interface The Test Controller provides for the control of offline diagnostic test execution. It utilizes a set of parameters to control its operation.
Test Controller User interface • Read and write the 128 words of test specific information • Select the hardware to test • Display the current parameter selections Main menu Test Controller Main Menu MAIN Menu commands 0=Quit Test Controller 1=Begin Test Controller Execution 2=Halt Test Controller Execution 3=Resume Test Controller Execution 4=Switch CPU 5=POST Boot Selection 6=Execution Mode Selection 7=Global Parameter Display 8=CPU Summary Display 9=Display CPU Errors A=Test Selection Menu B=Test Config
Test Controller User interface • 3=Resume Test Controller Execution—Continues execution from the point of interruption. • 4=Switch CPU—Allows the user to start the Test Controller on the specified processor. The previously used processor starts executing the command wait loop code.
Test Controller User interface • 8=CPU Summary display—Displays a summary of the current processor and testing information.
Test Controller User interface Example CPU summary display MAIN Menu - CPU Summary Display Total Failures = 0 Configuration Map ================= CPUs : 0 1 2 3* 4 5 6 7 8 CPUs : 16 17 18 19 20 21 22 23 24 SPACs : 0* 1* 2* 3* 4* 5* 6* 7* SMACs : 0* 1* 2 3 4 5 6 7 STACs : 0 1 2 3 4 5 6 7 SAGAs : 0* 1 2 3 4* 5 6 7 9 25 10 26 11 27 12 28 13 29 14 30 15 31 FAIL CPU STATE COUNT SUBTEST TEST NAME === ===== ===== ======= ========= 0 Not Available n/a n/a n/a 1 Not Available n/a n/a n/a 2 Not Available n/a
Test Controller User interface The possible states in the CPU Summary Display are described in Table 20. Table 20 Processor States CPU State Description Not Available Denotes processor is not available for testing. Running Denotes a test is currently running on this processor. Idle Denotes that no test is running on this processor. Ready Denotes last subtest completed and ready for next subtest. Test Completed Denotes test completed execution on this processor.
Test Controller User interface Example Test Parameters display.
Test Controller User interface Test Selection display MAIN Menu - Test Selection Display 0=Return to Main Menu 1=*Memory test 2=not available 3=not available 4=not available 5=I/O test 6=CPU selftests 7=not available 8=not available 9=not available A=not available Please enter number of test: • B=Test Configuration Menu—Switches the user to the Configuration menu shown below for the specified test.
Test Controller User interface • Selection 1 queries for the 40-bit address to read as follows: Enter 40-bit address: • Selection 2 queries for the 40-bit address and then for the 32-bits of data to write: Enter 32-bit data: • Selection 3 queries for the 40-bit address to read.
Test Controller User interface Test Configuration menu The Test Configuration menu is shown below: Test Configuration menu Test Configuration Menu 0=Return to Main Menu A=Hardware Selection Menu 1=Display ClassesB=Loop Enable 2=Display SubtestsC=Loop Count 3=Select ClassesD=Test Error Count 4=Select SubtestsE=Pause at Test Start 5=Read All Test ParametersF=Pause at Test End 6=Read One Test ParameterG=Pause at Subtest Start 7=Write Test ParameterH=Pause at Subtest End 8=Reset ParametersI=Pause On Failure 9=
Test Controller User interface Test Configuration menu - Subtest display Test Configuration Menu - Subtest Display Subtest 0 1 . . n* Description subtest 0 description subtest 1 description . . subtest n description An asterisk following the subtest number denotes that it is selected for execution. For example, see the “n subtest n description” line. • 3=Select Classes—Allows the user to specify which classes of subtests to execute. These selections are in addition to any subtests selected.
Test Controller User interface • 5=Read All Test Parameters—Reads all 128 words that make up the test parameter set and displays this information. These test parameters reside in NVRAM and are defined by the particular test.
Test Controller User interface Table 21 Parameter Defaults Parameter Default value Loop Enable 0 Loop Count 0 Test Error Count 1 Pause At Test Start 0 Pause At Test End 0 Pause At Subtest Start 0 Pause At Subtest End 0 • 9=Display Test Configuration—Displays the current values of the processor parameters. An example of the display is shown in the example below. An asterisk denotes the current selections. For Example, processor 0 is selected.
Test Controller User interface Test Configuration menu - Test Parameters display Test Configuration Menu - Test Parameters Display CPUs: ( 1) 0 1 2 3* 4 5 6 7 8 9 A B C D E F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F SPACs: ( 1) 0* 1* 2* 3* 4* 5* 6* 7* SMACs: ( 0) 0* 1* 2 3 4 5 6 7 STACs: ( 0) 0 1 2 3 4 5 6 7 SAGAs: ( 0) 0* 1 2 3 4* 5 6 7 Nodes: ( 1) Loop Enable: ON OFF* Loop Count: 00 Test Error Count: 01 Pause Test Start: ON OFF* Pause Test End: ON OFF* Pause Subtest Start: ON OFF* Pause Subtest End
Test Controller User interface • Multiple hardware component numbers separated by commas or spaces, for example 1,+2,-3. The format 2, or +2, denotes to use this hardware component in testing. The format -2 denotes not to use this hardware component in testing. The 1 and +1 formats are equivalent, and leaving a hardware component out of the list is equivalent to the -n format.
Test Controller User interface Pause at Test Start (0=disabled, 1=enabled): • F=Pause at Test End—Allows the user to modify the pause at test end flag. This flag results in the Test Controller pausing the testing on this processor after last subtest has completed execution and all cleanup is complete. The user is prompted for the new value as follows: Pause at Test End (0=disabled, 1=enabled): • G=Pause at Subtest Start—Allows the user to modify the pause at subtest start flag.
Test Controller Example of running diagnostics from Test Controller command line Example of running diagnostics from Test Controller command line This example shows how to run mem3000 from the Test Controller command line within the following scenario: • Configure mem3000 to run on a system with four memory boards installed. • Set the classes and subtests to be executed. • Run the tests. • View the results.
Test Controller Example of running diagnostics from Test Controller command line Step 2. From the Test Selection menu shown below, select Memory test, option 1. Test Controller Test Selection menu MAIN Menu - Test Selection Display 0= 1= 2= 3= 4= 5= 6= 7= 8= 9= A= Please Return to Main Menu Memory test not available not available not available I/O test CPU Selftests not available not available not available not available enter number of test: Step 3. Select option 0 to return to the Main Menu Step 4.
Test Controller Example of running diagnostics from Test Controller command line Step 5. From the menu, select Memory test, option 1. This opens the Test Configuration menu shown below: Test menu 1=*Memory test 2= not available 3= not available 4= not available 5= I/O test 6= CPU Selftests 7= not available 8= not available 9= not available A= not available Please enter number of test: Step 6. From the Test Configuration menu shown below, select the Hardware Selection menu, option A.
Test Controller Example of running diagnostics from Test Controller command line Step 7. From the Hardware Selection menu shown below, select CPUs, option 1. Selecting CPUs from Hardware Selection menu Test Configuration Menu - Hardware Selection Display 0=Return to Test Configuration Menu 1=CPU Selection 2=SPAC Selection 3=SMAC Selection 4=STAC Selection 5=SAGA Selection 6=Node Selection Step 8. At the following prompt: Select CPUs: 0 2 Select the number of processors (CPUs).
Test Controller Example of running diagnostics from Test Controller command line Step 3. From the Test Configuration menu, select Display Subtests, option 2.
Test Controller Example of running diagnostics from Test Controller command line Step 4. Select all appropriate subtests. Table 22 lists the test patterns for subtests 230 through 238.
Test Controller Example of running diagnostics from Test Controller command line Starting tests To run the tests selected from the Test Controller main menu, select Begin Test Controller Execution, option 1. The output is shown in the example below: Example of mem3000 execution % Enter command: 1 Execution Starting. .............................................................................. .............................................................................. ...................................
Test Controller Example of running diagnostics from Test Controller command line 100 Chapter 4
5 cxtest The cxtest program is a graphical front end and a command line interpreter for the test controller. It is a standalone program that runs independently of any diagnostic tests loaded in the EEPROM on the Utilities board.
cxtest Overview Overview The cxtest program runs on the teststation and communicates with the test controller via the NVRAM configuration parameters on the Utilities board. Depending on the command line, cxtest either starts the graphics display or runs as a command line interpreter. The GUI provides an easy and flexible way to select and run tests. The main screen has six drop down menus. The six menus are, File Menu, Test Menu, Global Parm Menu, Command Menu, System Configuration Menu, and Help Menu.
cxtest Overview • Retrieving error information from the test controller The test controller operates in the standalone mode when running in conjunction with cxtest. This is true whether one is using the command line version of cxtest or the graphics interface.
cxtest Graphics interface Graphics interface To start the cxtest graphics interface, specify the -d option on the command line as follows: % cxtest -d This causes cxtest to open a window on the display. Where the window is displayed is set by the environment variable $DISPLAY. This cannot be changed on the command line. The window has two areas of importance: • Menu selections • Test information display Menus There are six main menus in cxtest. Figure 30 shows the cxtest menu bar.
cxtest Graphics interface File menu The File menu has the following options: • Save Selections • Restore Selections • Log to File/Close Log File • Clear Log • Exit Save Selections The Save Selections option saves specific tests or configurations. Restore Selections With the Restore Selections option, the user runs specific tests without having to click on many buttons. Clear Display This option clears the browser of all text. It does not clear the log file.
cxtest Graphics interface The selections presented are based on whether the Test Controller has built a Subtest table and Class table in its tc_test_info_struct structure. Class menus Selecting a test opens a window that displays all classes for the test. See Figure 31. Down the left hand side of the window are a column of round buttons, and down the right hand side of the window are two columns of buttons.
cxtest Graphics interface The Defaults button installs test default values into all the parameters. If a class of tests has no parameters associated with it, the right most button (the square one) is not shown. Global Test Parameters menu cxtest provides the ability to loop on a number of tests by setting the Loop Enable count. The looping parameter is applied on a per test basis and is applied to all the tests.
cxtest Graphics interface Command menu The Command menu is used to perform actions on the node or complex being tested. These actions include: • Go • Reset Machine • Read Boot Config Map The Go selection starts the subtests. The subtests are sent to the test controller one at a time so that the application can detect the completion of each subtest. While running, an Abort button appears at the bottom of the screen.
cxtest Graphics interface Figure 33 System configuration window Help menu The Help menu has two entries: About and Contents. The About selection displays the version number of cxtest running and the Contents selection opens a browser that can scroll through the help file. Display area The display area shows the output of the tests. This output consists of messages that indicate when the tests start, the amount of time that the test has be running, and any error information.
cxtest Graphics interface Powering down the system When using cxtest in a troubleshooting environment, it is not necessary to exit and enter cxtest each time the power is cycled. To remove power to the system (for example, to move a board), power the system down leaving cxtest running. Make sure that no tests are actively running. Once power is restored, POST returns control to the test controller in the stand-alone mode. The user must also wait for the ccmd routine to regenerate the database.
cxtest Command line interface Command line interface cxtest is a utility that allows the user to run tests loaded into the Test Controller. Tests can be specified on the command line or a Graphic User Interface can be started to simplify test selection. cxtest allows use of the Test Controller without being at the system console. NOTE The -d option must be used on the command line to start the GUI interface to cxtest. By default, cxtest tries to load the test information needed from a file.
cxtest Command line interface Command line test selections The command line interface deciphers the following switches to select tests. • -mem—Memory diagnostic. • -io—I/O diagnostic. • -cpu—processor diagnostic. All the arguments between two test selections apply only to first test specified as in the following example: Example cxtest command line cxtest -mem -lt 3 -c 4 -io -c 2 The looping specification only applies to the memory test which runs the class-4 tests three times.
cxtest Command line interface Looping and Pause Controls Description -lt Execute of loops of all the tests that follow -t Switches the test controller to run on processor . (range 0-15) -mt Allows specification of error count To set the number of times a test is looped on use the -lt option.
cxtest Command line interface To specify a list of subtests. place a comma between the numbers. As an example, -s 100,150,140, runs subtest 100, then subtest 150, and finally subtest 140. Command line parameter specifications To specify the value of a parameter for a test, use the -pa# option.
cxtest Example of running diagnostics from cxtest window Example of running diagnostics from cxtest window The following example procedure shows the user how to use mem3000 from cxtest. It assumes that the node configuration has been set up using the main cxtest window. Step 1. From the cxtest main menu Tests option, select MEM3000 - EEPROM based memory tests. This opens the class selection window shown in Figure 34. Figure 34 mem3000 Test Class Selection window Step 2.
cxtest Example of running diagnostics from cxtest window Figure 35 mem3000 Class 1 Subtest Selections window Step 4. In the Subtest Selections window for each class, click the button for subtest to be executed. Any combination of subtests may be executed. Step 5. To set the parameters for each class of test, click the appropriate Show Parameters button in Test Class Selections window. This opens the Class Parameters window. Figure 36 shows the mem3000 Class 1 Test Parameters window.
cxtest Example of running diagnostics from cxtest window Step 6. To start the selected tests and subtests, click the Go option in the Command menu in the cxtest main window. Step 7. View the results in the lower window pane of the cxtest main window.
cxtest Example of running diagnostics from cxtest window 118 Chapter 5
6 Processor-dependent code firmware loader The processor-dependent code firmware loader (pdcfl) is a firmware module with the capabilities of loading other firmware modules into FLASH. It is intended to speed up download of POST and OBP on newly manufactured or malfunctioning utility boards. If the target system can successfully boot OBP, OBP should be used to download firmware in favor of pdcfl. Pdcfl can be loaded into FLASH using load_eprom as a stand-alone, potable module.
Processor-dependent code firmware loader pdcfl loading, booting, and setup pdcfl loading, booting, and setup NOTE This step should not be necessary under normal circumstances. pdcfl is loaded on all Utility boards at the factory. If the utility board FLASH contents have been erased, pdcfl may be loaded into the Utility board using load_eprom. load_eprom supports a -f option for loading pdcfl to the appropriate sector in FLASH memory.
Processor-dependent code firmware loader pdcfl loading, booting, and setup This requires making these entries to the following files: To /etc/services make the following entry: tftp 69/udp Trivial File Transfer Protocol To /etc/inetd.conf make the following entry: tftp dgram udp wait root /usr/lbin/tftpd tftpd -R 15 Also send a HUP to inetd.
Processor-dependent code firmware loader pdcfl commands pdcfl commands From the pdcfl prompt, the following commands are supported: • printenv [variable]—Prints configuration variables from NVRAM. • setenv variable value—Allows setting configuration variables in NVRAM. • lifls—Prints a listing of the LIF volume in the FLASH EEPROMs.
Processor-dependent code firmware loader pdcfl commands An example of the fload command PDCFL> fload post.fw POST TFTP server : 15.99.103.191 CUB IP : 15.99.111.150 Reading : post.fw Writing : POST (each '.' represents 4K copied) Sector erased 0xF0020000 ....................................... Sector erased 0xF0040000 ............ 148384 bytes transferred • reset [post]—Resets the node, optionally changing the boot vector to point to the POST module.
Processor-dependent code firmware loader pdcfl commands 124 Chapter 6
7 cpu3000 This chapter describes cpu3000 processor test cpu3000 runs via the test controller and provides a basic test of the functionality of the PA8500. cpu3000 requires a minimum of one processor with its associated SPAC and two EWMBs. Included in the testing are most of the instruction set, the ALU, general, space and control registers, external interrupts, RDRs, TLB RAM, the instruction cache, and the data cache.
cpu3000 cpu3000 classes and subtests cpu3000 classes and subtests cpu3000 consists of a series of tests grouped together in classes beginning with verification of the most basic functionality and progressing toward more complex functionality. Each class has subtests which target specific functionality. When a failure is encountered, the chassis code is available through the test controller along with the progress value. cpu3000 classes cpu3000 has five classes of tests shown in Table 25.
cpu3000 cpu3000 classes and subtests Table 26 cpu3000 Class 1 subtests Subtest Name Description 100 Processor basic Verifies the majority of registers and a basic set of instructions. Chassis code: 0x41020. 101 Processor-ALU Verifies the processor and arithmetic Logic unit (ALU) functionality. Chassis code: 0x41021. 102 Processor branch Verifies the branch instructions. Chassis code: 0x41022.
cpu3000 cpu3000 classes and subtests Subtest Table 27 Diagnostic register Verifies the local Diagnose Registers. Chassis code: 0x4102a. 141 Remote diagnostics registers Verifies the remote Diagnose Registers. Chassis code: 0x4102b. 150 Register bypass Verifies the register bypass functionality of the processor. It tests three different types of bypassing that can occur between the two integer queues. Chassis code: 0x4102c.
cpu3000 cpu3000 classes and subtests Table 30 cpu3000 Class 5 subtests Subtest Name Description 500 Late-early self test (LST-EST) Runs subtests 100, 101, 102, 103, 104, 105, 120, 130, and 150, first in main memory and then in the Icache.
cpu3000 cpu3000 classes and subtests Subtest Name Description 540 Dcache miss Verifies that data can be encached from coherent memory. Chassis code: 0x44060. 560 TLB transfer Verifies TLB hits and misses, as well as access rights and protection ID validation. Chassis code: 0x410b2. 570 Floating point unit Verifies the floating point unit. It consists of several groups of tests that include testing of the FPU registers, instruction tests, trap handling, and access rights and ID validation.
cpu3000 cpu3000 errors cpu3000 errors When a failure occurs, the chassis code is available through the test controller, along with the progress value. The progress value indicates what portion of the subtest encountered the error.
cpu3000 cpu3000 errors 132 Chapter 7
8 io3000 The I/O diagnostic supports Symbios 875 HVD SCSI controllers, Symbios 895 LVD SCSI controllers, and Tachyon Fibre Channel controllers. io3000 requires a node with a minimum of one processor, one SIOB with associated SPACs, and two EWMBs with associated SMACs. To exercise peripherals, either a Symbios SCSI or a Tachyon Fibre Channel card is required.
io3000 io3000 classes and subtests io3000 classes and subtests io3000 consists of a series of tests grouped together in classes beginning with verification of the most basic functionality and progressing toward more complex functionality. Each class is broken down into subtests which target specific functionality. The following sections describe the classes and individual subtests. io3000 classes io3000 has 10 classes of tests shown in Table 31.
io3000 io3000 classes and subtests Class Name Description 11 SAGA SCSI Tape Interface Test Verifies the ability to successfully issue SCSI commands to every selected tape drive. 12 Symbios Test Verifies the basic functionality of the Symbios SCSI controller. 15 CDROM SCSI Access Test Verifies basic SCSI bus access. 16 Tachyon SAGA PCI Access Test Verifies the SAGA PCI interface to all selected Tachyon controllers io3000 subtests The io3000 subtests are listed in Table 32 through Table 41.
io3000 io3000 classes and subtests Table 33 io3000 Class 2 subtests Subtest Name Description 200 Context/ shared memory read/ write Writes to the first 64-bit location of each context SRAM and reads them to verify that they can be uniquely accessed. 205 Context/ shared memory access width Verifies that all supported access widths of context SRAM function properly by writing and reading the first 64-bit location.
io3000 io3000 classes and subtests Subtest Name Description 235 Prefetch memory march C- Verify writes and reads to all of prefetch memory using a bitwise march Calgorithm. The default option does a shortened version of the march Calgorithm by using a limited pattern set. The march C- complete enable can be set to do a full march C- test. The test time increases by a factor of approximately four. The fault coverage for full march Cincreases from approximately 99% to 100% of targeted faults.
io3000 io3000 classes and subtests Table 34 io3000 Class 5 subtests Subtest Name Description 500 SCSI disk test unit ready A SCSI test unit ready command is issued to all selected devices at least twice. This first time, it should return with a SCSI check condition (not reported to the user) since the SCSI bus has been reset. The command is retried after approximately one second. If the second test unit ready fails, an error is reported.
io3000 io3000 classes and subtests Table 35 io3000 Class 6 subtests Subtest Name 600 Channel init, ATPR = 0x0 625 Channel init, write tlb, data prefetch, ATPR = 0xa 630 Channel init, tlb prefetch, ATPR = 0xc 635 Channel build, ATPR = 0xc 640 Channel init, tlb & data prefetch, ATPR = 0xe 645 Channel build, ATPR = 0xe 650 Channel context access 605 Channel build, ATPR = 0x0 Chapter 8 Description Verifies selected SAGA channels in virtual mode.
io3000 io3000 classes and subtests Subtest Name Description 610 Channel init, data prefetch, ATPR = 0x2 615 Channel build, ATPR = 0x2 620 Channel init, write tlb, ATPR = 0x8 Subtests 600-645 create channels by writing to the SAGA channel builder CSR. The method of channel creation and the specific mode (ATPR setting) is specified in the subtest’s one line description. Each test will write data to the disk and read it back and verify it.
io3000 io3000 classes and subtests Subtest Name Description 725 Jump outside of a page (TLB not encached) Verifies a DMA jump outside of a page. The TLB for the destination page is not encached in context SRAM. This means that SAGA must fetch a new TLB before the transfer can continue. This is done for both writes and reads. 730 Jump outside of a channel Verifies a DMA jump outside of the current channel. This is done for both writes and reads.
io3000 io3000 classes and subtests Table 37 io3000 Class 8 subtests Subtest Name Description 800 Multidisk nonmixed traffic Issues all selected devices simultaneous SCSI writes and then SCSI reads. The channels are programmed in virtual mode, with data and TLB prefetch turned on. 805 Multidisk mixed traffic, ATPR = 0xe All selected devices transfer data simultaneously.
io3000 io3000 classes and subtests Table 38 io3000 Class 11 subtests Subtest Name Description 1100 SCSI tape test unit ready Issues a SCSI test unit ready command to all selected devices at least three times. This first time the SCSI bus will have been reset. This is normal. The command is retried after approximately one second. The command is issued again to allow for a check condition due to the medium being changed.
io3000 io3000 classes and subtests Table 39 io3000 Class 12 subtests Subtest Name Description 1200 Symbios PCI configuration space test Verifies the ability of the SAGA to access the Symbios SCSI controller by way of the PCI configuration space. Verifies the PCI vendor ID and device ID fields to be 0x1000 and 0x000f, respectively. Also verifies the base address registers to be writable and readable.
io3000 io3000 classes and subtests Subtest Name Description 1230 Symbios SCSI Scripts RAM test Performs a simple data equals address pattern test of the SCRIPT RAM. 1240 Symbios SCSI Interrupt test Copies a simple SCRIPTS instruction to SCRIPTS RAM on the Symbios controller. The SCRIPTS instruction is a simple INT opcode which, when executed by the Symbios chip, should cause a DMA interrupt to be logged.
io3000 io3000 classes and subtests Table 40 io3000 Class 15 subtests Subtest NOTE Name Description 1500 SCSI CDROM test unit ready Issues a SCSI test unit ready command to all selected devices at least twice. The response to first command should return a SCSI “check condition” (not reported to the user) since the SCSI bus will have been reset. After approximately one second, the command is sent again. If the second test unit ready fails, an error is reported.
io3000 io3000 classes and subtests Table 41 io3000 Class 16 subtests Subtest Name Description 1600 Tachyon PCI configuration space test Verifies the ability of the SAGA to access the Tachyon Fibre Channel controller by way of the PCI configuration space. Verifies the PCI vendor ID and device ID fields to be 0x107e and 0x0004, respectively. Also verifies the base address registers to be writable and readable.
io3000 io3000 classes and subtests Table 42 io3000 test parameters Words Description 0 See Table 43. 1 Device write enable mask—Each bit in the mask corresponds with a device. Bit 0 (MSB or left most bit in the parameter word) corresponds to device 0, bit 29 corresponds to the last (29th) device. Device 0 is the first device parameter location in user parameter word 8 (see Words 819 Device specification below). A binary '0' in a device's bit field means that SCSI writes (to that disk) are not enabled.
io3000 io3000 classes and subtests Table 43 io3000 user test parameter word 0 bit definition Bit Description 0-23 Unused 24 Force code copy enable—Setting this bit causes all subtests that use encached routines to copy the code segment from flash into main memory. The copy will be performed even if the previous subtest already performed the copy. This feature should not be needed unless the code in main memory is being corrupted in a manner that cannot be easily detected.
io3000 io3000 classes and subtests Device specification Due to Core Logic SRAM space limitations, only 20 devices per SAGA can be tested at a time. Up to 24 SCSI devices can be specified using parameter words 8-19. Each of these parameter words contains two device specifications, as shown in Figure 37. Word 8 contains device specification 0 and 1. Word 9 contains 2 and 3, and so on. Up to six Fibre Channel devices can be specified in parameter words 2037.
io3000 io3000 classes and subtests Table 44 io3000 bit definition for direct SCSI device specification (words 8-19) Bit Figure 38 Definition 0-3 SAGA 4-7 Slot 8-11 SCSI target 12-15 SCSI lun 16-19 SAGA 20-23 Slot 24-27 SCSI target 28-31 SCSI lun io3000 test parameter device specification for Fibre Channel attached SCSI targets (words 20-37) Words 20-22 FC device 0 saga/slot/alpa FC device 0 lun hi FC device 0 lun lo Word 23-25 FC device 1 saga/slot/alpa FC device 1 lun hi FC devi
io3000 io3000 classes and subtests Table 45 io3000 bit definition for Fibre Channel attached SCSI device specification (words 29-37) Location Bit Definition Word n 0-3 SAGA Word n 4-7 Slot Word n 8-31 AL_PA (or D_ID) Word n+1 0-31 FC lun hi Word n+2 0-31 FC lun lo Devices are numbered according to their position in the parameter list. A device can be specified in any of the device specification locations in user parameter space.
io3000 io3000 classes and subtests Table 46 io3000 SAGA name to number correlation SAGA name SAGA number IOLF_A 4 IOLF_B 0 IOLR_A 5 IOLR_B 1 IORR_A 6 IORR_B 2 IORF_A 7 IORF_B 3 Chapter 8 153
io3000 io3000 error codes io3000 error codes When a failure is encountered, an event code is set along with an error message. The least significant 12 bits of the event code contain the error code. Table 47 lists the io3000 error codes. io3000 general errors io3000 general error codes post no error messages. Table 47 shows each io3000 general error code. Table 47 io3000 general error codes Code Description 0x1 Core logic SRAM allocation failure.
io3000 io3000 error codes io3000 device specification errors io3000 device specification errors post the following error message: SAGA_name/ctlr_num/tgt_num/lun_num Example of io3000 device specification error message: IOLF_A/ct0/idf/lu0 Table 48 shows each io3000 general error code. Table 48 io3000 device specification error codes Code Description 0x8 Duplicate device specification. The same device was specified multiple times in the user parameters. 0x9 Invalid SAGA number.
io3000 io3000 error codes Table 49 io3000 SAGA general errors Code Description 0x10 An SAGA specified in the user parameters was not available. 0x11 Unable to reset SAGA. io3000 was unsuccessful in setting or resetting the SAGA online bit on it’s associated SPAC. 0x12 Data prefetch timeout. The prefetch valid bits in the channel context never became valid, or did so too slowly.
io3000 io3000 error codes io3000 SAGA ErrorInfo CSR error The io3000 ErrorInfo CSR error code posts the following error message: SAGA_name/cause_bit/address/act_val Example of io3000 SAGA ErrorInfo CSR error: IOLF_A/5/fc210098/10e0000f0c000000 Table 51 shows the io3000 SAGA ErrorInfo CSR error code. Table 51 io3000 SAGA ErrorInfo CSR error Code 0x50 Description SAGA ErrorInfo CSR failure.
io3000 io3000 error codes Table 52 io3000 SAGA ErrorCause CSR errors Code Description 0x54 SAGA ErrorCause CSR failure. 0x55 SRAM parity error expected. This error occurs when the cci_rdperr bit in the SAGA ErrorCause does not get set when SRAM parity errors are forced. 0x58 PCIx status failure.
io3000 io3000 error codes io3000 controller general errors io3000 Controller general error codes post the following error message: SAGA_name/ctlr_num Example of io3000 controller general error message: IOLF_B/ct0 Table 54 shows each io3000 general controller error code. Table 54 io3000 Controller general errors Code Description 0x80 The controller was not detected as present per the SAGA’s PcixStatCSR PCI card present bits. 0x81 SCSI flash read error.
io3000 io3000 error codes Table 55 io3000 PCI errors Code Description 0x90 PCI vendor id failure. io3000 was unable to successfully read the controller’s PCI vendor id 0x91 PCI device id failure. io3000 was unable to successfully read the controller’s PCI device id. 0x92 PCI io base address register failure. io3000 was unable to successfully read and write the controller’s PCI io base address register. 0x93 PCI memory base address register failure.
io3000 io3000 error codes io3000 DMA error The io3000 DMA error code posts the following error message: SAGA_name/ctlr_num/tgt_num/lun_num/address/act_val/ exp_val Example of io3000 DMA error message: IOLF_A/ct0/idf/lu0/0004148200/a5a5a5a4/a5a5a5a5 Table 57 shows the io3000 DMA error code. Table 57 io3000 DMA error Field 0xd0 Description Data miscompare on DMA. Data in the destination buffer does not match data in the source buffer.
io3000 io3000 error codes Example of io3000 Symbios controller specific error message: IOLF_B/ct1/f804000010/ffffff01/00000001 Table 59 shows each io3000 Symbios controller specific error code. Table 59 io3000 Symbios controller specific errors Code Description 0x110 General failure detected on Symbios controller. 0x113 Error detected during SCRIPTS RAM pattern testing. 0x114 Interrupt test failed. The address is the address of the interrupt register.
io3000 io3000 error codes io3000 DIODC driver errors io3000 Diagnostic I/O Dependent Code (DIODC) driver error codes post the following error message: SAGA_name/ctlr_num/tgt_num/lun_num/ctlr_status/dev_status Example of io3000 DIODC driver error message: IOLF_A/ct1/ct0/idf/lu0/81/0 Table 61 shows each io3000 Symbios controller specific error code. Table 61 io3000 DIODC controller specific errors Code Table 62 Description 0x120 General controller error.
io3000 Notes on io3000 Notes on io3000 io3000 dumps trace data into Core Logic SRAM to troubleshooting failures. A script provided with io3000 called io_tr is located in the scripts directory (located in /spp/scripts at the time of this writing) that views this trace data. io_tr prints the version of io3000 from which it was built. If the versions does not match, there is no guarantee that the information presented will be correct.
9 mem3000 This chapter describes mem3000, a memory test for V2500 systems. mem3000 is core logic flash-based memory diagnostic that verifies the functionality of the memory subsystem. mem3000 requires a node with a minimum of one processor with two memory boards. Excalibur W Memory Boards (EWMBs) must be installed in pairs in order for the test to properly execute.
mem3000 mem3000 classes and subtests mem3000 classes and subtests mem3000 verifies the V2500 memory subsystem using the Test Controller. mem3000 requires one node with a minimum of one process with associated SPAC and two EWMBs with associated SMACs. mem3000 consists of a series of tests grouped together in classes beginning with verification of the most basic functionality and progressing toward more complex functionality. Each class has several subtests that target specific functionality.
mem3000 mem3000 classes and subtests mem3000 subtests The mem3000 subtests are listed in Table 64 through Table 69.
mem3000 mem3000 classes and subtests Table 66 mem3000 class 3 subtests Subtest Table 67 Description 300 Verifies the memory lines on each DIMM can be written and read using coherent operations 310 Verifies the data portion of a memory line using an addressing pattern with coherent operations 311 Verifies the data portion of a memory line using a byte uniqueness pattern with coherent operations 330-338 Verifies the data portion of a memory line using the MarchC algorithm and different patterns wit
mem3000 mem3000 classes and subtests Subtest Table 69 Description 510 Verifies ECC double bit data errors are detected and logged using coherent operations 520 Verifies ECC double bit data errors are detected and logged using non-coherent operations 530 Verifies that ECC errors are ignored when disabled mem3000 class 6 subtests Subtest Description 600 Verifies the memory system detects and reports accesses to all illegal and/or invalid memory space 610 Verifies the memory system detects and r
mem3000 V2500 memory configurations V2500 memory configurations In the V2500 server, Excalibur Pluggable Memory Boards (EPMBs) are installed in 16 DIMM connectors on the EWMBs. A V2500 memory board is organized by quadrants, rows, and buses. Each memory board has four quadrants, four rows and eight buses. The following terms are used to describe a V2500 memory board, as shown in Figure 39: Slot The physical location into which DIMMs are installed.
mem3000 V2500 memory configurations Table 70 DIMM row/bus table Rows 0 Buses 0 1 2 3 4 5 6 7 Q0B0 Q0B1 Q0B2 Q0B3 Q1B4 Q1B5 Q1B6 Q1B7 Q2B0 Q2B1 Q2B2 Q2B3 Q3B4 Q3B5 Q3B6 Q3B7 1 2 3 V2500 DIMM quadrant designations Memory boards can be populated in increments of four DIMMs called quadrants.
mem3000 V2500 memory configurations Figure 39 V2500 DIMM locations Example: Q2B3: Quadrant 2, Bank 3 V2500 DIMM configuration rules Use the following rules to plan the memory board DIMM configuration: • All memory boards must be populated identically. • Single node memory boards may be populated in 1/4, 1/2, 3/4, or full increments. • Multi node memory boards may be populated in only 1/4, 1/2, or full increments. • All DIMMs within a quadrant must be of the same size: 32 Mbyte, 128 Mbyte or 256 Mbyte.
mem3000 V2500 memory configurations • DIMMs in quadrant 1 can be of a different size than DIMMs in quadrant 2 or 3 without degrading performance. • DIMMS in quadrant 0 and 1 should be the same size for maximum performance. • DIMMS in quadrant 2 and 3 should be the same size for maximum performance. • DIMMs in quadrant 0 can be of a different size than DIMMs in quadrant 1. To allow this memory to be fully utilized, the bus interleave span will be reduced to 4 way bus interleaving.
mem3000 User parameters User parameters The Test Controller allows mem3000 20 user parameters. Table 73 defines these parameters: Table 73 User parameter definitions Words Usage 0/1 64-bit user pattern 0 used in subtests 238 and 338 (defaults=0xa5a5a5a5/0xa5a5a5a5) 2/3 64-bit user pattern 1 used in subtests 238 and 338 (defaults=0x5a5a5a5a/0x5a5a5a5a) 4 Denotes 88-bit DIMMs are installed (default=2) 5 Denotes test is to run with errors disabled (default=0) 6/7 Octant mask.
mem3000 User parameters Figure 40 Format of parameter 6 0x XX XX XX XX Board 0 Board 1 Board 2 Board 3 Parameter 7 contains the masks for boards 4-7 in the order shown in Figure 41. Figure 41 Format of parameter7 0x XX XX XX XX Board 4 Board 5 Board 6 Board 7 As an example, the Octant Mask for board 0 is encoded in the first two digits of Parameter 6. Subtests 100, 101, 150, and 310-338 DO NOT use the Octant Mask. Subtests 100 and 101 test CSRs on all enabled SMACs.
mem3000 mem3000 error codes mem3000 error codes When a failure is encountered, an event code is set along with an error message. The least significant 12 bits of the event code contain the error code. Table 74 lists the mem3000 error codes.
mem3000 mem3000 error codes Code Meaning 033 SMAC did not log the occurrence of a single bit ECC failure 035 SMAC did not log the occurrence of a double bit ECC failure 040 Data miscompare error occurred in sequence #1 of MarchC test (upper 32-bits) 041 Data miscompare error occurred in sequence #1 of MarchC test (lower 32-bits) 042 Data miscompare error occurred in sequence #2 of MarchC test (upper 32-bits) 043 Data miscompare error occurred in sequence #2 of MarchC test (lower 32-bits) 044
mem3000 mem3000 error codes Code Meaning 0d0* Tag state did not equal INVALID as it should have 0e0* An unexpected error was detected in the SMAC error CSRs 100* Uninstalled Memory 110* Invalid CSR 120* Network Cache 130* Unprotected Memory 140* Alternate Interleave 150 An HPMC was detected on access to the specified address 200 Denotes the EWMB contains all 80-bit DIMMs 201 Denotes the EWMB contains all 88-bit DIMMs 202 Denotes the EWMB contains a mixture of 80-bit and 88-bit DIMMs
mem3000 mem3000 error codes Code Table 76 Meaning code+6 Error address CSR miscompare error (upper 32-bits) code+7 Error address CSR miscompare error (lower 32-bits) code+8 Error info CSR syndrome code miscompare error Patterns used in specified subtests Subtest Pattern 230/330 0x7f7f7f7f7f7f7f7f and 0x8080808080808080 231/331 0xbfbfbfbfbfbfbfbf and 0x4040404040404040 232/332 0xdfdfdfdfdfdfdfdf and 0x2020202020202020 233/333 0xefefefefefefefef and 0x1010101010101010 234/334 0xf7f7f7f7f7
mem3000 mem3000 error codes Figure 42 Type one error message format MBxx_M/BxSx/xxxxxxxxxx/xxxxxxxx/xxxxxxxx/xxxxxxxx Field 1 Field 2 Field 3 Field 4 Field 5 Field 6 There are six fields separated by / symbols.
mem3000 mem3000 error codes The two fields of the type two error are as follows: • Field 1—Specifies the EWMB to which the information pertains • Field 2—Specifies the type of DIMM detected as follows: • x—Non-existent DIMM • 0—80-bit DIMM • 1—88-bit DIMM The correspondence of these values to the actual DIMM locations is shown in Figure 44.
mem3000 Notes on mem3000 Notes on mem3000 There is a dependency upon POST to initialize the memory system. This test uses many of the CSR values from POST and does not reconfigure the system. There are some exceptions in which CSR values need to be changed in order for the test to run. In these cases, CSR values should be returned to their previous value upon successful completion of the subtest.
10 Scan test The Exemplar scan test (est) is a diagnostic utility that uses the system scan hardware making it possible to perform connectivity tests and to test gate array internal registers. The est utility runs on the teststation and sends scan instructions to a given node by way of the Ethernet.
Scan test est utility test environment est utility test environment est is started on the teststation and is located in /spp/bin/est. The user has the option of either starting up a user interface or having the est utility run a script. est works on one node at a time by sending scan instructions and data and receiving the results over the diagnostic ethernet connection. Since est has to communicate closely with the Utilities board, no other diagnostic can be run at the same time.
Scan test est utility test environment To perform ID and ring checks in the utility system, the user should turn off the power control feature either though the command line argument -p or through a runtime option command (power_control). The latter should seldom occur, because est automatically runs these tests on the utility scan path at start up and reports any errors found. est exit and reset To quit, est calls a script called est_exit.
Scan test Running the est GUI Running the est GUI The est GUI may be started at the command prompt. The following is the est command usage: /spp/bin/est [-option] node_number As an example to bring up the GUI and test node 0, enter the following command: % /spp/bin/est -x 0 Table 77 on page 200 provides a complete list of options. Figure 46 shows the est main window. Figure 46 est main window The main window has two sections. The upper section has two rows of buttons.
Scan test Running the est GUI The lower set of buttons allows the user to quickly and easily run the scan tests in a wholesale fashion. The test can be modified to run fewer patterns, to loop continuously or for a finite number of times, to test nondefault limits, etc. Each button is explained in the following sections. System Test button Clicking the System Test button runs each set of tests in the following order: ring tests, dc connectivity tests, ac connectivity tests, and gate array tests.
Scan test Running the est GUI Files button Clicking the Files button opens pop-up menu with three selections: • Execute Scripts—Runs a file containing est commands. • Reset Log File—Clears the log file. • Exit—Closes the est main window and exits the program. Options button Clicking the Options button opens pop-up menu with seven selections: • Log_File—Generates a log file and stores it in /spp/data/est.log. • Stop On Error—Causes the test(s) to halt whenever an error is detected.
Scan test Running the est GUI Clocks button Clicking the Clocks button opens pop-up menu with four selections: • Upper—Sets the upper limit of the system clocks. • Nominal—Sets the system clocks to their nominal values. • External—Selects an external clock from the ECUB. • Status—Displays the current settings of the power supply voltages (upper, normal, or lower). When this option is invoked, it displays both the clock and power supplies settings.
Scan test Running the est GUI • Command Menu—Opens the command line window which allows the user to enter est commands directly from the GUI system. • Scan Debug Menu—Opens the debug window. • Connectivity Test Menu—Opens the connectivity test window. • Gate Array Test Menu—Opens the gate array test window. Gate array tests use test vectors that have been generated for the certain arrays (each array has multiple files associated with it). • Sci Test Menu—Opens the SCI test window.
Scan test Running the est GUI Figure 48 est connectivity window To select a connectivity test, click on either the dc or ac button in the Connectivity Test panel. In the Pattern panel, clicking the All button runs each test pattern. est creates the patterns on the fly based on the number of testable wires in the system. The user can also select the starting and ending patterns by clicking the button next to the start field. Enter the appropriate data in the Start and End fields.
Scan test Running the est GUI Gate array test window The gate array test window provides a means to test all gate arrays in the Exemplar system. The window is simple to use. Figure 49 shows the est gate array test window. Figure 49 est gate array test window In the top panel, enter the following data in the appropriate fields: • Board—Sets the location of the gate array. • Type—Sets the type of gate array. • Refdes—Sets the reference designation of the gate array.
Scan test Running the est GUI The next lower panel determines which and how many patterns are used in the gate array test. The test normally uses all patterns, but, for troubleshooting, you may set the starting and ending patterns, set the maximum number of patterns (a range of patterns), or set a single, custom pattern. Enter the following test pattern information in the appropriate fields: • Start—Sets the starting pattern. • End—Sets the ending pattern. • Pattern—Sets a custom pattern.
Scan test Running the est GUI Scan window The scan window provides means of testing the system scan rings. Figure 50 shows the est scan window. NOTE For more information on scan rings and modes, see the IEEE 1149.1 JTAG specification. Figure 50 est scan window The window has three panels: Ring, Scan, and Pattern. Clicking the buttons in the Ring panel has the following effect: • All—Tests all available rings in the system.
Scan test Running the est GUI Clicking the buttons in the Scan panel sets the scan paths. All scan modes can be selected or the test can be set up to test the individual pathways as follows: • All—Tests all scan modes. • Bypass—Test the bypass ring. • ID—Tests JTAG identification ring. • Boundary—Tests the ring boundary. • Internal—Test the internal ring. In the Pattern panel, clicking the All button causes the test to use all available patterns.
Scan test Running the est GUI SCI cable test window The SCI cable test window provides a means to test the cables that connect the scalable coherent interfaces between nodes. All cables are tested by default, but an individual cable can be tested using this window. Figure 51 shows the est SCI cable test window. Figure 51 est SCI cable test window In the top panel are two rows of fields and buttons that determine source port (Driver) of the cable and the destination (Receiver).
Scan test Running the est GUI Help Clicking the Help button opens pop-up menu with five topic selections: • Overview • Commands • GUI • Input Files • Options Clicking on one of these options opens the Help window shown in Figure 52. This window is initially blank. To open the topic of interest, click the Browser button. This opens the Help browser window shown in Figure 53. Double click on a topic listed in the browser.
Scan test Running the est GUI Figure 52 est Help window 198 Chapter 10
Scan test Running the est GUI Figure 53 est Help browser window Chapter 10 199
Scan test Running est from command line Running est from command line The following is the command line usage for est: est [-options] For example, to test node 0, enter: % est 0 est reads configuration information from files stored in /spp/data (e.g node_0.cfg). These configuration files are automatically generated by ccmd each time the system is powered up. While ccmd is running, it prints its status to the console window.
Scan test Running est from command line Option Description -P Do not let est handle the MIB power control -U UTS support option -Y Force est_config to be run -Z Force est_config not to be run Some examples of est usage are: est -v est -l -f my_script 0 est -o ./my_log_file 0 The est utility uses certain data and vector files located in the /spp/est directory. Unless disabled or redirected, the est utility will generate a log file, est.log, and store it /spp/data/est.log.
Scan test Running est from command line Example of output when est is started: % est 0 Excalibur Scan Test 1.0.0.2 ......................... ..... General EST c ... r ... d ... a ... g [options] 1998/11/25 10:32:58 Steven Terry Tests: compare id’s to config file scan ring test board level dc tests board level ac tests [file] ... gate array tests Special Scan Tests: b ... bypass/id test i ... print id’s found in design EST Options: F ... set option & debug flags q ... quit nicely, ask first qq ...
Scan test Running est from command line Example output when using the est -h option: % est -h Excalibur Scan Test 1.0.0.2 1998/11/25 10:32:58 Steven Terry usage: est [-options] [server] node [-cp port] [-sp port] options: -h ... print this help message -v ... print the version of the program and exit -l ... turn OFF log file for this session -f ... get commands from -o ... redirect log file to -x -y -C -V ... ... ... ...
Scan test Running est from command line Table 78 AC Connectivity test options Option Description -s Step mode (for debug purposes). -p Run pattern number only. Bypass test The Bypass test format is: b The Bypass test places the scan ring hardware into bypass mode. DC Connectivity test DC Connectivity test format is: d [-s -p #] Table 79 shows the options for the this test. Table 79 Dc Connectivity test options Option Description -s Step mode (for debug purposes).
Scan test Running est from command line Table 80 Gate Array test options Option Description -r Test arrays with matching reference designator value. -b Test arrays on given board. may either be a number or a name. -j Test arrays matching a jtag_id. -t Test an array type (For example, ERAC). -s Start with a given pattern number. -e End on a certain pattern number. -m Run a maximum of patterns per file.
Scan test Running est from command line When an error occurs, parallel scans into the scan hardware may result in bus conflicts on TDO pins. Therefore, est automatically stops using parallel scans when errors happen.
Scan test Running est from command line SCI test The sci utility tests the Coherent Toroidal Interface (CTI) cables between nodes. The term SCI (Scalable Coherent Interface) is often used in place of the term CTI; the terms are interchangeable. The usage of sci is as follows: sci [driver] [receiver] ring test where: [driver] Refers to the node and memory board to which the CTI cable is connected and from which the test data originates.
Scan test Running est from command line SCI_all test The sci_all utility tests all SCI cables in a complex. The usage of sci_all is as follows: sci_all [test] where: test Refers to the specific test: dc, dc_clk, ac. With the dc test, the clock from the receiver node is used. The dc_clk test derives its clock from the cable.
Scan test Running est from command line • -c high—Displays the upper clock limit. • -p 1 nom—Sets the supply 1 margin to nominal. There are four power supplies, 1 through 4. Table 81 shows the valid values for clock and power. Table 81 Valid values for clock and power supplies Clock Power up or high up or high nom nom ext low est miscellaneous commands This section gives the following useful commands entered at the est prompt: • ms—Puts all the scan hardware into a safe state.
Scan test Running est from command line Table 82 est runtime option commands Command Description Default argument log_file Turn on/off writing to the log file. On stop_on_error Stops the test when an error is detected. On limit_patterns Runs a limited set of patterns when testing arrays. This runs faster, but reduces coverage. Off limit_errors Limits to 10 the max number of errors that will get printed. The total error count is still printed.
Scan test Running est from command line est command flags and options There are a number of flags or options that operate on and enhance the est commands. Some of these flags and options perform the same functions as the run time option commands. To set these options, enter F at the est prompt. This invokes the flags submenu. To exit, press return at the flags prompt. This returns the main est prompt.
Scan test Running est from command line An example file might contain the following lines: # check the rings r # show pattern pass/fail steps FP #limit dc testing to 3 patterns FD3 #do dc testing d q 212 Chapter 10
11 Utilities This chapter details most of the diagnostic utilities which include: • address_decode • arrm • consolebar • dcm • dfdutil • dump_rdrs • fwcp • fw_init • get_node_info • hard_logger • lcd • load_eprom • pim_dumper • set_complex • soft_decode • sppconsole • tc_init • tc_ioutil • tc_show_struct • Version utilities • Event processing • Miscellaneous tools Chapter 11 213
Utilities address decode address decode address_decode decodes 40-bit virtual address into the physical node, smac, row, bus, and bank. It has the following format: address_decode <40-bit address in hex> In order to determine the current memory configuration, address_decode invokes some sppdsh commands to read certain CSR values so that it can take into account the board mapping, row mapping, interleave values, and DIMM sizes present in the system.
Utilities AutoRaid recovery map (arrm) AutoRaid recovery map (arrm) The arrm utility is used only with an AR-12H (C5447A) disk array that displays the status "No address table" on the front panel rather than the usual status of "Ready." It is only intended for use by trained service personnel in this specific situation. Starting arrm To run arrm, enter the followingf command: tc_ioutil 0 arrm.fw This script downloads and executes arrm.
Utilities AutoRaid recovery map (arrm) 0/1/0.5.0 If the EPIC number is outside of the range 0 to 7, the slot number is outside of the range 0 to 2, or the target number is outside of the range 0 to 15, an error message is displayed and the operator prompted to reenter the address. The program then tries to open the path to the array and perform checks of its internal state.
Utilities AutoRaid recovery map (arrm) Example of unsucessful recovery message Utility Compatibility Check Unsuccessful. The Product firmware may not support RECOVER! Do you want to attempt recover anyway ([y]/n)? In all cases of this type, respond with a y, Y, n, or N followed by ENTER or just ENTER. The default is the choice enclosed in the square brackets (i.e.[y]), and just pressing ENTER is equivalent to entering the letter enclosed in the square brackets followed by ENTER.
Utilities AutoRaid recovery map (arrm) where xx is a number between 0 and 100. This message indicates the percentage of the volume set that has been recovered and is updated approximately once per second. The recovery operation can take several minutes depending on the amount of data in the volume set. To exit the recovery process, press the ENTER key. NOTE Do not exit the recovery process unless the progress indication hangs and does not increment within one or two minutes.
Utilities consolebar consolebar The consolebar utility is an X application that provides a simple interface capable of starting console windows to all V2500 nodes configured on the teststation. It has the following format: consolebar [-display displayname] consolebar retrieves the list of configured nodes and displays the node IDs, grouped by complex. When the push-button for a node is pressed, an xterm is started and the sppconsole program is run against the specified node.
Utilities dcm dcm dcm dumps the boot configuration map information for the specified node. There are two main reporting modes; one for general hardware configuration and one for the DIMM type. The general hardware mode reports processors, ASICs, and memory size information. The DIMM type mode provides pass/fail tests for specific DIMM types, and a general DIMM type report option. dcm uses following format: dcm [-d <80|88|all>] ... -d 80 checks to see if only 80-bit DIMMs are installed.
Utilities dcm Output table using dcm Acquiring Boot Configuration Map... Stingray Configuration Map Dump: Node: 0 (hw2a-0000) ============================================================= VERSION: 1.0 compiled: 1998/12/16 18:35:00 CheckSum:0xf407a073 Boot Config Map Size:164 words POST Revision:1.0 CPUs (Rev, ICache, DCache Size in MegaBytes) ============================================ PB0L_A PASS (2.0, 0.50, 1.00) PB0L_B EMPTY PB0R_A EMPTY PB0R_B EMPTY PB1R_A PASS (2.0, 0.50, 1.
Utilities dcm MB5L_T - EMPTY MB6R_T - EMPTY MB7R_T - EMPTY Memory: ======= Physical: L=128MB, M=64MB, S=16MB Logical: l=128MB, m=64MB, s=16MB (If logical memory not specified, then it matches physical memory size) * = Software Deconfigured EWMB0: ====== EWMB0: EWMB0: EWMB0: EWMB0: EWMB1: ====== EWMB1: EWMB1: EWMB1: EWMB1: - = Not In Use Q0B0 Q0B1 Q0B2 Q0B3 S/S S/S S/S S/S Q1B4 Q1B5 Q1B6 Q1B7 -/-/-/-/- Q2B0 Q2B1 Q2B2 Q2B3 -/-/-/-/- Q3B4 Q3B5 Q3B6 Q3B7 -/-/-/-/- Q0B0 Q0B1 Q0B2 Q0B3 S/S S/S S/S S/S
Utilities dcm Output table using dcm -d all Stingray Configuration Map DIMM Info: Node: 0(hw2b-0000) ============================================================= VERSION: 0.8.0.1 compiled: 1998/10/23 14:34:01 Memory Type: ============ Physical: 88=Multi node 88-bit DIMM, 80=Single node 80-bit DIMM (Only physical DIMM type is reported.
Utilities dfdutil dfdutil dfdutil is a standalone offline utility that downloads firmware to SCSI devices including disks, arrays, and fibrechannel devices such as SCSI MUX and fibrechannel arrays. The firmware image(s) are contained in a Logical Interchange Format (LIF) volume on the teststation at /spp/firmware/DFDUTIL.LIF. The raw (usually binary) firmware image of one or more devices is contained in the LIF filesystem.
Utilities dfdutil Example of dfdutil output when loading Loading file dfdutil.fw ................................... ............................................ .......................................................... .......................................................... ............................. dfdutil.fw copied successfully, booting ****************************************************************************** *** DFDUTIL *** *** DFDUTIL *** *** *** *** (C) Copyright Hewlett-Packard Co.
Utilities dfdutil Example of dfdutil output (continued//0 Indx ---0 1 1.0 2 2.0 2.1 2.2 2.3 2.4 2.5 2.6 | 3 Path ------------------5/0.8.0.255.7.12.0 5/0.8.0.124.0.14.0 ^array^ 5/0.8.0.124.1.5.0 ^array^ ^array^ ^array^ ^array^ ^array^ ^array^ ^array^ 4/2:0.3.
Utilities dfdutil • b—slot number • c—path level (always 0) • d—always 8 for FC storage • e—upper 4 bits of loop address • f—lower 4 bits of loop address • g—LUN number If the device is attached to an FC MUX, the path is formatted as a/b.c.d.e.f.g.h.
Utilities dfdutil dfdutil LIF file table The descriptions of the fields in the LIF file table are as follows: • Filename—Specifies the name of the file in the LIF volume. The operator specifies this name when issuing download commands to the devices. • Intended Product ID—Specifies the vendor name and Device product name. These fields are setup when the raw firmware is packaged for distribution. It may or may not match exactly the product and vendor Ids reported by the device INQUIRY data.
Utilities dfdutil DOWNLOAD command Use the DOWNLOAD command to download firmware to a particular device. DOWNLOAD transfers the contents of a particular firmware file to a device. It prompts the user for any arguments that were not specified on the command line. NOTE Once the download begins, do not interrupt the process, or the devices to which the firmware is being loaded could be rendered useless.
Utilities dfdutil DISPMAP The user may enter the index number of a single device; using no index number causes DISPMAP to list all devices. This command will display the bootable device table displayed when dfdutil is started. If the optional argument [index] is specified, then only the information for the given index number will be displayed, not the entire table. This display may not reflect any downloads that may have been done since the program was started.
Utilities dfdutil DISPFILES command The DISPFILES command displays a list of all available firmware files found on a LIF device. The command displays: • File name • Intended product identification • New revision number • Size of firmware (not file size) The syntax for this command is: DISPFILES The user may enter the index number of a single device; using no index number causes DISPFILES to list all devices. LS command The LS command displays information about the LIF volume.
Utilities dfdutil Entering HELP without a command name displays a list of all available dfdutil commands. Entering the specific command name after HELP outputs specific information about the command. Notes and cautions about dfdutil This section presents some limitations and cautions concerning dfdutil. Backup before downloads Some firmware downloads may affect formatting resulting in the loss of some or all the data on the disk. CAUTION Back up all disks before loading firmware onto them.
Utilities dfdutil Shared SCSI Buses If dfdutil is running on a system which shares any of its SCSI busses with another system or systems, the other system or systems must be halted while this program is running. This program can not determine that a bus is shared, so the operator must determine if any bus is shared and halt the other computer(s).
Utilities dump_rdrs dump_rdrs The dump_rdrs utility automatically resets the specified node and directs it to boot the RDR dumper firmware module. Once it detects that the RDR dumper firmware has completed, it scans out the results and places a formatted RDR dump of each processor in /spp/data// nodeX.cpuY.rdrs. X is the node number specified and Y is a processor number from 0 - 31.
Utilities fwcp fwcp fwcp is an OBP command that upgrades system firmware. A single firmware package may be loaded by the following command: % fwcp To load all system firmware packages, use the following master download script: source /core@f0,f0000000/ lan@0,d30000;15.99.111.99:/spp/scripts/dl-diags The master download script output is shown below: v-c-t:/spp/firmware$ cat /spp/scripts/dl-diags fwcp 15.99.111.99:/spp/firmware/pdcfl.fw PDCFL fwcp 15.99.111.99:/spp/firmware/post.fw POST fwcp 15.
Utilities fw_init fw_init fw_init provides an automatic means for downloading firmware to each node and initializing certain data structures in NVRAM. Using this script prevent problems that could occur when executing this procedure manually. The format if fw_init is as follows: fw_init [-c complex name] -c complex name specifies the complex to update. For example fw_init updates all nodes in the current complex. fw_init -c hw2a updates all nodes in the complex hw2a.
Utilities fw_init fw_init message example 3 Loading Diagnostic LIF header on "hw2a-0000". fw_init message example 4 Loading JTAG firmware on "hw2a-0000". fw_init message example 5 The "hw2a" complex will now be reset to OBP. Please wait fw_init message example 6 Saving NVRAM contents and beginning firmware download via OBP. fw_init message example 7 Now clearing NVRAM and resetting the system again. Please wait. fw_init message example 8 Now restoring NVRAM. Please wait.
Utilities get_node_info get_node_info The get_node_info utility provides as a mechanism for scripts or programs to access the teststation configuration information generated by the ts_config configuration tool. It has the following format: get_node_info [node_info] [OPTIONS] When a V2500 node is configured by ts_config, an entry is added to a node configuration file.
Utilities get_node_info [OPTIONS] include the following: • -a—Display all fields (default) • -A—Display all configured nodes The selected fields will be printed in the order below) • -c—Display the Complex name • -n—Display the Node id • -m—Display the Diagnostic IP hostname • -o—Display the OBP IP hostname • -t—Display the Test Station Diagnostic hostname • -s—Display the console name The following are examples of the get_node_info utility: Example showing the return all information about Node Id 0: jok
Utilities hard_logger hard_logger hard_logger is a script that invokes the interrogators and extractors to log all error information on a node The usage of the script is: hard_logger [node number] [node number] is a hex number. hard_logger resides in /spp/scripts/hard_logger and is automatically invoked be ccmd when a hard error occurs. The hard_logger script performs the following tasks: • Parses the command line arguments to determine on which node it should run.
Utilities hard_logger To interrogate the controllers, hard_logger calls the ASIC specific interrogator located in /spp/scripts/. For example, the SMAC interrogator is located in /spp/scripts/smac The interrogator returns a list of extractors to run on that ASIC in /spp/data//hl/inter_n$node. • Runs each extractor returned by the interrogator. • Sends the COP, PCE, interrogator, and extractor output to event_logger.
Utilities lcd lcd lcd prints the current contents of the liquid crystal display for node 0 of the current complex. It has the following format: lcd The complex can be changed by using the set_complex utility. The output is sent to stdout output.
Utilities load_eprom load_eprom The load_eprom utility resides on the teststation. It downloads the core firmware products into the EEPROM on the Utilities board through the scan interface. It can also update the JTAG scan interface controller firmware. If, during a download, it detects any errors, it automatically retries the download. The load_eprom utility uses subroutines that perform the following functions: • It reads a raw binary file on the teststation.
Utilities load_eprom Table 83 load_eprom options Option Description -Q Quiet (no) output mode. -R Read and verify data only-No writing. -P number SPAC to use for scan operations where number is 0-7, 8 is UBUS. -V Verify data after a write. -j Load binary into JTAG flash. -c Load binary into JTAG_CORE flash. -e Load binary into PDC Entry section. -p Load binary into PDC POST section. -o Load binary into PDC OBP section.
Utilities load_eprom Example output of load_eprom -n hw2a-0000 -p entry.pdc command Reading file “entry.pdc”: 4253 (0x109d) bytes read. Using default SPAC (P0L). Erasing sector 0 (0xf0000000) OK Writing sector 0 (0xf0000000) .. OK Example output of load_eprom -n hw2a-0000 -p post.fw command Reading file “post.fw”: 92820 (0x16a94) bytes read. Using default SPAC (P0L). Erasing sector 4 (0xf0020000) OK Writing sector 4 (0xf0020000) ....................... OK Example output of load_eprom -n hw2a-0000 -o obp.
Utilities pim_dumper pim_dumper pim_dumper is a utility used to display Process Internal Memory (PIM) information after a TOC, LPMC, or HPMC. The PIM dump information includes the processor registers and various ASIC registers. It has the following fomat: pim_dumper [-c CPU#] [-n NODE_PARM] [-t][-l][-h] [-e][-help] Example of pim_dumper use: pim_dumper -h -c 2 This example displays HPMC information for Processor 2 on Node 0.
Utilities pim_dumper The TOC/LPMC/HPMC options are mutually exclusive. Specify only one of these options; do not specify any, and the default mode dumps all TOC/LPMC/HPMC data. If pim_dumper is able to accomplish the desired action, it returns zero . If for any reason the requested operation cannot be completed, a nonzero exit code is used.
Utilities set_complex set_complex The set_complex sets the default V2500 Complex Name in the current shell environment. set_complex [COMPLEX_NAME] Once set, teststation diagnostic or console utilities that are run from within the shell operate on the specified complex. If multiple complexes are configured on a single teststation, individual shells can each be set to a specific default complex using set_complex.
Utilities set_complex set_complex can be invoked anytime the user wants to change the shell default complex. If the user enters an invalid COMPLEX_NAME, the default complex becomes unset and the prompt string indicates this condition. If the user does not enter a COMPLEX_NAME, the complex name remains set (assuming it is still a valid complex). set_complex does not work from within a shell script.
Utilities soft_decode soft_decode soft_decode decodes single-bit ECC error data. This perl script decodes single-bit ECC error information. It prompts for syndrome, row, and address information that is parsed, decoded, and displayed in an easy-to-read format that can be cut-and-pasted into quasar. To exit enter q.
Utilities sppconsole sppconsole sppconsole connects the user to the console for a specified node. sppconsole has the following format: % sppconsole node [opt1, ..., optN There are several ways to initiate the sppconsole interface. • Run the sppconsole command in a shell on the teststation. • Select from the teststation root menu the desired V2500 complex, then select “Console” and the desired node. • Use the consolebar utility to select the desired node.
Utilities sppconsole Example of sppconsole boot output joker-t(hw2b)% sppconsole [enter `^Ec?' for help] [no, sppuser@joker-t is attached] [replay] POST Hard Boot on [0:PB0L_A] HP9000/V2500 POST Revision 1.0.0.1, compiled 1998/12/03 09:50:10 (#0039) Probing CPUs: PB0L_A PB1R_A PB2L_A PB3R_A PB4L_A PB5R_A PB6L_A PB7R_A Completing core logic SRAM initialization. Starting main memory initialization.
Utilities sppconsole Example of OBP output while booting OBP Power-On Boot on [0:0] ------------------------------------------------------------------------------PDC Firmware Version Information PDC_ENTRY version 4.1.0.9 POST Revision: 1.0.0.1 OBP Fieldtest Release 4.1.0.9, compiled 98/10/30 14:11:20 (3) SPP_PDC Fieldtest 1.4.0.
Utilities sppconsole The following message appears in the console window: [0:1] ok [read-only -- use `^Ecf’ to attach, `^Ec?’ for help] Attach to the node by entering Ctrl ecf. Press the Ctrl key e simultaneously; do not press the Ctrl key with the c and f. All information and error messages are logged into the /usr/adm/syslog system error log file.
Utilities tc_init tc_init tc_init determines the node ID, ethernet address, and IP address for all nodes in the complex. This information is then stored in the NVRAM of all nodes as one 12-byte entry per node. Each 12-byte entry has the format shown in Figure 54: Figure 54 tc_init NVRAM entry 7-bit node ID Upper 16-bits ethernet address Lower 32-bits ethernet address 32-bit IP address In addition, tc_init updates the ARP entries on the teststation by executing as root.
Utilities tc_init Execute tc_init after the node has been configured by jfnode_ip_set and xconfig. ccmd must finish the scan database generation. Once ccmd executes, the changes become effective the next time test_controller is running. If ccmd is running when tc_init is executed then test_controller must be restarted. tc_init only needs to be executed once. The following are the only reasons for having to rerun this utility: • NVRAM is corrupted.
Utilities tc_ioutil tc_ioutil tc_ioutil resets the node and requests that the Test Controller load, (via tftp) and boot the specified file.
Utilities tc_show_struct tc_show_struct The tc_show_struct tool examines certain structures that the test controller uses to set up and run tests.
Utilities tc_show_struct The tc_cpu_info_struct structure displays the status or state of each processor and the current subtest. The tc_show_struct tool takes two arguments: the first is the test of interest, the second is the node of interest.
Utilities tc_show_struct 104) 0x00000000 105) 0x00000000 106) 0x00000000 107) 0x00000000 108) 0x00000000 109) 0x00000000 110) 0x00000000 111) 0x00000000 112) 0x00000000 113) 0x00000000 114) 0x00000000 115) 0x00000000 116) 0x00000000 117) 0x00000000 118) 0x00000000 119) 0x00000000 120) 0x00000000 121) 0x00000000 122) 0x00000000 123) 0x00000000 124) 0x00000000 125) 0x00000000 126) 0x00000000 127) 0x00000000 ----------------------------------------------------------------------CPU Mask = 0x0000 SPAC Mask = 0x0
Utilities Version utilities Version utilities This section describes the three version utilities. diag_version The diag_version utility displays the product name and the version of the current teststation software. For example: $ diag_version HP9000/V2500 Diagnostics, Version 1.0.0.0 flash_info flash_info reads the known entry points for the various products that are stored in flash EEPROM.
Utilities Version utilities ver ver is a teststation version retriever utility. It is used to read and display the version information built into each diagnostic product. Its usage is: ver ver searches the specified file for a version string previously compiled or inserted into the file and extracts and displays a version and date stamp. This works for most teststation utilities and diagnostics firmware.
Utilities Event processing Event processing This section discusses three event processing utilities: • event_logger • log_event event_logger The event_logger utility is the teststation Event Logger and has a format as follows: event_logger [-d] event_logger receives messages from diagnostic utilities through rpc calls and writes them to the event log for later review or processing. The -d option keeps event_logger from running as a daemon which is useful for debugging.
Utilities Event processing event_logger should never terminate, but must be killed. If a second copy of event_logger is started it attempts to kill the existing copy of the event_logger. There should only be one copy of event_logger running at any one time. The following return code indicates a fatal error occurred. -1 unknown option log_event log_event logs its STDIN to the event log as a single event.
Utilities Event processing The -c option displays event information output on the console as well. If the event severity is high enough, this happens automatically. event_logger displays any events that have a severity greater than the warning level. The following two examples show how log_event can be used: cat data_file | log_event 0x86340001 -n 0 This example puts an event in the event log with the event code of 0x86340001. The data will be the information contained in the file data_file.
Utilities Miscellaneous tools Miscellaneous tools The following miscellaneous tools are described in this section: • kill_by_name • fix_boot_sector kill_by_name The kill_by_name script kills processes by name rather than by process identification. The following is the usage of this script: kill_by_name Table 85 describes the options in kill_by_name. Table 85 kill_by_name options Option Description file name Process name to kill.
12 Scan tools This chapter details most of the scan tools which include: • sppdsh • do_reset • jf-node_info • jf-ccmd_info • jf-reserve_info Chapter 12 267
Scan tools sppdsh sppdsh sppdsh is an enhanced version of the Korn Shell (ksh) with all of the functionality of ksh, as well as new commands that are suited to a diagnostic environment. sppdsh resides on the teststation in /spp/bin/ sppdsh. The diagnostic shell runs on a teststation that is totally independent of the system itself. The shell requires information about the complexes and nodes attached to the teststation.
Scan tools sppdsh Definitions The following definitions will help user with the operation of sppdsh: • node id—An identification (ID) that can be either the node IP name or a node number. To distinguish between one node number and another, the environmental variable, COMPLEX_NAME, indicates the complex. No complex can have non-unique node numbers. • complex name—Identifies a grouping of nodes. The ts_config utility groups nodes into complexes where each node shares the same OS and memory space.
Scan tools sppdsh Table 86 sppdsh parameters Parameter Value Unknown 0xff Reserved 0x00 Pass 0x01 Fail 0x10 Deconfigured by POST 0x20 Empty 0x30 Deconfigured by software 0x40 a 16MB deconfigured 0x04 16MB 88-bit deconfigured to 80 0x24 16MB 88-bit deconfigured 0x34 16MB SW deconfigured 0x44 16MB 88-bit SW deconfigured to 80 0x64 16MB 88-bit SW deconfigured 0x74 64MB deconfigured 0x08 64MB 88-bit deconfigured to 80 0x28 64MB 88-bit deconfigured 0x38 64MB SW deconfigured 0x
Scan tools sppdsh Parameter Value 128MB 88-bit SW deconfigured to 80 0x6c 128MB 88-bit SW deconfigured 0x7c 64MB deconfigured to 16MB 0x89 64MB deconfigured to 16MB (88-bit to 80) 0xa9 64MB deconfigured to 16MB (88-bit) 0xb9 SW deconfigured 64MB to 16MB 0xc9 SW deconfigured 64MB to 16MB (88-bit to 80) 0xe9 SW deconfigured 64MB to 16MB (88-bit) 0xf9 128MB deconfigured to 16MB 0x8d 128MB deconfigured to 16MB (88-bit to 80) 0xad 128MB deconfigured to 16MB (88-bit) 0xbd SW deconfigured
Scan tools sppdsh • backplane_serial_number—Identifies a specific board on the diagnostic network. This number may be read with the COP command. It is used to assign new node numbers or complex serial numbers. • complex_serial_number—Identifies all the nodes in a complex. Software licensing is often based on the complex serial number. • key—A 32-bit hexadecimal number used as an encryption code for complex serial numbers. • cop_id—A name associated with a board in a node. Table 87 lists valid cop IDs.
Scan tools sppdsh ID Description pb7l A processor board on the left side of the cabinet pb7r A processor board on the right side of the cabinet mb0l A memory board on the left side of the cabinet mb1l A memory board on the left side of the cabinet mb2r A memory board on the right side of the cabinet mb3r A memory board on the right side of the cabinet mb4l A memory board on the left side of the cabinet mb5l A memory board on the left side of the cabinet mb6r A memory board on the right s
Scan tools sppdsh • memory size—An argument used to deconfigure larger amounts of memory across all memory boards on a node. • net cache size—Refers to the memory shared between nodes in each node’s network cache. The network cache should be the same across all nodes in a complex. Miscellaneous commands sppdsh miscellaneous commands are described below: • assert —Assert reset on node_id; a deassert must follow. • assert_soft —Asserts a soft reset on node id.
Scan tools sppdsh • power supply[1..4] [low|nom|up]—Changes the power margin on the supply indicated across all nodes in contact with the test station. • power supply[1..4] [low|nom|up—Changes the power margin on the supply indicated across all nodes in contact with the test station. • pswitch —Identifies whether N or N+1 fans have been enabled for the system. This switch is located on the SCUB board of a node.
Scan tools sppdsh NOTE For clarity, a 0x0 style notation is returned by the shell rather than the 16#0 notation of ksh. The 16#0 notation is acceptable for data that can be expressed in 32 bits or less. • list | node id>::::—Lists the possible paths, parts or fields that match the argument. Common wild card symbols are supported by this command to help identify fields names.
Scan tools sppdsh • bput [-q] : —Inserts data into the locked scan ring image. When the -q option is used, the results are displayed without the scan field name. • bunlock n::—Writes the scan ring image and unlocks it. • packet [-q] [NR | R=number] [P=number] [6=number] node8_0 —Requests input to a xbar device from SPAC 0 on node 8. The request waits for a response and returns it. The -q option suppresses some output.
Scan tools sppdsh • ecc_cpy
[size]—Copies the data into the ECC associated with the cache line of address and repeats for size cache lines. Data conversion commands Data conversion commands manipulate, evaluate or interpret data within the diagnostic shell. They support a variety of logical, arithmetic and string based operations.Scan tools sppdsh l_sub —Left subtract two data arguments. For example: abc=`l_sub 0x55 0x1` l_mod —Left modulo two data arguments. For example: abc=`l_mod 0x55 0x1` l_mult —Left multiply two data arguments. For example: abc=`l_mult 0x55 0x1` b2h —Convert a binary number to hex (abc = 0xb). This command is limited to 32-bit data types. For example: abc=`b2h 1011` h2b —Convert a hex number to binary (abc = 1011).
Scan tools sppdsh node — set default node to be node _number in the current complex. fi_node—Find all available nodes in the current complex. fi_cpu [-v] [-q] —Find all available processors of node_number in the current complex. fi_emb [-v] [-q] —Find all available EMBs of node_number in the complex. fi_sci [-v] [-q] —Find all available SCIs of node_number in the current complex.
Scan tools sppdsh I/O buffering commands This section presents a list of the sppdsh I/O buffering commands. For these commands, four default buffers are created: buf1 - buf4. buf_cmp buf1 buf2—Compares two buffers. Null is returned if they are the same. If they are different, the index and data of the first conflict is returned. buf_cpy buf1 buf2—Copy buf1 to buf2 buf_clear buf—Clear buf1 seed [get|set 0xseed_value]—Set or get a seed value.
Scan tools sppdsh mem_cmp addr1 addr2 size—Compares the memory at addr1 to (addr1+size) to that at addr2. mem_cmp addr1 buf1 size—Compares the memory at addr1 to (addr1+size) to that at buf1. mem_dump addr [size]—Dumps the memory starting at addr. mem_cpy addr1 buf1 [size]—Copies the memory from addr1 to buf1 up to size or four Kbytes. mem_cpy buf1 addr1 [size]—Copies the memory from buf1 to addr1 up to size or four Kbytes. tag_dump
[size]—Dump the tags associated with the cache line of .Scan tools sppdsh Ring Parts Alternate names 6 pb6l, p6l, pb6r [pcxu], spac6, [pcxu] 7 pb7r, p7l, pb7l [pcxu], spac7, [pcxu] 8 mb0l_m, mb0l_t smac0, [stac0] 9 mb1l_m, mb1l_t smac1, [stac1] 10 mb2r_m, mb2r_t smac2, [stac2] 11 mb3r_m, mb3r_t smac3, [stac3] 12 mb4l_m, mb4l_t smac4, [stac4] 13 mb5l_m, mb5l_t smac5, [stac5] 14 mb6r_m, mb6r_t smac6, [stac6] 15 mb7r_m, mb7r_t smac7, [stac7] 16 iolf_b, iolf_a saga0, saga4 17 iolr_b, iolr_a saga1, saga5 18 iorr_b, iorr_a sag
Scan tools do_reset do_reset do_reset performs one of four levels of reset on a node or complex. The first argument is either a node ID, complex, or the keyword, all, which resets all nodes. If no nodes are specified, the default is to reset all nodes in contact with the teststation. If a node number is specified, the level argument must be specified as well. The second argument specified is the level of reset. All levels of reset are expressed as numbers.
Scan tools jf-node_info jf-node_info jf-node_info displays the IP address, UDP port and JTAG firmware version string for each node in a complex. The -e option adds the ethernet address to the display. The -c option adds the core version to the display.
Scan tools jf-ccmd_info jf-ccmd_info jf-ccmd_info displays information about active V2500 nodes connected to the diagnostic LAN. It has the following format: jf-ccmd_info The display includes the Ethernet address, IP address, Complex Serial number, Node number, environmental LED status, and the Diagnostic node name of each active V2500 node. jf-ccmd_info sends a broadcast packet to all nodes on the diagnostic LAN requesting this information.
Scan tools jf-reserve_info jf-reserve_info Before using the JTAG scan interface on the Utilities board, teststation utilities must reserve the JTAG hardware on a time-sharing basis. It has the following format: jf-reserve_info jf-reserve_info sends a broadcast packet to all nodes on the diagnostic LAN requesting the latest JTAG reservation information.
Scan tools jf-reserve_info 288 Chapter 12
A List of diagnostics This appendix provides a list of all utilities and diagnostics in this book and where they are located.
List of diagnostics Name Locations hard_logger Page 240 io3000 Chapter 8, page 133 io_tr Page 164 jf-ccmd_info Page 286 jf-node_info Page 285 jf-reserve_info Page 287 kill_by_name Page 266 lcd Page 242 load_eprom Page 243 log_event Page 264 mem3000 Chapter 9, page 165 pdcfl Chapter 6, page 119 pim_dumper Page 246 POST Chapter 3, page 53 rdr_dumper.
List of diagnostics Name Locations ts_config Page 23 ver Page 262 xconfig Page 42 xsecure Page 51 Appendix A 291
List of diagnostics 292 Appendix A
Index A AC Connectivity test, 203 AC test of a node, 11 address IP, 40 address decode, 213, 214, 216, 217, 218 arrm, 213, 215 Attention lightbar, 4, 7 B Boot Configuration map, 110 bootable device table, 226 buses memory, discussed, 170 Bypass test, 204 C ccmd, 21, 22, 40, 200 how to run, 40 IP address request, 40 request for JTAG ports, 40 requests for JTAG ports, 40 CERS, 40 clock margining, 10 console ethernet, 7 consolebar, 213, 219, 251 controllers SMUC, 2, 4, 7, 9, 16, 18, 19, 20 SPAC, 4 SPUC, 4, 6,
io3000 SAGA ErrorCause CSR error, 157 io3000 SAGA general errors, 155 io3000 SAGA SRAM errors, 158 io3000 SCSI inquiry error, 161 mem3000 error codes, 176 mem3000 extended error codes, 178 midplane power failure, 19 power failure, 19 power-on detected, 16 est, 183, 184, 200 command line AC Connectivity test, 203 Bypass test, 204 DC Connectivity test, 204 DC Connectivity test options, 204 Gate Array test, 204 Gate Array test options, 205 JTAG Identification test, 208 margin commands, 208 miscellaneous comman
processor init steps, table, 13 processor run-time status,table, 14 Processor status line, 13 LEDs attention light bar, 12 LIF file table, 228 Liquid crystal display (LCD), 4, 6, 7, 12, 13, 213, 242 List of diagnostics and utilities, 289 load_eprom, 213, 243, 244, 245 log_event, 263, 264, 265 M margin commands, 208 mem3000, 165 classes, 166 command line, 93, 115 configuration, 93 cxtest, 93, 115 error codes, 176 extended error codes, 178 selecting classes and subtests, 96 starting, 99 subtests, 167 Subtests
sppdsh, 7, 266, 268 configuration commands, 280 data conversion commands, 278 data transfer commands, 275 I/O buffering commands, 281 map of alternate names, 282 memory transfer commands, 281 miscellaneous commands, 274 system information commands, 279 Stingray Monitor Utilties controller (SMUC), 4, 7, 9, 16, 18, 19, 20 Stingray Processor Agent controller (SPAC), 4 Stingray Processor Utilities controller (SPUC), 4 Stingray Processor Utilties controller (SPUC), 4, 6, 7, 9, 18, 19 Stop-on-hard button, 49 Stri