Operator’s Guide HP 9000 V2500/V2600 SCA Server First Edition A5845-96001 Customer Order Number: A5845-90001 July 1999 Printed in: USA
Revision History Edition: First Document Number: A5845-96001 Notice Copyright Hewlett-Packard Company 1999. All Rights Reserved. Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowed under the copyright laws. The information contained in this document is subject to change without notice.
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Notational conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv Safety and regulatory information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Safety in material handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi USA radio frequency interference FCC Notice . . . . . . . . .
3 DVD-ROM drive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Disk loading slot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Busy indicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eject button . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optional DAT drive. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LEDs . . . . .
/spp/bin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55 /spp/scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55 /spp/data/complex_name. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55 /spp/firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56 /spp/est . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Node configuration map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Node control panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Configuration utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . autoreset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . est_config . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Logical Volume Manager (LVM) related problem. . . . . . . . . . . . . . . .145 Recovery from other situations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .145 Rebooting the system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .146 Monitoring the system after a system panic. . . . . . . . . . . . . . . . . . . .146 Abnormal system shutdowns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .147 Fast dump . . . . . . . . . . . . . . . . . . . . . . . .
viii Table of Contents
Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Figure 16 Figure 17 Figure 18 Figure 19 Figure 20 Figure 21 Figure 22 Figure 23 Figure 24 Figure 25 Figure 26 Figure 27 Figure 28 Figure 29 Figure 30 Figure 31 Figure 32 Figure 33 Figure 34 Figure 35 Figure 36 Figure 37 Figure 38 Figure 39 Figure 40 Japanese radio frequency notice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 41 Figure 42 Figure 43 Figure 44 Figure 45 Figure 46 Figure 47 Figure 48 Figure 49 Figure 50 Figure 51 Figure 52 Figure 53 Figure 54 Figure 55 Figure 56 Figure 57 Figure 58 Figure 59 Figure 60 Figure 61 ts_config “Add/Configure Terminal Mux” selection. . . . . . . . . . . . . . . . . . . 84 Terminal mux IP address panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 “Start Console Session” selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Tables Table 1 Table 2 Table 3 Table 4 Table 5 Table 6 Table 7 Table 8 Table 9 Table 10 Table 11 Table 12 Table 13 Table 14 Table 15 Table 16 Table 17 Table 18 Table 19 Table 20 Table 21 Table 22 Table 23 Valid CTI cache sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 Indicator LED operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26 Processor initialization steps . . . . . . . . . . . . . . . . . . . . . . . . . . .
xii List of Tables
Preface The Operator’s Guide HP 9000 V2500/V2600 Server documents the information necessary to operate and monitor HP V-Class servers. This book is intended to be a reference for system administrators, system operators, and system managers.
Preface Notational conventions This section describes notational conventions used in this book. bold monospace In command examples, bold monospace identifies input that must be typed exactly as shown. monospace In paragraph text, monospace identifies command names, system calls, and data structures and types. In command examples, monospace identifies command output, including error messages. italic In paragraph text, italic identifies titles of documents.
Preface Horizontal ellipses (...) In command examples, horizontal ellipses show repetition of the preceding items. Vertical ellipses Vertical ellipses show that lines of code have been left out of an example. Keycap Keycap indicates the keyboard keys you must press to execute the command example. NOTE A note highlights important supplemental information. CAUTION Cautions highlight procedures or information necessary to avoid injury to personnel.
Preface Safety and regulatory information For your protection, this product has been tested to various national and international regulations and standards. The scope of this regulatory testing includes electrical/mechanical safety, radio frequency interference, ergonomics, acoustics, and hazardous materials. Where required, approvals obtained from third-party test agencies are shown on the product label. Safety in material handling CAUTION Do not lift the node manually.
Preface Japanese radio frequency interference VCCI Figure 1 Japanese radio frequency notice This equipment is a Class A category (Information Technology Equipment to be used in commercial and /or industrial areas) and conforms to the standards set by the Voluntary Control Council for Interference by Information Technology Equipment aimed at preventing radio interference in commercial and/or industrial areas.
Preface Figure 2 BCIQ (Taiwan) 3862H354 Acoustics (Germany) Laermangabe (Schalldruckpregel LpA) gemessen am fiktiver Arbeitsplatz bei normalem Betrieb nach DIN 45635, Teil 19: LpA =65.3 dB. Acoustic Noise (A-weighted Sound Pressure Level LpA) measured at the bystander position, normal operation, to ISO 7779: LpA = 65.3 dB. IT power system This product has not been evaluated for connection to an IT power system (an AC distribution system having no direct connection to earth according to IEC 950).
Preface Installation conditions (U.S.) See installation instructions before connecting to the supply. Voir la notice d’installation avant de raccorder au réseau.
Preface Associated documents Associated documents include: • HP Diagnostic Guide: V2500/V2600 Servers, (A5824-96002) • HP-UX SCA Programming and Process Management White Paper – Available in /usr/share/doc for HP-UX 11.10 • HP-UX 11.0 Configurable Kernel Parameters – Available online at: http://docs.hp.com/hpux/os • HP-UX 11.10 Installation and Configuration Notes HP V2500 Servers, (A5532-90005) • HP V-Class Server HP-UX Configuration Notes (for 11.
Preface Technical assistance If you have questions that are not answered in this book, contact the Hewlett-Packard Response Center at the following locations: • Within the continental U.S., call 1 (800) 633-3600. • All others, contact your local Hewlett-Packard Response Center or sales office for assistance.
Preface Reader feedback This document was produced by the System Supportability Lab Field Engineering Support organization (SSL/FES). If you have editorial suggestions or recommended improvements for this document, please write to us. Please report any technical inaccuracies immediately. You can reach us through email at: fes_feedback@rsn.hp.
1 Overview This chapter introduces Hewlett-Packard V-Class system components and includes a brief overview of V2500/V2600 server hardware resources. Some basic details about HP-UX use also are provided. For details on the external cabinet controls and displays, see Chapter 2. The V2500/V2600 model of V-Class server can have up to 128 processors, 128 Gbytes of memory, and 112 PCI I/O cards.
Overview V-Class System Components V-Class System Components Each V-Class system includes two main components: a V-Class server and a Service Support Processor (SSP workstation) dedicated to supporting the server, as shown below in Figure 3. Figure 3 V-Class Server Components: Cabinet and Service Support Processor DC OFF CONSOLE ENABLE CONSLOL SECURE E DC ON TOC V25U075 10/13/98 The V-Class cabinet contains all V-Class server resources, such as processors, memory, disks, power, and so forth.
Overview V-Class System Components Figure 4 shows a four-cabinet V2500/V2600 server and the Service Support Processor that is used for console, diagnostic, and other support work. The V2500/V2600 cabinets are tightly interconnected by Coherent Toroidal Interconnect (CTI) cables, as described in “Multiple-Cabinet Server Connections” on page 15. Connections among the Service Support Processor and V2500/V2600 cabinets are covered in “Server Console and Diagnostic Connections” on page 4.
Overview V-Class System Components monitoring the server hardware, as well as diagnostics operations. You also must use the Service Support Processor when installing or upgrading V-Class firmware. The Service Support Processor runs HP-UX V10.20. In addition to HPUX software, the Service Support Processor includes files and utility software for managing and monitoring the V2500/V2600 server.
Overview V-Class System Components Figure 5 Console and Diagnostic Connections for a Four-Cabinet V2500/V2600 Server 2 6 Util. Util. 4 0 Util. Util. (diagnostic LAN) 2 1 0 Term. Server SSP Workstation (console) The console port on cabinet ID 0’s utilities board connects to the Service Support Processor, and console ports on cabinet IDs 2, 4, and 6 connect to the terminal server (port numbers 2, 3, and 4, respectively).
Overview V-Class Server Architecture V-Class Server Architecture The V2500/V2600 server has a powerful set of interconnecting hardware components that allow the server’s processors, memory, and I/O components to operate with minimal interruptions or contentions for resources. The processor agents serve as a bus connection for a subset of the system’s processors. Memory controllers provide cache-coherent access to a large, shared memory. PCI controllers are the connections for PCI I/O cards.
Overview V-Class Server Architecture Figure 6 Functional Diagram of a Single-Cabinet V2500/V2600 Server CPU CPU PCI Controller Processor Agent Memory Controller CPU CPU Memory I/O I/O I/O I/O CTI CPU CPU I/O I/O I/O I/O PCI Controller Processor Agent Memory Controller CPU CPU Memory CTI CPU CPU I/O I/O I/O I/O PCI Controller Processor Agent Memory Controller CPU CPU Memory CTI I/O I/O I/O I/O Processor Agent CPU CPU CPU CPU PCI Controller Processor Agent I/O I/O I/O CPU CPU H
Overview V-Class Server Architecture Figure 7 V2500/V2600 HyperPlane Crossbar Connections Each ERAC has 16 ports, 4 send and 4 receive on each side, which may operate simultaneously.
Overview V-Class Server Architecture Core Utilities Board The utilities board provides boot, diagnostics, and console connections from the V-Class cabinet to the Service Support Processor, as well as system clock, system LCD, and other functionality. It also stores the boot firmware and boot-time variable settings in non-volatile memory. For details on firmware use and configuration, refer to Chapter 4.
Overview V-Class Server Architecture Three DIMM sizes are supported for use in V2500/V2600 servers: 32 MByte, 128 MByte, and 256 MByte. Only specified mixed DIMM size configurations are supported. If planning for a multiple-cabinet server configuration, you must use 88bit DIMMs and configure your V2500/V2600 server to be one-fourth, onehalf, or fully populated with DIMMs. Single-cabinet servers can instead use 80-bit DIMMs and may also be filled to three-fourths memory capacity (3 DIMMs in every quadrant).
Overview V-Class Server Architecture Memory Interleaving Through the memory access controllers, each memory board provides separate read and write access to the memory DIMMs. Up to 16 DIMMs may be installed per board, providing up to 256-way memory interleaving per cabinet when all memory boards are fully populated. Slots for DIMMs on each memory board are conceptually grouped in four quadrants. Each quadrant, as Figure 8 on page 10 shows, has a separate connection to the memory controller.
Overview V-Class Server Architecture With small CTI cache sizes, additional aliasing between memory locations may occur, reducing the cache hit rate and increasing the latency for remote accesses. The bold entries in Table 1 show the minimal CTI cache sizes needed to avoid excessive aliasing.
Overview V-Class Server Architecture Each V2500/V2600 I/O port is capable of direct memory access (DMA), which eliminates processor involvement during data transfers and streamlines data transfer for large disk blocks and high-speed network connections. The PCI bus controllers are numbered based on the V2500/V2600 cabinet in which they reside. The first component of the hardware path (such as reported by the HP-UX ioscan utility) indicates which cabinet a hardware component resides upon.
Overview V-Class Server Architecture For multiple-cabinet servers, the PCI bus numbering is as shown in Figure 10. The PCI bus number also serves as the first field of the associated devices’ hardware path, so I/O devices on cabinet ID 0 are numbered with the first field of the hardware path of 0 to 7. For cabinet ID 2, the PCI bus numbers are from 64 to 71. PCI busses on cabinet ID 4 are from 128 to 135, and cabinet ID 6 devices are numbered from 192 to 199.
Overview V-Class Server Architecture For an example of listing I/O devices on various cabinets and details of listing other V-Class server hardware configuration details, see “Listing the Server Hardware Configuration” on page 118. Multiple-Cabinet Server Connections All cabinets in a multiple-cabinet V2500/V2600 server are tightly connected using HP’s Coherent Toroidal Interconnect (CTI) technology. CTI is an extension of the Scalable Coherent Interface standard defined by the IEEE.
Overview V-Class Server Architecture Each CTI controller connects to a corresponding CTI controller on a remote cabinet by cables that provide both send (local-to-remote) and receive (remote-to-local) connections among the cabinets. Figure 11 Four-Cabinet V2500/V2600 Server CTI Cable Connections 2 6 0 4 Two dimensions of CTI connections are possible. Y-dimension cables connect between cabinets 0 and 2, and between cabinets 6 and 4. Xdimension cables connect cabinets 0 and 4, and cabinets 6 and 2.
Overview V-Class Server Architecture For X-dimension connections, CTI cables connect to the opposite controller on the remote cabinet. This means—for X-dimension CTI connections—memory boards connect in the following pairs: 0 and 2, 1 and 3, 4 and 6, and 5 and 7. For details on CTI cable connections refer to qualified HP service personnel.
Overview V2500/V2600 Cabinet Configurations V2500/V2600 Cabinet Configurations This section shows two sample V2500/V2600 server configurations: a single-cabinet system and a three-cabinet system, filled to one-half processor capacity and to one-half and full memory capacity, respectively. Each V2500/V2600 cabinet can contain up to 32 processors, 32 Gbytes of memory, and 28 PCI cards, with up to four cabinets (up to 128 processors, 128 Gbytes of memory, and 112 I/O cards) comprising a V2500/V2600 server.
Overview V2500/V2600 Cabinet Configurations Figure 12 Sample V2500/V2600 Cabinet Configurations A single-cabinet V2500/V2600 server with 16 processors and 16 Gbytes memory, using 256 MByte Chapter 1 A three-cabinet V2500/V2600 server with 48 processors and 96 Gbytes memory, using 256 MByte 19
Overview V2500/V2600 Cabinet Configurations 20 Chapter 1
2 Indicators, switches, and displays This section describes indicators, switches, and displays of the HP 9000 V2500 server.
Indicators, switches, and displays Operator panel Operator panel The operator panel is located on the top left side of the server and contains the key switch panel, DVD-ROM drive, optional DAT tape drive, and the LCD display. Figure 13 shows the location of the operator panel and its components.
Indicators, switches, and displays Operator panel Key switch panel The key switch panel is located on the left of the operator panel, as shown in Figure 13 on page 22. The key switch panel contains a two position key switch, a DC ON LED, and a TOC (Transfer Of Control) button, as shown in Figure 14. Figure 14 Key switch panel DC ON ON DC OFF TOC IOEXS095 10/10/97 Key switch The key switch has two positions: • DC OFF DC power is not applied to the system.
Indicators, switches, and displays Operator panel TOC The TOC (Transfer Of Control) button is a recessed switch that resets the system. DVD-ROM drive The DVD-ROM drive is located on the left of the operator panel, as shown in Figure 13 on page 22. Figure 15 shows the DVD-ROM drive front panel in detail. Figure 15 DVD-ROM drive Disk loading slot Headphone jack Busy indicator Volume control Eject button V25U101 3/17/99 Disk loading slot Place the disk into the slot with the label side up.
Indicators, switches, and displays Operator panel Busy indicator The busy indicator LED flashes to indicate that a read operation is occurring. CAUTION Do not push the eject button while this LED is flashing. If you do, the operation in progress is aborted, and the DVD-ROM is ejected, possibly causing a loss of data. Eject button Push the eject button to eject DVD-ROMs from the drive. Optional DAT drive The DAT drive is located on the right of the operator panel, as shown in Figure 13 on page 22.
Indicators, switches, and displays Operator panel Table 2 Tape (Activity) LED (green) Indicator LED operation Clean (Attention) LED (amber) Meaning Flashing slowly Off A load or unload of a cartridge is in progress. Flashing rapidly Off A cartridge is loaded and a read or write is in progress. On Off A cartridge is loaded. Any Flashing slowly Media caution signal. Indicates that a cartridge is near the end of its life or that the heads need cleaning.
Indicators, switches, and displays System Displays System Displays The V-Class servers provide two means of displaying status and error reporting: an LCD and an Attention light bar.
Indicators, switches, and displays System Displays LCD (Liquid Crystal Display) The LCD display is located on the right of the operator panel, as shown in Figure 17 on page 27. The LCD is a 20-character by 4-line liquid crystal display. Figure 18 shows the display and indicates what each line on the display means.
Indicators, switches, and displays System Displays Table 3 Processor initialization steps Step Description 0 Processor internal diagnostic register initialization 1 Processor early data cache initialization. 2 Processor stack SRAM test.(optional) 3 Processor stack SRAM initialization. 4 Processor BIST-based instruction cache initialization. 5 Processor BIST-based data cache initialization 6 Processor internal register final initialization. 7 Processor basic instruction set testing.
Indicators, switches, and displays System Displays Status Description d DECONFIG: processor has been deconfigured by POST or the user. - EMPTY: Empty processor slot. ? UNKNOWN: processor slot status in unknown. Message display line The message display line shows the POST initialization progress. This is updated by the monarch processor. The system console also shows detail for some of these steps. Table 5 shows the code definitions.
Indicators, switches, and displays System Displays Message display code Description p Multi-node hardware verification q Multi-node initialization ending synchronization r Enabling system error hardware. Attention light bar The Attention light bar is located at the top left corner on the front of the V2500/V2600 server as shown in Figure 17 on page 27. The light bar displays system status in three ways: • OFF—dc power is turned off.
Indicators, switches, and displays System Displays Environmental errors Environmental errors are detected by two basic systems in the V2500/ V2600 server: Power-On and Environmental Monitor Utility Chip (MUC). Power-On detected errors such as ASIC install or ASIC not OK are detected immediately and will not allow dc power to turn on until that condition is resolved. MUC detected errors such as Ambient Air Hot allows the dc power to turn on for approximately 1.2 seconds before the dc power is turned off.
Indicators, switches, and displays System Displays Identifying a node with the blink command The blink command is used to physically identify a node. This command forces the node attention light bar to blink or turns off blinking, provided an error does not exist on the node. Step 1. Bring up the sppdsh prompt at a sppuser window by entering: $ sppdsh Step 2. Use the blink command to cause the attention light bar to blink on a specific node by entering the blink command followed by the node number.
Indicators, switches, and displays System Displays 34 Chapter 2
3 SSP operation This chapter describes the operation the SSP in conjunction with a V-Class server and includes: • SSP log-on • Using the CDE (Common Desktop Environment) Workspace menu • Using the console • SSP file system • System log pathnames Chapter 3 35
SSP operation SSP and the V-Class system SSP and the V-Class system The Service Support Processor (SSP) is either a Hewlett-Packard B180L or 712 workstation that performs the following functions for the V-Class system: • Running diagnostics • Updating of CUB (Core Utility Board) firmware • Logging environmental and system level events • Configuring of hardware and boot parameters • Booting the operating system • Accessing V-class console The SSP is closely interfaced with the Core Utility Board (CUB), loca
SSP operation SSP log-on SSP log-on Two UNIX user accounts are created on the SSP during the HP-UX 10.20 operating system installation process. sppuser This user is the normal log-on for the SSP during system operation, verification, and troubleshooting. Default password: spp user Please note the space between spp and user. root This user has the ability to modify and configure every parameter on the SSP.
SSP operation SSP log-on Figure 19 SSP user windows for V2500/V2600 servers with one node 38 Chapter 3
SSP operation SSP log-on Figure 20 SSP user windows for V2500/V2600 servers with more than two nodes Chapter 3 39
SSP operation SSP log-on Message window The message window displays status from the ccmd daemon running on the SSP approximately 60 seconds after power on. The hard error logger also displays status in this window. This is a display window only and does not accept input. Console window (sppconsole - complex console) The complex console window is the main console window for the V-Class server complex. It displays all POST (Power-On Self -Test) status for node 0.
SSP operation Using the CDE (Common Desktop Environment) Workspace menu Using the CDE (Common Desktop Environment) Workspace menu The SSP uses the CDE Workspace Manager to control the windows on the screen. The Workspace menu is Workspace Manager main menu. The Workspace menu selects create new windows, initiate diagnostic tools, and perform other tasks. CDE Workspace menu The following section describes how to use the CDE Workspace menu on V2500/V2600 servers: Step 1.
SSP operation Using the CDE (Common Desktop Environment) Workspace menu Figure 21 SSP Workspace submenus for V2500/V2600 Figure 22 SSP Workspace submenus for V2500/V2600 42 Chapter 3
SSP operation Using the CDE (Common Desktop Environment) Workspace menu V2500/V2600 Workspace menu options include: • V-Class Complex: name—Opens this submenu for the node/complex. If more than one node/complex has been configured, multiple V-Class Complexes are available by name. • Console—Creates a new console window for a list of available node/complexes. • Shells—Selects a shell: sppdsh, ksh, tcsh, csh, and sh shells. • Diagnostic Tools—Performs a do_reset or invokes cxtest, est, or xconfig.
SSP operation Using the CDE (Common Desktop Environment) Workspace menu • Restart Workspace Manager—Stops and restarts the Workspace Manager. • logout—Closes all open windows and stops Workspace Manager.
SSP operation Using the console Using the console The console serves as the communication device for the V-Class server. Virtual consoles are also used to monitor specific operations, like a system software crash dump. Creating new console windows Console windows can also be created using the sppconsole and xterm commands from the SSP; see Table 6 for details.
SSP operation Using the console Starting the console from the Workspace menu To start the console using the Workspace menu, complete the following steps: Step 1. Move the pointer over the CDE workspace backdrop. Step 2. Press and hold down any mouse button. The Workspace (root) menu appears. Step 3. Drag the mouse pointer to the “V-Class Complex: complex_name” option. Step 4. Release the mouse button to select the option. Step 5. Drag the mouse pointer to the Console option. Step 6.
SSP operation Using the console For example: COMPLEX_NAME = [Select from colossus, guardian] colossus Step 3. Start the console. Enter: sppconsole NOTE Running sppconsole without any additional parameters defaults to Node 0 in the current complex. sppconsole 2 would start a console on Node 2. The new sppconsole window appears.
SSP operation Using the console Step 5. Enter the root password. Refer to “Starting ts_config” on page 92 for information on starting ts_config from a local or remote shell. Step 6. Select the desired node(s) from the list in the display panel. For example, clicking on node 0 in the list highlights that line in the window. Step 7. Start the console session by doing one of the following: • Select “Actions” to drop the pop-down menu and then click “Start Console Session.
SSP operation Using the console Starting the console by logging back on This method of starting the console works from the SSP or after logging on from another system. To start the console by logging out of the SSP and logging back on again, complete the following steps: Step 1. Move the pointer over the CDE workspace backdrop. Step 2. Press and hold down any mouse button. The Workspace (root) menu appears. Step 3. Drag the mouse pointer to the logout menu option. Step 4.
SSP operation Using the console Example: Performing a ^E command To execute the ^Ecf command complete the following steps: 1. Press the Cntrl key and the e key simultaneously. 2. Release the Cntrl key and the e key. 3. Press the c key. 4. Press the f key. Watching the console Any user can display the console via a remote login to the SSP, so it is possible to have many different processes watching the console at the same time. This is sometimes referred to as “spy mode.
SSP operation Using the console CTRL-Ec. The period is part of the command. Assuming control of the console System maintenance or diagnostics can be performed remotely by assuming control of the console from a remote terminal. Upon gaining control of the console, the user has write access to that window. Only one window can be active at a time. To assume control of the console, complete the following steps: Step 1.
SSP operation Using the console Changing a console connection Once the console is started as a watch or a control connection, the connection type can be changed with escape characters. To change a watch window to an active console window, enter: CTRL-Ecf To change an active console window to watch window, enter: CTRL-Ecs Accessing system logs Monitor system status via two logs, event_log and consolelogX (where X is the node_id), located in /spp/data/complex_name on the SSP.
SSP operation Using the console prompting the user if only one complex is configured). This utility accesses the desired node based on node ID. However, the single node must still be configured by ts_config and assigned a complex name before it can be accessed. Targeting commands to nodes Use the jf-ccmd_info command to determine what names or IP addresses the JTAG interfaces have been set to on an SPP.
SSP operation SSP file system SSP file system The /spp and /users/sppuser directories contain most of the SSP specific files. Other files in various directories are also modified. This section restricts, however, its discussion to the /spp directory. Figure 23 shows the SSP file system structure for V2500/V2600 servers.
SSP operation SSP file system conserver The console-server that directs RS-232 console traffic from the Utility Board to the various sppconsole sessions. /spp/bin In the /spp/bin directory are specific commands and daemons that manage a V-Class node. Some of these are: est The command (Exemplar Scan Test) to initiate scan testing. do_reset The command executed on the SSP to reset the V-Class node remotely.
SSP operation SSP file system consolelogX A file containing all the console activity on the system, where X is the node ID. est.log The scan testing log. hard_hist Log of all hard failure information. Logs the output of all suspected ASIC (Application Specific Integrated Circuits). This file may be useful in troubleshooting intermittent ASIC failures. event_log Log of all event information. A read only file which captures information generated by the ccmd daemon.
SSP operation SSP file system Device files Table 8 shows the differences in the device files between the HP B180L and HP 712 SSPs.
SSP operation System log pathnames System log pathnames To separate the configuration and log files for each complex, several files have been moved to complex-specific directories. In Table 9, complex denotes specific complex names. These are assigned by the operator using ts_config. The /spp/data/complex directories are created by ts_config during the “Configure Node” process. Configuration and log files are then created by the various daemon and utility programs as necessary.
4 Firmware (OBP and PDC) This chapter discusses the boot sequence and the commands available from the boot menu.
Firmware (OBP and PDC) Boot sequence Boot sequence OpenBoot PROM (OBP) and SPP Processor Dependent Code (SPP_PDC) make up the firmware on HP V-Class servers that makes it possible to boot HP-UX. Once a machine powers on, the firmware controls the system until the operating system (OS) executes. If the system encounters an error any time during the boot process, it stops processing and goes to HP mode boot menu. See “HP mode boot menu” on page 64 for more information.
Firmware (OBP and PDC) Boot sequence Figure 24 Boot process NO Boot menu displays Autoboot Enabled? YES Prompt displays: Processor is starting the autoboot process. To discontinue, press any key within 10 seconds.
Firmware (OBP and PDC) Boot process output Boot process output The following output illustrates what typically displays on the console as the system starts up: POST Hard Boot on [0:PB4L_A] HP9000/V2500 POST Revision 1.0.0.2, compiled 1999/04/12 11:51:10 Probing CPUs: PB4L_A Completing core logic SRAM initialization. Starting main memory initialization. Probing memory: MB0L MB1L MB2R MB3R Installed memory: 2048 MBs, available memory: 2048 MBs. Initializing main memory.
Firmware (OBP and PDC) Boot process output ------------------------------------------------------------------------------PDC Firmware Version Information PDC_ENTRY version 4.2.0.4 POST Revision: 1.0.0.2 OBP Release 4.2.0, compiled 99/01/06 14:00:18 (3) SPP_PDC 2.0.
Firmware (OBP and PDC) HP mode boot menu HP mode boot menu In some instances, the boot menu displays; otherwise the operating system boots and the system is ready for use.
Firmware (OBP and PDC) HP mode boot menu Table 10 lists the commands available from the Command: prompt. Table 10 Boot menu commands Command Description AUto [BOot|SEArch|Force ON|OFF] Displays or sets the Autoboot or Search flag. If Autoboot is on, the system boots automatically after reset. If AutoSearch is on, the system searches for and displays all I/O devices that the system can boot from.
Firmware (OBP and PDC) HP mode boot menu Command Description PAth [PRI|ALT|CON] [path] Displays or sets primary, alternate, console, and keyboard hardware paths. Keyboard path cannot be modified. PDT [CLEAR|DEBUG] Displays or clears Page Deallocation Table (PDT) information. For use by service personnel only. PIM_info [cpu#] [HPMC|TOC|LPMC] Displays Processor Internal Memory (PIM) information for current or any CPU.
Firmware (OBP and PDC) Enabling Autoboot Enabling Autoboot AUto displays or sets the Autoboot or Search flag, which sets the way a system will behave after powering on. If Autoboot is ON, the system boots automatically after reset. If AutoSearch is ON and Autoboot is OFF, the system searches for and displays all I/O devices from which the system can boot. Changes to a flag take effect after a system reset or power- on. The default value for both Autoboot and Autosearch is OFF.
Firmware (OBP and PDC) Enabling Autoboot Examples au This command displays the status of the Autoboot and Autosearch flags. Autoboot:ON Autosearch:ON au bo This command displays the current setting of the Autoboot flag. Autoboot:ON au bo on This command sets the Autoboot flag ON.
Firmware (OBP and PDC) HElp command HElp command The help command displays help information for the specified command or redisplays the boot menu. Syntax HElp [command] Used alone, HElp displays the boot menu. Specifying command displays the syntax and description of the named command. Examples The following example illustrate use of this command: help au This command displays information for the auto command.
Firmware (OBP and PDC) HElp command 70 Chapter 4
5 Configuration utilities This chapter describes server configuration management and includes: • ts_config • ccmd • xconfig • Configuration utilities Two utilities, sppdsh and xconfig, allow reading or writing configuration information. OBP can also be used to modify the configuration. The SSP allows the user to configure the node using the ts_config utility. This is the preferred method for V2500/V2600 servers. ts_config configures the SSP to communicate with the node.
Configuration utilities ts_config ts_config ts_config [-display display name] Any V2500/V2600 nodes added to the SSP must be configured by ts_config to enable diagnostic and scan capabilities, environmental and hard-error monitoring, and console access. Once the configuration for each node is set, it is retained when new SSP software is installed.
Configuration utilities ts_config For example: $ DISPLAY=myws:0; export DISPLAY (sh/ksh/sppdsh) % setenv DISPLAY myws:0 (csh/tcsh) Also, the -display start-up option may be used as shown below: For example: # /spp/bin/ts_config NOTE -display myws:0 For shells that are run from the SSP desktop, the DISPLAY variable is set (at the shell start-up) to the local SSP display. ts_config operation The ts_config utility displays an active list of nodes that are powered up and connected to the SSP diagnostic LAN.
Configuration utilities ts_config ts_config automatically updates the display when it detects either a change in the configuration status of any node or a newly detected node. The node display is not updated while an Action is being processed or while the user is entering information into an Action dialog. The upper right corner of the ts_config window indicates whether a node has been selected.
Configuration utilities ts_config Configuration Status Description Action Required Active The node is configured and answering requests on the Diagnostic LAN. None required. This is the desired status. Inactive The SSP node configuration file contains information about the specified node, but the node is not responding to requests on the Diagnostic LAN.This status is also shown if a node was configured and then removed from the SSP LAN without being deconfigured.
Configuration utilities ts_config Figure 26 ts_config showing node 0 highlighted Notice that after the node has been highlighted that ts_config displays information concerning the node. In this step, it tells the user what action to take next, “This node’s JTAG firmware must be upgraded. Select “Actions,” “Upgrade JTAG firmware” and “Yes” to upgrade.” Step 2. Select “Actions” to drop the pop-down menu and then click “Upgrade JTAG firmware,” as shown in Figure 27.
Configuration utilities ts_config Figure 28 Upgrade JTAG firmware confirmation panel Step 4. After the firmware is loaded a panel appears as the one shown in Figure 29. Click “OK” and then power-cycle the node to activate the new firmware. Figure 29 ts_config power-cycle panel When the node is powered up, the “Configuration Status” should change to “Not Configured.” Configure a Node Step 1. Select the desired node from the list of available nodes.
Configuration utilities ts_config Figure 30 ts_config indicating Node 0 as not configured Step 2. Select “Actions” and then click “Configure Node,” as shown in Figure 31. Figure 31 ts_config “Configure Node” selection. After invoking ts_config to configure the node, a node configuration panel appears as the one in Figure 32.
Configuration utilities ts_config Figure 32 ts_config node configuration panel Step 3. Enter a name for the V2500/V2600 System. The SSP uses this name as the “Complex Name” and to generate the IP host names of the Diagnostic and OBP LAN interfaces. Select a short name that SSP users can easily relate to the associated system (for example: hw2a, swtest, etc.). Step 4. Select an appropriate serial connection for the V2500/V2600 console from the pop-down option menu in the node configuration panel.
Configuration utilities ts_config Figure 33 ts_config restart workspace manager panel. Step 6. Read the panel and click “OK.” When the configuration process is complete, the “Configuration Status” of the node changes to “Active,” as shown in Figure 34. Figure 34 ts_config indicating Node 0 is configured Step 7. Restart the Workspace Manager: Click the right-mouse button on the desktop background to activate the root menu.
Configuration utilities ts_config Configure the scub_ip address Step 1. Select the desired node from the list of available nodes. Step 2. In the ts_config display panel, select “Actions” and then “Configure ‘scub_ip’ address,” as shown in Figure 35. Figure 35 ts_config “Configure ‘scub_ip’ address” selection ts_config checks the scub_ip address stored in NVRAM on the SCUB in the node. This would initially be the default address set at the factory.
Configuration utilities ts_config Figure 37 ts_config scub_ip address configuration confirmation Step 4. A panel as the shown in Figure 38 appears confirming that the scub_ip address is set. Click OK. Figure 38 ts_config scub_ip address set confirmation panel Initiate a node reset to activate the new scub_ip address. Reset the Node Step 1. Select the desired node from the list of available nodes. Step 2. Select “Actions,” then “Reset Node.” This is indicated in Figure 39.
Configuration utilities ts_config Figure 39 ts_config “Reset Node” selection A panel as the one shown in Figure 40 appears. Figure 40 ts_config node reset panel Step 3. In the Node Reset panel, select the desired “Reset Level” and “Boot Options,” then click Reset.
Configuration utilities ts_config Deconfigure a Node Deconfiguring a node removes the selected node from the SSP configuration. The SSP will no longer monitor the environmental and hard-error status of this node. Console access to the node is also be disabled. Step 1. Select the desired node from the list of available nodes. Step 2. Select “Actions,” then “Deconfigure Node,” then click “Yes.” Add/Configure the Terminal Mux To add or reconfigure the terminal mux, perform the following procedure. Step 1.
Configuration utilities ts_config Figure 42 Terminal mux IP address panel Remove terminal mux ts_config does not remove the terminal mux if any node consoles are assigned to terminal mux ports. Step 1. Select “Actions,” then “Configure Terminal Mux.” Step 2. Select “Remove Terminal Mux,” then click “Yes.” Console sessions ts_config may also start console sessions by selecting the desired node(s) and then selecting the “Start Console Session” action as shown in Figure 43.
Configuration utilities ts_config Figure 43 “Start Console Session” selection Figure 44 Started console sessions 86 Chapter 5
Configuration utilities ts_config V2500/V2600 SCA (multinode) configuration ts_config can also configure a V2500/V2600 SCA system. An example to follow describes how. The example assumes that there are two active single-node complexes. After the system has rebooted to OBP, node 0 becomes the console for the SCA complex. To configure the two-node system in the example, start ts_config as described in “Starting ts_config” on page 72.
Configuration utilities ts_config Figure 46 ts_config Configure Multinode complex selection Step 3. When “Configure Multinode complex” is selected, a configuration dialog appears as shown in Figure 57.
Configuration utilities ts_config Step 4. Enter the required fields into the Configure Multinode Complex dialog window. • V-Class Complex Name—Current complex name of either node or a new complex name. • Complex Serial Number—Unique serial number of the complex. This is not required if the nodes have the same serial numbers. • Complex Key—Number required to enter the Complex Serial Number. NOTE If all of the SCA system complex serial numbers are the same, no complex key is required. Step 5.
Configuration utilities ts_config Figure 48 Configure Multinode Complex dialog window with appropriate values Step 10. Click the “Configure” button to start the configuration. A message box appears indicating that the configuration has started. Figure 49 Configuration started information box The following activities occur during the configuration process: • SSP files are updated based on the new complex and node names.
Configuration utilities ts_config This information includes: • Node ID • Complex serial number (if it has been modified) • Requested or auto-generated software identifier • Configuration Manager Daemon, ccmd, is notified of the new configuration. • The shared-memory database of node information is updated. • Multinode configuration parameters are written to NVRAM in each node.
Configuration utilities ts_config Figure 50 ts_config showing newly configured complexes When remotely running ts_config, the Restart Workspace Manager step cannot be performed, because it is the SSP Workspace Manager that needs to be restarted. The Workspace Manager can be restarted at any time by clicking on the desktop background and selecting Restart Workspace Manager, then OK.
Configuration utilities ts_config Figure 51 ts_config Split Multinode complex operation Figure 52 ts_config Split Multinode complex panel Step 3. Enter the complex names for each node. New complex serial numbers may be assigned. Each node becomes node 0 in a new complex. Figure 53 shows the Split Multinode panel filled in. Click the Split Complex button to initiate the configuration process.
Configuration utilities ts_config Figure 53 ts_config Split Multinode complex panel filled in The message shown in Figure 54 appears indicating the configuration is taking place. Figure 54 Split Multinode confirmation panel Figure 55 shows the main ts_config display after the split multinode operation has completed. It shows the resulting configuration: two single node complexes (two node 0s) with names assigned in the prior step.
Configuration utilities ts_config ts_config files ts_config either reads or maintains the following SSP configuration files: /etc/hosts The standard system hosts file, includes entries for the cabinet related IP addresses. /etc/services Service definitions for the console interface. /etc/ inetd.conf Contains entries for starting console related processes /spp/data/ nodes.conf Contains entries which define the complexes (either single cabinet or multi-cabinet) managed by the SSP.
Configuration utilities ts_config NODE Complex Node ID JTAG-hostname OBP-hostname SSP-hostname Console-port The variables of the entry are defined as follows: NODE—Keyword designating a cabinet (node) entry. Complex—Name to which the node (cabinet) is associated. In a multi-cabinet complex all the cabinets comprise a single system (complex) and are managed by a single console (the console on cabinet 0).
Configuration utilities SSP-to-system communications SSP-to-system communications Figure 56 depicts the V-Class server to SSP communications using HP-UX.
Configuration utilities SSP-to-system communications LAN communications There are two ethernet ports located on the SCUB as shown in the diagram in the upper-left side of the node (dotted line) in Figure 56 on page 97. These comprise the “private” or diagnostic LAN.
Configuration utilities SSP-to-system communications Serial communications The DUART port on the SCUB provides an RS232 serial link to the SSP. Through this port HP-UX, OBP, POST (Power-On Self Test) and the Test Controller send console messages. The SSP processes these messages using the sppconsole and ttylink utilities and the consolelogx log file. POST and OBP also send system status to the LCD connected to the DUART.
Configuration utilities ccmd ccmd ccmd (Complex Configuration Management Daemon) is a daemon that maintains a database of information about the V2500/V2600 hardware. ccmd also monitors the system and reports any significant changes in system status. It supports multiple nodes, multiple complexes and nodes that have the same node number. There are two types of related information in the database: node information (node numbers, IP addresses and scan data) and configuration data which is initialized by POST.
Configuration utilities ccmd If started with no options, ccmd disassociates itself from the terminal or window where it was started. It instead reports to the console window and the file /spp/data/ccmd_log. If ccmd is sent a SIGHUP, it regenerates the database. All scan-based operations require ccmd. If POST is unable to run, then ccmd is not able to read configuration data and some system information is not accessible.
Configuration utilities xconfig xconfig xconfig is the graphical tool that can also modify the parameters initialized by POST to reconfigure a node. The graphical interface allows the user to see the configuration state. Also the names are consistent with the hardware names, since individual configuration parameters are hidden to the user. The drawback of xconfig is that it can not be used as a part of script-based tests, nor can it be used for remote debug. xconfig is started from a shell.
Configuration utilities xconfig Figure 57 xconfig window—physical location names Chapter 5 103
Configuration utilities xconfig Figure 58 xconfig window—logical names As buttons are clicked, the item selected changes state and color. There is a legend on the screen to explain the color and status. The change is recorded in the SSP’s image of the node. When the user is satisfied with the new configuration, it should be copied back into the node, and the node should be reset to enable the changes.
Configuration utilities xconfig The main xconfig window has three sections: • Menu bar—Provides additional capability and functions. • Node configuration map—Provides the status of the node. • Node control panel—Provides the capability to select a node and control the way data flows to it. Menu bar The menu bar appears at the top of the xconfig main window. It has four menus that provide additional features: • File menu—Displays the file and exit options.
Configuration utilities xconfig Node configuration map The node configuration map is a representation of the left and right side views of a node as shown in Figure 60.
Configuration utilities xconfig The button boxes are positioned to represent the actual boards as viewed from the left and right sides. Each of the configurable components of the node is in the display. The buttons are used as follows: • Green button—Indicates that the component is present and enabled. • Red button—Indicates that the component is software disabled in the system.
Configuration utilities xconfig Figure 61 xconfig window node control panel The node number is shown in the node box. A new number can be selected by clicking on the node box and selecting the node from the pulldown menu. A new complex can be selected by clicking on the complex box and selecting it from the pull-down. A node IP address is displayed along with the node number and complex.
Configuration utilities xconfig When a new node is selected and available, its data is automatically read and the node configuration map updated. The data image is kept on the SSP until it is rebuilt on the node using the Replace button. This is similar to the replace command on sppdsh. Even though data can be rebuilt on a node, it does not become active until POST runs again and reconfigures the system. The Reset or Reset All buttons can be used to restart POST on one or all nodes of a system.
Configuration utilities Configuration utilities Configuration utilities V2500/V2600 diagnostics provides utilities that assist the user with configuration management. autoreset autoreset allows the user to specify whether ccmd should automatically reset a complex after a hard error and after the hard logger error analysis software has run.
Configuration utilities Configuration utilities NOTE If there is a node_#.pwr file that is older than the node_#.cfg file, existing node configuration files do not need to be updated. report_cfg This utility generates a report summarizing the configuration of all nodes/complexes specified on the command line. The format of report_cfg is as follows: report_cfg [ [ ...]] node id may be a node number, IP name, or “all.
Configuration utilities Configuration utilities Effects of hardware and software deconfiguration report_cfg counts all processors, STACs, SMACs, SAGAs and ERACs if POST has not marked them as empty. This results in ASICs and processors being included in the summary count even though they may have failed or have been deconfigured by software. This is necessary because POST deconfigures STACs in a single node configuration.
Configuration utilities Configuration utilities report_cfg ASIC report To obtain a report on the ASICs in a complex, use the -A option.
Configuration utilities Configuration utilities report_cfg memory report To obtain a report on the memory in a complex, use the -m option. The following is a sample memory report by report_cfg: report_cfg -m Complex |Node#| MIB COP | SCUB COP ====================+=====+=======================+======================= hw2a 0 A5074-60002 00 a 3845 A5074-60003 00 b 3830 hw2a 2 A5074-60002 00 a 3840 A5074-60003 00 b 00XA | 80-bit | 88-bit | | | | 1 | 2 | | 1 | 2 |Mem.
Configuration utilities Configuration utilities report_cfg processor report To obtain a report on the processor in a complex, use the -p option.
Configuration utilities Configuration utilities If the command line [-on | -off | -check] options are used, xsecure does not use the GUI interface. These options allow the user to turn the secure mode on, off or allow the user to check the secure mode status. A simple button with a red or green secure mode indicator provides the user with secure mode status information. The red indicator shows that the secure mode process has begun. The label near the red button will inform the user when the SSP is secure.
6 HP-UX Operating System Different versions of the HP-UX operating system run on a V-Class server and its Service Support Processor. This section covers issues related to using HP-UX V11.0 and HP-UX V11.10 on V-Class servers. Multiple-cabinet server configurations and HP-UX SCA features require that HP-UX V11.10 be installed.
HP-UX Operating System HP-UX on the V2500/V2600 HP-UX on the V2500/V2600 In general HP-UX administration tasks are performed on V-Class servers as they are on other HP servers. One difference is that V-Class servers run the HP-UX kernel only in 64-bit mode. This facilitates addressing the larger memory capacity available on the V2500/V2600. However, both 32-bit and 64-bit applications may be run simultaneously on the server.
HP-UX Operating System HP-UX on the V2500/V2600 On multiple-cabinet V2500/V2600 servers, the first component of the hardware path indicates which cabinet a hardware component resides upon. Hardware on cabinet ID 0 is listed with a first hardware path field starting at 0, hardware on cabinet ID 2 is listed starting at 64, cabinet ID 4 starts at 128, and cabinet ID 6 starts at 192. For example, a disk on cabinet ID 0 could have a hardware path of: 1/2/0.9.
HP-UX Operating System HP-UX on the V2500/V2600 V2500/V2600 Cabinet ID First Field of Hardware Path Cabinet ID 4 128–135 PCI I/O bus bridges (card cages) 136 Memory 143 Core utilities board 144–175 Processors (PA-RISC CPUs) 192–199 PCI I/O bus bridges (card cages) 200 Memory 207 Core utilities board 208–239 Processors (PA-RISC CPUs) Cabinet ID 6 Description of Hardware Component Configuring HP-UX for V-Class Servers HP-UX V11.
HP-UX Operating System HP-UX on the V2500/V2600 • Dedicated commercial data processing use—Servers whose use is restricted for online transaction processing (OLTP), running Oracle, and running other data processing workloads. These systems provide limited, if any, interactive user access. The “OLTP/Database Server System” tuned parameter set provides a good HP-UX configuration for using HP V-Class servers for dedicated commercial data processing.
HP-UX Operating System HP-UX on the V2500/V2600 /usr/sam/lib/kc/tuned Refer to the SAM online help for examples and details on using kernel parameters. Process and Thread “Gang Scheduling” HP-UX V11.0 includes support for kernel threads and provides a “gang scheduling” feature for managing how threads belonging to the same process or application are executed. The HP-UX gang scheduler permits a set of MPI processes, or multiple threads from a single process, to be scheduled concurrently as a group.
HP-UX Operating System HP-UX on the V2500/V2600 extensions also provide system inquiry features for retrieving information about the current hardware topology, as well as thread and process inquiry features. Both traditional system architectures as well as SCA systems are supported by the HP-UX 11.10 enhancements. HP-UX SCA Features HP-UX V11.10 SCA programming and launch features provide the following capabilities.
HP-UX Operating System HP-UX on the V2500/V2600 • Fill First—Fill a locality first, then spill over to another locality, as needed. Once all localities are filled, start over as needed. • Packed—Place all threads or processes in the same locality; do not spill over. • Least Loaded—Place each thread or process in the locality that is least-loaded at the time of its creation. Gang scheduling of threads and processes is supported through the mpsched -g option and the MP_GANG HP-UX environment variable.
HP-UX Operating System Starting HP-UX Starting HP-UX Bringing the V-Class server to a usable state involves two systems and their hardware and software. This section provides a brief overview of the process; for complete instructions, see Managing Systems and Workgroups. Additional information is contained in the V2500/V2600 SCA HP-UX System Guide.
HP-UX Operating System Starting HP-UX Start up, or boot, HP-UX after the operating system has been completely shut down or partially shut down to perform system administration tasks. The boot procedure differs according to the value of the Autoboot flag. See “Enabling Autoboot” on page 67 for information on how to set Autoboot. After you power-up your V-Class server, if Autoboot is set to: • ON, OBP automatically starts HP-UX.
HP-UX Operating System Starting HP-UX Step 4. Issue the OBP menu’s BOOT command to boot HP-UX on the V-Class server. You can set the server to automatically boot HP-UX if you have also set a primary boot device (PRI). The OBP menu provides the AUTO BOOT option, which causes the server to automatically boot HP-UX from the primary boot device when AUTO BOOT is set to ON. If a cabinet is already powered on before the SSP booted, the cabinet can be reset from the SSP after it boots, using the do_reset command.
HP-UX Operating System Starting HP-UX Table 14 Boot variables Variable Description AUto BOot [ON|OFF] If set to ON, the server automatically boots HPUX from the primary (PRI) device during system startup or reset. When set to OFF the server boots to the OBP menu interface. AUto SEArch [ON|OFF] If set to ON, the server searches for and lists all bootable I/O devices. AUto Force [ON|OFF] If set to ON, then OBP allows HP-UX to boot even if one or more cabinets does not complete power on self test.
HP-UX Operating System Starting HP-UX The start-up process is interrupted: /usr/sbin/fsclean:/dev/dsk/0s0 not ok run fsck FILE SYSTEM(S) NOT PROPERLY SHUTDOWN, BEGINNING FILE SYSTEM REPAIR. At this point, the system runs /usr/sbin/fsck in a mode that corrects certain inconsistencies in the file systems without your intervention and without removing data.
HP-UX Operating System Stopping HP-UX Stopping HP-UX This section provides a brief overview of the process; for complete instructions, see Managing Systems and Workgroups. Additional information is contained in the V2500/V2600 SCA HP-UX System Guide. Typically, the system is shut down to: • Put it in single-user state so that the system can be updated or to check file systems. • Turn it off in order to perform a task such as installing a new disk drive.
HP-UX Operating System Stopping HP-UX See the shutdown man page for a complete description of the shutdown process and available options.
HP-UX Operating System Stopping HP-UX Rebooting the system To shutdown HP-UX and reboot the V-Class server, perform the following steps: Step 1. If the server is running HP-UX, log in to the server as root. Step 2. Check activity on the server and warn users of the impending server reboot. If HP-UX is hung, to reboot HP-UX you may need to reset the server. See “Resetting the V2500/V2600 server hardware” on page 134. Step 3. Change to the root directory. Enter: cd / Step 4.
HP-UX Operating System Stopping HP-UX Shutting down the system To shut down the V-Class server, perform the following steps: Step 1. Login to the server as root. Step 2. Check activity on the server and warn users of the impending server shutdown. Step 3. Change to the root directory. Enter: cd / Step 4. Shut down the system using the shutdown or reboot command. Enter: shutdown Progress messages detailing system shutdown activities print to the terminal.
HP-UX Operating System Stopping HP-UX Resetting the V2500/V2600 server hardware The /spp/bin/do_reset command resets the V-Class hardware. The do_reset command is run from the Service Support Processor, causes OBP to reboot, and halts all activity on the V-Class server cabinets involved. For details see the do_reset man page on the Service Support Processor.
HP-UX Operating System Stopping HP-UX Performs a level 4 reset of all cabinets. This causes a Transfer of Control (TOC) that initiates a crash dump of the operating system, if crash dump is configured. See the savecrash(1M) man page for crash dump details. To reset the V-Class server hardware, perform the following steps: Step 1. Shut down HP-UX on the V2500/V2600 server. This involves logging in to the server and issuing the shutdown -h or reboot -h command.
HP-UX Operating System Stopping HP-UX 136 Chapter 6
7 Recovering from failures This chapter provides detailed information on recovering from HP-UX system interruptions. Usually, the first indication of a problem is that the system does not respond to user input. This lack of response indicates either a performance problem or system interruption. Performance problems are generally characterized by: • The system responds to one or more programs/users but not all, or sluggishly to others.
Recovering from failures Collecting information Collecting information Providing the Response Center with a complete and accurate symptom description is important in solving any problem. The V-Class server’s SSP automatically records information on environmental and system level events in several log files. See “SSP file system” on page 54 for more information about these files. Use the following procedure to collect troubleshooting information: Step 1.
Recovering from failures Performance problems Performance problems Performance problems are generally perceived as: • Sluggish response at the operating system prompt • Slow program execution • Some users/programs unable to get a response Use the following procedure to troubleshoot a performance problem: Step 1.
Recovering from failures System hangs System hangs System hangs are characterized by users unable to access the system, although the LCD display and attention light may not indicate a problem exists. The system console may or may not be hung. Use the following procedure to troubleshoot a system hang: Step 1. Press Enter at a terminal several times and wait for a response. Step 2. Press Ctrl-C at a terminal to abort an executing command. Step 3.
Recovering from failures System panics System panics A system panic is the result of HP-UX encountering a condition that it is unable to respond to and halting execution. System panics are rare and are not always the result of a catastrophe. They may occur on bootup, if the system was previously shut down improperly. Sometimes they occur as a result of hardware failure. Recovering from a system panic can be as simple as rebooting the system.
Recovering from failures System panics Step 2. Record the panic message displayed on the system console. Look for text on the console that contains terms like: • System Panic • HPMC • Privilege Violation • Data Segmentation Fault • Instruction Segmentation Fault Step 3. Categorize the panic message. The panic message describes why HP-UX panicked. Sometimes panic messages refer to internal structures of HPUX (or its file systems) and the cause might not be obvious.
Recovering from failures System panics 2. Take the device offline. 3. Power down the device. 4. If it is a disk drive, wait for the disk to stop spinning. 5. Power up the device. 6. Place the device back online. Step 3. Check to ensure the device address or ID is correct. Step 4. Check cable and terminator connections. Step 5. If the system does not reboot by itself, reboot the computer by issuing the reset command in the console window or do_reset command at the ksh-shell window.
Recovering from failures System panics Step 4. If the system does not reboot by itself, reboot the computer by issuing the reset command in the console window or do_reset command at the ksh-shell window. For more information about rebooting the system see “Rebooting the system” on page 146. If the problem reappears, it might be necessary to have the problem fixed by Hewlett-Packard service personnel.
Recovering from failures System panics Logical Volume Manager (LVM) related problem If the size of a logical volume that contains a file system is reduced such that the logical volume is smaller than the file system within it, the file system will be corrupted. When an attempt is made to access a part of the truncated file system that is beyond the new boundary of the logical volume a system panic will often result. The problem might not show up immediately.
Recovering from failures Rebooting the system Rebooting the system Once a problem has been corrected, reset and reboot the system. Step 1. Reset the V-Class server. See “Resetting the V2500/V2600 server hardware” on page 134. Step 2. If the system panicked due to a corrupted file system, fsck will report the errors and any corrections it makes. If fsck terminates and requests to be run manually, refer to Managing Systems and Workgroups for further instructions.
Recovering from failures Abnormal system shutdowns Abnormal system shutdowns Abnormal systems shutdowns (often referred to as system crashes) can occur for many reasons. In some cases, the cause of the crash can be easily determined. In some extreme cases, however, it may be necessary to analyze a snapshot (called a core dump or simply dump) of the computer’s memory in order to determine the cause of the crash. This may require the services of the Hewlett-Packard Response Center.
Recovering from failures Abnormal system shutdowns The on-disk and file system formats of a crash dump have changed with HP-UX 11.0. libcrash(3) is a new library provided to allow programmatic access to a crash dump. supports all past and current crash dump formats. By using libcrash(3) under certain configurations, crash dumps no longer need to be copied into the file system before they can be debugged. See the libcrash(3) man page for more information.
Recovering from failures Abnormal system shutdowns IMPORTANT Crashdump must be configured to dump on cabinet zero disks only. It is important to have sufficient space to capture the part of memory that contains the instruction or data that caused the crash. More than one dump device can be defined so that if the first one fills up, the next one continues dumping until the dump is complete or no more defined space is available.
Recovering from failures Abnormal system shutdowns To calculate an appropriate size for a V2500/V2600 SCA crash dump volume, estimate that you will need at most the following amount of space: the total amount of physical memory in the system, plus space to allow for dump headers and tables, minus the amount of memory dedicated as CTI cache, and minus the amount of memory that is kernel text replicated across cabinets.
Recovering from failures Abnormal system shutdowns The fewer pages dumped to disk (and on reboot, copied to the HP-UX file system area), the faster the system can be back up and running. Therefore, avoid using the full dump option. When defining dump devices, whether in a kernel build or at run time, the operator can list which classes of memory must always get dumped, and which classes of memory should not be dumped.
Recovering from failures Abnormal system shutdowns /etc/rc.config.d/savecrash) reduces system recovery time. After the system recovery, run savecrash manually to copy the memory image from the dump area to the HP-UX file system area. Partial save If a memory dump resides partially on dedicated dump devices and partially on devices that are also used for paging, only those pages that are endangered by paging activity can be saved. Pages residing on the dedicated dump devices can remain there.
Recovering from failures Abnormal system shutdowns Dump definitions built into the kernel vs. defined at runtime There are three places to define which devices are to be used as dump devices: 1. During kernel configuration 2. At boot time (entries defined in the /etc/fstab file) 3. At run time (using the /sbin/crashconf command) Definitions at each of these places add to or replace any previous definitions from other sources.
Recovering from failures Abnormal system shutdowns paging from being enabled to the device by creating the file /etc/ savecore.LCK. swapon does not enable the device for paging if the device is locked in /etc/savecore.LCK. Systems configured with small amounts of memory and using only the primary swap device as a dump device might not be able to preserve the dump (copy it to the HP-UX file system area) before paging activity destroys the data in the dump area.
Recovering from failures Abnormal system shutdowns NOTE With HP-UX 11.0, it is possible to analyze a crash dump directly from dump devices using a debugger that supports this feature. If, however, there is a need to save it to tape or send it to someone, copy the memory image to the HP-UX file system area first.
Recovering from failures Abnormal system shutdowns CLASS -------- PAGES ---------- UNUSED USERPG BCACHE KCODE USTACK FSDATA KDDATA KSDATA 2036 6984 15884 1656 153 133 2860 3062 INCLUDED IN DUMP ---------------- DESCRIPTION ------------------------------------- no, no, no, no, yes, yes, yes, yes, unused pages user process pages buffer cache pages kernel code pages user process stacks file system metadata kernel dynamic data kernel static data by by by by by by by by Total pages on system: Total page
Recovering from failures Abnormal system shutdowns Step 3. Use the SAM action menu to add, remove, or modify devices or logical volumes. NOTE The order of the devices in the list is important. Devices are used in reverse order from the way they appear in the list. The last device in the list is as the first dump device. Step 4. Follow the SAM procedure for building a new kernel. Step 5. Boot the system from the new kernel file to activate the new dump device definitions.
Recovering from failures Abnormal system shutdowns • The logical volume cannot be used for file system storage, because the whole logical volume is used.
Recovering from failures Abnormal system shutdowns The /etc/fstab file Define entries in the fstab file to activate dump devices during the HPUX initialization (boot) process or when crashconf reads the file. The format of a dump entry for /etc/fstab looks like the following: devicefile_name / dump defaults 0 0 Examples: /dev/dsk/c0t3d0 / dump defaults 0 0 /dev/vg00/lvol2 / dump defaults 0 0 /dev/vg01/lvol1 / dump defaults 0 0 Define one entry for each device or logical volume to be used as a dump device.
Recovering from failures Abnormal system shutdowns To have crashconf add the devices represented by the block device files /dev/dsk/c0t1d0 and /dev/dsk/c1t4d0 to the dump device list, enter the following: /sbin/crashconf /dev/dsk/c0t1d0/dev/dsk/c1t4d0 To have crashconf replace any existing dump device definitions with the logical volume /dev/vg00/lvol3 and the device represented by block device file /dev/dsk/c0t1d0, enter the following: /sbin/crashconf -r /dev/vg00/lvol3 /dev/dsk/c0t1d0 Dump order The or
Recovering from failures Abnormal system shutdowns Operator override options When the system crashes, the system console displays a panic message similar to the following: *** A system crash has occurred. (See the above messages for details.) *** The system is now preparing to dump physical memory to disk, for use *** in debugging the crash. *** The dump will be a SELECTIVE dump: 21 of 128 megabytes. *** To change this dump type, press any key within 10 seconds.
Recovering from failures Abnormal system shutdowns Following the dump, the system attempts to reboot. The reboot When dumping of physical memory pages is complete, the system attempts to reboot (if the Autoboot is set). For information on the Autoboot flag, see “Enabling Autoboot” on page 67. savecrash processing During the boot process, a process called savecrash can be used that copies (and optionally compresses) the memory image stored on the dump devices to the HP-UX file system area.
Recovering from failures Abnormal system shutdowns Using crashutil to complete the saving of a dump If devices are being used for both paging (swapping) and dumping, it is very important to not disable savecrash processing at boot time. If this is done, there is a chance that the memory image in the dump area will be overwritten by normal paging activity.
Recovering from failures Abnormal system shutdowns destination Designates the pathname where the converted file will be written. If no destination is specified the source will be overwritten. See the crashutil(1M) manpage for more information. Analyzing crash dumps Analyzing crash dumps is not a trivial task. It requires intimate knowledge of HP-UX internals and the use of debuggers. It is beyond the scope of this document to cover the actual analysis process.
A LED codes This appendix describes core utilities board (CUB) LED errors The Attention LED on the core utilities board (CUB) turns on, and the Attention light bar on the front of the node flashes to indicate the presence of an error code listed Table 15. Additionally, only the highest priority error is displayed. Once remedied, an error that is cleared may expose a lesser priority error.
LED codes Power on detected errors Power on detected errors This section describes core utilities board (CUB) LED errors from highest to lowest priority detected at power on. The Attention LED on the core utilities board (CUB) turns on, and the Attention light bar on the front of the node flashes to indicate the presence of an error code listed in Table 15. Additionally, only the highest priority error is displayed. Once remedied, an error that is cleared may expose a lesser priority error.
LED codes Power on detected errors LED Fault Symptoms Corrective action 03 FPGA not OK 1. Core Utilities Board (CUB) monitoring utilities chip (MUC) problem. 2. MUC cannot get correct program transfer from EEPROM on power up. • Cycle the node power using the Key switch. • Call the Response Center. 04 dc OK error (Upper Left) 1. Power supply is reporting failure (dc OK) after keyswitch is turned on, but prior to CUB power on sequence. 2. This is the first of two or more supplies reporting failure.
LED codes Power on detected errors LED Fault Symptoms 08-11 48V error NPSUL failure PWRUP=0-9 1. Error occurs when 48 volt distribution falls below 42 volts during powerup state displayed. Powerup state indicates which loads are being turned on. 2. Excessive load on 48 volts due to an inadequate number of functioning 48 volt supplies or overload condition on 48V bus. 3. Possible node power supply (NPS) upper left failure. Call the Response Center. 12-1B 48V error NPSUR failure PWRUP=0-9 1.
LED codes Power on detected errors LED Fault Symptoms 1C-25 48V error NPSLL failure PWRUP=0-9 1. Error occurs when 48 volt distribution falls below 42 volts during powerup state displayed. Powerup state indicates which loads are being turned on. 2. Excessive load on 48 volts due to an inadequate number of functioning 48 volt supplies or overload condition on 48V bus. 3. Possible node power supply (NPS) lower left failure. Call the Response Center. 26-2F 48V error NPSLR failure PWRUP=0-9 1.
LED codes Power on detected errors LED Fault Symptoms 30-39 48V error (maintenance) no supply failure reported PWRUP=0-9 1. Error occurs when 48 volt distribution falls below 42 volts during powerup state displayed. Powerup state indicates which loads are being turned on. 2. Excessive load on 48 volts due to an inadequate number of functioning 48 volt supplies or overload condition on 48V bus. 3. Possible node power supply (NPS) failure. Call the Response Center. 3A 48V Yo Yo error 1.
LED codes CUB detected memory power fail CUB detected memory power fail This describes covers memory errors detected by the monitoring utilities chip (MUC) on the core utilities board after power-on. Table 16 LED CUB detects memory power fail Fault 40 MB0L Power Fail 41 MB1L Power Fail 42 MB2R Power Fail 43 MB3R Power Fail 44 MB4L Power Fail 45 MB5L Power Fail 46 MB6R Power Fail 47 MB7R Power Fail Symptoms 1. 3.3V dropped below acceptable level. 2.
LED codes CUB detected processor error CUB detected processor error This section describes processor errors detected by the monitoring utilities chip (MUC) on the core utilities board after power-on.
LED codes CUB detected I/O error CUB detected I/O error This section describes I/O errors detected by the monitoring utilities chip (MUC) on the core utilities board after power-on. Table 18 LED CUB detects I/O (IOB) power fail Fault 58 Left Front I/O Board failure 59 Left Rear I/O Board failure 5A Right Front I/O Board failure 5B Right Rear I/O Board failure Appendix A Symptoms 1. 3.3V or 5V dropped below acceptable level (+12V and -12V not monitored). 2.
LED codes CUB detected fan error CUB detected fan error This section describes fan errors detected by the monitoring utilities chip (MUC) on the core utilities board after power-on. NOTE Fan positions are referred to as viewed from the rear of the server.
LED codes CUB detected ambient air errors CUB detected ambient air errors This section describes air errors detected by the monitoring utilities chip (MUC) on the core utilities board after power-on. Table 20 LED CUB detects ambient air error Fault Symptoms Corrective action 62 Ambient hot 1. Ambient air too hot. 2. Core utilities board (CUB) powers down system. 3. Should have received “ambient air too warm” error 69 prior to this error. • Check site temperature. • Call the Response Center.
LED codes CUB detected hard error CUB detected hard error This section describes hard errors detected by the monitoring utilities chip (MUC) on the core utilities board after power-on. Table 21 LED 68 Hard error Fault Symptoms Corrective action Hard error (RAC) (PAC) (MAC) (TAC) (SAGA) 1. Hard error lines to core utilities board (CUB) reported ASIC problem. 2. Bit and hard error bus determine which ASIC to check • Read /spp/data/ hard_list. • Call the Response Center.
LED codes CUB detected intake ambient air error CUB detected intake ambient air error This section describes air intake errors detected by the monitoring utilities chip (MUC) on the core utilities board after power-on. Table 22 LED 69 Ambient air (intake) error Fault Ambient air too warm is an environmental warning Appendix A Symptoms Intake air through CUB too warm. Corrective action • Check site temperature and correct. • If the fault reoccurs when room temperature is within spec.
LED codes CUB detected dc error CUB detected dc error This section describes dc errors detected by the monitoring utilities chip (MUC) on the core utilities board after power-on. Table 23 LED dc error Fault 70 NPSUL failure (warning) 71 NPSUR failure (warning) 72 NPSLL failure (warning) 73 NPSLR failure (warning) Symptoms 1. Node power supply (Viewed from Node front) failure reported. 2. Low-priority error for redundant power configurations.
Index Symbols /spp directory, 4 /spp/bin, 55 /spp/data, 55 /spp/est, 56 /spp/etc, 54 /spp/firmware, 56 /spp/man, 56 /spp/scripts, 55 ^E key sequence, 49 Numerics 10/100 Base T Ethernet, 12 1000 Base SX Gigabit Ethernet, 12 32-bit applications, 118 64-bit applications, 118 712 workstation, 36, 98 A Abaqus, 120 abnormal system shutdown, 147 accessing I/O, 13 accounts, 37 acoustics, xviii add terminal mux, 84, 85 applications 32-bit, 118 64-bit, 118 associated documents, xx attention light bar, 31 auto comman
detected memory power fail, 171 detected processor error, 172 core utility board (CUB), 36 CPU.
HyperPlane Crossbar, 6 I I/O controllers, 13 listing, 118 multiple-cabinet numbering, 14 numbering, 119 physical access, 13 supported cards, 12 indicator LEDs DAT, 25 DVD-ROM, 24 indicators, 21 DAT drive LEDs, 25 dc on LED, 23 light bar, 31 installation conditions, xix interconnecting hardware, 6 interleaving of memory, 11 ioscan command, 118 IP address, 53, 98 IT power system, xviii J jf-ccmd_info, 53 JTAG, 53, 98 K kernel configuration, 120 threads, 122 key switch panel, 23 L LAN 712/B180L, 57 launch poli
LVM (Logical Volume Manager), problems, 145 M material handling safety, xvi media DAT, 25 tape, 25 memory 80-bit DIMMs, 10 88-bit DIMMs, 10 board, 11 controllers, 6 CTI cache memory, 11, 12 interleaving, 11 latency, 10 numbering, 119 population, 10 supported DIMM sizes, 10 memory power fail, 171 MIB, 36 midplane, 36 migration of threads and processes, 122 model command, 118 modem, 57 MPI scheduling, 122 mpsched command, 123 mu-000X, 98 MUC detected errors, 171 multiple-cabinet cabinet IDs, 2 configurations,
power, 23 powering down the system, 130 Power-On Self Test (POST), 28 private ethernet, 36 private LAN, 57 private LAN see diagnostic LAN processor binding, 123 numbering, 119 PA-8200, 9 PA-8500, 9 PA-8600, 9 processor agents, 6 Processor Dependent Code (PDC), 60 processor status line, 28 programming extensions SCA features, 122 prompt command, 64 pthread SCA features, 123 scheduling, 122 R radio frequency interference, xvi Reader feedback, xxii reboot, 132 rebooting, 146 recovering from failures, 137 remot
Stop-on-hard button, 109 supported I/O cards, 12 switches, 21 Symbios, 98 Symmetric Multi-Processing (SMP), 10 system displays, 27 hangs, 140 logs, 52 panics, 141 file system problem, 144 interface card problem, 143 lan problem, 144 logical volume manager problem, 145 monitoring the system, 146 peripheral problem, 142 reboot procedure, 132, 146 reset, 24 reset procedure, 134 shutdown procedure, 133 shutdown, abnormal, 147 shutting down, 130 startup, 60 status, 99 T Tachyon Fibre Channel, 12 tape, 25 tcsh, 3