HP Integrity NonStop Operations Guide for H-Series and J-Series RVUs HP Part Number: 529869-023 Published: February 2014 Edition: J06.03 and subsequent J-series RVUs and H06.
© Copyright 2014 Hewlett-Packard Development Company, L.P. Legal Notice Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice.
Contents About This Document...................................................................................13 Supported Release Version Updates (RVUs)................................................................................13 Intended Audience..................................................................................................................13 New and Changed Information in 529869-023 Edition...............................................................
NonStop BladeSystems NB50000c and NB50000c-cg......................................................34 NonStop BladeSystems NB54000c and NB55000c-cg......................................................34 NonStop BladeSystems NB56000c and NB56000c-cg......................................................34 NonStop NS-Series Systems.....................................................................................................34 NonStop NS-Series Modular Hardware Components................................
Determining Device States...................................................................................................60 SCF Object States.........................................................................................................62 Using Onboard Administrator..................................................................................................63 Automating Routine System Monitoring Procedures......................................................................
Monitoring Status for a SWAN Concentrator....................................................................91 Monitoring Status for a Data Communications Device........................................................91 Monitoring WAN Processes............................................................................................92 Monitoring CLIPs...........................................................................................................92 Monitoring the NonStop TCP/IP Subsystem.......
11 Processors and Components: Monitoring and Recovery..............................120 When to Use This Chapter.....................................................................................................120 Overview of Processors.........................................................................................................120 Processors in NonStop Systems Running J-Series RVUs...........................................................
Monitoring the Use of Space on a Disk Volume....................................................................149 Monitoring the Size of Database Files.................................................................................149 Example....................................................................................................................149 Monitoring Disk Configuration and Performance..................................................................150 Identifying Disk Drive Problems.
ESS Cabinets...................................................................................................................172 Networking CLIMs...........................................................................................................172 Storage CLIMs.................................................................................................................172 SAS Disk Drive Enclosures.................................................................................................
Getting a Corrupt System Configuration File Analyzed..........................................................195 Recovering From a Reload Failure .....................................................................................195 Exiting the OSM Low-Level Link..........................................................................................196 Opening Startup Event Stream and Startup TACL Windows...................................................
Checking Physical Security................................................................................................218 Maintaining Order and Cleanliness...................................................................................218 Checking Fire-Protection Systems........................................................................................219 Cleaning System Components................................................................................................
Example.........................................................................................................................233 Hexadecimal to Decimal........................................................................................................234 Example.........................................................................................................................235 Decimal to Binary.....................................................................................................
About This Document This guide describes how to perform routine system hardware operations for HP Integrity NonStop™ NS-series systems and HP Integrity NonStop BladeSystems. It covers H-series release version updates and J-series release version updates.
• Updated the chapter “Overview of Monitoring and Recovery” (page 51) to document other HP manageability tools and update information, as needed, for previously-documented tools. • Added the chapter “Monitoring the Performance of NonStop Subsystems” (page 71) to document performance-monitoring tools. • Updated the section “CLuster I/O Modules (CLIMs)” (page 87) to describe the OSM Service Connection System CLIM operations that generate CLIM diagnostic data.
New and Changed Information in 529869–020 Edition • Changed note about OSM Notification Director and made other changes in “System Consoles” (page 28). • Removed OSM Notification Director from lists and made other changes in “Overview of OSM Applications” (page 29) and “Launching OSM Applications” (page 30). • Added core licensing to “NonStop BladeSystem Options” (page 33). • Added link to “NonStop Software Essentials” (page 57) at the beginning of “Overview of Monitoring and Recovery” (page 51).
• In Chapter 2 under “NonStop NS16000 Series Systems” (page 37), indicated that NS16000 series servers running H06.20 or later RVUs support IP, Telco, and Storage CLIMs. NonStop NS16000 and NS16200 servers running H06.23 or later RVUs support IB CLIMs. • In Chapter 2 under “NonStop NS2000 Series Systems” (page 36), added Telco to the list of CLIMs supported. • In Chapter 2 under “NonStop NS1000 and NS1200 Systems” (page 38), added Telco to the list of CLIMs not supported.
• In Chapter 7 under “ServerNet and System I/O Connectivity” (page 100): ◦ Added IB CLIM and Telco CLIM to NonStop BladeSystem ◦ Added IB CLIM and Telco CLIM to NonStop NS16000 series systems ◦ Added Telco CLIM to NonStop NS2000 series system ◦ Added references to the new NS2200 system • In Chapter 7 under “When to Use This Chapter” (page 100), added references to the new NS2200 system.
• In Chapter 16, added references to the new NS2200 system under: ◦ “Powering On the System From a Low-Power State” (page 176) ◦ “Powering On the System From a No Power State” (page 177) ◦ “Starting a System” (page 182) ◦ “System Load Disks” (page 183) ◦ “Powering Off a System” (page 192) • Changed and added steps for both NS2000 and NS2200 systems under “Power Cycling a NonStop NS2000 Series, NS2100, NS2200 Series, NS2300, or NS2400 Series Processor” (page 181).
• Changed reference under “Overview of Tape Drives” (page 155), “Related Reading for Disk Drives” (page 154)and “Related Reading for Tapes and Tape Drives” (page 161) in Chapter 12. • Added note pointing to reference on configuration settings for NonStop BladeSystems that shipped before J06.10 under “When to Use This Chapter” (page 170) in Chapter 15. • Added new text to “Starting a System” (page 182) in Chapter 16.
• Added “NonStop Software Essentials” (page 57). • Removed “HP Storage Essentials”. • Added OSM Service Connection online help to Table 12 (page 70). • Changed “Monitoring the CLIMs with HP SIM” (page 112). • Added references for monitoring and managing CLIMs to “Related Reading for CLIMs” (page 119). • Changed Page 173 for NonStop BladeSystems to configure OSM power fail support for only Enclosure 100. • Added “NonStop Software Essentials” (page 224).
• “Starting and Stopping the System ” (page 175) • “Creating Startup and Shutdown Files” (page 199) • “Preventive Maintenance” (page 218) • “Operational Differences Between Systems Running G-Series, H-Series, and J-Series RVUs” (page 221) • “Tools and Utilities for Operations” (page 222) • “Related Reading for Tools and Utilities” (page 227) • “Converting Numbers” (page 232) • “Safety and Compliance” (page 238) Notation Conventions General Syntax Notation This list summarizes the notation co
TERM [\system-name.]$terminal-name INT[ERRUPTS] A group of items enclosed in brackets is a list from which you can choose one item or none. The items in the list can be arranged either vertically, with aligned brackets on each side of the list, or horizontally, enclosed in a pair of brackets and separated by vertical lines. For example: FC [ num ] [ -num ] [ text ] K [ X | D ] address { } Braces A group of items enclosed in braces is a list from which you are required to choose one item.
CALL STEPMOM ( process-id ) ; If there is no space between two items, spaces are not permitted. In this example, no spaces are permitted between the period and any other items: $process-name.#su-name Line Spacing If the syntax of a command is too long to fit on a single line, each continuation line is indented three spaces and is separated from the preceding line by a blank line. This spacing distinguishes items in a continuation line from items in a vertical list of selections.
• ServerNet Cluster Supplement for NonStop NS-Series Servers or the ServerNet Cluster Supplement for NonStop BladeSystems • The Add Node to ServerNet Cluster guided procedure online help For the earlier ServerNet cluster star topologies, using 6770 switches, see: • ServerNet Cluster Manual • ServerNet Cluster Supplement for NonStop NS-Series Servers or the ServerNet Cluster Supplement for NonStop BladeSystems • The Add Node to ServerNet Cluster guided procedure online help For information about CL
docsfeedback@hp.com Include the document title, part number, and any comment, error found, or suggestion for improvement you have concerning this document.
1 Introduction to NonStop Operations • “When to Use This Chapter” (page 26) • “Understanding the Operational Environment” (page 26) • “What Are the Operator Tasks?” (page 26) • “Logging On to a NonStop System” (page 28) • ◦ “System Consoles” (page 28) ◦ “Opening a TACL Window” (page 29) ◦ “Overview of OSM Applications” (page 29) ◦ “Launching OSM Applications” (page 30) “Service Procedures” (page 31) ◦ “Support and Service Collection” (page 31) When to Use This Chapter This chapter identi
Starting the NonStop System and Loading the NonStop Operating System For information about starting a NonStop system, see “Starting a System” (page 182). For information about loading the NonStop operating system, see “Loading the System” (page 183) and the Software Installation and Upgrade Guide for your RVU. Updating Firmware For information about installing compatible versions of firmware, see the Software Installation and Upgrade Guide for your RVU and the NonStop Firmware Matrices.
Recovery operations for a system console are not discussed in this guide. For recovery procedures for a system console and the applications installed on the system console, see the planning guide for your NonStop system. Preparing for and Recovering from Power Failures You can minimize unplanned outage time by having procedures to prepare and recover quickly from power failures, as described in Chapter 16 (page 170).
NOTE: For information about configuring and enabling a remote deskstop for dial-in services for use by a service provider or authorized employee, see the NonStop System Console Installer Guide. The OSM Low-Level Link and OSM Console Tools components reside on the system console, along with other required HP and third-party software. OSM Service Connection and OSM Event Viewer software resides on your system, and connectivity is established from the console through Internet Explorer browser sessions.
• The OSM Event Viewer is used for “Monitoring EMS Event Messages” (page 77). • OSM Console Tools: ◦ NonStop Maintenance LAN DHCP DNS Configuration Wizard — used to configure DHCP, DNS, and BOOTP servers required for certain NonStop Systems. ◦ Down System CLIM Firmware Update Tool — used for updating firmware/BIOS for CLIM components during planned system down time.
can select the system of your choice from the list of bookmarks displayed in the left column of the page (available bookmarks include those that were user-created during previous sessions and those converted automatically from an existing OSM system list). If no bookmarks are available, the web page also contains instructions on how to access these applications by entering a system URL as an Internet Explorer address.
2 Determining Your System Configuration • “When to Use This Chapter” (page 32) • “NonStop BladeSystems” (page 32) • ◦ “NonStop BladeSystems NB50000c, NB50000c-cg, NB54000c, NB54000c-cg, NB56000c, and NB56000c-cg Modular Hardware Components” (page 32) ◦ “NonStop BladeSystem Options” (page 33) “NonStop NS-Series Systems” (page 34) ◦ “NonStop NS-Series Modular Hardware Components” (page 34) ◦ “Differences Between NonStop NS-Series Systems” (page 36) • “Terms Used to Describe System Hardware Comp
• ◦ Interconnect Ethernet switch for the NB50000c, NB54000c, or NB56000c; Interconnect Ethernet switch CG for the NB50000c-cg, NB54000c-cg, or NB56000c-cg ◦ Onboard Administrator (OA) module ◦ 1-phase or 3-phase AC input power module for the the NB50000c, NB54000c, or NB56000c; CD input module for the NB50000c-cg, NB54000c-cg, or NB56000c-cg I/O Adapter Module (IOAM) Enclosure, including subcomponent I/O Adapters: ◦ Fibre Channel ServerNet adapter (FCSA) ◦ Gigabit Ethernet 4-port ServerNet adapte
Differences Between NonStop BladeSystems NonStop BladeSystems NB50000c and NB50000c-cg NonStop BladeSystems NB50000c and NB50000c-cg differ from the other BladeSystems in these respects: • Run on J06.04 and later J-series RVUs • Use from 8 GB to 48 GB main memory per logical processor For more information, see the NonStop BladeSystems Planning Guide and NonStop BladeSystems Hardware Installation Manual.
• I/O Adapter Module (IOAM) Enclosure — applies to NonStop NS16000 series, NS14000, and NS1000 systems only IOAMs include these subcomponent I/O Adapters: ◦ Fibre Channel ServerNet adapter (FCSA) ◦ Gigabit Ethernet 4-port ServerNet adapter (G4SA) ◦ 4-Port ServerNet Extenders (4PSEs) (NonStop NS14000 and NS1000 systems only) • VIO Enclosure (displayed by OSM as a VIO Module object) — For more information, see “NonStop NS14000 Series Systems” (page 37), “NonStop NS2000 Series Systems” (page 36), “Non
NonStop NS-Series System Options NonStop NS-series systems offer of a variety of architecture and configuration options to suit different customer needs. For more information, see the appropriate planning guide for your NonStop system. For information about supported CLIMs or CLIM-attached storage (SAS disk drive enclosures), see the planning guide for your NonStop system. Differences Between NonStop NS-Series Systems NonStop NS2400 Series Systems NonStop NS2400 series systems are first released with J06.
blade elements. For more information on NonStop NS2000 series systems, refer to the NonStop NS2000 Series Planning Guide. NonStop NS2000 series systems support connections to CLIMs and SAS disk drive enclosures. NonStop NS2000 series systems do not support FCDMs, and they do not support connections to NonStop S-series I/O enclosures. The method for booting is different from the method for NonStop NS2200 series systems or NonStop NS2100 systems. See “Starting a System” for more information.
NonStop NS1000 and NS1200 Systems NonStop NS1000 and NS1200 systems have no processor switches or LSUs. Like NonStop NS14000 systems, there are now two types of NS1000 systems: those consisting of a single IOAM enclosure and those consisting of one VIO enclosure for each fabric (two VIO enclosures). NS1200 systems consist of VIO enclosures only. ServerNet connectivity for each type is accomplished as described for the “NonStop NS14000 Series Systems”, except for the absence of the LSUs.
NS1200 server, or NS1000 server. Or it can refer to a NonStop server blade in a NonStop BladeSystem, NonStop NS2200 series system, NS2100 system, or NS2000 series system. Blade The term “blade” is used differently in different NonStop systems: • In NonStop BladeSystems, NonStop server blades house the microprocessors and are mounted inside c7000 enclosures.
SCF accepts commands from a workstation, a disk file, or an application process. It sends display output to a workstation, a file, a process, or a printer. Some SCF commands are available only to some subsystems. For complete information, see the SCF Reference Manual for J-Series and H-Series RVUs. Subsystem-specific information appears in a separate manual for each subsystem. For a partial list of these manuals, refer to “Related Reading for Tools and Utilities” (page 227).
Using SCF to Display Subsystem Configuration Information SCF enables you to display, in varying levels of detail, the configuration of objects in each subsystem supported by SCF. For example, you can use the LISTDEV command to list all the devices on your system or to list the objects within a given subsystem. Then you can use the INFO command with a logical device name or device type to obtain information about a specific device or class of devices.
Example 1 SCF LISTDEV Command Output $SYSTEM STARTUP 1> SCF LISTDEV LDev 0 1 3 5 6 7 63 64 65 66 67 68 86 87 91 104 105 106 107 108 104 105 106 107 108 121 122 123 124 126 128 129 131 132 133 134 135 136 137 145 167 168 200 Name $0 $NCP $YMIOP $Z0 $SYSTEM $ZOPR $ZZKRN $ZZWAN $ZZSTO $ZZSMN $ZZSCL $ZZLAN $ZSNET $ZSLM2 $ZNET $ZM03 $ZM02 $ZM01 $ZM00 $ZLOG $ZM03 $ZM02 $ZM01 $ZM00 $ZLOG $ZIM03 $ZIM02 $ZIM01 $ZIM00 $ZEXP $SC26 $SC25 $DATA6 $DATA5 $DATA4 $DATA3 $DATA2 $DATA1 $DATA $ZOLHD $ZTC0 $ZTNT $ZPMON PPID
Table 1 gives the names of some subsystems that are common to most NonStop NS-series systems and are routinely monitored by operations. These subsystems appear in the LISTDEV output in Example 1 (page 42).
TCP/IP Subsystem These examples are based on a TCP/IP process named $ZTCO.
Table 4 Displaying Information for the Storage Subsystem ($ZZST0) (continued) To Display Information About These Configured Objects Enter This Command All disk drives (list) LISTDEV TYPE 3 All disk drives (summary information) INFO DISK $* A specific disk drive (detailed information) INFO DISK $name, DETAIL All tape drives (list) LISTDEV TYPE 4 All tape drives (summary information) INFO TAPE $* A specific tape drive (detailed information) INFO TAPE $name, DETAIL When displaying configuration f
To get detailed configuration information in command format for all disks on the system, issue this command: -> INFO DISK $*,OBEYFORM To get detailed configuration information in command format for all tape drives on the system, issue this command: -> INFO TAPE $*,OBEYFORM ServerNet LAN Systems Access (SLSA) Subsystem Before using commands listed in Table 5, type this command to make the SLSA subsystem the default object: > SCF ASSUME PROCESS $ZZLAN The SLSA subsystem provides access to parallel LAN and W
The WAN subsystem has responsibility for all WAN connections.
Table 7 Subsystem Objects Controlled by SCF (continued) Subsystem Acronym Description Device Type Device Subtype OSICMIP Open Systems Interconnection/Common Management Information Protocol 55 24 OSIFTAM Open Systems Interconnection/File Transfer, Access, and Management 55 21 or 25 OSIMHS Open Systems Interconnection/Message Handling System 55 11 or 12 OSITS Open Systems Interconnection/Transport Services 55 55, 4 OSS Open System Services 24 0 PAM Port Access Method QIO Queued I/O
Example 3 SCF INFO PROCESS Command Output 32-> INFO PROCESS $ZZKRN.#* NONSTOP KERNEL - Info PROCESS \DRP09.$ZZKRN Symbolic Name CLCI-TACL OSM-APPSRVR OSM-CIMOM OSM-CONFLH-RD OSM-OEV QATRAK QIOMON ROUT SCP SP-EVENT TFDSHLP ZEXP ZHOME ZLOG ZSLM2 ZZKRN ZZLAN ZZSTO ZZWAN *Name *Autorestart *Program $CLCI 10 $SYSTEM.SYSTEM.TACL $ZOSM 10 $SYSTEM.SYSTEM.APPSRVR $ZCMOM 5 $SYSTEM.SYSTEM.CIMOM $ZOLHI 0 $SYSTEM.SYSTEM.TACL $ZOEV 10 $SYSTEM.SYSTEM.EVTMGR $TRAK 10 $SYSTEM.SYSTOOLS.QATRACK $ZMnn 10 $SYSTEM.SYSTEM.
Example 5 SCF INFO PROCESS $ZZWAN Command Output -> INFO PROCESS $ZZWAN.* WAN MANAGER Detailed Info Process \DRP09.$ZZWAN.#ZTXAE RecSize........... Preferred Cpu..... HOSTIP Address.... *IOPOBJECT........ TCPIP Name........ 0 *Type............. ( 0,49) 0 Alternate Cpu..... 1 172.031.145.090 $SYSTEM.SYS00.SNMPTMUX $ZTC02 WAN MANAGER Detailed Info Process \DRP09.$ZZWAN.#0 RecSize........... 0 *Type............. (50,00) Preferred Cpu..... 0 Alternate Cpu..... N/A *IOPOBJECT........ $SYSTEM.SYS00.
3 Overview of Monitoring and Recovery • “When to Use This Chapter” (page 51) • “HP Tools for Monitoring System Resources” (page 52) ◦ “What the HP Tools Monitor” (page 52) ◦ “Monitored NonStop Hardware Resources” (page 52) • “Using HP SIM” (page 55) • “Using the HP SIM Plug-Ins” (page 57) • • ◦ “NonStop Software Essentials” (page 57) ◦ “NonStop Software Essentials” (page 57) ◦ “NonStop Cluster Essentials” (page 58) ◦ “NonStop Cluster Performance Essentials” (page 58) ◦ “Insight Contro
HP Tools for Monitoring System Resources A number of HP tools, several of them automated and GUI-based, are available from which you can view, and in some cases manage, the status of NonStop hardware resources.
For information about the manageability tools you use to monitor NonStop subsystem performance, see “Monitoring the Performance of NonStop Subsystems” (page 71). Table 8 Summary of Monitored NonStop Hardware Resources Resource Monitored using these tools Adapters for communications OSM Service Connection subsystems: G4SA SCF interface to various subsystems For more information, see..
Table 8 Summary of Monitored NonStop Hardware Resources (continued) Resource Monitored using these tools For more information, see..
Table 8 Summary of Monitored NonStop Hardware Resources (continued) Resource Monitored using these tools For more information, see..
Figure 1 HP SIM Tree View HP SIM includes these general features for all platforms: • Installs on Windows, HP-UX, and Linux CMS (Central Management Service) • Inventory, fault, and configuration management • Security features: role-based authorizations, OS-based authentication, SSL/SSH support • Distributed task facility to remotely run commands, scripts, and batch files on managed systems NOTE: Only Windows CMS is currently supported to manage NonStop systems.
the HP SIM online help, or manuals listed for HP Systems Insight Manager under Network and Systems Management on docs.hp.com.
NonStop Cluster Essentials The NonStop Cluster Essentials plug-in acts as a central integration point for cluster management. It provides integrated cluster management for homogeneous clusters of NonStop systems or Unix systems and for heterogeneous clusters of NonStop systems and Unix systems. It supports clusters consisting of NonStop BladeSystems, NonStop NS-series systems, NonStop S-series servers and optionally, Proliant systems running Linux.
HP Insight Remote Support Advanced HP Insight Remote Support Advanced is the go-forward remote support solution for all NonStop systems, replacing the OSM Notification director in both modem-based and HP Instant Support Enterprise Edition (ISEE) remote support solutions. For more information about Insight Remote Support Advanced, refer to the HP Integrity NonStop Service information collection of NTL.
For more information about Insight Remote Support Advanced, refer to the HP Integrity NonStop Service information collection of NTL. Generating Diagnostic Data The OSM Service Connection provides several options for setting up for and generating diagnostic information about the system resources it monitors. Among them is the System action Collect Diagnostic Data, which generates Diagnostic Data files for the system.
Example 7 SCF STATUS TAPE Command 1-> STATUS TAPE $* STORAGE - Status TAPE \COMM.$TAPE0 LDev State Primary Backup PID PID 156 STOPPED 2,268 3,288 DeviceStatus STORAGE - Status TAPE \COMM.$DLT20 LDev State Primary Backup PID PID 394 STARTED 2,267 3,295 NOT READY DeviceStatus STORAGE - Status TAPE \COMM.$DLT21 LDev State Primary Backup PID PID 393 STARTED 1,289 0,299 NOT READY DeviceStatus STORAGE - Status TAPE \COMM.
where: subsystem The reporting subsystem name object-type The object, or device, type object-name The fully qualified name of the object State One of the valid object states: ABORTING, DEFINED, DIAGNOSING, INITIALIZED, SERVICING, STARTED, STARTING, STOPPED, STOPPING, SUSPENDED, SUSPENDING, and UNKNOWN PPID The primary processor number and process identification number (PIN) of the object BPID The backup processor number and PIN of the object attrn The name of an attribute of the object valn T
Table 9 SCF Object States (continued) State Substate Explanation INACCESSIBLE The object is inaccessible to user processes. PREMATURE-TAKEOVER The backup input/output (I/O) process was asked to take over for the primary I/O process before it had the proper information. RESOURCE-UNAVAILABLE The input/output (I/O) process could not obtain a necessary resource. UNKNOWN-REASON The input/output (I/O) process is down for an unknown reason. STOPPING The object is in transition to the STOPPED state.
Using Automated HP Manageability Tools You should use the automated, GUI-based HP manageability tools listed in Table 10 (page 64) to create automated system monitoring procedures: Table 10 Management Applications for Automating System Monitoring Procedures Management application For more information, see..
1. To create a command file named SYSCHK that will automate system monitoring, type the text shown in Example 8 into an EDIT file.
Example 9 System Monitoring Output File COMMENT THIS IS THE FILE SYSCHK COMMENT THIS CHECKS ALL DISKS: SCF STATUS DISK $* STORAGE - Status DISK \SHARK.$DATA12 LDev Primary Backup Mirror 52 *STARTED STARTED *STARTED STORAGE - Status DISK \SHARK.$DATA01 LDev Primary Backup Mirror 63 *STARTED STARTED *STARTED STORAGE - Status DISK \SHARK.$DATA04 LDev Primary Backup Mirror 60 *STARTED STARTED *STARTED STORAGE - Status DISK \SHARK.
SLSA Status ADAPTER Name State $ZZLAN.MIOE0 STARTED $ZZLAN.E4SA0 STARTED $ZZLAN.MIOE1 STARTED $ZZLAN.E4SA2 STARTED COMMENT THIS CHECKS ALL LIFS SCF STATUS LIF $* SLSA Status LIF Name $ZZLAN.LAN0 $ZZLAN.LAN3 State STARTED STARTED Access State UP DOWN COMMENT THIS CHECKS ALL PIFS SCF STATUS PIF $* SLSA Status PIF Name State $ZZLAN.E4SA0.0.A STARTED $ZZLAN.E4SA0.0.B STARTED $ZZLAN.E4SA0.1.A STOPPED $ZZLAN.E4SA0.1.
COMMENT THIS CHECKS THE STATUS OF PATHWAY: PATHCOM $ZVPT;STATUS PATHWAY;STATUS PATHMON PATHWAY -- STATE=RUNNING RUNNING EXTERNALTCPS 0 LINKMONS 0 PATHCOMS 1 SPI 0 SERVERCLASSES RUNNING 17 STOPPED 0 THAWED 17 SERVERPROCESSES TCPS RUNNING 17 1 STOPPED 35 0 PENDING 0 0 RUNNING STOPPED PENDING TERMS 0 0 0 PATHMON \COMM.$ZVPT -- STATE=RUNNING PATHCTL (OPEN) $OPER.VIEWPT.
Table 11 Status LEDs and Their Functions (continued) Location LED Name Color Function Power Middle Green Flashes when EMU is operational and performing locate.On when EMU is operational. An EMU or an enclosure fault might still exist.Off when power has just been applied to an enclosure, or when an enclosure fault exists. Enclosure Status Amber Flashes when EMU is operational and performing locate.On when EMU is operational, but an enclosure fault exists.
Table 11 Status LEDs and Their Functions (continued) Location LED Name Color Locator Flashing Blue Lights when the system locator is activated. P-switch PICs Power-on Green Lights when power is on with PIC available for normal operation. Amber Lights when a fault exists. Green Lights when a ServerNet link is functional.
4 Monitoring the Performance of NonStop Subsystems This chapter provides an overview of the HP manageability tools you can use to monitor the performance of NonStop entities.
HP Manageability Tools That Monitor NonStop Performance You can use these HP manageability tools to obtain performance information, both current and historical, for NonStop subsystems: • “NonStop Cluster Performance Essentials” (page 72) • “NonStop Availability Statistics and Performance (ASAP)” (page 72) • “Web ViewPoint NonStop Storage Analyzer Plug-In” (page 74) • “HP Operations Agent for NonStop (OVNM)” (page 75) • “HP Performance Agent for NonStop (OVNPM)” (page 75) NonStop Cluster Performanc
CLIM, Processor, Expand File, Process, RDF and other entity information and report it to a local-node ASAP database server. • The ASAP Server collects and normalizes network-wide application and system availability data and stores this information in the ASAP database. The ASAP local-node server includes command interpreters and database servers which act on behalf of ASAP workstation clients.
Web ViewPoint NonStop Storage Analyzer Plug-In NonStop Storage Analyzer, a Web ViewPoint plug-in, is an automated, web-based software tool you use to analyze the performance of and manage NonStop disk storage resources. It automatically scans an entire NonStop system to provide detailed GUI snapshots of all existing disks and files. This information is organized into several GUI views of both physical and logical disk storage.
The Central Display Panel on the right uses grids, lists, and details to display the current selection on the tree. It contains: • In the top left center, three selection tabs: Utilization, Volumes, and Summary. When the Volumes or Summary tab is selected, you can choose from a series of Alert icons with checkable boxes to select to display only disks which have the selected Alerts on them. • Above the tree, a User Filter to control the scope of content displayed.
OVNPM installation creates a default configuration for out-of-the-box use. It scans your NonStop system, automatically builds a set of entities (instances) for the CPU, DISK, GROUP, LINE, NETLINE, NODE, SERVERNET, TMF, and USER domain types, and saves them in the USERCFG configuration file.
5 Monitoring EMS Event Messages • “When to Use This Chapter” (page 77) • “What Is the Event Management Service (EMS)?” (page 77) • “Tools for Monitoring EMS Event Messages” (page 77) • “Related Reading for EMS Event Messages” (page 78) When to Use This Chapter Use the chapter for a brief description of the Event Management Service (EMS) and the tools used to monitor EMS event messages.
Web ViewPoint Web ViewPoint, a browser-based product, accesses the Event Viewer, Object Manager, and Performance Monitor subsystems. Web ViewPoint, a browser-based product, displays event messages about current or past events occurring anywhere in the network on a set of block-mode events screens. The messages can be errors, failures, warnings, and requests for operator actions. The events screens allow operators to monitor significant occurrences or problems in the network as they occur.
6 Processes: Monitoring and Recovery • “When to Use This Chapter” (page 79) • “Types of Processes” (page 79) • ◦ “System Processes” (page 79) ◦ “CIP Processes” (page 80) ◦ “I/O Processes (IOPs)” (page 79) ◦ “Generic Processes” (page 80) “Monitoring Processes” (page 80) ◦ “Monitoring System Processes” (page 81) ◦ “Monitoring IOPs” (page 82) ◦ “Monitoring CIP Processes” (page 82) ◦ “Monitoring Generic Processes” (page 82) • “Recovery Operations for Processes” (page 84) • “Related Read
ServerNet wide area network (SWAN) concentrator. Examples of IOPs include, but are not limited to, line-handler processes for Expand and other communications subsystems. CIP Processes Cluster I/O Protocols (CIP) processes provide configuration and management interfaces for I/O between CLIMs and NonStop server blades in NonStop BladeSystems. For information about CIP processes, refer to the Cluster I/O Protocols (CIP) Configuration and Management Manual.
Automated HP Tools That Monitor Processes NonStop Availability Statistics and Performance (ASAP), HP Operations Agent for NonStop (OVNM), and HP Performance Agent for NonStop (OVNPM) can monitor processes and alert you to exception conditions or other performance problems. See Table 15 (page 81). Table 15 Automated HP Tools That Monitor Processes HP tools Monitored objects For more information, see..
$ZZKRN $Z000 $ZLM00 $IXPOHO $ZTXAE $ZWBAF $ZZW00 $DSMSCM $DATA2 $ZLOG $ZTH00 $DSMSCM $Z1RM $ZPP01 0,298 0,299 0,300 0,301 0,330 0,333 0,334 0,335 0,336 0,340 0,343 0,344 1,80 1,280 180 180 200 199 145 179 199 220 220 150 148 220 148 160 P P P P 011 011 015 355 015 015 215 317 317 011 005 317 005 015 P P P P P P P 255,255 255,255 255,255 255,255 255,255 255,255 255,255 255,255 255,255 255,255 255,255 255,255 255,255 255,255 $SYSTEM.SYS14.OZKRN $SYSTEM.SYS14.TZSTOSRV $SYSTEM.SYS14.LANMON $SYSTEM.SYS14.
NONSTOP KERNEL - Status PROCESS \DRP25.$ZZKRN.
Recovery Operations for Processes For recovery operations on generic processes, use the SCF interface to the Kernel subsystem and specify the PROCESS object. These SCF commands are available for controlling generic processes: ABORT Terminates operation of a generic process. This command is not supported for the subsystem manager processes. START Initiates the operation of a generic process. Generic processes that are configured to be persistent usually do not require operator intervention for recovery.
7 Communications Subsystems: Monitoring and Recovery • “When to Use This Chapter” (page 85) • “Communications Subsystems” (page 85) ◦ “Local Area Networks (LANs) and Wide Area Networks (WANs)” (page 85) ◦ “CLuster I/O Modules (CLIMs)” (page 87) “Monitoring Communications Subsystems and Their Objects” (page 88) • “Monitoring the SLSA Subsystem” (page 88) • “Monitoring the WAN Subsystem” (page 90) • “Monitoring the NonStop TCP/IP Subsystem” (page 93) • “Monitoring the CIP Subsystem” (page 94)
devices through various LAN protocols. SLSA also communicates with the appropriate adapter type over the ServerNet fabrics. Adapters supported on NonStop systems include: • Gigabit Ethernet 4-port adapter (G4SA) • Fibre Channel ServerNet adapter (FCSA) (for the Storage subsystem) The Storage CLIMs and IP CLIMs can take the place of G4SA, FCSA, and IOAM on NonStop BladeSystems. A Telco CLIM supports the Message Transfer Part Level 3 User Adaptation layer (M3UA) protocol.
The WAN subsystem is used to control access to the SWAN concentrator.
The OSM Service Connection provides several options for obtaining diagnostic data about CLIMs. The System action Collect Diagnostic Data generates Diagnostic Data files for the system, allowing you to locate and reviewCLIM diagnostic data. The CLIM action Reboot performs a CLIM reboot; you can select the “Yes” default value to generate a dump of CLIM diagnostic data before the CLIM reboots. For detailed information about these options and others, see the OSM Service Connection User’s Guide.
Monitoring the Status of an Adapter and Its Components 1. To monitor the status of an adapter: > SCF STATUS ADAPTER adapter-name A listing similar to this example is sent to your home terminal: ->STATUS ADAPTER $ZZLAN.G11123 SLSA Status ADAPTER Name $ZZLAN.G11123 State STARTED This example shows the listing displayed when checking all adapters on $ZZLAN: > SCF STATUS ADAPTER $ZZLAN.* 1->STATUS ADAPTER $ZZLAN.* SLSA Status ADAPTER Name $ZZLAN.G11121 $ZZLAN.G11122 $ZZLAN.G11123 $ZZLAN.G11124 $ZZLAN.
> SCF STATUS PIF pif-name A listing similar to this example is sent to your home terminal: ->STATUS PIF $ZZLAN.G11123.0 SLSA Status PIF Name $ZZLAN.G11123.0.A State STARTED Trace Status ON This example shows a listing of the status of all PIFs on $ZZLAN.G11123: > SCF STATUS PIF $ZZLAN.G11123.* ->STATUS PIF $ZZLAN.G11123.* SLSA Status PIF Name $ZZLAN.G11123.0.A $ZZLAN.G11123.0.B $ZZLAN.G11123.0.C $ZZLAN.G11123.0.D 4.
Monitoring Status for a SWAN Concentrator To display the current status for a SWAN concentrator: > SCF STATUS ADAPTER $ZZWAN.#concentrator-name The system displays a listing similar to: -> status adapter $zzwan.#s01 WAN Manager STATUS ADAPTER for ADAPTER State........... STARTED \TAHITI.$ZZWAN.#S01 Number of clips. 3 Clip 1 status : CONFIGURED Clip 2 status : CONFIGURED Clip 3 status : CONFIGURED To display the status for all SWAN concentrators configured for your system: > SCF STATUS ADAPTER $ZZWAN.
Monitoring WAN Processes To display the status of all WAN subsystem processes—configuration managers, TCP/IP processes, WANBoot processes: > SCF STATUS PROCESS $ZZWAN.* The system displays a listing similar to: -> STATUS PROCESS $ZZWAN.* WAN Manager STATUS PROCESS for PROCESS State :......... STARTED LDEV Number..... 66 PPIN............ 5 \COMM.$ZZWAN.#5 ,264 Process traced.. NO WAN Manager STATUS PROCESS for PROCESS State :......... STARTED LDEV Number..... 67 PPIN............ 4 \COMM.$ZZWAN.
> SCF STATUS SERVER $ZZWAN.#concentrator-name.clip-num Values for the CLIP number are 1, 2, or 3. The system displays a listing similar to: -> status server $zzwan.#s01.1 WAN Manager STATUS SERVER for CLIP \COWBOY.$ZZWAN.#S01.1 STATE :..........STARTED PATH A...........: CONFIUGRED PATH B...........: CONFIGURED NUMBER of lines. 2 Line...............0 Line...............
TCPIP Status ROUTE \SYSA.$ZTCO.* Name #ROU11 #ROU9 #ROU12 #ROU8 #ROU3 Status STARTED STARTED STARTED STARTED STOPPED RefCnt 0 0 0 1 0 Monitoring NonStop TCP/IP Subnets To obtain the status of all NonStop TCP/IP subnets: > SCF STATUS SUBNET $ZTC0.* The system displays a listing similar to: 1-> STATUS SUBNET $ZTC0.* TCPIP Status SUBNET \SYSA.$ZTC0.
Example 10 STATS CLIM -> STATS CLIM $ZZCIP.CLIM2 CIP Stats CLIM \COCOA.$ZZCIP.CLIM2 Sample Time ... 11 Jun 2006, 23:51:49.000 Reset Time .... 09 Jun 2006, 2:28:39.000 CLIMMON STATS Event Log Entries......... 0 CLIMAGT Failures.......... 0 Restarts.................. 1 CIPSSRV0 Failures......... 0 CLIMAGT STATS Event Log Entries........... Buffer denials.............. IT-API errors............... Last IT-API error code...... Linux errors................ Last Linux errno............ Current bfr bytes in use.
Example 11 STATS MON -> STATS MON $ZZCIP.ZCM01 CIP Stats MON \COCOA.$ZZCIP.ZCM00 Sample Time ... 11 Jun 2006, 23:55:55.300 Reset Time .... 07 Jun 2006, 16:15:13.781 SOCKET STATS Total Recv Socket Reqs...... Total Recv Errors........... Total Send Socket Reqs...... Total Send Errors........... Data Bytes Sent............. Data Bytes Received......... Total Connections Out....... Current TCP Listen Sockets.. Current UDP Sockets......... Current TCP Connections.....
To check the status of a line-handler process on your system: > SCF STATUS LINE $line A listing similar to this example is sent to your home terminal: 1-> STATUS LINE $LHPLIN1 EXPAND Status LINE Name State $LHCS6S STARTED PPID 1, 20 BPID 2,25 ConMgr-LDEV 49 This listing shows that the Expand line-handler process being monitored is up and functioning normally.
1 \CYCLONE (206) 363 200K ( 0, 287) 2 \SNAX (118) 353 200K ( 5, 333) 3 \TESS (194) 554 200K ( 8, 279) 4 \TSII (099) 556 200K ( 2, 265) 5 \ESP (163) 365 200K ( 1, 274) 6 \SVLDEV (077) 538 200K ( 7, 265) 1 363 READY 1 353 READY 1 554 READY 1 556 READY 1 365 READY 1 538 READY 1 183 READY 1 294) NPT 1 ( 8, 280) 1 ( 8, 264) 1 -- ----1 -- ----1 ( 5, 293) NPT 1 -- ----1 677 READY 276 READY 165 READY 295 READY . . .
Related Reading for Communications Subsystems For more information about monitoring and performing recovery operations for communications subsystems, see the manuals listed in Table 17 (page 99). The appropriate manual to use depends on how your system is configured.
8 ServerNet Resources: Monitoring and Recovery • “When to Use This Chapter” (page 100) • “ServerNet Connectivity” (page 100) • “ServerNet Communications Network” (page 101) • “Monitoring the Status of the ServerNet Fabrics” (page 102) • ◦ “Monitoring the ServerNet Fabrics Using OSM” (page 102) ◦ “Monitoring the ServerNet Fabrics Using SCF” (page 103) “Related Reading for ServerNet Resources” (page 105) When to Use This Chapter Use this chapter to learn about the following: • Monitoring and pe
Table 18 ServerNet and System I/O Connectivity (continued) System ServerNet Connectivity System I/O Connectivity Reference Guides ServerNet clusters, 6780 ServerNet clusters, BladeClusters IB CLIM, Telco CLIM, high-speed Ethernet, Fibre Channel, ESS, FCDM, FC Tape, FC SCSI Tape, additional IOAMEs BladeCluster Solution Manual LSUs, p-switches, IOAMs, , 6770 ServerNet clusters, 6780 ServerNet clusters, BladeClusters IP CLIM, Storage CLIM with SAS disk drive enclosures, IB CLIM, Telco CLIM, high-speed
In the ServerNet architecture, each processor maintains two independent paths to other processors and I/O devices. These dual paths can be used simultaneously to improve performance, and to ensure that no single failure disrupts communications among the remaining system components.
3. Check these objects for: a. If an object icon is covered by a red or yellow triangular symbol, check the Attributes tab in the details pane for degraded attribute values. The Service State attribute is only displayed in the Attributes tab if it has a value of other than OK. If a degraded Service State is indicated, there will be an associated alarm to provide more information about the cause of the problem. b.
03 04 05 06 07 08 09 10 11 12 13 14 15 <<<<<<<<<<<<<- DOWN DOWN DOWN DOWN DOWN DOWN DOWN DOWN DOWN DOWN DOWN DOWN DOWN In the preceding example of a 2-processor system: • All ServerNet connections between processors 0 and 1 are up. • Processors 2 through 15 do not exist on this system. As a result: ◦ The status from processors 0 and 1 to processors 2 through 15 is displayed as unavailable (UNA) in both fabrics. ◦ The status from processors 2 through 15 is displayed as down.
The processor in the FROM row is down or nonexistent. For a processor that does exist on your system, this status is abnormal. • ERROR nnn (for an entire row) The processor in the FROM row unexpectedly returned a file-system error to that ServerNet fabric. • UNA (unavailable) The path from the processor in the FROM row to the processor in the TO column is down because the processor in the TO column is down. For a processor that does exist on your system, this status is abnormal.
9 I/O Adapters and Modules: Monitoring and Recovery • “When to Use This Chapter” (page 106) • “I/O Adapters and Modules” (page 106) • ◦ “Fibre Channel ServerNet Adapter (FCSA)” (page 107) ◦ “Gigabit Ethernet 4-Port Adapter (G4SA)” (page 107) ◦ “4-Port ServerNet Extender (4PSE)” (page 108) “Monitoring I/O Adapters and Modules” (page 108) ◦ “Monitoring the FCSAs” (page 108) ◦ “Monitoring the G4SAs” (page 109) ◦ “Monitoring the 4PSEs” (page 110) • “Recovery Operations for I/O Adapters and Mo
Fibre Channel ServerNet Adapter (FCSA) The FCSA provides fibre channel connectivity to certain external devices such as disk drives contained in a Fibre Channel Disk Module (FCDM) enclosure that supports fibre channel disks and an Enterprise Storage System (ESS).
4-Port ServerNet Extender (4PSE) A component in NonStop NS14000 and NS1000 systems only, 4PSEs provide ServerNet connectivity between processors and the IOAM enclosure (functionality provided by p-switches in an NonStop NS16000 series system). 4PSEs are located in slot one (and optionally slot 2) of each IOAM. They are connected to the processors through LSUs in NonStop NS14000 systems, directly to the processors (with no LSUs) in NonStop NS1000 systems.
Table 19 Service, Flash Firmware, Flash Boot Firmware, Device, and Enabled States for the FCSA (continued) State Description Device State: Not Configured The component is not configured. Device State: Started The component is running. Device State: Starting Processing is starting up. Device State: Stopped Processing has been terminated. Device State: Stopping Processing is being terminated. Device State: Unknown Component is not responding. Device State: OK Component is accessible.
Table 20 Service, Device, and Enabled States for the G4SA (continued) State Description Device State: Aborting Processing is terminating. Device State: Defined State is defined by the NonStop OS. Device State: Degraded Performance is degraded. Device State: Diagnose A diagnostic test is running on the component. Device State: Initializing Processing is starting up. Device State: Not Configured The component is not configured. Device State: Started The component is running.
commands described in the appropriate manual listed in “Related Reading for I/O Adapters and Modules” (page 111). If you are unable to start a required process or object, contact your service provider. Related Reading for I/O Adapters and Modules For more information about monitoring and performing recovery operations for the I/O adapters and the SLSA and Storage subsystems, see the manuals in the following table. The appropriate manual to use depends on how your system is configured.
10 CLuster I/O Modules (CLIMs): Monitoring and Recovery • “When To Use This Section” • “CLuster I/O Modules (CLIMs)” • “Monitoring the CLIMs with HP SIM” (page 112) • “Monitoring the CLIMs with SCF” (page 112) ◦ “CLIM Device States” (page 114) ◦ “Using SCF to Monitor a CLIM with $ZZCIP” (page 115) • “Recovery Operations for CLIMs” (page 118) • “Related Reading for CLIMs” (page 119) When To Use This Section Use this section to monitor CLuster I/O Modules and to perform recovery operations.
Example 12 STATUS CLIM Summary -> STATUS CLIM $ZZCIP.* CIP Status CLIM \MYSYS.$ZZCIP.* Name Present State Trace CLIM1 Yes STARTED OFF CLIM2 Yes STARTED 1, 2 CLIM3 Yes STARTED 2 “Monitoring the CIP Subsystem” (page 94) shows examples of other SCF commands. “Monitoring CLIM Status” (page 96) shows the use of scripts that run on the CLIM itself to monitor the operation of a CLIM and its IP protocol behavior.
Figure 3 CLIM Objects in OSM CLIM Device States When monitoring CLIMs using the OSM Service Connection, the state of a CLIM should indicate normal operation. Table 22 lists the possible states for a CLIM. Table 22 CLIM Device States State Substate Explanation STARTED The object is logically accessible to user processes. STARTING The object is being initialized and is in transition to the STARTED state. STOPPED The object is configured improperly.
Using SCF to Monitor a CLIM with $ZZCIP This is an example of using SCF to monitor the detailed status of an IP CLIM on a NonStop BladeSystem.
Example 13 STATUS CLIM Detailed for IP CLIM on NonStop BladeSystem -> status clim $zzcip.C1002583, detail CIP Detailed Status CLIM \BLITUG.$ZZCIP.C1002583 Mode...................... CLIM HW Connection Status. State..................... ConnPts................... X1 Location............... Expected Y1 Location...... X2 Location............... Expected Y2 Location...... X1 Connection Status...... Y1 Connection Status...... X2 Connection Status...... Y2 Connection Status...... Trace Status..............
C1002583.eth5 C1002583.eth4 C1002583.eth3 C1002583.eth2 C1002583.eth1 ------ C1002583.eth5 C1002583.eth4 C1002583.eth3 C1002583.eth2 C1002583.eth1 0x0000000003 0x0000000004 0x0000000005 0x0000000006 0x0000000007 This is an example of using SCF to monitor the detailed status of a Storage CLIM on a NonStop NS-series system.
Example 14 STATUS CLIM Detailed for Storage CLIM on NonStop NS-Series System -> STATUS CLIM $ZZCIP.* , DETAIL CIP Detailed Status CLIM \YOSQA14.$ZZCIP.C100261 Mode...................... CLIM HW Connection Status. State..................... ConnPts................... X1 Location............... Expected Y1 Location...... X2 Location............... Expected Y2 Location...... X1 Connection Status...... Y1 Connection Status...... X2 Connection Status...... Y2 Connection Status...... Trace Status..............
Related Reading for CLIMs For more information about monitoring and managing CLIMs, see the following:.
11 Processors and Components: Monitoring and Recovery • “When to Use This Chapter” (page 120) • “Overview of Processors” (page 120) • “Monitoring and Maintaining Processors” (page 124) • • • ◦ “Monitoring Processors Automatically Using TFDS” (page 125) ◦ “Monitoring Processor Status Using the OSM Low-Level Link” (page 125) ◦ “Monitoring Processor Status Using the OSM Service Connection” (page 125) ◦ “Monitoring Processor Performance Using ViewSys” (page 128) “Identifying Processor Problems”
Processors in NonStop Systems Running J-Series RVUs NonStop BladeSystem server blades and NonStop NS2400 series, NS2300, NS2200 series, NS2100, and NS2000 series blade elements utilize multicore microprocessors (see Table 23 (page 121)). The set of cores on a NonStop server blade or blade element is considered to be one CPU and is configured as one logical processor. Applications do not see the individual cores.
Table 23 Terms, Systems, and Planning Guides (continued) Term, System, Planning Guide Description NonStop blade element Blade element that houses one dual-core microproccessor. NonStop NS2400 series blade element See the NonStop NS2400 Planning Guide Two or four NonStop blade elements each utilize a two-core microprocessor and are installed in modular cabinets.
the system continues to run. A single NonStop system can have up to four NonStop Blade Complexes for a total of 16 processors. Processors communicate with each other and with the system I/O over dual ServerNet fabrics. A ServerNet fabric is a complex web of links that provide a large number of possible paths from one point to another. Two communications fabrics, the X and Y ServerNet fabrics, provide redundant, fault-tolerant communications pathways.
Systems and Planning Guides Term their associated LSUs. A system includes up to four Blade Complexes. NonStop NS14000 series systems (2 to 8 processors per system) NonStop NS14000 Series Planning Guide Description Blade Element Consists of a chassis, processor board containing two or four PEs (one representing each logical processor in the Blade Complex), memory, I/O interface board, midplane, optics adapters, fans, and power supplies. Blade Elements are mounted in a 19-inch computer equipment rack.
Monitoring Processors Automatically Using TFDS HP Tandem Failure Data System (TFDS) should be used to proactively monitor processors and manage processor halts. Configured and running before a halt occurs, TFDS can help determine the type of recovery operation needed and: • If TFDS determines that the entire processor should be dumped before reloading, it automatically dumps, then reloads the processor.
Figure 6 OSM Representation of Processor Blade For NonStop NB54000c, NB54000c-cg, NB56000c, and NB56000c-cg BladeSystem server blades, the OSM Service Connection displays the Physical Cores attribute and the Number of Enabled Cores attribute of the Logical Processor object. To increase the number of cores enabled in a processor, obtain a new core license for the new number of cores, and install and enable the new core license according to the Install Core License guided procedure on the System object.
• A NonStop NS14000 series system can have up to three Blade Complex objects with a total of 8 processors. When you expand a Blade Complex object (see Figure 7), you should see up to three Blade Element objects and either two or four Logical Processor objects. • A NonStop NS1200 or NS1000 system can have up to eight Blade Complex objects. When you expand a Blade Complex object (see Figure 7), you should see one Blade Element object and its associated Logical Processor object.
HP SIM for NonStop Manageability for these procedures and the supported Power Regulator modes. Monitoring Processor Performance Using ViewSys Use the ViewSys product to view system resources online and to see information on system performance. ViewSys provides information about processor activity. Using ViewSys, you can list the processors on your system and determine their status. For more information, refer to “ViewSys” (page 226).
freeze-enabled. Two types of processor halts display a processor halt code in the Processor Status dialog box: • A halt instruction results in a processor halt. When the operating system detects a millicode or software error that it cannot correct, it can execute a halt instruction to suspend all application and system processes running in the associated processor. The status of the halted processor becomes: Halt code = %nnnnnn Unlike a freeze instruction, a halt instruction affects only one processor.
• “Reloading a Single Processor on a Running system” (page 130) • “Recovery Operations for a System Hang” (page 133) • “Enabling/Disabling Processor and System Freeze” (page 134) • “Freezing the System and Freeze-Enabled Processors” (page 134) • “Dumping a Processor to Disk” (page 134) • “Backing Up a Processor Dump to Tape” (page 137) • “Replacing Processor Memory” (page 137) • “Replacing the Processor Board and Processor Entity” (page 137) • “Submitting Information to Your Service Provide
After you have determined that a processor is not operating, check that the processor is halted. If it needs to be halted, see “Halting One or More Processors” (page 130)). Collect information about the reason for the halt (as described in “Identifying Processor Problems” (page 128)) to send to your service provider along with the dump file. In the Low-Level Link Processor Status dialog box, write down the halt code and status message for the processor.
cpu-range is one of these: cpu cpu-cpu cpu is the processor number, an integer from 0 through 15. cpu-cpu is two processor numbers separated by a hyphen, specifying a range of processors. In a range specification, the first processor number must be less than the second. option is one of these: NOSWITCH [PRIME|NOPRIME] fabric OMITSLICE [A|B|C] $volume [.sysnn.
2. 3. 4. 5. 6. • Right-click and select Actions. Select Reload, click Perform action. Click OK to the dismiss the confirmation dialog box. In the Logical Processor Reload Parameters dialog box, select the appropriate options. See OSM online help for information about the options. Click OK. To reload a multiple processors, use the Multi-Resource Actions dialog box (available from the Display menu of the OSM Service Connection). 1.
2. 3. 4. “Freezing the System and Freeze-Enabled Processors” (page 134). Start the system by loading Processor 0 or 1, as described in “Performing a System Load From a Specific Processor” (page 187). You can omit one Blade Element from the load operation, to dump after the system is running. You can also dump the remaining processors as needed—dump the entire processor before reloading, or reload and omit Blade Element to dump later. For more information, see “Dumping a Processor to Disk” (page 134).
If you did not have TFDS configured to take the processor dump, you can use the RCVDUMP utility to perform the dump as described in “Using RCVDUMP to Dump a Processor to Disk” (page 135). Loading a Down System to Perform a Processor Dump to Disk To perform a processor dump to disk from a down system, load the system from CPU 0 or CPU 1 with the CIIN file disabled, thereby preventing other processors configured to reload from doing so and allowing you to dump their memory to disk.
SLICE bladeId is the identification of the Blade Element from which the processor element is to be dumped. Valid values are A or B or C or ALL. Note that ALL may not be used with the parallel method of dumping. START n... is the byte address where the dump will start. The default value is 0. END n... is the byte address where the dump will stop. Using a value of -1 is the same as specifying the end of memory. The default value is -1.
See “Using RCVDUMP to Dump a Processor to Disk” (page 135). • If your service provider determines that a processor halt is not divergence-related, you might be directed to reload the processor while excluding the PE for one Blade Element, which is then dumped before being reintegrated.
Submitting Information to Your Service Provider To help with the analysis of a processor dump, submit a backup tape of other system configuration and operations files and some additional information. • “Submitting Tapes of Processor Dumps” (page 138) • “Web ViewPoint” (page 78) • “Additional Information Required by Your Service Provider ” (page 138) Submitting Tapes of Processor Dumps Use a separate tape for each processor dump.
Table 25 Additional Processor Dump Information for Your Service Provider (continued) The date that the processor dump was done __________________________________________ The RVU you are using __________________________________________ You should also provide: • A list of any software product revisions (SPRs) you have installed since installing the RVU. • A list of any customer-written privileged programs running on your system and explanations of what they do. • The reason for the processor dump.
12 Disk Drives: Monitoring and Recovery • “When to Use This Chapter” (page 140) • “Overview of Disk Drives ” (page 140) • ◦ “Internal SCSI Disk Drives” (page 141) ◦ “M8xxx Fibre Channel Disk Drives” (page 141) ◦ “Enterprise Storage System (ESS) Disks” (page 142) ◦ “Serial Attached SCSI (SAS) Disks and Solid State Drives” (page 142) “Monitoring Disk Drives” (page 143) ◦ “Monitoring Disk Drives With OSM” (page 143) ◦ “Monitoring Drives With SCF ” (page 145) ◦ “Monitoring the State of Disk D
Internal SCSI Disk Drives NOTE: Only NS16000 series systems support connections to internal SCSI disk drives installed in S-series I/O enclosures. Internal SCSI disk drives are installed in NonStop S-series I/O enclosures. These disk drives are Class-1 CRUs. Any physical action on a CRU, including installing and replacing disks, can be performed by customers. However, depending on the class of CRU, training in replacement techniques might be recommended.
Fibre Channel disk drives are field-replaceable units (FRUs). Any physical action on a FRU, including installation and replacement, must be performed only by a qualified HP service provider. For information about See M8xxx Fibre Channel disk specifications The planning guide for your NonStop system M8xxx disk commands SCF Reference Manual for the Storage Subsystem Enterprise Storage System (ESS) Disks The Enterprise Storage System (ESS) is any of several models of HP storage disk arrays.
displays partitioning information for HDDs and SSDs. See the SCF Reference Manual for the Storage Subsystem for more information.
Figure 9 Logical Disks in OSM Task See Monitor the status of disk drives • OSM Service Connection • OSM Event Viewer Inventory the entire system, including disk drives OSM Inventory View. You can save this view as an Excel file. Use: OSM Online Help • OSM Service Connection • OSM Event Viewer • OSM Inventory View Inventory multiple systems, including disk drives OSM System Inventory Tool. You can save this inventory as an Excel file.
Task See • Primary path access state • Backup path access state For SAS disks, check both the physical disk (under SAS “Using the OSM Service Connection” (page 59) Disk Enclosures) and the logical disk (under CLIM Attached Disks). For physical disk, determine the Device State. For logical disk, determine: • Service state • Primary path state • Backup path state • Primary path access state • Backup path access state Check for alarms and degraded attribute values.
2. Get information about a disk with SCF STATUS DISK, DETAIL. For example: -> STATUS DISK $DATA09, DETAIL The output from this example shows that $DATA09 is in the STOPPED state, HARDDOWN substate. STORAGE - Detailed Status DISK \SHARK.
LDev 147 State STARTED Primary PID 9,22 Backup PID 8,53 Type Subtype 3 36 STORAGE - Status VIRTUAL DISK \COMM.$WANA LDev State Primary Backup Type PID PID 145 STARTED 8,77 9,56 3 STORAGE - Status VIRTUAL DISK \COMM.$WEB LDev State Primary Backup Type PID PID 144 STARTED 9,29 8,48 3 Subtype 36 Subtype 36 STORAGE - Status VIRTUAL DISK \COMM.$WEBVPT LDev State Primary Backup Type Subtype PID PID 142 STARTED 9,26 8,47 3 36 STORAGE - Status VIRTUAL DISK \COMM.
Hardware Information: Path Location (group,module,slot) PRIMARY EXTERNAL MIRROR EXTERNAL Power Physical Status DUAL DUAL PRESENT PRESENT To display status of all paths for $DATA00: -> STATUS DISK $DATA00-* STORAGE - Status DISK \ALM171.
Table 27 Primary and Backup Path States for Disk Drives (continued) Path State Description Revive A mirrored disk is being updated. Special Only maintenance-type I/O tasks can be performed on the disk. Unknown The path state is unknown. The disk might not be responding. Up The disk volume or disk path is logically accessible. Monitoring the Use of Space on a Disk Volume The Disk Space Analysis Program (DSAP) provides information on disk capacity, free-space fragments, and page allocation.
TYPE U CODE 101 EXT ( 2 PAGES, 2 PAGES ) ODDUNSTR MAXEXTENTS 16 BUFFERSIZE 4096 OWNER 8,255 SECURITY (RWEP): NUNU DATA MODIF: 12 Jul 1994, 14:04 CREATION DATE: 12 Jan 1994, 14:04 LAST OPEN: 12 Jul 1994, 14:04 EOF 567022 (88.2% USED) FILE LABEL: 775 (31.
M8xxx Fibre Channel Disk Drives The most common disk problems on an NonStop NS-series system are intm-errors-exceeded and slow-IOs-threshold-exceeded errors on the Fibre Channel loop. Such errors are often normal. However, if they cause problems on a Fibre Channel loop, power the affected disk down and up again. This procedure can solve the problem temporarily. Unless you are a qualified service provider, you cannot perform any physical actions on disk drives.
Table 29 Common Recovery Operations for Disk Drives (continued) Problem Recovery Defective sectors If you are authorized, use the SCF CONTROL DISK, SPARE command to spare defective sectors. For information on reinitializing the disk drive, see the SCF Reference Manual for the Storage Subsystem. Disks come formatted from HP. No disk format utility is available. Return any disk that requires formatting to HP.
1. If a path is down due to a ServerNet fabric failure, determine the affected paths. From an SCF prompt: -> STATUS DISK $*-*, SUB MAGNETIC The output indicates: • $DATA06-M and $DATA06-MB are stopped in the DOWN substate. • $WD8-M and $WD8-MB are stopped in the HARDOWN substate. • $DATA00-P and $DATA00-B are stopped in the HARDDOWN substate. STORAGE - Status DISK \ALPHA12.
- ALTER MEMOS, MAXEXTENTS 20 - INFO MEMOS, DETAIL A report such as this one is sent to your home terminal: $DATA.DATA1.MEMOS 12 Jul 1993, 14:05 ENSCRIBE TYPE U CODE 101 EXT ( 2 PAGES, 2 PAGES ) ODDUNSTR MAXEXTENTS 20 BUFFERSIZE 4096 OWNER 8,255 SECURITY (RWEP): NUNU DATA MODIF: 12 Jul 1993, 14:04 CREATION DATE: 12 Jan 1993, 14:04 LAST OPEN: 12 Jul 1993, 14:24 EOF 567022 (78.5% USED) FILE LABEL: 649 (22.
13 Tape Drives: Monitoring and Recovery • “When to Use This Chapter” (page 155) • “Overview of Tape Drives” (page 155) • “Monitoring Tape Drives” (page 155) ◦ “Monitoring Tape Drive Status With OSM” (page 156) ◦ “Monitoring Tape Drive Status With SCF” (page 157) ◦ “Monitoring Tape Drive Status With MEDIACOM” (page 158) ◦ “Monitoring the Status of Labeled-Tape Operations” (page 159) • “Identifying Tape Drive Problems” (page 159) • “Recovery Operations for Tape Drives ” (page 160) • ◦ “Rec
Monitoring Tape Drive Status With OSM To check the status of all tape drives on your system: 1. Log on to the OSM Service Connection. 2. In the tree pane, expand the system object and check the Tape Collection object. A yellow arrow displayed over the Tape Collection object (see Figure 10) indicates that a problem exists with one or more of the tape drives connected to the system. 3.
Figure 11 OSM: Monitoring Tape Drives Connected to an IOMF2 NOTE: All tape drives connected to a system appear under the Tape Collection object. When a IOMF2-connected tape drive uses storage routers, those objects appear under that tape drive object in the OSM tree pane hierarchy; however, fibre channel routers appear under the Monitored Service LAN Devices object (after being configured in OSM).
Primary PID The primary processor number and process identification number (PIN) of the specified device Backup PID The backup processor number and PIN of the specified device DeviceStatus The status of the device path For more information: • “SCF Object States” (page 62) describes the possible SCF states of tape drives and other devices. • The Guardian User’s Guide provides additional information about tape operations and the tasks you can perform.
A listing such as this one is sent to your home terminal: MEDIACOM - T6028D42 (18DEC98) Tape Drive ----------$TAPE0 Drive Status ----FREE Tape Name ----- Tape Status ------ Label Type ------- Open Mode ------ Process Name ----------------- 1 tape drive returned. Monitoring the Status of Labeled-Tape Operations Use the MEDIACOM STATUS TAPEDRIVE and STATUS TAPEMOUNT commands to determine the current status of labeled-tape operations on your system.
Recovery Operations for Tape Drives You can perform recovery operations on tape drives using either the SCF interface to the storage subsystem or the OSM Service Connection. Recovery Operations Using the OSM Service Connection If the recovery operation calls for an OSM Service Connection action, you can perform an action on one or more tape drive objects. Performing an OSM Action on a Tape Drive 1. From the OSM Service Connection tree pane (the left-hand pane shown in Figure 10 (page 156)): a.
Related Reading for Tapes and Tape Drives For more information about tapes and tape drives, refer to the documentation listed in Table 31. Table 31 Related Reading for Tapes and Tape Drives For information about.. Refer to.. Tape drives The planning guide for your NonStop system BACKUP, RESTORE, and BACKCOPY utilities Guardian Disk and Tape Utilities Reference Manual (for Enscribe and SQL/MP files) BRCOM utility Backup and Restore 2.
14 Printers and Terminals: Monitoring and Recovery • “When to Use This Chapter” (page 162) • “Overview of Printers and Terminals ” (page 162) • “Monitoring Printer and Collector Process Status” (page 162) • ◦ “Monitoring Printer Status” (page 162) ◦ “Monitoring Collector Process Status” (page 163) “Recovery Operations for Printers and Terminals” (page 163) ◦ • “Recovery Operations for a Full Collector Process” (page 163) “Related Reading for Printers” (page 163) When to Use This Chapter This
DEVICE $LASER STATE WAITING FLAGS H PROC $SPLP FORM The output shows that the printer $LASER is up and available to print user jobs. Monitoring Collector Process Status Check that the collector processes on your spooler subsystem do not become more than about 90 percent full.
For information about the spooler and SPOOLCOM: • Guardian User’s Guide • Spooler Utilities Reference Manual 164 Printers and Terminals: Monitoring and Recovery
15 Applications: Monitoring and Recovery • “When to Use This Chapter” (page 165) • “Monitoring TMF” (page 165) • ◦ “Monitoring the Status of TMF” (page 165) ◦ “Monitoring Data Volumes” (page 166) ◦ “TMF States” (page 167) “Monitoring the Status of Pathway” (page 167) ◦ • “PATHMON States” (page 168) “Related Reading for Pathway” (page 169) When to Use This Chapter This chapter explains how to monitor the status of the HP NonStop Transaction Management Facility (TMF) and Pathway transaction pro
2. At the TMFCOM prompt: ~ STATUS TMF NOTE: The STATUS TMF command presents status information about the audit dump, audit trail, and catalog processes. Thus, in addition to the general TMF information, the STATUS TMF command combines information from the STATUS AUDITDUMP, STATUS AUDITTRAIL, and STATUS BEGINTRANS commands. However, information from the other STATUS commands (STATUS DATAVOLS, STATUS OPERATIONS, STATUS SERVER, and STATUS TRANSACTION) does not appear in the STATUS TMF display.
TMF States The TMF subsystem can be in any of the states listed in Table 32. Table 32 TMF States State Meaning Configuring New Audit Trails The TMF subsystem has not yet been started with this configuration. Deleting The TMF subsystem is purging its current configuration, audit trails, and volume and file recovery information for the database in response to a DELETE TMF command.
For example, to check the status of the PATHMON process for the Pathway environment on your system: > PATHCOM $ZVPT $Y290: PATHCOM - T9153D20 - (01JUN93) COPYRIGHT TANDEM COMPUTERS INCORPORATED 1980 - 1985, 1987 - 1992 = STATUS PATHWAY PATHCOM responds with output such as: EXTERNALTCPS LINKMONS PATHCOMS SPI RUNNING 0 0 1 1 SERVERCLASSES RUNNING 13 STOPPED 5 THAWED 18 SERVERPROCESSES TCPS RUNNING 13 1 STOPPED 40 0 PENDING 0 0 TERMS RUNNING 1 STOPPED 0 PENDING 0 FROZEN 0 FREEZE PENDING 0 SUS
• The REQNUM column contains the PATHMON internal identifiers of application requesters that are currently running in this environment. • The FILE column identifies the type of requester. • The WAIT column indicates whether the process is waiting, which can be caused by one of these conditions: IO The request is waiting for an I/O operation to finish. LOCK The request is waiting for an object that has been locked by another requester. PROG-DONE The request is waiting for a RUN PROGRAM to finish.
16 Power Failures: Preparation and Recovery • “When to Use This Chapter” (page 170) • “System Response to Power Failures ” (page 170) • • • ◦ “NonStop Cabinets (Modular Cabinets) ” (page 171) ◦ “External Devices ” (page 171) ◦ “ESS Cabinets” (page 172) ◦ “Networking CLIMs” (page 172) ◦ “Storage CLIMs” (page 172) ◦ “SAS Disk Drive Enclosures” (page 172) ◦ “Air Conditioning” (page 172) “Preparing for Power Failure” (page 172) ◦ “Set Ride-Through Time” (page 173) ◦ “Monitor Power Suppl
controllers before the ServerNet was shut down in order to achieve a relatively clean shutdown. This makes TMF recovery less time-consuming and difficult than if all power failed, the NonStop OS crashed, and disk writes did not complete. NOTE: OSM power fail support works as described only after it has been properly configured, as described in “Configure OSM Power Fail Support” (page 173).
During a power failure, a ServerNet/DA remains operational during the power-fail delay time, but the external modular disk and tape subsystems attached to it do not. This type of situation could result in data-integrity problems if the system software continues processing data from an external disk drive or tape drive during a short power outage. If a power failure occurs and the processors resume operations but one or more external devices fail, data integrity problems can occur.
Set Ride-Through Time Ensure that the system is set for the proper ride-through time. The default powerfail delay time for NonStop NS-series systems that are configured with rack-mounted UPSs is 30 seconds. Contact HP Expert Services for the optimum ride-through time for your system. NOTE: For NonStop BladeSystems, the ride-through time must be actively set to at least 3 minutes to allow for possible loss of SNMP traps.
Power Failure Recovery After a power failure, if AC power is restored to a NonStop system while the batteries are still holding up the system, it will not be necessary to restart the system. Depending on the configuration of UPS resources, power failure can last long enough to leave the system with some processors down because the batteries were drained to the point where the processors can no longer operate.
17 Starting and Stopping the System • “When to Use This Chapter” (page 176) • “Powering On a System ” (page 176) • • ◦ “Powering On the System From a Low-Power State” (page 176) ◦ “Powering On the System From a No Power State” (page 177) ◦ “Power Cycling a NonStop BladeSystem Processor” (page 179) ◦ “Power Cycling a NonStop NS2000 Series, NS2100, NS2200 Series, NS2300, or NS2400 Series Processor” (page 181) “Starting a System” (page 182) ◦ “Loading the System” (page 183) ◦ “Starting Other
• ◦ “Exiting the OSM Low-Level Link” (page 196) ◦ “Opening Startup Event Stream and Startup TACL Windows” (page 196) “Related Reading for Starting and Stopping a System” (page 197) When to Use This Chapter Normally, you leave a system running. However, some procedures or recovery actions require you to start the system (perform a system load) or stop or power off the system. • • Stop and then power off a system before: ◦ An extended planned power outage for your building or computer room.
4. For NonStop NS-series systems only, if your maintenance LAN is not configured with the dynamic name service (DNS) or does not have reverse look-up, you must perform a hard reset of the maintenance entities (MEs) in each p-switch or IOAM enclosure, or the integrated maintenance entities (IMEs) in each VIO enclosure*: a. From the Log On to HP OSM Low-Level Link dialog box, select Logon with Host Name or IP Address. b. Enter the IP address of: c. d. e. f. g.
3. To physically monitor power-on activity: a. Check fan activity for all of the following that are present in your NonStop system: c7000 enclosure, NonStop server blades, processor switches, processor Blade Elements, CLIMs, IOAM or VIO enclosures, and FCDM enclosures. Check that the fans are turning and that you can feel air circulate through the components. b. After the POSTs finish, check that only green power-on LEDs are lit in the system components before you start the system.
Figure 12 Processor Status Dialog Box c. d. In the OSM Low-Level Link Processor Status dialog box, verify that all processors appear in the Processor Status list. Restart the OSM Low-Level Link application and then log on to the system. NOTE: An already-running instance of the OSM Low-Level Link cannot be used here because it may show incorrect processor status. Instead, restart the OSM Low-Level Link application and log onto the system to start a new instance. e. f. Click Processor Status.
1. 2. 3. 4. Log into Onboard Administrator using the domain name of the OA of the HP NonStop BladeSystem c7000 enclosure that contains the processor that did not come up. The table below shows how the NonStop logical processors correspond to the slots in the enclosure.
5. Wait four minutes, and then use the OSM Low-Level Link application to verify the state of the affected processor, as follows: a. Restart the OSM Low-Level Link application and then log on to the system. NOTE: An already-running instance of the OSM Low-Level Link cannot be used here because it may show incorrect processor status. Instead, restart the OSM Low-Level Link application and log onto the system to start a new instance. b. c. 6. Click Processor Status.
1. 2. 3. From the system console, use Internet Explorer to connect to the iLO IP address associated with the processor to be power cycled. Power cycle the processor as follows: a. Click Power and Reset in the left pane. b. Click on Momentary Press. c. Click OK to dismiss the dialog box asking if you wish to continue. d. Wait until the Momentary Press button reappears for Power On. e. Click OK to dismiss the dialog box asking if you wish to continue.
Loading the System For information about performing a system load to load the NonStop operating system, see the instructions for loading an RVU in the Software Installation and Upgrade Guide for your RVU. Alerts • All processors in the system must be in a halted state before you perform a system load. • To perform processor dumps during a system load, see the considerations in “System Load to a Specific Processor” (page 183).
You can choose these system load disks: • An FCDM-Load attempts to load the system from a system disk in the FCDM enclosure connected to IOAM enclosure group 110: IOAM FCSA FCDM Enclosure Path Group Module Slot SAC Shelf Bay Primary 110 2 1 1 1 1 Backup 110 3 1 1 1 1 Mirror 110 3 1 2 1 1 Mirror Backup 110 2 1 2 1 1 NOTE: For NonStop NS14000 series, NS1200, and NS1000 systems, Fibre Channel disks are connected to IOAMs or VIO enclosures located in group 100.
Table 33 System Load Paths in Order of Use (continued) Data Travels 8 Mirror backup $SYSTEM-M 0 Y 9 Primary $SYSTEM-P 1 X 10 Primary $SYSTEM-P 1 Y 11 Backup $SYSTEM-P 1 X 12 Backup $SYSTEM-P 1 Y 13 Mirror $SYSTEM-M 1 X 14 Mirror $SYSTEM-M 1 Y 15 Mirror backup $SYSTEM-M 1 X 16 Mirror backup $SYSTEM-M 1 Y Configuration File Normally, you select Current (CONFIG), the default system configuration file.
For more information about configuring generic processes to start automatically, refer to the documentation in “Related Reading for Starting and Stopping a System” (page 197). • You can include commands in startup command files that you invoke from a TACL prompt or another startup file. For some techniques to make startup command files run as efficiently as possible, refer to Writing Efficient Startup and Shutdown Command Files“Writing Efficient Startup and Shutdown Command Files” (page 206).
3. 4. 5. 6. 7. 8. 9. In the Configuration File box, select a system configuration file. In most cases, you should select the Current (CONFIG) file. Select or clear the CIIN disabled check box. For a normal system load, check that the CIIN disabled check box is cleared so that the commands in the CIIN file execute. To make changes to the load paths, double-click on a row in the Path window. Click Start system.
Figure 14 System Load Configuration Dialog Box 8. 9. Click Load. Check for messages in the System Load dialog box. After the “System Startup Complete” message, close the dialog box. 10. In the Processor Status dialog box, check the status of all processors. At least one processor must be running. Determine whether you need to reload any remaining processors. 11. Dump processor memory, if needed. For more information about dumping processor memory, refer to Chapter 11 (page 120). 12.
Reloading Processors Using OSM The OSM Service Connection provides a Reload action on the Logical Processor object. You can perform the action on a single or multiple processors. For NonStop NS-series systems, the OSM action lets you reload an entire processor or omit a Blade Element from the reload action so you can dump the PE for that Blade Element before reintegrating it into the running processor. To reload a single processor, see Chapter 11 (page 120).
Anticipating and Planning for Change Anticipating and planning for change is a key requirement for maintaining an enterprise-level, 24 x 7 operation. To avoid taking a NonStop NS-series system down unnecessarily: • Evaluate system performance and growth—Track system usage and anticipate system capacity and performance requirements as new applications are introduced.
= SHUTDOWN2, MODE ORDERLY 3. Stop Distributed Systems Management/Software Configuration Manager (DSM/SCM) if it is running. At a TACL prompt: a. Type this VOLUME command: > VOLUME $DSMSCM.ZDSMSCM b. Stop DSM/SCM: > RUN STOPSCM 4. 5. Stop communications lines, such as Expand lines. Identify and stop any remaining processes that should be stopped individually: a. Use the TACL PPD and STATUS commands to help you identify running processes. b. Use the TACL STOP command to stop running processes. 6.
4. In the Processor Status dialog box, select all processors to be halted. To select multiple processors, use the Shift key, but the processors must be in numerical order. For example, you can select processors 2, 3, and 4, but not 2 and 4. 5. 6. 7. From the Processors Actions menu, select Halt. Click Perform action. A message box asks whether you are sure you want to perform a halt on the selected processors. Click OK.
Do you still wish to execute this action? 6. Select Yes. System Shutdown Using SCF To power off the system using SCF, log on to an available TACL command interpreter as the super ID (255,255) and issue the SCF power-off command: > SCF CONTROL SUBSYS $ZZKRN, SHUTDOWN Emergency Power-Off Procedure If possible, HP recommends that the system be in a low-power state before you remove power to the system. However, in emergency situations, you might need to quickly remove AC power from a system.
10 minutes for large configurations). If the system is still not powered on after this time and you cannot determine the cause of the problem: • Check your site’s circuit breakers. • Plug in another device into the PDU that powers the LSU to check the power for that PDU. Green LED Is Not Lit After POSTs Finish It can take several minutes for the green LEDs on all system components to light: 1. Wait for the POSTs to finish. It might take as long as 10 minutes for all system components. 2.
7. If you continue to have problems, load the system from each disk path for both the primary and mirror $SYSTEM drives. 8. If you cannot load the system using the current configuration file, load the system using a saved version of the system configuration file. See “Configuration File” (page 185). 9. If you still cannot load the system or if a CONFxxyy is not available, load the system from an alternate system disk if one is available. 10.
3. 4. 5. 6. 7. Check for any event messages. Look up event messages in the EMS logs ($0 and $ZLOG) and refer to the OSM Event Viewer or the Operator Messages Manual for further information about the cause, effect, and recovery for any event message. Perform a processor dump, if needed, as described in “Dumping a Processor to Disk” (page 134). Try a soft reset of the processor. Reload the processor or processors as described in Chapter 11 (page 120).
2. From the File menu, select Start Terminal Emulator > For Startup TACL. Figure 16 Opening a Startup TACL Window To open startup event stream windows and startup TACL windows using comForte MR-Win6530: 1. Select Start > All Programs > MR-Win6530 > MR-Win6530. 2. Refer to the Win6530 Help or to the Win6530 User Guide, both available from Start > All Programs > MR-Win6530 > MR-Win6530.
Table 34 Related Reading for Starting and Stopping a System (continued) For Information About Refer to Informing OSM of the location of an alternate system disk. OSM Service Connection online help See Saving (a disk-level action) or deleting (a system-level action) alternate system load volumes.
18 Creating Startup and Shutdown Files This section describes command files that automatically start and shut down an NonStop system. • • “Automating System Startup and Shutdown” (page 200) ◦ “NonStop Cluster Boot Application” (page 200) ◦ “Managed Configuration Services (MCS)” (page 200) ◦ “Startup” (page 201) ◦ “Shutdown” (page 201) ◦ “For More Information ” (page 201) “Processes That Represent the System Console” (page 201) ◦ “$YMIOP.#CLCI” (page 201) ◦ “$YMIOP.
◦ “Expand-Over-IP Line Startup File” (page 213) ◦ “Expand Direct-Connect Line Startup File” (page 214) • “Tips for Shutdown Files” (page 214) • “Shutdown File Examples” (page 214) ◦ “System Shutdown File” (page 214) ◦ “CP6100 Lines Shutdown File” (page 215) ◦ “ATP6100 Lines Shutdown File” (page 215) ◦ “X.
Startup You can use startup command files to automate the starting of devices and processes on the system, which minimizes the possibility of operator errors caused by forgotten or mistyped commands. The system is shipped with a basic startup file named CIIN, located on the $SYSTEM.SYS00 subvolume. The CIIN file must be specified in a particular way. See “CIIN File” (page 203) for more information.
$ZHOME The $ZHOME process is a process pair that provides a reliable home terminal to which processes can perform write operations. The $ZHOME process can be used by processes that must write to the system console but do not require a response. $ZHOME is preconfigured on your system by the CONFBASE file. $ZHOME is a generic process that is part of the SCF Kernel subsystem. Note the following about the configuration of $ZHOME: • The $ZHOME process is configured with $YMIOP.
• The IP addresses used in this section are examples only. If you use the example files described in this section on your system, you must change the IP addresses in these examples to IP addresses that are appropriate for your LAN environment. • The configuration track-ID for the SWAN concentrator used in the example files, X001XX, is also an example.
system startup, even if you enable that file. You cannot simply copy a startup file to the SYSnn subvolume and name it CIIN. Modifying a CIIN File After the CIIN file is established on $SYSTEM.SYSnn (as part of running DSM/SCM), you can modify the contents of SYSnn.CIIN with a text editor such as TEDIT. You need not run DSM/SCM again to make these changes effective.
Comment -- This file is used to reload the remaining processors and Comment -- start a TACL process pair for the system console. Comment -- Reload the remaining processors. RELOAD /TERM $ZHOME, OUT $ZHOME/ * Comment Comment Comment Comment Comment Comment Comment Comment --------- Start a TACL process pair for the system console TACL window. Use the OSM Low-Level Link to start a TTE session for the startup TACL before issuing this command (see the Start Terminal Emulator command under the File menu).
Writing Efficient Startup and Shutdown Command Files TACL and by many subsystems support command files. Command files for startup or shutdown contain a series of commands that automatically execute when the file is executed. To automate and reduce the time required to start and stop your applications, devices, and processes: • Include commands in one or more command files that you invoke from either a TACL prompt or another file. • Write efficient startup and shutdown command files.
communications lines. The files START0, START1, START2, and START3 contain the actual commands that start the communications lines. This command file uses a special technique intended to ensure that each process gets started even if a given processor is out of service. The technique is to start each process in two processors. If the first processor is down, the command file continues to the next processor.
The sequence in which you invoke startup files can be important. Some processes require other processes to be running before they can be started. Be sure to indicate the order in which your startup files are to be run. Because the TCP/IP configurations are not stored in the configuration database, they are not preserved after system loads. Therefore, TCP/IP stacks must be configured as well as started each time the system is started. This is only true for conventional TCP/IP.
Comment -- command to start $ZEXP, unless you load the system from a Comment -- different CONFIG file. Comment Comment Comment Comment ----- If you have not configured $ZEXP as a persistent generic process,remove the commenting from the following SCP command and start $ZEXP as a nonpersistent process pair. OZEXP / NAME $ZEXP, NOWAIT, PRI 180, OUT $ZHOME, CPU 0/1 comment -- Warm start the spooler subsystem using the SPOOLCOM command comment -- file SPLWARM OBEY $SYSTEM.STARTUP.
comment -- check to see that the spooler started successfully SPOOLCOM; SPOOLER, STATUS TMF Warm-Start File This example command file warm starts the TMF subsystem. This file can be invoked automatically from the STRTSYS file, or you can invoke it by using the following TACL command: > TMFCOM / IN $SYSTEM.STARTUP.TMFSTART, OUT $ZHOME / -- This is $SYSTEM.STARTUP.TMFSTART -- This file warm starts the Transaction Management Facility (TMF) subsystem -- and checks to see if TMF started successfully.
#SET TCP^CPU2 1 [#IF NOT [#PROCESSEXISTS $ZNET] |THEN| #OUTPUT #OUTPUT Starting SCP... SCP /NAME $ZNET, NOWAIT, CPU 0, PRI 165, TERM [CON^NAME]/ 1; AUTOSTOP -1 ] [#IF [#PROCESSEXISTS [LST^NAME]] |THEN| STOP [LST^NAME] ] #OUTPUT #OUTPUT Stopping existing TCP/IP processes...
+ ALTER SUBNET #SN1, SUBNETMASK %%hFFFFFF00 + ALTER SUBNET #LOOP0, IPADDRESS 127.1 + START SUBNET * + ADD ROUTE #GW, DESTINATION 0, GATEWAY [GW^ADDR], DESTTYPE BROADCAST + START ROUTE * + EXIT POP #INLINEPREFIX #OUTPUT #OUTPUT Starting Listner: [LST^NAME] LISTNER /NAME [LST^NAME], CPU [TCP^CPU1], PRI 160, NOWAIT, TERM [CON^NAME], HIGHPIN OFF/ $SYSTEM.ZTCPIP.
START LINE $ATP* X.25 Lines Startup File This example shows an SCF command file that starts the X.25 lines associated with the SWAN concentrator $ZZWAN.#S01 (configuration track-ID X001XX). This file can be invoked automatically from the STRTSYS file, or you can invoke it by using the following TACL command: > SCF / IN $SYSTEM.STARTUP.STRTX25, OUT $ZHOME / == == == This is $SYSTEM.STARTUP.STRTX25 Starts the X.25 lines associated with the SWAN concentrator $ZZWAN.
Expand Direct-Connect Line Startup File This example shows an SCF command file that starts an Expand direct-connect line on a SWAN concentrator. This file can be invoked automatically from the STRTSYS file, or you can invoke it by using the following TACL command: > SCF / IN $SYSTEM.STARTUP.STRTLH, OUT $ZHOME / == This is $SYSTEM.STARTUP.
comment -- Shut down the ATP6100 lines associated with the SWAN concentrator SCF/ IN $SYSTEM.SHUTDOWN.SDNATP, OUT $ZHOME / comment -- Shut down the X.25 lines associated with the SWAN concentrator SCF/ IN $SYSTEM.SHUTDOWN.SDNX25, OUT $ZHOME / comment -- Shut down the printer lines associated with the SWAN concentrator SCF/ IN $SYSTEM.SHUTDOWN.SDNLP, OUT $ZHOME / comment -- Shut down the Expand-over-IP line to \Case2 SCF/ IN $SYSTEM.SHUTDOWN.
ALLOW 20 ERRORS ABORT LINE $ATP* X.25 Lines Shutdown File This example shows an SCF command file that stops the X.25 lines associated with the SWAN concentrator $ZZWAN.#S01 (configuration track-ID X001XX). This file can be invoked automatically from the STOPSYS file, or you can invoke it by using the following TACL command: > SCF/ IN $SYSTEM.SHUTDOWN.SDNX25, OUT $ZHOME / == This is $SYSTEM.SHUTDOWN.SDNX25 == == This shuts down the X.25 lines associated with the SWAN concentrator $ZZWAN.
This file can be invoked automatically from the STOPSYS file, or you can invoke it by using the following TACL command: > SCF/ IN $SYSTEM.SHUTDOWN.STOPLH, OUT $ZHOME / == This is $SYSTEM.SHUTDOWN.STOPLH == This shuts down the direct-connect line ALLOW 20 ERRORS ABORT LINE $Case2elh Spooler Shutdown File This example shows a TACL command file that drains the spooler. This file can be invoked automatically from the STOPSYS file, or you can invoke it by using the following TACL command: > OBEY $SYSTEM.
19 Preventive Maintenance • “When to Use This Chapter” (page 218) • “Monitoring Physical Facilities” (page 218) • • ◦ “Checking Air Temperature and Humidity” (page 218) ◦ “Checking Physical Security” (page 218) ◦ “Maintaining Order and Cleanliness” (page 218) ◦ “Checking Fire-Protection Systems” (page 219) “Cleaning System Components” (page 219) ◦ “Cleaning an Enclosure” (page 219) ◦ “Cleaning and Maintaining Printers” (page 219) ◦ “Cleaning Tape Drives” (page 219) “Handling and Storing
Checking Fire-Protection Systems You might also be asked to check the fire alarms and fire extinguisher systems in your facility. Cleaning System Components This section contains basic information about cleaning enclosures, printers, and tape drives. Many companies have service-level agreements with HP that include regular preventive maintenance (PM) of their hardware components.
NOTE: These precautions are extremely important to prevent damage: • Do not use cleaner solutions that contain lubricants. Lubricants deposit a film on the tape head and impair performance. • Do not use aerosol cleaners, even if they contain isopropyl alcohol. The spray is difficult to control and often contains metallic particles that damage the tape head. • Do not use soap and water on a tape path. Soap leaves a thick film, and water can damage electronic parts. • Do not use facial tissues.
A Operational Differences Between Systems Running G-Series, H-Series, and J-Series RVUs Users familiar with systems running G-series RVUs will find several major differences in the operational environment of systems running H-series and J-series RVUs. Although many of the operations to be performed remain the same, the tools you use to execute these operations might differ significantly.
B Tools and Utilities for Operations When to Use This Appendix This appendix briefly describes the tools and utilities that might be available on your system to assist you in performing the operations tasks for an NonStop system. The use of some of these tools and utilities is discussed throughout this guide. For a list of other documentation that provides detailed information about these tools and utilities, see “Related Reading for Tools and Utilities” (page 227).
File Utility Program (FUP) The File Utility Program (FUP) is a component of the standard software package for the NonStop Kernel operating system. FUP software is designed to help you manage disk files, nondisk devices (printers, terminals, and tape drives), and processes (running programs) on the NonStop system. You can use FUP to create, display, and duplicate files; load data into files; alter file characteristics; and purge files.
NonStop Maintenance LAN DHCP DNS Configuration Wizard The NonStop Maintenance LAN DHCP DNS Configuration Wizard can be used to configure DHCP, DNS, and BOOTP servers. DHCP and DNS servers are required for NonStop systems and the CLIMs connected to these systems. BOOTP servers are only for the J-series NonStop BladeSystems, NonStop NS2400 series, NS2300, NS2200 series, and NS2100 systems and make the Halted State Services (HSS) files available for processors to boot from.
RESTORE Use the RESTORE utility to copy files from magnetic tape to disk.
TMFCOM TMFCOM allows you to enter commands that initiate communication with TMF, request various TMF operations, and terminate communication with TMF. Web ViewPoint Use Web ViewPoint, a browser-based product, to access the Event Viewer, Object Manager, and Performance Monitor subsystems.
C Related Reading for Tools and Utilities For more information about tools and utilities used for system operations, refer to the documentation listed in Table 35. Table 35 Related Reading for Tools and Utilities Tool Documentation Description BACKCOPY Guardian Disk and Tape Utilities Manual This manual describes these disk and tape utilities: BACKCOPY, BACKUP, DCOM, DSAP, and RESTORE. This manual supports D-series, G-series, H-series, and J-series RVUs.
Table 35 Related Reading for Tools and Utilities (continued) Tool Documentation Description DSM/Tape Catalog Operator Interface This manual explains how to run a (MEDIACOM) Manual MEDIACOM session and describes the purpose and the syntax of the MEDIACOM commands. Guardian User’s Guide NonStop Cluster Essentials This guide contains information explaining how to perform routine operations relating to the tapes and tape drives on your system.
Table 35 Related Reading for Tools and Utilities (continued) Tool Documentation Online help Online help is also available from within each of these OSM applications: Description • NonStop Maintenance LAN DHCP DNS Configuration Wizard • Down System CLIM Firmware Update Tool • OSM Low-Level Link • OSM Notification Director • OSM Event Viewer • OSM System Inventory Tool • OSM Certificate Tool • OSM Emulator File Converter • OSM guided procedures (within the OSM Service Connection) PATHCOM TS/MP System Ma
Table 35 Related Reading for Tools and Utilities (continued) Tool Documentation Description Subsystem Control Facility (SCF) SCF Reference Manual for H and J Series RVUs This manual describes the operation of SCF on H and J Series RVUs and how it is used to configure, control, and monitor subsystems supported by an SCF interface.
Table 35 Related Reading for Tools and Utilities (continued) Tool Documentation Description • Monitors and graphs performance attributes and trends • Investigates and displays most active system processes • Offers simple navigation and a point-and-click command interface ViewPoint ViewPoint Manual This manual describes ViewPoint, a multifunction operations console application that allows the management of a network of systems.
D Converting Numbers When to Use This Appendix Refer to this appendix if you need to convert numbers from one numbering system to another. Overview of Numbering Systems Internally, a computer stores data as a series of off and on values represented symbolically by the binary digits, or bits, 0 and 1, respectively. Because numbers represented as strings of binary 0s and 1s are difficult to read, binary numbers are generally converted into octal, decimal, or hexadecimal form.
Figure 17 Binary to Decimal Conversion 1. 2. 3. Take the rightmost binary digit and multiply it by the rightmost placeholder value. Moving to the left, take the next binary digit and multiply it by the next placeholder value. Continue to do this until the binary number has been exhausted. Add the multiplied values together. The result is: Binary Value Decimal Value %B11011 27 Octal to Decimal To convert an octal number to a decimal number: 1.
Figure 18 Octal to Decimal Conversion 1. 2. 3. Take the rightmost octal digit and multiply it by the rightmost placeholder value. Moving to the left, take the next octal digit and multiply it by the next placeholder value. Continue to do this until the octal number has been exhausted. Add the multiplied values together. The result is: Octal Value Decimal Value %1375 765 Hexadecimal to Decimal To convert a hexadecimal number to a decimal number: 1.
Example Convert the hexadecimal value BA10 to its decimal equivalent. (In this example, the symbol “*” indicates multiplication.) Refer to Figure 19 (page 235). Figure 19 Hexadecimal to Decimal Conversion 1. 2. 3. Take the rightmost hexadecimal digit and multiply it by the rightmost placeholder value. Moving to the left, take the next hexadecimal digit and multiply it by the next placeholder value. Continue to do this until the hexadecimal number has been exhausted.
3. 88/2 = 44 0 4. 44/2 = 22 0 5. 22/2 = 11 0 6. 11/2 = 5 1 7. 5/2 = 2 1 8. 2/2 = 1 0 9. 1/2 = 0 1 remainder = most significant (leftmost) digit The result is: Decimal Value Binary Value 354 %B101100010 Decimal to Octal To convert a decimal number to an octal number: 1. Divide the decimal number by 8. The remainder of this first division becomes the least significant (rightmost) digit of the octal value. 2.
1. 2. Divide the decimal number by 16. The remainder of this first division becomes the least significant (rightmost) digit of the hexadecimal value. If the remainder exceeds 9, convert the 2-digit remainder to its hexadecimal letter equivalent. Use this table for conversion.
Safety and Compliance This section contains three types of required safety and compliance statements: • Regulatory compliance • Waste Electrical and Electronic Equipment (WEEE) • Safety Regulatory Compliance Statements The following regulatory compliance statements apply to the products documented by this manual. FCC Compliance This equipment has been tested and found to comply with the limits for a Class A digital device, pursuant to part 15 of the FCC Rules.
Taiwan (BSMI) Compliance Japan (VCCI) Compliance This is a Class A product. In a domestic environment this product may cause radio interference in which case the user may be required to take corrective actions. European Union Notice Products with the CE Marking comply with both the EMC Directive (2004/108/EC) and the Low Voltage Directive (2006/95/EC) issued by the Commission of the European Community.
Ukraine Addendum to User Documentation Україна Додаток до документації користувача: Обладнання відповідає вимогам Технічного регламенту щодо обмеження використання деяких небезпечних речовин в електричному та електронному обладнанні, затвердженого постановою Кабінету Міністрів України від 3 грудня 2008 № 1057 Збережіть цей документ разом із документацією користувача для цього виробу Laser Compliance This product may be provided with an optical storage device (that is, CD or DVD drive) and/or fiber optic tr
IMPORTANT: TOUS LES RECIPIENTS SONT DESTINES UNIQUEMENT A UN USAGE INTERNE. VORSICHT: ALLE STECKDOSEN DIENEN NUR DEM INTERNEN GEBRAUCH. HIGH LEAKAGE CURRENT To reduce the risk of electric shock due to high leakage currents, a reliable grounded (earthed) connection should be checked before servicing the power distribution unit (PDU).
Safety and Compliance
Index Symbols $SYSTEM, recovery operations for, 194 $YMIOP.#CLCI, 201, 203 $YMIOP.
E EMS Analyzer (EMSA), 222 EMS event messages monitored by HP SIM, 77 monitored by OSM Event Viewer, 77 monitored by Web ViewPoint, 78 tools for monitoring, 77 EMS event messages, monitoring , 77 EMSA, 222 EMSDIST description of, 222 using to monitor EMS event messages, 78 EMSLOG file, 138 Enclosures cleaning, 219 Encryption, volume level, 33, 37 Enterprise Storage System, 107 see also ESS ESS, 107 Event Management Service (EMS), 77 Examples checking file size, 149 checking status of PATHMON process, 169 ch
L LEDs status, 68, 176 LEDs, status, 68, 176 LIFs, 86 Logical interfaces (LIFs), 86 M Maintenance LAN, 86 Measure program, 223 MEDIACOM description of, 223 interface, 161, 223 STATUS TAPEDRIVE command, 158 Monitoring communications subsystems, 98 disk drives, 143 EMS event messages , 77 G4SA, 109 NonStop system resources, 52 overview , 51 performance of NonStop subsystems, 71 printers, 162 problem incident reports, 59 processes , 79 processors , 124 ServerNet fabrics , 102 tape drives , 155 N NonStop Avai
POSTs, 176 see also Power-on self-tests (POSTs) Power failure how external devices respond to , 171 preparing for maintaining batteries , 173 monitor batteries , 173 monitor power supplies, 173 ride-through time , 173 recovery operations, 174 response ESS cabinets, 172 external devices, 171 IB CLIMs, 172 IP CLIMs, 172 NonStop cabinets, 171 NonStop S-series enclosures, 171 SAS disk drive enclosures, 172 Storage CLIMs, 172 systems , 171 Telco CLIMs, 172 Powering off the system , 192 Powering on external syste
CIIN, 201 configuration database, 208 CP6100, 212 direct-connect, 214 Expand-over-IP, 213 invoking, 201 security, 207 sequence, 208 spooler warm start, 209 system startup file, 208 TCP/IP stacks, 208 TMF warm start, 210 X.