NonStop NS-Series Operations Guide (H06.03+)

ManualsBrandsHP ManualsServerHP Integrity NonStop H-Series

111

112

113

114

115

116

117

118

119

120

Table Of Contents

What’s New in This Manual
About This Guide
1 Introduction to Integrity NonStop NSSeries Operations
- When to Use This Section
- Understanding the Operational Environment
- What Are the Operator Tasks?
- Determining the Cause of a Problem: A Systematic Approach
- Logging On to an Integrity NonStop Server
- Service Procedures
  - CSSI Web
2 Determining Your System Configuration
- When to Use This Section
- Modular Hardware Components
  - Terms Used to Describe System Hardware Components
- Recording Your System Configuration
- Using SCF to Determine Your System Configuration
3 Overview of Monitoring and Recovery
- When to Use This Section
- Functions of Monitoring
- Monitoring Tasks
- Monitoring and Resolving Problems—An Approach
- Using OSM to Monitor the System
- Using SCF to Monitor the System
  - Determining Device States
- Automating Routine System Monitoring
- Using the Status LEDs to Monitor the System
- Related Reading
4 Monitoring EMS Event Messages
- When to Use This Section
- What Is the Event Management Service (EMS)?
- Tools for Monitoring EMS Event Messages
- Related Reading
5 Processes: Monitoring and Recovery
- When to Use This Section
- Types of Processes
- Monitoring Processes
- Recovery Operations for Processes
- Related Reading
6 Communications Subsystems: Monitoring and Recovery
- When to Use This Section
- Communications Subsystems
  - Local Area Networks (LANs) and Wide Area Networks (WANs)
- Monitoring Communications Subsystems and Their Objects
- Recovery Operations for Communications Subsystems
- Related Reading
7 ServerNet Resources: Monitoring and Recovery
- When to Use This Section
- ServerNet Communications Network
- System I/O ServerNet Connections
- Monitoring the Status of the ServerNet Fabrics
  - Monitoring the ServerNet Fabrics Using OSM
  - Monitoring the ServerNet Fabrics Using SCF
- Related Reading
8 I/O Adapters and Modules: Monitoring and Recovery
- When to Use This Section
- I/O Adapters and Modules
  - Fibre Channel ServerNet Adapter (FCSA
  - Gigabit Ethernet 4-Port Adapter (G4SA)
- Monitoring I/O Adapters and Modules
  - Monitoring the FCSAs
  - Monitoring the G4SAs
- Recovery Operations for I/O Adapters and Modules
- Related Reading
9 Processors and Components: Monitoring and Recovery
- When to Use This Section
- Overview of the NonStop Blade Complex
- Monitoring and Maintaining Processors
- Identifying Processor Problems
- Recovery Operations for Processors
- Related Reading
10 Disk Drives: Monitoring and Recovery
- When to Use This Section
- Overview of Disk Drives
- Monitoring Disk Drives
- Identifying Disk Drive Problems
  - Internal SCSI Disk Drives
  - M8xxx Fibre-Channel Disk Drives
- Recovery Operations for Disk Drives
  - Recovery Operations for a Down Disk or Down Disk Path
  - Recovery Operations for a Nearly Full Database File
- Related Reading
11 Tape Drives: Monitoring and Recovery
- When to Use This Section
- Overview of Tape Drives
- Monitoring Tape Drives
- Identifying Tape Drive Problems
- Recovery Operations for Tape Drives
  - Recovery Operations Using the OSM Service Connection
  - Recovery Operations Using SCF
- Related Reading
12 Printers and Terminals: Monitoring and Recovery
- When to Use This Section
- Overview of Printers and Terminals
- Monitoring Printer and Collector Process Status
  - Monitoring Printer Status
  - Monitoring Collector Process Status
- Recovery Operations for Printers and Terminals
  - Recovery Operations for a Full Collector Process
- Related Reading
13 Applications: Monitoring and Recovery
- When to Use This Section
- Monitoring TMF
- Monitoring the Status of Pathway
  - PATHMON States
- Related Reading
14 Power Failures: Preparation and Recovery
- When to Use This Section
- System Response to Power Failures
- Preparing for Power Failure
- Power Failure Recovery
  - Procedure to Recover From a Power Failure
  - Setting System Time
- Related Reading
15 Starting and Stopping the System
- When to Use This Section
- Powering On a System
  - Powering On the System From a Low Power State
  - Powering On the System From a No Power State
- Starting a System
- Minimizing the Frequency of Planned Outages
  - Anticipating and Planning for Change
- Stopping Application, Devices, and Processes
- Stopping the System
  - Alerts
  - Halting All Processors Using OSM
- Powering Off a System
- Troubleshooting and Recovery Operations
- Related Reading
16 Creating Startup and Shutdown Files
- Automating System Startup and Shutdown
- Processes That Represent the System Console
- Example Command Files
- CIIN File
- Writing Efficient Startup and Shutdown Command Files
- How Process Persistence Affects Configuration and Startup
- Tips for Startup Files
- Startup File Examples
- Tips for Shutdown Files
- Shutdown File Examples
17 Preventive Maintenance
- When to Use This Section
- Monitoring Physical Facilities
- Cleaning System Components
- Handling and Storing Cartridge Tapes
A Operational Differences Between Systems Running GSeries and HSeries RVUs
B Tools and Utilities for Operations
- When to Use This Appendix
- BACKCOPY
- BACKUP
- Disk Compression Program (DCOM)
- Disk Space Analysis Program (DSAP)
- EMSDIST
- Event Management Service Analyzer (EMSA)
- File Utility Program (FUP)
- Measure
- MEDIACOM
- NonStop NET/MASTER
- NSKCOM and the Kernel-Managed Swap Facility (KMSF)
- OSM Package
- PATHCOM
- PEEK
- RESTORE
- SPOOLCOM
- Subsystem Control Facility (SCF)
- HP Tandem Advanced Command Language (TACL)
- TMFCOM
- Web ViewPoint
- ViewPoint
- ViewSys
C Related Reading
D Converting Numbers
- When to Use This Appendix
- Overview of Numbering Systems
- Binary to Decimal
- Octal to Decimal
- Hexadecimal to Decimal
- Decimal to Binary
- Decimal to Octal
- Decimal to Hexadecimal
Safety and Compliance
Index

Processors and Components: Monitoring and

Recovery

HP Integrity NonStop NS-Series Operations Guide—529869-001

9-2

When to Use This Section

Use this section to monitor processors and to perform recovery operations such as

processor dumps.

Overview of the NonStop Blade Complex

The basic building block of the modular NonStop advanced architecture (NSAA)

compute engine is the NonStop Blade Complex, which consists of two or three

processor modules called NonStop Blade Elements. Each Blade Element houses two

or four microprocessors called processor elements (PEs). A logical processor consists

of one processor element from each Blade Element. Although a logical processor

physically consists of multiple processor elements, it is convenient to think of a logical

processor as a single entity within the system. Each logical processor has its own

memory, its own copy of the operating system, and processes a single instruction

stream. NSAA logical processors are usually referred to simply as “processors.”

All input and output to and from each NonStop Blade Element goes through a logical

synchronization unit (LSU). The LSU interfaces with the ServerNet fabrics and contains

logic that compares all output operations of a logical processor, ensuring that all

NonStop Blade Elements agree on the result before the data is passed to the

ServerNet fabrics.

A processor with two NonStop Blade Elements comprise the dual modular redundant

(DMR) NonStop Blade Complex, which is also referred to as a duplex system. This

duplex system provides data integrity and system availability that is comparable to

NonStop S-series systems, but at considerably faster processing speeds.

Three NonStop Blade Elements plus their associated LSUs make up the triple modular

redundant (TMR) NonStop Blade Complex, which is referred to as a triplex system.

The triplex system provides the same processing speeds as the duplex system while

also enabling hardware fault recovery that is transparent to all but the lowest level of

the NonStop operating system (OS).

In the event of a processor fault in either a duplex or triplex system, the failed

component within a NonStop Blade Element (processor element, power supply, and so

forth) can be replaced while the system continues to run. A single Integrity NonStop

system can have up to four NonStop Blade Complexes for a total of 16 processors.

Processors communicate with each other and with the system I/O over dual ServerNet

fabrics.

A ServerNet fabric is a complex web of links that provide a large number of possible

paths from one point to another. Two communications fabrics, the X and Y ServerNet

fabrics, provide redundant, fault-tolerant communications pathways. If a hardware fault

occurs on one of the ServerNet fabrics, communications continues on the other with

hardware fault recovery transparent to all but the lowest level of the OS.

Figure 9-1 is an overview of the modular NSAA and shows one NonStop Blade

Complex with four processors, the I/O hardware and the ServerNet fabrics.