NonStop NS-Series Operations Guide (H06.03+)
Table Of Contents
- What’s New in This Manual
- About This Guide
- 1 Introduction to Integrity NonStop NSSeries Operations
- When to Use This Section
- Understanding the Operational Environment
- What Are the Operator Tasks?
- Monitoring the System and Performing Recovery Operations
- Preparing for and Recovering from Power Failures
- Stopping and Powering Off theSystem
- Powering On and Starting the System
- Creating Startup and Shutdown Files
- Performing Preventive Maintenance
- Operating Disk Drives and Tape Drives
- Responding to Spooler Problems
- Updating Firmware
- Determining the Cause of a Problem: A Systematic Approach
- Logging On to an Integrity NonStop Server
- Service Procedures
- 2 Determining Your System Configuration
- 3 Overview of Monitoring and Recovery
- 4 Monitoring EMS Event Messages
- 5 Processes: Monitoring and Recovery
- 6 Communications Subsystems: Monitoring and Recovery
- 7 ServerNet Resources: Monitoring and Recovery
- 8 I/O Adapters and Modules: Monitoring and Recovery
- 9 Processors and Components: Monitoring and Recovery
- When to Use This Section
- Overview of the NonStop Blade Complex
- Monitoring and Maintaining Processors
- Identifying Processor Problems
- Recovery Operations for Processors
- Recovery Operations for a Processor Halt
- Halting One or More Processors
- Reloading a Single Processor on a Running Server
- Recovery Operations for a System Hang
- Enabling/Disabling Processor and System Freeze
- Freezing the System and Freeze-Enabled Processors
- Dumping a Processor to Disk
- Backing Up a Processor Dump to Tape
- Replacing Processor Memory
- Replacing the Processor Board and Processor Entity
- Submitting Information to Your Service Provider
- Related Reading
- 10 Disk Drives: Monitoring and Recovery
- 11 Tape Drives: Monitoring and Recovery
- 12 Printers and Terminals: Monitoring and Recovery
- 13 Applications: Monitoring and Recovery
- 14 Power Failures: Preparation and Recovery
- 15 Starting and Stopping the System
- When to Use This Section
- Powering On a System
- Starting a System
- Minimizing the Frequency of Planned Outages
- Stopping Application, Devices, and Processes
- Stopping the System
- Powering Off a System
- Troubleshooting and Recovery Operations
- Fans Are Not Turning
- System Does Not Appear to Be Powered On
- Green LED Is Not Lit After POSTs Finish
- Amber LED on a Component Remains Lit After the POST Finishes
- Components Fail When Testing the Power
- Recovering From a System Load Failure
- Getting a Corrupt System Configuration File Analyzed
- Recovering From a Reload Failure
- Exiting the OSM Low-Level Link
- Opening Startup Event Stream and Startup TACL Windows
- Related Reading
- 16 Creating Startup and Shutdown Files
- Automating System Startup and Shutdown
- Processes That Represent the System Console
- Example Command Files
- CIIN File
- Writing Efficient Startup and Shutdown Command Files
- How Process Persistence Affects Configuration and Startup
- Tips for Startup Files
- Startup File Examples
- Tips for Shutdown Files
- Shutdown File Examples
- 17 Preventive Maintenance
- A Operational Differences Between Systems Running GSeries and HSeries RVUs
- B Tools and Utilities for Operations
- When to Use This Appendix
- BACKCOPY
- BACKUP
- Disk Compression Program (DCOM)
- Disk Space Analysis Program (DSAP)
- EMSDIST
- Event Management Service Analyzer (EMSA)
- File Utility Program (FUP)
- Measure
- MEDIACOM
- NonStop NET/MASTER
- NSKCOM and the Kernel-Managed Swap Facility (KMSF)
- OSM Package
- PATHCOM
- PEEK
- RESTORE
- SPOOLCOM
- Subsystem Control Facility (SCF)
- HP Tandem Advanced Command Language (TACL)
- TMFCOM
- Web ViewPoint
- ViewPoint
- ViewSys
- C Related Reading
- D Converting Numbers
- Safety and Compliance
- Index

Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-001
3-12
Recovery Operations for Problems Detected by
OSM
Suppressing Problems and Alarms
In certain cases, you might want to acknowledge or suppress a particular problem, to
stop it from propagating a known problem all the way up to the system level. That way,
it will be easier to identify other problems that might occur. For more information on
OSM problem management features such as deleting or suppressing alarms and
suppressing problem attributes, see the OSM User’s Guide (also available as online
help within the OSM Service Connection).
Recovery Operations for Problems Detected by OSM
Recovery operations depend on the particular problem, of course. Methods of
determining the appropriate recovery action include:
•
Alarm Details, available for each alarm displayed in OSM, provide suggested repair
actions.
•
The value displayed by problem attributes in OSM often provide clues to recovery.
•
EMS events, retrieved and viewed in the OSM Event Viewer, include cause, effect,
and recovery information in the event details.
•
Check the section in this guide that covers the system resource—for example,
Section 11, Tape Drives: Monitoring and Recovery— for information on using the
SCF and other tools to determine the cause of a problem. Then follow the
directions in the Recovery Operations subsection in the relevant section.
Replacing a system component that has malfunctioned is beyond the scope of this
guide. For more information, contact your service provider, or refer to the CSSI Web.
Monitoring Problem Incident Reports
The OSM Notification Director generates problem incident reports when changes occur
that could directly affect the availability of resources on your Integrity NonStop server.
The Incident Report List tab on the Notification Director dialog box allows you to view,
sort, authorize, and reject incident reports. The Notification Director allows you to
forward notifications to your service provider if your system is configured for remote
dial-out.
Using SCF to Monitor the System
Use the Subsystem Control Facility (SCF) to display information and current status for
all the devices on your system known to SCF. Some SCF commands are available
only to some subsystems. The objects that each command affects and the attributes of
those objects are subsystem specific. This subsystem-specific information appears in a
separate manual for each subsystem. A partial list of these manuals appears in
Appendix C, Related Reading.










