Availability Guide for Problem Management
Auditing Systems for Fault Tolerance
Availability Guide for Problem Management–125509
7-6
Configuring Processors for Stress Periods
loads the environment stored prior to power off, and processing can continue
automatically.
The system automatically resumes operations within a few minutes after power is
restored. After bringing disks and tapes back to full operating speed, the system recovers
any files protected by the NonStop Transaction Manager/MP (TM/MP) that might have
been compromised, and resumes processing transactions against these files. In some
cases, recovery of files is automatic, while in other cases, operator intervention is
required.
Typical power line disturbances include the following:
•
Sags—electrical power dips, often caused by routine circuit-switching at power
utility substations
•
Surges—increases in power, sometimes caused by lightning hits
•
Spikes—sporadic, rapid, short-lived variations in electrical power
•
Outages—interruptions to electrical power, resulting from ac power failure,
accidental power-off, or failure in the hardware power system module
What You Can Do to Enhance Powerfail Protection
There are a number of steps you can take to enhance powerfail protection in your
system, as follows:
•
Put your applications through power-failure tests (before installing them, if
possible), to observe performance and to identify which operator tasks, if any, are
necessary to restore the applications.
•
Monitor the state of your backup batteries.
•
Purchase and install an independent power source (generator).
Monitoring Backup Batteries
The length of time the battery can maintain memory depends on the state of the battery
and the size of the memory to be maintained. It is important to monitor the state of
batteries regularly.
You can monitor the batteries using the TSM physical view.
Configuring Processors for Stress Periods
If a processor fails and you have not sized your system’s memory requirements
correctly, system performance will be affected to the extent that processes will have
difficulty executing.
While the Tandem NonStop Kernel operating system can survive a single processor
failure, it is possible that a highly available application might not be able to survive a
single processor failure. This is possible if processors are so busy that they are not able
to take on the additional load caused by the single processor failure or if the application
is not properly coded. For example, suppose all of the processors in your system are