NonStop Systems Introduction
NonStop Server Architecture
NonStop Systems Introduction—527825-001
7-9
Multiple Power Sources and Online Repair
Multiple Power Sources and Online Repair
One threat to continuous system operation lies outside the system itself: the danger of 
a power failure. No system is immune to a total power failure, but the NonStop server 
contains a number of mechanisms to minimize the effects of power failures. These 
mechanisms have the added advantage of enabling the operations staff or HP service 
personnel to take individual hardware components out of operation for repair without 
shutting down the whole system.
Each processor has its own power supply and can be brought up and shut down 
independently of all the other processors so that individual repairs can be performed. 
The ability to remove and repair an individual component while the rest of the system 
continues to operate is known as online repair.
As in the case of processors, the I/O board containing the logic for ServerNet 
addressable controllers can be individually powered up or down to allow it to be 
replaced while the system continues to operate.
The power supply for each processor includes a battery backup system to provide a 
ride-through power backup feature (in addition to the commonly implemented 
power-fail interrupt memory maintenance function) when loss of AC power occurs. 
The ride-through feature (or power-fail delay) permits the processor to continue 
operating for about 20 to 30 seconds without AC power. If the power outage lasts 
longer than the ride-through time, then the usual power-fail interrupt occurs to protect 
the contents of memory. The battery can maintain the contents of main memory for up 
to several hours, depending on the size of memory.
In the case of a full shutdown following a power failure, assuming that power is 
restored while the batteries are still maintaining the memory contents, the system 
automatically resumes operation within minutes following restoration of power. After 
bringing disks and tapes back to full operating speed, the system recovers any files 
that might have been compromised and resumes processing transactions against 
these files. Of course, if the power outage lasts a very long time (longer than the 
batteries can maintain proper memory contents), operator intervention is required—
possibly with an alternate AC power source.
Detection and Correction of Hardware Errors
As explained in Processor Checking on page 6-12, the operating system running in 
each processor in the NonStop server checks the status of all other processors in the 
system by sending periodic messages, called “I’m alive” messages, to each processor. 
In addition, the processors themselves perform extensive self-checking. When an 
error occurs, the processor either reports it to the operating system or takes itself out of 
service.
In some instances, processors are able to correct errors and continue running rather 
than halt. For example, if an error occurs in main memory, the processor detects and, 
if possible, corrects the error using an error correcting code (ECC). Whenever a word 
of main memory gets a correctable error, the processor detects it and uses the ECC 
information to derive the correct data and rewrites the word.










