NonStop Systems Introduction for H-Series RVUs

Integrity NonStop NS-Series Server Architecture
NonStop Systems Introduction for H-Series RVUs540083-001
7-12
Multiple Power Sources and Online Repair
Multiple Power Sources and Online Repair
One threat to continuous system operation lies outside the system itself: the danger of
a power failure. No system is immune to a total power failure, but the Integrity NonStop
server contains a number of mechanisms to minimize the effects of power failures.
These mechanisms have the added advantage of enabling the operations staff or HP
service personnel to take individual hardware components out of operation for repair
without shutting down the whole system.
Dual redundant AC power feeds supply each cabinet, with this dual power feeding
each enclosure in the system. If one AC power feed fails, the system continues
operation on the other feed.
Each NonStop Blade Element has its own redundant power supplies and can be
brought up and shut down independently of all the other processors so that individual
repairs can be performed. The ability to remove and repair an individual component
while the rest of the system continues to operate is known as online repair.
As in the case of processors, the IOAM and ServerNet hardware is fully redundant so
only the failed component is individually powered down for replacement while the
system continues to operate.
A site uninterruptable power supply (UPS) or motor generator of sufficient capacity can
ensure continuous system up-time during power failures. If a site does not have a
UPS, an optional UPS and extended runtime module (ERM) can be installed with each
cabinet in the system to provide a ride-through power backup when a momentary
loss of AC power occurs. If the power outage lasts longer than the ride-through time
and only a UPS is available (the site does not have a motor generator), an orderly
shutdown of the system might be necessary.
Detection and Correction of Hardware Errors
Processes called synchronization and rendezvous at the LSUs perform two main
functions:
To keep the individual processor elements (PEs) in a logical processor in loose
lock-step through a technique called rendezvous. Rendezvous occurs to:
°
Periodically synchronize the PEs so they execute the same instruction at the
same time. Synchronization accommodates the slightly different clock speed
within each PE.
°
Allow each PE to individually and deterministically respond to asynchronous
incoming interrupts and then to respond together as a logical processor.
°
Exchange software state information when performing operations that are
distributed across PEs; for example, memory reintegration, error handling, and
memory scrubbing.
To compare output from each PE. If identical, the output is transmitted over the
ServerNet fabrics. If the PE outputs are not the same, appropriate actions occur to