NonStop NS14000 Planning Guide (H06.10+)

Memory Reintegration
Memory reintegration initiates processing in a PE whose operation has been stopped because
the NonStop Blade Element diverged or has been replaced. This reintegration requires that all
of the memory and processor states be copied from a functioning PE to the target PE. Once the
memory and processor state data is copied, rendezvous is used to complete the reintegration.
This entire reintegration operation is invisible to the running applications.
Failure Recovery for Duplex Processor
Duplex processors have no single points of failure. Any single element of a duplex processor
might fail, but alternative paths exist for operation of user applications. Failure of a complete
NonStop Blade Element reduces the system to operation on the running NonStop Blade Element.
The failure of an LSU might take down the associated logical processor, but in this event, the
operating system activates the backup processes in other logical processors. The system remains
available to the applications as if no failure occurred.
The errant processor is reset and is then synchronized with the running one. If the failure rate
exceeds a predetermined threshold value within a period of time, the failing processor is reset
and held for repair action.
Failure Recovery for Triplex Processor
In triplex processors, each LSU has inputs from the three processor elements within a logical
processor. As with the duplex processor, the LSU keeps the three PEs in loose lockstep. The LSU
also checks the outputs from the three PEs. If outputs from one of the PEs is not the same as the
other two, the errant result is ignored, and the result from the other two PEs is sent to the
ServerNet fabrics. Reintegration works the same as in the duplex processor. The number of PEs
in a reintegration depends on the conditions of the failure and the configuration of the hardware.
The failure of a NonStop Blade Element in a triplex processor reduces processor operation to
duplex. When the failing unit is replaced, the reintegration function restores the system to triplex
operation. If failure of an LSU takes down its associated logical processor, the operating system
activates the backup processes in other logical processors. The system runs user applications as
if no failure occurred.
As with a duplex processor, the errant processor is reset, and is then synchronized with the
running processors. If the failure rate exceeds a predetermined threshold value within a period
of time, the failing processor is reset and held for repair action.
ServerNet Fabric I/O
This subsection provides information about the ServerNet network in an Integrity NonStop
NS14000 system and covers these topics:
“Overview of the ServerNet Fabric” (page 41)
“Simplified ServerNet System Diagram” (page 41)
“ServerNet Pathways in the VIO Enclosure” (page 42)
“Example of ServerNet Pathways” (page 43)
For further information on the ServerNet network, protocols, IP addresses, and naming
conventions, see the Introduction to Networking for Integrity NonStop NS-Series Servers.
40 Integrity NonStop NS14000 System Description