NonStop NS-Series Planning Guide (H06.03+)

Introduction to Integrity NonStop NS-Series Systems
HP Integrity NonStop NS-Series Planning Guide529567-004
1-8
Processor Synchronization and Rendezvous
Processor Synchronization and Rendezvous
Synchronization and rendezvous at the LSUs perform two main functions:
Keep the individual PEs in a logical processor in loose lock-step through a
technique called rendezvous. Rendezvous occurs to:
°
Periodically synchronize the PEs so they execute the same instruction at the
same time. Synchronization accommodates the slightly different clock speed
within each PE.
°
Allow each PE to individually and deterministically respond to asynchronous
incoming interrupts and then to respond collectively as a single logical
processor.
°
Exchange software state information when performing operations that are
distributed across PEs; for example, memory reintegration, error handling, and
memory scrubbing.
Compare output from each PE. If identical, the output is transmitted over the
ServerNet fabrics. If the PE outputs are not the same, appropriate actions occur to
identify the errant one and to recover from the failure. Under some failure
conditions, it can be necessary to stop normal operations of the erring PE.
Memory Reintegration
Memory reintegration initiates processing in a PE whose operation has been stopped
because the slice diverged or has been replaced. This reintegration requires that all of
the memory and processor state be copied from a functioning PE to the target PE.
Once the memory and processor state information are copied, rendezvous is used to
complete the reintegration. This entire reintegration operation is invisible to the running
applications.
Failure Recovery for Duplex Processor
Duplex processors have no single points of failure. Any single element of a duplex
processor might fail, but alternative paths exist for operation of user applications.
Failure of a complete slice reduces the system to operation on the running slice. The
failure of an LSU might take down the associated logical processor, but in this event,
the operating system activates the backup processes in other logical processors. The
system remains available to the applications as if no failure occurred.
The errant processor is reset and then it is synchronized with the running one. Should
the failure rate exceed a predetermined threshold value within a period of time, the
failing processor is reset and held for repair action.