NonStop NS16000 Planning Guide (H06.08+)
Integrity NonStop NS16000 System Description
HP Integrity NonStop NS16000 Planning Guide—529567-009
4-8
Processor Synchronization and Rendezvous
Processor Synchronization and Rendezvous
Synchronization and rendezvous at the LSUs perform two main functions:
•
Keep the individual PEs in a logical processor in loose lock-step through a
technique called rendezvous. Rendezvous occurs to:
°
Periodically synchronize the PEs so they execute the same instruction at the
same time. Synchronization accommodates the slightly different clock speed
within each PE.
°
Allow each PE to individually and deterministically respond to asynchronous
incoming interrupts and then to respond collectively as a single logical
processor.
°
Exchange software state information when performing operations that are
distributed across PEs; for example, memory reintegration, error handling, and
memory scrubbing.
•
Compare output from each PE. If identical, the output is transmitted over the
ServerNet fabrics. If the PE outputs are not the same, appropriate actions occur to
identify the errant one and to recover from the failure. Under some failure
conditions, it can be necessary to stop normal operations of the erring PE.
Memory Reintegration
Memory reintegration initiates processing in a PE whose operation has been stopped
because the NonStop Blade Element diverged or has been replaced. This reintegration
requires that all of the memory and processor states be copied from a functioning PE
to the target PE. Once the memory and processor state data is copied, rendezvous is
used to complete the reintegration. This entire reintegration operation is invisible to the
running applications.
Failure Recovery for Duplex Processor
Duplex processors have no single points of failure. Any single element of a duplex
processor might fail, but alternative paths exist for operation of user applications.
Failure of a complete NonStop Blade Element reduces the system to operation on the
running NonStop Blade Element. The failure of an LSU might take down the associated
logical processor, but in this event, the operating system activates the backup
processes in other logical processors. The system remains available to the
applications as if no failure occurred.
The errant processor is reset and is then synchronized with the running one. If the
failure rate exceeds a predetermined threshold value within a period of time, the failing
processor is reset and held for repair action.










