NonStop Systems Introduction for H-Series RVUs

Integrity NonStop NS-Series Server Architecture
NonStop Systems Introduction for H-Series RVUs540083-001
7-13
ServerNet Clustering for Availability and
Performance
recover from the failure. Under some failure conditions, it can be necessary to stop
normal operations of a PE.
A process known as reintegration is used to start processing in a PE where
processing has been stopped due to either a failure or a service action. Reintegration
requires that all of the memory and processor state be copied from a functioning
NonStop Blade Element to the target NonStop Blade Element. Once the memory and
processor state information are copied, rendezvous is used to complete the
reintegration. This entire reintegration operation is invisible to the running applications.
As explained in Processor Checking on page 6-14, the operating system running in
each logical processor in the Integrity NonStop server checks the status of all other
processors in the system by sending periodic messages, called “I’m alive” messages,
to each processor. In addition, the processors themselves perform extensive self-
checking. When an error occurs, the processor either reports it to the operating system
or takes itself out of service.
In some instances, processors are able to correct errors and continue running rather
than halt. For example, if an error occurs in main memory, the processor detects and, if
possible, corrects the error using an error correcting code (ECC). Whenever a word of
main memory gets a correctable error, the processor detects it, uses the ECC
information to derive the correct data, and rewrites the word.
However, if a word of main memory gets an uncorrectable error, the operating system
immediately halts the processor. The remaining processors notice the absence of “I’m
alive” messages, declare the processor to be down, and take over for the down
processor.
ServerNet Clustering for Availability and
Performance
You have seen how the processors of an Integrity NonStop server communicate with
each other over dual ServerNet fabrics. The fabrics provide a fast, efficient, and
reliable way for the processors to exchange messages.
ServerNet technology can also be used to connect servers in groups called ServerNet
clusters. ServerNet clusters extend the ServerNet X and Y fabrics outside the system
boundary and allow ServerNet to be used for messaging between systems. A
ServerNet cluster consists of individual servers, each containing internal X and Y
fabrics, connected through fiber-optic cables and NonStop cluster switches to other
servers. The fiber-optic cables and NonStop cluster switches constitute external
ServerNet X and Y fabrics. Figure 7-8 on page 7-14 shows how a ServerNet cluster
extends the X and Y fabrics to multiple systems.