LAN Configuration and Management Manual
Introduction to the SLSA Subsystem
LAN Configuration and Management Manual—520469-012
2-16
Fault Tolerance of the SLSA Subsystem
Fault Tolerance of the SLSA Subsystem
The SLSA subsystem operates even after a loss of one of its components. This section
describes how the SLSA subsystem maintains data paths to the LAN subsystem when
a processor, SAC, media, or adapter is unavailable.
Loss of a ServerNet Fabric
When access through a ServerNet fabric is lost, all traffic goes through the remaining
fabric. When the failing fabric is restored, all traffic goes over both fabrics again. If both
fabrics fail, traffic moves as described in Loss of Access of One Processor to a SAC on
page 2-16 or Loss of Access of All Processors to a SAC on page 2-17.
When access through a ServerNet fabric is lost, SLSA events are generated and
stored in the Event Management Service (EMS) log. Transient-fault event messages
that show a transient-fault number of 3044 (Path switched to X fabric) or 3045 (Path
switched to Y fabric) may indicate that access across a fabric was lost.
Loss of Access of One Processor to a SAC
If a SAC on an adapter becomes unavailable to a LANMON that has a data path to the
SAC, all clients (such as TCP/IP, PAM, and IPX/SPX) on that processor lose access,
and the SLSA subsystem performs the following steps:
1. The LANMON detects that it cannot communicate with the SAC because:
a. The SAC issued a command that timed out.
b. The SAC received an error.
c. The SAC did not respond after 3 consecutive pings sent 10 seconds apart.
2. The LANMON tries to recover access to the SAC by trying the alternate ServerNet
fabric. If the LANMON does not succeed:
a. The LANMON informs the LANMAN process about lost access to the SAC.
b. If the LANMON that lost access was the owning LANMON, the LANMAN
process assigns ownership of the SAC to another processor, but only if the
LANMON in that processor can access the SAC.
c. LANMAN checks whether any other processors have access, and, if none
does, proceeds to the behavior described in Loss of Access of All Processors
to a SAC on page 2-17. If other processors have access to the SAC, LANMAN
repeats Step 2b (above).
If access is not reestablished, the clients have to wait until the SAC automatically
comes up again.