Optimizing Serviceguard Failover Time, Version A.11.19 and later, April 2009

8
Figure 4. Steps in a failover caused by package failureServiceguard with Serviceguard Extension for RAC implementation
Note: Diagram is not to scale.
With RAC, the two application-dependent steps of failover are different from the steps with
Serviceguard implementations. They are:
Group membership reconfigurationIf there is a change in membership, RAC will start
reconfiguration.
RAC reconfiguration and database recoveryAfter a cluster membership change, RAC reassigns
the database locks that were on failed nodes and restarts the databases.
Serviceguard implementation: resource failure detection
Serviceguard monitors the configured package services and networks.
Serviceguard package configuration can include Event Monitoring Service (EMS), which monitors
hardware such as storage. EMS polls an EMS resource monitor to get returns. The polling interval is
set in the package configuration file. When EMS finds a resource failure, it immediately notifies
Serviceguard.
Generally, when a resource fails, the package will fail over to another node. If the package is
configured with NODE_FAIL_FAST_ENABLED set to “yes”, Serviceguard will cause the node to fail. If
this happens, the process will start at the first step described in the previous section, “The process
when failover is caused by a node failure.”
Serviceguard implementation: package determination
When a package fails, Serviceguard can automatically try to restart it if AUTO_RUN is set to YES
in the package configuration file. If AUTO_RUN is set to YES, Serviceguard next determines where to
start the package. It creates an ordered list of nodes, which is prioritized according to the node list
and the settings of the failover and failback policies in the package’s configuration file.
The user cannot directly change the time needed for package determination.
Serviceguard implementation: resource recovery
Serviceguard starts each package to begin the application-dependent part of failover. During failover,
package resources need to be made available before applications can be started. The package
resources include IP addresses, file systems, volume groups, and disk groups. Some resources may
require recovery before they can be used.
Group membership
reconfiguration
RAC reconfiguration
and database
recovery
Applic
ation
failover time