Optimizing Serviceguard Failover Time, Version A.11.19 and later, April 2009

6
Cluster component recovery
In this step, Serviceguard does miscellaneous tasks, such as cluster information synchronization and
package determination. If Serviceguard extension for RAC is configured then Serviceguard provides
new membership to it. If packages are down due to a failure, Serviceguard determines on which
node(s), if any, they should be restarted. (See “Serviceguard implementation: Package Determination”
in next section)
The time needed for cluster component recovery depends mainly on how many packages need to be
restarted. At the end of this recovery phase, Serviceguard starts the packages.
The user cannot directly change the time needed for cluster component recovery and in general, it is a
short step (typically less than one second).
Environments using Serviceguard with Veritas Cluster Volume Manager (CVM) from Symantec or
Serviceguard Storage Management Suite with Veritas Cluster File System (CFS) from Symantec require
additional time during cluster component recovery to synchronize cluster memberships between
Serviceguard and Veritas cluster components prior to package determination. The time required to
synchronize memberships largely depends on the type of failure. There are three types of failures to
consider. In the case of system panic, machine check, or power failure, cluster component recovery
requires an additional 4 seconds. Alternatively, in the case of a node or service failfast type failure,
an additional 8 seconds is required. Finally, in failures where the cluster monitor is unable to run or is
killed, as in a kernel hang or a reboot, it can take up to an additional cluster reformation time to
synchronize the memberships.
Users with Serviceguard and CVM or CFS configurations can minimize cluster component recovery
time by always using cmhaltnode(1M) prior to issuing shutdown(1M) or reboot(1M) when restarting a
node in the cluster.
Serviceguard implementation: resource recovery
When Serviceguard starts a package, the application-dependent part of failover begins. Package
resources are made available, ready for the package’s applications to start. Package resources
include IP addresses, file systems, volume groups, and disk groups needed by the package. Some
resources may require other recovery steps before they can be used.
The time to complete resource recovery is determined by the package resources.
Serviceguard implementation: applications recovery
The commands defined in package configuration, completes the application-dependent part of
failover. It includes package application recovery and restart. The amount of time it takes depends on
the applications and how they are configured.
Serviceguard with Serviceguard Extension for RAC: group membership reconfiguration
When Serviceguard Extension for RAC communicates the group membership to Oracle RAC, the
application-dependent part of a RAC failover starts. If there is a change in membership, RAC will start
reconfiguration. RAC needs to know which nodes are in the re-formed cluster; if the node holding the
database lock leaves the cluster, another node needs to claim the lock.
The time needed for group membership reconfiguration is determined by RAC, and the user cannot
directly change it.
Serviceguard with Serviceguard Extension for RAC: RAC reconfiguration
After Oracle RAC is notified of a cluster membership change, it starts its own reconfiguration to claim
the database locks that were on failed nodes. RAC reconfiguration and recovery occurs in the RAC
instances running on the other nodes in the cluster.
The time needed for this step is determined by RAC, and the user cannot directly change it.