VERITAS Volume Manager 3.5 Administrator's Guide (September 2002)

Chapter 10, Administering Cluster Functionality
Cluster Initialization and Configuration
257
Clean node shutdown must be used after, or in conjunction with, a procedure to halt all
cluster applications. Depending on the characteristics of the clustered application and its
shutdown procedure, a successful shutdown can require a lot of time (minutes to hours).
For instance, many applications have the concept of draining, where they accept no new
work, but complete any work in progress before exiting. This process can take a long time
if, for example, a long-running transaction is active.
When the VxVM shutdown procedure is invoked, it checks all volumes in all shared disk
groups on the nodethat is being shut down. The procedure theneither continues withthe
shutdown, or fails for one of the following reasons:
If all volumes in shared disk groups are closed, VxVM makes them unavailable to
applications. Because all nodes are informed that these volumes are closed on the
leaving node, no resynchronization is performed.
If any volume in a shared disk group is open, the shutdown operation in the kernel
waits until the volume is closed. There is no timeout checking in this operation.
Note Once shutdownsucceeds, the node hasleft thecluster. It isnot possible to accessthe
shared volumes until the node joins the cluster again.
Since shutdown can be a lengthy process, other reconfiguration can take place while
shutdown is in progress. Normally, the shutdown attempt is suspended until the other
reconfiguration completes. However, if it is already too far advanced, the shutdown may
complete first.
Note The MC/ServiceGuard cmhaltnode command first attempts to halt all packages
that are using shareddisks before attemptingto shutdown VxVM. If an application
running outside of a definedpackage performsI/O toa shared volume, it can delay
shutdown of VxVM, resulting in an MC/ServiceGuard timeout.
Node Abort
If anode doesnot leavea cluster cleanly, this is because it crashed or becausesome cluster
component made the node leave on an emergency basis. The ensuing cluster
reconfiguration calls the VxVM abort function. This procedure immediately attempts to
halt all access to shared volumes, although it does wait until pending I/O from or to the
disk completes.
I/O operations that have not yet been started are failed, and the shared volumes are
removed. Applications that were accessing the shared volumes therefore fail with errors.
After a nodeabort or crash,sharedvolumes must berecovered, either bya surviving node
or by a subsequent cluster restart, because it is very likely that there are unsynchronized
mirrors.