Using NFS as a file system type with HP Serviceguard A.11.20 on HP-UX and Linux

Configuring the cluster parameter CONFIGURED_IO_TIMEOUT_EXTENSION

In a Serviceguard cluster in which NFS-imported file systems are used, an unlikely but possible scenario exists in which

data corruption could occur. The scenario is as follows:

1. A Serviceguard package using an NFS file system (“NFSPkg”) is running on cluster node client-1

2. Node client-1 issues an NFS write request immediately before NFSPkg moves to another cluster node.

3. NFSPkg is started on the adoptive node client-2

4. Adoptive node client-2 begins sending NFS write requests to the same file and offset as the write request

previously sent by client-1 just before the package was moved.

5. If the original NFS write request from client-1 arrives on the NFS server after the new write requests from

client-2, the server would overwrite the data sent from client-2, thus resulting in data corruption.

To prevent this, you must determine a maximum delay between when a write is issued from any Serviceguard node and

when it can arrive at the NFS server. The following typical scenarios and illustrations could give you some guidance.

The NFS write may go through network switches before it reaches the NFS server. In each switch, the packet will be

dropped after some specific time has elapsed. The IEEE Bridge specification—802.1D, refers to this value as MBTD.

Important: All switches and routers that are configured between the NFS server and Serviceguard nodes must support MBTD.

You can calculate the lifetime of an NFS client’s write packet by adding the MBTD value of all the switches and routers

that are configured between the NFS server and the Serviceguard nodes.

You must set the Serviceguard cluster parameter CONFIGURED_IO_TIMEOUT_EXTENSION for any cluster in which

packages use NFS mounts. See the section on cluster configuration parameters in the Managing Serviceguard Manual

for more information about CONFIGURED_IO_TIMEOUT_EXTENSION.

To set the value for the CONFIGURED_IO_TIMEOUT_EXTENSION, first determine MBTD for each switch and router. The

value should be in the vendors’ documentation. Set the CONFIGURED_IO_TIMEOUT_EXTENSION to the sum of the values

for the switches and routers. If there is more than one possible path between the NFS server and the cluster nodes, add

the values for each path and use the largest number.

The CONFIGURED_IO_TIMEOUT_EXTENSION value will increase with the increase in routers and switches there are

between the NFS server and Serviceguard nodes. The cluster reformation time is increased by the

CONFIGURED_IO_TIMEOUT_EXTENSION, so keep this value as low as possible by appropriate routing between the NFS

server and Serviceguard nodes, or by using hardware that supports smaller MBTD values.

The CONFIGURED_IO_TIMEOUT_EXTENSION parameter must also be set in some cases in an extended-distance cluster

(EDC). See the discussion of this parameter in the Managing Serviceguard Manual for details. If packages use NFS

imports in an EDC, calculate the settings for each case separately (that is, the value required for the EDC configuration,

and the value required for NFS) and use the greater of the two values.

For example, if the EDC configuration requires CONFIGURED_IO_TIMEOUT_EXTENSION to be 1000000 (microseconds)

and the NFS configuration requires it to be 2000000, set it to 2000000.