Serviceguard NFS Toolkit for Linux Version A.03.00 Release Notes, March 2009

NOTE: toolkit.sh, hanfs.sh, nfs.mon, nfs.flm and hanfs.conf should be in the
same directory. In case of legacy packages even the pkg.cntl script must be in the same directory.
Do not rename the following files since their names are hard coded in the control scripts:
toolkit.sh, hanfs.sh, hanfs.conf, nfs.flm, nfs.mon, tkit_module.sh, tkit_gen.sh,
lockmigration.sh, nfs.1 and it’s softlink nfs.
Known Problems and Workarounds
The following describes known problems with the NFS Toolkit and workarounds for them.
However, this is subject to change without notice. For the most current information contact your
HP support representative.
More recent information on known problems and workarounds may be available on the Hewlett
Packard IT Resource Center:
http://itrc.hp.com (Americas and Asia Pacific)
http://europe.itrc.hp.com (Europe)
JAGaf57739: HA NFS and ‘Stale NFS Handle
What is the problem?
Serviceguard relies on LVM (Logical Volume Manager) to manage its shared storage, which
contain data and files of applications managed by Serviceguard. Just as with other applications,
an NFS instance managed by Serviceguard means that the specific instance is “active” on one
node at a time, with all resources available to that node only (including volume groups configured
as a resource for the NFS package).
If the package goes down on node 1, the resources are released so that the second node can
“claim” the resources. The package is brought up, and the instance is now “active” on node 2.
NFS clients continue to connect to the server, unaware that the server has migrated from one
node to another.
The behavior has changed in Linux kernel 2.6 such that the kernel uses lvm2 with the device
mapper to virtualize devices instead of using the actual physical names.
In this implementation, the actual device node for a logical volume is dynamically created upon
volume group activation, meaning a volume group that starts out on node 1 but is failed over
to node 2 can very easily end up with a different minor number after fail over. This will result
in clients who connected before the failover getting a “stale NFS handle”.
The volume groups are created with dynamic minor numbers. This causes NFS problems (‘stale
nfs handle’) on the client when the NFS package is migrated to another node.
What is the workaround?
There are two ways to address this issue:
Create the logical volumes with persistent minor numbers.
Export the filesystem with an assigned filesystem identification.
NOTE: Refer to the README File for more detailed information.
JAGag06739: Serviceguard/LX NFS server returns ESTALE when package is brought
down
What is the problem?
When the NFS package is brought down, the client process gets ESTALE from the server while
it should not. There is a delay when the interface is brought down and that package could still
reach the NFS server layer.
10 Serviceguard NFS for Linux Version A.03.00 Release Notes