Computer Hardware User Manual

124 IBM Certification Study Guide AIX HACMP
Each time an error is logged in the system error log, the error notification
daemon determines if the error log entry matches the selection criteria. If it
does, an executable is run. This executable, called a
notify method
,
can
range from a simple command to a complex program. For example, the notify
method might be a mail message to the system administrator or a command
to shut down the cluster.
Using the Error Notification facility adds an additional layer of high availability
to the HACMP for AIX software. Although the combination of the HACMP for
AIX software and the inherent high availability features built into the AIX
operating system keeps single points of failure to a minimum, failures still
exist that, although detected, are not handled in a useful way.
Take the example of a cluster where an owner node and a takeover node
share an SCSI disk. The owner node is using the disk. If the SCSI adapter on
the owner node fails, an error may be logged, but neither the HACMP for AIX
software nor the AIX Logical Volume Manager responds to the error. If the
error has been defined to the Error Notification facility, however, an
executable that shuts down the node with the failed adapter could be run,
allowing the surviving node to take over the disk.
5.3 Network Modules/Topology Services and Group Services
The HACMP for AIX SMIT interface allows you to add, remove, or change an
HACMP for AIX network module. You rarely need to add or remove any of
those, however, you may want to change the failure detection rate of a
network module.
There are three values to choose from:
Fast, Normal
and
Slow.
The normal
heartbeat rate is usually optimal. Speeding up or slowing down failure
detection is an area where you can adjust cluster failover behavior.
If you decide to change the failure detection rate of a network module, keep
the following considerations in mind:
Failure detection is dependent on the fastest network linking two nodes.
Faster heartbeat rates may lead to false failure detections, particularly on
busy networks. For example, bursts of high network traffic may delay
heartbeats and this may result in nodes being falsely ejected from the
cluster. Faster heartbeat rates also place a greater load on networks.
If your networks are very busy and you experience false failure detections,
you can try changing the failure detection speed on the network modules
to slow to avoid this problem.