6.5 HP StoreAll 8200/9300 Storage Administrator Guide

File serving nodes can be in one of three operational states: Normal, Alert, or Error. These states
are further broken down into categories describing the failover status of the node and the status
of monitored NICs and HBAs.
DescriptionState
Up: Operational.Normal
Up-Alert: Server has encountered a condition that has been logged. An event will appear in the Status
tab of the GUI, and an email notification may be sent.
Alert
Up-InFailover: Server is powered on and visible to the Fusion Manager, and the Fusion Manager is
failing over the server's segments to a standby server.
Up-FailedOver: Server is powered on and visible to the Fusion Manager, and failover is complete.
Down-InFailover: Server is powered down or inaccessible to the Fusion Manager, and the Fusion
Manager is failing over the server's segments to a standby server.
Error
Down-FailedOver: Server is powered down or inaccessible to the Fusion Manager, and failover is
complete.
Down: Server is powered down or inaccessible to the Fusion Manager, and no standby server is providing
access to the server's segments.
The STATE field also reports the status of:
Monitored NICs and HBAs. If you have multiple HBAs and NICs and some of them are down,
the state is reported as Up, HBAsDown or Up, NicsDown.
Uptime of the node. If the number of consecutive days that the node has been up surpasses
the threshold (set by the serverUptimeEventThreshold parameter of the ibrix_fm_tune
command), the state is reported as Up, UptimeOverThreshold. The default (and recommended)
threshold is 400 days. If you see the state reported as UptimeOverThreshold, reboot the node
as soon as possible to prevent the file systems from eventually becoming unresponsive. To
reboot the node, see “Powering nodes on or off” (page 96).
NOTE: You can reboot the node at any time. The purpose of implementing these features is to
ensure the maximum uptime of a node does not exceed 400 days, thereby preventing file system
performance issues.
Monitoring cluster events
StoreAll software events are assigned to one of the following categories, based on the level of
severity:
Alerts. A disruptive event that can result in loss of access to file system data. For example, a
segment is unavailable or a server is unreachable.
Warnings. A potentially disruptive condition where file system access is not lost, but if the
situation is not addressed, it can escalate to an alert condition. Some examples are reaching
a very high server CPU utilization or nearing a quota limit.
Information. An event that changes the cluster (such as creating a segment or mounting a file
system) but occurs under normal or nonthreatening conditions.
Events are written to an events table in the configuration database as they are generated. To
maintain the size of the file, HP recommends that you periodically remove the oldest events. See
“Removing events from the events database table” (page 87).
You can set up event notifications through email (see “Viewing email notification of cluster events
(page 59)) or SNMP traps (see “Using SNMP notifications” (page 60)).
86 Monitoring cluster operations