6.2.2 HP IBRIX 9000 Storage Release Notes (AW549-96061, January 2013)

In some situations, ibrix_collect successfully collects information after a system crash but
fails to report a completed collection. The information is available in the /local/ibrixcollect/
archive directory on one of the file serving nodes.
The ibrix_collect command supports a maximum collection size of 4 GB. If the size of the
final collection exceeds 4 GB, the collection fails. You must either:
Delete the excess older logs from each node to reduce the size.
Or
Manually collect the individual collection, which is stored on each node in the following
format:
/local/ibrixcollect/<node_name>_<collection_name>_<time>.tgz
Cluster component states
Changes in file serving node status do not appear on the management console until 6 minutes
after an event. During this time, the node status may appear to be UP when it is actually DOWN
or UNKNOWN. Be sure to allow enough time for the management console to be updated before
verifying node status.
Generally, when a vendorstorage component is marked Stale, the component has failed
and is not responding to monitoring. However, if all components are marked Stale, this implies
a failure of the monitoring subsystem. Temporary failures of this system can cause all monitored
components to toggle from Up, to Stale, and back to Up. Common causes of failures in the
monitoring system include:
Reboot of a file serving node
Network connectivity issues between the management console and a file serving node
Resource exhaustion on a file serving node (CPU, RAM, I/O or network bandwidth)
While network connectivity and resource exhaustion issues should be investigated, they can occur
normally due to heavy workloads. In these cases, you can reduce the frequency at which
vendorstorage components are monitored by using the following command:
ibrix_fm_tune -S -o vendorStorageHardwareStaleInterval=1800
The default value of this command is 900; the value is in seconds. A higher value reduces the
probability of all components toggling from Up to Stale and back to Up because of the conditions
listed above, but will increase the time before an actual component failure is reported.
Workarounds 21