6.2.2 HP IBRIX 9000 Storage Release Notes (AW549-96061, January 2013)

• In some situations, ibrix_collect successfully collects information after a system crash but

fails to report a completed collection. The information is available in the /local/ibrixcollect/

archive directory on one of the file serving nodes.

• The ibrix_collect command supports a maximum collection size of 4 GB. If the size of the

final collection exceeds 4 GB, the collection fails. You must either:

◦ Delete the excess older logs from each node to reduce the size.

◦ Manually collect the individual collection, which is stored on each node in the following

format:

/local/ibrixcollect/<node_name>_<collection_name>_<time>.tgz

Cluster component states

• Changes in file serving node status do not appear on the management console until 6 minutes

after an event. During this time, the node status may appear to be UP when it is actually DOWN

or UNKNOWN. Be sure to allow enough time for the management console to be updated before

verifying node status.

• Generally, when a vendorstorage component is marked Stale, the component has failed

and is not responding to monitoring. However, if all components are marked Stale, this implies

a failure of the monitoring subsystem. Temporary failures of this system can cause all monitored

components to toggle from Up, to Stale, and back to Up. Common causes of failures in the

monitoring system include:

◦ Reboot of a file serving node

◦ Network connectivity issues between the management console and a file serving node

◦ Resource exhaustion on a file serving node (CPU, RAM, I/O or network bandwidth)

While network connectivity and resource exhaustion issues should be investigated, they can occur

normally due to heavy workloads. In these cases, you can reduce the frequency at which

vendorstorage components are monitored by using the following command:

ibrix_fm_tune -S -o vendorStorageHardwareStaleInterval=1800

The default value of this command is 900; the value is in seconds. A higher value reduces the

probability of all components toggling from Up to Stale and back to Up because of the conditions

listed above, but will increase the time before an actual component failure is reported.

Workarounds 21