HP XC System Software Administration Guide Version 3.2

Typically, this entry reports the number of new records processed in the
/hptc_cluster/adm/logs/consolidated.log file.
A warning or critical message occurs when there is insufficient time to process a huge volume of
messages before the Nagios service_check_timeout period expires.
Nagios examines the recent incoming consolidated log messages and issues a warning or critical
message if the incoming rate since last interval exceeds a configured number of records. The default
values are 2 for warnings and 20 for critical. See
/opt/hptc/nagios/libexec/check_syslogalerts for details.
No specific action is required unless the service times out. In that case, an excessive number of syslog
messages is collected across the system; this is more than the plug-in can process in the
service_check_timeout period. See the /opt/hptc/nagios/etc/nagios.cfg file for the
value of the service_check_timeout parameter. Running the following command on the node
reporting error solves the problem:
# /opt/hptc/nagios/libexec/check_syslogalerts domain node:nagios_monitor nsca
Otherwise, wait for the nightly log to roll over.
Service: Syslog Alerts
Status Information: Node Syslog alerts information
Typically, this entry reports the number of alerts in a specified period of time and allows you to access
the most recent log.
A warning or critical message indicates that one or more rules defined in the
/opt/hptc/nagios/etc/syslogAlertRules file matches the specified node's consolidated log
file.
Take the appropriate action based on the message.
Service: System Event Log
Status Information: Node Syslog alerts information
A warning or critical message indicates that one or more rules defined in the
/opt/hptc/nagios/etc/selRules file matches the specified node's firmware System Event Log.
Take the appropriate action based on the System Event Log message.
Service: System Free Space
Status Information: Node / and /var free space
This entry typically displays the status of the /, /var, and /hptc_cluster file systems on the node.
A warning or critical message indicates that the thresholds for the specific node were exceeded.
Clean up disk space.
21.4 System Interconnect Troubleshooting
This section describes the troubleshooting steps for the following supported system interconnects:
“Myrinet System Interconnect Troubleshooting” (page 252)
“Quadrics System Interconnect Troubleshooting” (page 253)
“InfiniBand System Interconnect Troubleshooting” (page 255)
“OFED Troubleshooting Procedures” (page 257)
21.4.1 Myrinet System Interconnect Troubleshooting
The following troubleshooting information applies to the Myrinet system interconnect. Perform
these steps on any node on which you suspect a problem to determine if your HP XC system is
configured properly. If these tests pass but you are still experiencing difficulty, see Chapter 20:
Using Diagnostic Tools (page 231).
1. Run the gm_board_info test:
# /opt/gm/bin/gm_board_info
This command displays all the nodes in the HP XC system.
252 Troubleshooting