HP Insight Control for Linux 6.2 User Guide

Service: Environment
Status Information: Node sensor status
A warning or critical message indicates that one or more monitored sensors reported that a threshold
was exceeded.
Correct the condition.
Service: Load Average
Status Information: Node Load Ave: x/y/z QueLen: n
A warning or critical message indicates that load average thresholds for the specific managed system
were exceeded.
Thresholds can be set on a per-managed system, per-class, or per-system basis in the
nagios_vars.ini file. These values are specific to the site and depend on site load.
If the load average thresholds are reasonable, monitor for excessive activity on the managed system.
Service: Nagios Monitor
Status Information: Nagios status information
Typically, the status of Nagios, the number of Nagios services located, and the last time the Nagios
status log was updated.
A warning or critical message indicates that one or more of the Nagios monitor processes either failed
or reported error conditions that can degrade monitoring.
Ensure that the managed system can communicate with the CMS.
Service: Nodeinfo
Status Information: Node process status total/user/zombie , uptime
Displays the total number of processes, the number of user processes, the number of Zombie processes,
and the uptime for the Nagios host.
A warning or critical message indicates that thresholds for the specific managed system were exceeded.
Thresholds can be set on a per-managed system, per-class or per-system basis in the
nagios_vars.ini file. These values are specific to the site and depend on site load.
If thresholds are reasonable, monitor for excessive activity on the managed system.
Service: Supermon Metrics Monitor
Status Information: Supermon node metrics retrieval status
Reports the status of the Supermon service and the number of systems from which it collected metrics
data.
A warning or critical message indicates that one or more systems were not accessible during metrics
collection or a Nagios service_check_timeout interval timed-out.
These messages can occur if metrics collection cannot be completed in a reasonable time; examine the
/opt/hptc/nagios/etc/nagios.cfg file for the value of the service_check_timeout
parameter.
The default works best for configurations with fewer than 256 managed systems.
Increase the value of the service_check_timeout parameter to solve the problem for configurations
with more managed systems.
Also, run the following command to verify that the supermond service is running on the CMS:
# /etc/init.d/supermond status
Loss or time-outs of this service can cause per-managed system warnings for nodeinfo, load
average and system free space.
A non-timeout warning or critical message indicates some monitored managed systems are not
responding; this is normal if the managed systems are down or otherwise disabled.
Service: Syslog Alert Monitor
Status Information: Status of consolidated.log syslog monitoring
25.14 Nagios Troubleshooting 241