HP Insight Control for Linux 6.2 User Guide

ManualsBrandsHP ManualsSoftwareHP Insight Control for Linux v2.0 Media Kit

241

242

243

244

245

246

247

248

249

250

Service: Environment

Status Information: Node sensor status

A warning or critical message indicates that one or more monitored sensors reported that a threshold

was exceeded.

Correct the condition.

Service: Load Average

Status Information: Node Load Ave: x/y/z QueLen: n

A warning or critical message indicates that load average thresholds for the specific managed system

were exceeded.

Thresholds can be set on a per-managed system, per-class, or per-system basis in the

nagios_vars.ini file. These values are specific to the site and depend on site load.

If the load average thresholds are reasonable, monitor for excessive activity on the managed system.

Service: Nagios Monitor

Status Information: Nagios status information

Typically, the status of Nagios, the number of Nagios services located, and the last time the Nagios

status log was updated.

A warning or critical message indicates that one or more of the Nagios monitor processes either failed

or reported error conditions that can degrade monitoring.

Ensure that the managed system can communicate with the CMS.

Service: Nodeinfo

Status Information: Node process status total/user/zombie , uptime

Displays the total number of processes, the number of user processes, the number of Zombie processes,

and the uptime for the Nagios host.

A warning or critical message indicates that thresholds for the specific managed system were exceeded.

Thresholds can be set on a per-managed system, per-class or per-system basis in the

nagios_vars.ini file. These values are specific to the site and depend on site load.

If thresholds are reasonable, monitor for excessive activity on the managed system.

Service: Supermon Metrics Monitor

Status Information: Supermon node metrics retrieval status

Reports the status of the Supermon service and the number of systems from which it collected metrics

data.

A warning or critical message indicates that one or more systems were not accessible during metrics

collection or a Nagios service_check_timeout interval timed-out.

These messages can occur if metrics collection cannot be completed in a reasonable time; examine the

/opt/hptc/nagios/etc/nagios.cfg file for the value of the service_check_timeout

parameter.

The default works best for configurations with fewer than 256 managed systems.

Increase the value of the service_check_timeout parameter to solve the problem for configurations

with more managed systems.

Also, run the following command to verify that the supermond service is running on the CMS:

# /etc/init.d/supermond status

Loss or time-outs of this service can cause per-managed system warnings for nodeinfo, load

average and system free space.

A non-timeout warning or critical message indicates some monitored managed systems are not

responding; this is normal if the managed systems are down or otherwise disabled.

Service: Syslog Alert Monitor

Status Information: Status of consolidated.log syslog monitoring

25.14 Nagios Troubleshooting 241