HP Insight Management Agents 9.10 Managing ProLiant Servers with Linux HOW TO Whitepaper

If the normal operating range is exceeded for any of these sensors, the Health Monitor does the
following:
Displays a message on the console stating the problem
Makes an entry in the system health log and the operating system log
Additionally, on some servers, the fans gradually increase to full speed in an attempt to cool the
server as the external environment temperature increases. If the server exceeds the normal operating
range and does not cool down within 60 seconds, the operating system is, in most cases, shut
down to close the file systems.
TIP: On servers that do not have variable speed fans, the server is shut down unless the
ROM-Based Setup Utility (RBSU) Thermal Shutdown feature is disabled. This feature is enabled by
default. Use RBSU to control the shutdown option.
System fan monitoring
A ProLiant server can contain fan sensors. On ProLiant servers with intelligent fan sensors, check
the status of the fans by running hplog -f.
If a cooling fan fails and there is no secondary redundant fan, the Health Monitor does the following:
Displays a message on the console stating the problem
Makes an entry in the system health log and the operating system log
Shuts down the system (optionally) to avoid hardware damage. Use RBSU to control the
shutdown option.
If a secondary or redundant fan is present when a fan fails, the Health Monitor does the following:
Activates the redundant fan if not already running
Displays a message on the console stating the problem
Makes an entry in the system health log and the operating system log
Monitoring the system fault tolerant power supply
If the server contains a redundant power supply, the power load is shared equally between the
power supplies. Check the status of the power supplies by running hplog -p. If a primary power
supply fails, the server automatically switches over to a backup power supply. The Health Monitor
does the following:
Monitors the system for power failure and for physical presence of power supplies
Reports when the power supplies experience a change in shared power load
Displays a message on the console stating the problem
Makes an entry in the system health log and the operating system log
ECC memory monitoring and advanced memory protection
If a correctable ECC memory error occurs, the Health Monitor logs the error in the health log,
including the memory address causing the error. If too many errors occur at the same memory
location, the driver disables the ECC error interrupts to prevent flooding the console with warnings
(the hardware automatically corrects the ECC error).
On servers with AMP, the driver attempts to log an error if a memory board has been inserted,
removed, or incorrectly configured, and optionally if an Online Spare Switchover or Mirrored
Memory engaged event occurs.
8 Software architecture