HP XC System Software Administration Guide Version 3.2

7 Monitoring the System

System monitoring can identify situations before they become problems. This chapter addresses

the following topics:

• “Monitoring Tools” (page 87)

• “Monitoring Strategy” (page 88)

• “Displaying System Environment Data” (page 89)

• “Monitoring Disks” (page 90)

• “Displaying System Statistics” (page 90)

• “Logging Node Events” (page 92)

• “The collectl Utility” (page 94)

• “HP Graph” (page 97)

• “The resmon Utility” (page 101)

• “The netdump and crash Utilities” (page 102)

7.1 Monitoring Tools

Tools for monitoring the HP XC System Software include the following:

• Standard Linux monitoring commands:

— ps

— sar

— top

— uptime

— vmstat

— w

You can use these administrative commands from any node to determine the health of an

individual node. Information for these commands is available from their corresponding

manpages.

• Utilities developed by HP:

— The collectl utility. See “The collectl Utility” (page 94) for more information.

— The HP XC shownode metrics command, which can be issued from any node in the

HP XC system, provides the ability to monitor the status of all the nodes in the system.

These arguments to the shownode metrics command monitor the node status:

◦ shownode metrics cpus

◦ shownode metrics cputotals

◦ shownode metrics load

◦ shownode metrics mem

◦ shownode metrics paging

◦ shownode metrics sensors

◦ shownode metrics swap

For more information, see “Displaying System Statistics” (page 90) and shownode(8).

• Externally developed software:

— The Nagios Web-based utility displays a series of windows that provide system statistics.

Chapter 8 (page 105) discusses Nagios.

— Supermon is a highly scalable, high-speed cluster monitoring system. Supermon provides

all required node statistics to the Nagios subsystem. System statistics are tiered,

aggregated, and stored in the configuration and management database (CMDB)s.

7.1 Monitoring Tools 87