HP XC System Software Administration Guide Version 3.2

By default, the collectl service gathers information on the following subsystems:

• CPU

• Disk

• Inode and file system

• Lustre file system

• Memory

• Networks

• Sockets

• TCP

• Interconnect

The collectl(1) manpage discusses running the collectl utility as a service.

7.7.3 Running the collectl Utility in a Batch Job Submission

You can run the collectl utility as one job in a batch job submission. In a batch job submission,

the purpose of the collectl utility is to monitor the node while the batch job processes. You

must modify the job submission script, as follows:

1. Determine on which node the collectl utility is to be run.

2. Decide which options you need. Typically, the following options define:

• The output file, specified with the -f option.

• The subsystem data to collect, specified with the -s option. The subsystems include

the following:

— CPU

— Disk

— Inode and File System

— Interconnect

— Memory

— Networks

— NFS V3 data

— TCP

• The number of seconds in the sampling interval, specified with the -i option.

3. Start the collectl utility on each node with the ssh utility.

Be sure to run the collectl utility in the background so that the script does not hang while

waiting for the collectl utility to complete.

Collect the process ID for the collectl utility on each node.

Allow the collectl utility from 5 to 10 seconds to start and quiesce.

4. Start the batch job and allow it to complete.

5. When the batch job completes, stop the collectl process on each node by killing its process

ID. The collectl process traps the SIGNINT signal and shuts down cleanly.

6. Copy the files that the collectl process created on the node's disk, and store them in a

separate location for later review.

Delete the files that the collectl process created.

Another alternative is to log in to one of the compute nodes used by the application, and run the

collectl utility on the command line.

96 Monitoring the System