HP Insight Cluster Management Utility v7.1 User Guide Abstract This guide describes how to install, configure, and use HP Insight Cluster Management Utility (CMU) v7.1 on HP systems. HP Insight CMU is software dedicated to the administration of HPC and large Linux clusters. This guide is intended primarily for administrators who install and manage a large collection of systems.
© Copyright 2013 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. The information contained herein is subject to change without notice.
Contents 1 Overview................................................................................................11 1.1 Features...........................................................................................................................11 1.1.1 Compute node monitoring............................................................................................11 1.1.2 HP Insight CMU configuration......................................................................................11 1.1.
2.5.5 Installing the HP Insight CMU v7.1 package..................................................................31 2.5.6 Restoring the HP Insight CMU configuration..................................................................31 2.5.7 Starting HP Insight CMU............................................................................................31 2.5.8 Deploying the monitoring client...................................................................................31 2.5.9 Deploying cmugui.
4.6 Rescan MAC....................................................................................................................53 4.7 HP Insight CMU image editor.............................................................................................54 4.7.1 Expanding an image..................................................................................................54 4.7.2 Modifying an image..................................................................................................55 4.
5.5.2 Actions....................................................................................................................78 5.5.3 Alerts.......................................................................................................................79 5.5.4 Alert reactions..........................................................................................................79 5.5.5 Modifying the sensors, alerts, and alert reactions monitored by HP Insight CMU................80 5.5.
.2.2 Delete diskless image...............................................................................................115 7.2.3 Configure diskless node............................................................................................116 7.2.4 Unconfigure diskless node........................................................................................116 7.2.5 Boot diskless node...................................................................................................116 7.2.
cmu_add_network_entity(8)...................................................................................................148 cmu_add_logical_group(8)....................................................................................................149 cmu_add_to_logical_group_candidates(8)...............................................................................150 cmu_add_user_group(8)........................................................................................................
Figures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Typical HPC cluster...........................................................................................................13 iLO server power controls..................................................................................................16 NIC2 on the SL2x170z G6 Server.......................................................................
53 54 55 56 User group management...................................................................................................99 Certificate error.............................................................................................................129 Java control panel..........................................................................................................129 HP Insight CMU GUI....................................................................................................
1 Overview HP Insight Cluster Management Utility (CMU) is a collection of tools that manage and monitor a large group of computer nodes, specifically HPC and large Linux Clusters. You can use HP Insight CMU to lower the total cost of ownership (TCO) of this architecture. HP Insight CMU helps manage, install, and monitor the compute nodes of your cluster from a single interface. You can access this utility through a GUI or a CLI. 1.
• Managing the system images stored by HP Insight CMU • Configuring actions performed when a node status changes such as display a warning, execute a command, or send an email • Exporting the HP Insight CMU node list in a simple text file for reuse by other applications • Importing nodes from a simple text file into the HP Insight CMU database 1.1.3 Compute node administration The HP Insight CMU GUI and CLI enable you to perform actions on any number of selected compute nodes.
2 Installing and upgrading HP Insight CMU 2.1 Installing HP Insight CMU A typical HP Insight CMU cluster contains three kinds of nodes. Figure 1 (page 13) shows a typical HPC cluster. • The management node is the central point that connects all the compute nodes and the GUI clients. Installation, management, and monitoring are performed from the management node. The package cmu-v7.1-1.i686.rpm must be installed on the management node. All HP Insight CMU files are installed under the /opt/cmu directory.
2.1.2 Planning for compute node installation Two IP addresses are required for each compute node. • Determine the IP address for the management card (iLO) on the management network. • Determine the IP address for the NIC on the administration network. HP recommends assigning contiguous ranges of static addresses for nodes located in the same rack. This method eases the discovery of the nodes and makes the cluster management more convenient.
NOTE: On Blade servers, to configure the IP addresses on the iLO cards, you can use the EBIPA on the OA. For instructions, see “Configuring iLO cards from the OA: Blades only” (page 16). NOTE: Blade servers do not use the Single Sign-On capability. You must configure each Blade individually and create the same username and password. For instructions, see “Disabling server automatic power on: Blades only” (page 16). 2.1.
2.1.7.1.2 Configuring iLO cards from the OA: Blades only Use the EBIPA to assign consecutive addresses to the iLO: • 16 addresses on the c7000 Enclosure • 8 addresses on the c3000 Enclosure To configure the iLO cards: 1. Open a browser to the OA. 2. In the right window, select Device Bays. 3. Select Bay 1. 4. In the left window, select the Enclosure Setting tab and then Enclosure Bay IP Addressing. 5. Enter the IP address of the first iLO card. 6. Click Auto Fill or the red arrow.
NOTE: • • • • These IDE settings only apply to the DL160 G5 Server. IPMI ◦ Serial Port assigned to System ◦ Serial Port Switching Disabled ◦ Serial Port Connection Mode Direct LAN ◦ Share NIC mode Disabled ◦ DHCP Disabled Remote Access ◦ Remote access Enabled ◦ Redirection Always ◦ Terminal VT100 Boot Configuration ◦ Boot Order 1. Embedded NIC 2. Disk or smart array ◦ Embedded NIC1 Enabled 2.1.7.
2.1.7.4 SL2x170z G6 and DL170h G6 Servers BIOS setting IMPORTANT: To enable BIOS updates, you must restart the server. You can restart the server with Ctrl+Alt+Delete immediately after leaving the BIOS, or you can physically restart the server by using the power switch on the server.
Otherwise, if your node is wired with a dedicated management port for LO100i: • • • ◦ BMC NIC Allocation Dedicated ◦ LAN protocol: HTTP, telnet, ping Enabled Remote Access ◦ BIOS Serial console Enabled ◦ EMS console support Enabled ◦ Flow control Node ◦ Redirection after BIOS POST Enabled ◦ Serial port 9600 8,n,1 Boot device priority ◦ Network ( 0500 ) ◦ Removable device ◦ Hard Disk Enable PXE for the NIC that is connected to the administration network. 2.
2.2.3 Operating system support HP Insight CMU software is generally supported on Red Hat Enterprise Linux (RHEL) 5 and 6; and SUSE Linux Enterprise Server (SLES) 10 and 11. The HP Insight CMU diskless environment is supported on RHEL5, RHEL6, SLES10, and SLES11. Ubuntu is supported on the compute nodes only, on HP Ubuntu certified servers. Contact HP for support. Debian is supported on the compute nodes only, but requires active approval and verification from HP. Contact HP for support.
Table 1 Directory structure (continued) Subdirectory Contents Documentation Documentation and release notes Licenses Contains the following licenses: Apache_LICENSE-2_0.txt, gluegen_LICENSE.txt, jogl_LICENSE.txt. Also contains system-config-netboot-legalnotice.html 2.2.5 HP Insight CMU installation checklist The following list summarizes the steps needed to install HP Insight CMU on your HPC cluster: Preparing the management node: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
2.3 Installation procedures 1. 2. Perform a full installation of your base OS on the management node. HP Insight CMU depends on Oracle Java version 1.6 update 26 or later. Upgrade the Java JVMs to version 1.6u26 or later on both the management node and the clients running the GUI to avoid security problems with the remote file browser (used by the cmu_pdcp and autoinstall GUI dialogs). Only the Java Runtime Runvironment (JRE) is required. To download a supported JRE, go to: http://www.oracle.
9. Install HP Insight CMU on the GUI client workstation. For details, see “Installing HP Insight CMU on the GUI client workstation” (page 135). 2.4 Installing HP Insight CMU with high availability If you are not using HP Insight CMU with high availability (HA), skip this section and go to the instructions on configuring the cluster in “Defining a cluster with HP Insight CMU” (page 32). A ”classic” HP Insight CMU cluster has a single management server.
The next figure shows a “classic” HP Insight CMU cluster with one HP Insight CMU management server and compute nodes connected directly to the site network. A unique IP address IP0 is used for compute node management and site network access. The next figure shows the corresponding configuration with two HP Insight CMU management servers running HP Insight CMU software in active or standby mode under control of HA software. The address IP0 is attached to the server running the HP Insight CMU software.
2.4.1 HA hardware requirements The hardware requirements for HP Insight CMU under HA control are: • Two or more management servers. • One shared storage accessed by both servers. 2.4.2 Software prerequisites In addition to the prerequisites described in “Preparing for installation” (page 19), you must install and configure the HA software of your choice. 2.4.3 Installing HP Insight CMU under HA 2.4.3.
2.4.3.2 HP Insight CMU HA service requirements When you configure the HA software layer, configure the HP Insight CMU HA service with the following resources: • A shared file system. The mount point of this file system must be /opt/cmu-store and must be created on all HP Insight CMU management servers. • A shared IP address. • If your HP Insight CMU cluster uses separate site and compute networks, an additional IP address resource must be configured and assigned to your HP Insight CMU HA service.
* it must support locking via flock() * * it must be mounted only by one (active) cmu mgt node at a time * * it must be NFS exportable (for kickstart/diskless/backup/cloning) * * * * 2] (at least) one alias IP address: * * * * this is the address used by the compute nodes to contact the mgt * * service, set CMU_CLUSTER_IP into /opt/cmu/etc/cmuserver.
cmu ha:cmu service needs (re)start This command does not actually start HP Insight CMU. It only clears the audit mode to enable HP Insight CMU to be started by the HA tool. 7. 8. 9. Run the appropriate command for your HA software to start HP Insight CMU. To verify that HP Insight CMU is still running correctly, review the /var/log/cmuservice_hostname.log file for errors. Install and configure HP Insight CMU on additional management cluster members.
cmuadmin1 cmuadmin2 e. Unset the audit mode on the new member: # /etc/init.d/cmu unset_audit cmu ha:cmu service needs (re)start f. g. Start HP Insight CMU under HA control. Use your HA tool to migrate the HP Insight CMU HA service on the new member. 2.4.
12. Restore the cluster-wide configuration on server 1. 13. Unset the audit mode on server 1. 14. Using the appropriate command for your HA software, restart the HP Insight CMU HA service. 2.5 Upgrading HP Insight CMU Complete the steps in this section if you are upgrading an existing HP Insight CMU system from a previous HP Insight CMU version. 2.5.1 Dependencies 2.5.1.1 64-bit versions on management node As of V7.0, HP Insight CMU is an x86 64-bit kit only and can no longer run on x86 32-bit hardware.
2.5.5 Installing the HP Insight CMU v7.1 package For more information about installing the HP Insight CMU v7.1 package, see “Installation procedures” (page 22). 2.5.6 Restoring the HP Insight CMU configuration If you have a pre-existing HP Insight CMU installation, you must restore your HP Insight CMU cluster configuration: # /opt/cmu/tools/restoreConfig -f /opt/cmu/etc/savedConfig/cmuconf##.sav HP Insight CMU v7.1 provides new features in the monitoring file /opt/cmu/etc/ ActionAndAlertFile.txt.
3 Defining a cluster with HP Insight CMU 3.1 HP Insight CMU service status Obtain the status of all HP Insight CMU service components with the following command on the management node: # /etc/init.d/cmu status HP Insight CMU must be properly configured before using the GUI. Ensure that the core and java services report configured. 3.2 Launching the HP Insight CMU GUI The HP Insight CMU GUI can be used from any workstation connected through the network to the cluster management node.
Figure 4 (page 32) contains four main areas: • The top bar allows you to perform configuration commands. • The left frame lists resources such as Network Entities, Logical Groups, Nodes Definitions, etc. The '+' expands a resource. If HP Insight CMU cluster configuration commands have not yet been entered, most resources are empty. • A filter allows you to show specific resources. • The central frame displays the global cluster view.
NOTE: If the Display Number field is empty, verify that you started your X server and that your firewall allows X traffic. 3.3 High-level checklist for building an HP Insight CMU cluster After HP Insight CMU is installed and running on the management node, the rest of the cluster can be configured as follows: 1. Start HP Insight CMU on the management node. 2. Start the GUI client on the GUI workstation. 3. Scan the compute nodes. 4. Create the network entities.
3.4.1 Node management Figure 7 Node management window In Figure 7 (page 35), the node list of the cluster will appear as the node database is populated by adding, scanning, or importing nodes.
3.4.1.1 Scanning nodes Cluster Administration→Node Management→Scan Node The HP Insight CMU Node Management component provides the capability to scan new nodes into the HP Insight CMU database. You can also manually add node information. Use this interface to scan nodes in the HP Insight CMU database to retrieve hardware addresses and configure IP addresses. The HP Insight CMU database is updated with the new nodes. Enter parameters in the initial Scan Node dialog box.
NOTE: This is necessary only for the first scan operation. For subsequent scans, the Management card password window will not be displayed. Figure 9 Management card password window 4. 5. The Scan Node Result window appears. Figure 10 (page 37) Select to either add or replace scanned nodes. Figure 10 Scan node result 3.4.1.2 Adding nodes Cluster Administration→Node Management→Add Node Use this interface to add a new node to the HP Insight CMU database. 3.
Figure 11 Add node dialog At the Node Dialog box: 1. Click OK. A dialog box displays the successful addition of a node completion. 2. Click OK. A dialog box asks if you want to add another node. NOTE: utility. When you add a node, include it in a network entity using the Network Entity Management The newly added nodes appear in the node list. Figure 12 Populated database node management window 3.4.1.
To modify the attributes of a node, select the node in the Node Management list, and then select Modify Node. The same interface as Add Node appears. NOTE: The node name cannot be changed. 3.4.1.4 Importing nodes Cluster Administration→Node Management→Import Node To import nodes from a flat text file, select an existing text file and then click Open to import all the nodes from this file into the HP Insight CMU database. The following is a sample import/export file: cn001 cn002 cn003 cn004 cn005 16.16.
You can use the Network Entity Management window to add and delete network entities. To perform tasks by using the Network Entity Management option, click Cluster Administration and then select Network Entity Management. 3.4.2.1 Adding network entities NOTE: The cloning process does not clone nodes that are not assigned to a network entity. Figure 13 Network entity management 1. Specify the name of the network entity to create. The length is limited to 15 characters.
4 Provisioning a cluster with HP Insight CMU 4.1 Logical group management A logical group in HP Insight CMU represents a disk image that has been captured (backed up). Each logical group is associated with a single backup image. The logical group must contain the nodes with good hardware configurations that can be cloned with this image. The Logical Group Management window is used to add, modify, delete, or rename logical groups.
• For the first smart array logical drive on ProLiant servers, use cciss/c0d0. IMPORTANT: For RHEL6, the smart array device name depends on the smart array controller. For additional information, see “HP Smart Array warning with RHEL6 and future Linux releases” (page 20). 4. 5. Click OK. To add nodes to the logical group, on the top bar click Cluster Administration→Logical Group Management→Manage logical group. The following window appears. Figure 16 Logical group management 6.
4.2 Autoinstall The HP Insight CMU kickstart functionality is renamed autoinstall. HP Insight CMU autoinstall provides the following improvements: • Adds support for SLES and Debian • Enables automated compute node installations from software distribution repositories available on the HP Insight CMU administration node 4.2.1 Autoinstall requirements • Autoinstall repository—The operating system distribution repository must be copied to the HP Insight CMU management node and NFS exported.
4.2.4 Using autoinstall from GUI 4.2.4.1 Enabling autoinstall By default, the HP Insight CMU GUI does not display the autoinstall buttons. To enable this functionality: 1. In /opt/cmu/etc/cmuserver.conf, change the line: CMU_KS=false to CMU_KS=true 2. Restart cmuserver: # /etc/init.d/cmu restart When the HP Insight CMU GUI is launched, click Cluster Administration→Logical Group Management to view the additional option Create an Auto Install Logical Group. Figure 17 Logical group management autoinstall 4.
Figure 18 New autoinstall logical group After the autoinstall logical group is created, the HP Insight CMU image directory contains a new directory with the name of the logical group. This directory contains: • autoinst.tmpl.orig—An exact copy of the autoinstall file. • repository—A logical link to the autoinstall repository. For example: # ls -l /opt/cmu/image/rh5u5_autoinstall/ total 4 -rw-r--r-- 1 root root 1313 Oct 11 13:48 autoinst.
NOTE: Autoinstall files and pxelinux files are created only if they do not already exist. This enables parameters to be customized for a node or group of nodes. For example, if you modified pxelinux-node1, successive launches of autoinstall will not modify your settings in that file. After creating the files previously described, HP Insight CMU network boots the requested compute node(s) and autoinstall functions as a normal Red Hat kickstart, SLES autoyast, or Debian preseed operation.
cmu> add_to_logical_group node1 to rh5u5_autoinst selected nodes: node1 processing 1 node ... cmu> Or: # /opt/cmu/bin/cmu_add_to_logical_group_candidates -t rh5u5_ autoinstall node1 node2 processing 2 nodes... 4.2.5.3 Autoinstall compute nodes To autoinstall a node, enter the following command at the cmucli prompt: cmu> autoinstall "image" node1 For example: cmu> autoinstall "rh5u5_autoinst" node1 or /opt/cmu/bin/cmu_autoinstall_node -1 rh5u5_autoinst -f nodes.txt Where nodes.
4.2.7 Restrictions This implementation contains the following restrictions: • The repository must be on the local storage of the management node. • The repository must be exported by NFS only. Do not use HTTP, Samba, or FTP. • Updates must be applied through autoinstall post installation scripts. • Only qualified distributions and updates are supported by HP. 4.
IMPORTANT: If partitions to be backed up are less than 50% empty, you must configure HP Insight CMU to use the tmpfs file system for cloning partitions. To make this functionality work, two conditions must be satisfied: • The size of the largest partition to back up and clone must be smaller or equal to the compute node memory size. • Cloning must be enabled using tmpfs by setting CMU_CLONING_USE_TMPFS to yes in /opt/cmu/etc/cmuserver.conf and then restart HP Insight CMU.
4.4 Cloning The HP Insight CMU cloning operation copies the complete contents of the golden image to other nodes. The copied image is the same except for two changes: • HP Insight CMU updates the hostname of the node. • HP Insight CMU updates the IP address of the network used for cloning. All other configurations remain the same. Node-specific configuration changes can be made with the HP Insight CMU reconf.sh script.
Figure 23 Cloning status When cloning is complete, a popup window displays the results. The correctly cloned compute nodes appear in the chosen logical group. The compute nodes that failed remain in the default logical group. The cloning feature duplicates the software installation configuration from an installed Linux system to systems with similar hardware configurations. This function eliminates the time-consuming task of system installation and configuration for each node in the cluster.
The default content of pre_reconf.sh is: #!/bin/bash #keep this version tag here CMU_PRE_RECONF_VERSION=1 #starting from cmu version 4.2 this script is dedicated to custom code #it is running at cloning time after netboot is done and before the #filesystems or even the partitioning is created. exit 0 4.4.2 Reconfiguration During cloning, automatic reconfiguration is performed on each node.
# CMU_RCFG_IP = mgt network ip of this compute node # CMU_RCFG_NTMSK = net mask exit 0 4.5 Node static info To collect static information such as system model, BIOS version, CPU model, speed, and memory size, from the contextual menu click Update→Get Node Static Info. Upon completion, static info is available by clicking on the Details tab. Figure 24 Node static info 4.6 Rescan MAC Use this command only if you must replace a failing node.
Figure 25 Rescan MAC 4.7 HP Insight CMU image editor An existing HP Insight CMU cloning image can be modified directly on the HP Insight CMU management node, without making the modifications on a golden node and backing up the system. Image editing involves three steps: 1. Use the cmu_image_open command to expand the image. 2. Make changes. 3. Use the cmu_image_commit command to save the image. 4.7.1 Expanding an image An HP Insight CMU cloning image is stored in /opt/cmu/image.
4.7.2 Modifying an image Modifications can consist of simple manual commands such as adding, removing, or modifying files. However, complex operations using chroot commands on the expanded image directory are also possible, such as installing a new rpm. IMPORTANT: When using chroot, HP recommends performing chroot mount /proc or chroot mount /sys in the image directory before executing other chroot commands.
In the HP Insight CMU implementation, the compute nodes share the operating system on the HP Insight CMU management node. Each compute node has its own read/write directory hosted on the administration server. Each time a compute node starts, it mounts most of the operating system through NFS as read-only and its own directory through NFS as read/write. Each client has its own read/write directory so that one client cannot affect another.
user server server_args per_source cps flags = = = = = = root /usr/sbin/in.tftpd /tftpboot /opt/cmu/ntbt/tftp -v 11 100 2 IPv4 } 4. Restart xinetd to reload the TFTP configuration. # /etc/init.d/xinetd restart 4.8.5 Activating the diskless feature 1. Edit /opt/cmu/etc/cmuserver.conf to activate CMU_DISKLESS: #cmu diskless feature true/false CMU_DISKLESS=true 2. Restart the HP Insight CMU server: # /etc/init.d/cmu restart 4.8.
Figure 26 Adding a new logical group 3. Select the Diskless option to the right of the group name. NOTE: If you cannot see the Diskless option, the diskless feature is not activated properly. To correct the error, see “Activating the diskless feature” (page 57). 4. Enter the new logical group name. Figure 27 Naming a logical group 5. 6. 58 Enter the IP address of the golden node. Click Get Kernel List. The ssh launches to your golden node to retrieve the list of kernels available on this node.
7. Select one of these kernels, and then click OK. The diskless image building process launches. This operation might last several minutes while files copy from all file systems of the golden node to build the diskless image. From the CLI 1. Start the HP Insight CMU CLI: # /opt/cmu/cmucli 2. To create the diskless group, you must know the IP address and the kernel name of the golden node used by the diskless nodes. To get the kernel name, use the probe_kernel command: cmu> probe_kernel 16.16.185.192 2.
4.8.10 Booting the compute nodes From the GUI 1. 2. 3. Select the compute nodes you added to the diskless logical group. Right-click to launch a boot command on these nodes. Select network. The list of all the diskless images registered in HP Insight CMU appears. The cmu network image is also listed. The HP Insight CMU classic network boot image is used for cloning and backup. Figure 29 Booting the compute nodes 4. In the list box, select your diskless image name, and then click OK.
4.8.12.2 Using reconf-diskless-image.sh The reconf-diskless-image.sh script is executed at the end of the image building process. This script contains any modifications to be applied in the read-only part of the image mounted by the nodes. To this script, add all the commands that you want to execute before the creation of the snapshot directories, such as the personalized read/write directory for each compute node. For example, you can customize the list of files to be copied into the snapshot directory.
#!/bin/bash #cmu_begin_interface #do not change anything in this section #add custom code after this section CMU_RECONF_DISKLESS_SNAPSHOT_VERSION=1 # # # # # # # # # # # # # starting with cmu version 4.
• ◦ The snapshot directories are not synchronized. The registration process copies the listed files into files and files.custom in the snapshot directory of each node. When modifying the root directory directly, you might change one of these files. Because the snapshot directory is not updated, the change does not affect the compute nodes. ◦ The golden node is not updated.
On SLES # chkconfig nfsserver on 3. Ensure that enough NFS daemons and threads are configured to handle the anticipated volume of NFS traffic. On Red Hat Set RPCNFSDCOUNT in the /etc/sysconfig/nfs file to the requested number of NFS daemons. By default, RPCNFSDCOUNT=8. On SLES Set USE_KERNEL_NFSD_NUMBER in the /etc/sysconfig/nfs file, which defaults to 4.
When a node is added to the diskless logical group • A copy of the snapshot directory for this node is sent to the NFS server. • A PXE-boot file is created in the TFTP pxelinux.cfg directory that instructs the kernel to obtain its root file system from the assigned NFS server. IMPORTANT: When booting the computes nodes in a large-scale diskless cluster, only one DHCP and TFTP server are available for the cluster. HP recommends booting no more than 256 nodes at a time to avoid DHCP and TFTP timeouts.
5 Monitoring a cluster with HP Insight CMU 5.1 Installing the HP Insight CMU monitoring client You must install the HP Insight CMU monitoring client to properly monitor your cluster. 1. Select the compute nodes that need the rpm installation, and right-click to access the contextual menu. 2. Select Update. This displays a submenu. 3. On the submenu, click Install CMU monitoring client. 4. An X Window appears with the status of the installation. A summary of the installation is provided. 5.
5.3 Monitoring the cluster Launch the HP Insight CMU GUI. Figure 31 Main window In Figure 31 (page 67), the left frame lists the resources, such as Network Entities, Logical Groups, Nodes Definitions, etc. The '+' sign expands a resource. Compute nodes can be displayed: • By network entity • By logical group • By user group • By nodes definition For example, to see nodes belonging to a logical group, expand the Logical Group resource list, then expand the desired logical group.
Figure 32 Node status The status of this node is okay. Node values are correctly reported to the main monitoring daemon. The node is pinging properly, and the monitoring is working properly, but an alert is currently reported for this node. One of the thresholds defined by you has been exceeded. Click the node in the tree to view the detail of this alert. The status of this node is "No Ping". This node is not pinging at all. User action is required to identify the problem.
In the central frame, the following tabs are available: • Instant View • Table View • Time View • Details • Alerts For a single node view, the following tabs are available: • Monitoring • Details • Alerts 5.3.3 Global cluster view in the central frame By default, the central frame displays the monitoring values of the whole cluster. You can return to this view at any time by clicking CMU Cluster at the root of the node tree.
5.3.4 Resource view in the central frame Monitoring values can be visualized by: • Global cluster • A specific logical group • A specific network entity • A specific user group Click the desired resource in the left-frame tree and the title of the central frame displays the name of the selected resource. NOTE: Resource or node specific monitoring metrics and alerts can be displayed in CLI mode using /opt/cmu/bin/cmu_monstat. For details, see the cmu_monstat manpage. 5.3.4.
5.3.4.2 Detail mode in resource view To display a table with sensor values, select the Instant View tab in the central frame. • The cell is green when the value is below 33% of the maximum value. • The cell is orange when the value is between 33% and 66% of the maximum value. • The cell is red when the value is above 66% of the maximum value. Figure 36 Resource view details 5.3.5 Gauge widget The middle of the pie shows average values for a sensor.
• Details — Shows static data for the node. Some of the values are filled during the initial node discovery (scan node). Other values are filled by right-clicking on the node in the tree to get the contextual menu. Then select Update→Get Node Static Info. • Alerts — Contains the alerts currently raised for this node. Figure 38 Node details The central frame title displays the name of the node. The title is colored according to the state of the node.
5.3.7.1 Getting started To launch HP Insight CMU with Time View: • From the web: ◦ • Go to http://yourcluster. Click the first link Launch Insight Cluster Management Utility GUI (Time View ON). With Java: ◦ Download required libraries: – Copy cmugui.jar to a chosen folder (/tmp in this example): scp root@yourcluster: /opt/cmu/bin/cmugui.jar /tmp – Copy required jars to the same folder: scp root@ yourcluster: /opt/cmu/www/jnlp/jogl-2.0-b44-20111202/jar/*.
Figure 39 Time view 5.3.7.4 Bindings and options 5.3.7.4.1 Mouse control • Left-click on a node – Mark the node from a set of four predefined colors • Right-click on a node – Open the interactive menu for this node • Right-click elsewhere – Open the metrics selection menu NOTE: Time View cannot display more than 10 metrics. For details, see “Technical dependencies” (page 75).
5.3.7.4.3 Custom cameras To save a custom camera position, press Ctrl+1 to 5. Restore it later by pressing 1 to 5. (Custom camera position 1 ... 5 options.) • e – Set perspective view • z – Set history view • s – Set front view 5.3.7.4.4 Options The following options are also available in Options→Properties: • Anti-aliasing level – Set the smoothness of the line rendering. Higher levels are best, but not all graphic cards can support it, and it can reduce performance.
Some GPUs may not support anti-aliasing levels set to 8. Symptoms are black strips on the left and right of Time View, or cylinders above the rings making the visualization inoperable. If this occurs, set anti-aliasing to a lower value such as 4. 5.3.8 Archiving user groups Monitoring data for deleted user groups can be archived and visualized later as “history data”. To archive monitoring data for deleted user groups: 1. Delete a user group 2.
5.3.8.2 Limitations To display an archived user group, the following conditions must be satisfied: • Time must not exceed 24 hours. • The number of nodes must not exceed 4096. • The number of metrics must not exceed 100. • The product of the three parameters above must not exceed 409600. Table 2 (page 77) displays examples of valid combinations of these three parameters.
# # # ALERTS # # #cpu_freq_alert "CPU frequency is not nominal" 1 24 100 < % sh -c "b=`cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq`;a=`cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq`;echo 100 \* \$b / \$a |bc" login_alert "Someone is connected" 3 24 0 > login(s) w -h | wc -l root_fs_used "The / filesystem is above 90% full" 4 24 90 > % df / | awk '{ if ($6=="/") print $5}' | cut -f 1 -d % #reboot_alert "Node rebooted" 4 24 5 < rebooted awk '{printf "%.
• MeanOverTime returns the difference between the current value and the previous value divided by the time interval. For example, if the sensors return 1, 100, 50, 100 at 4 continuous time steps of 5 seconds: • HP Insight CMU Monitoring with the Instantaneous option returns 1, 100, 50, 100. • HP Insight CMU Monitoring with the MeanOverTime option returns N/A, 19.8, -10, 10. Max value Used by the interface to create the pies at the beginning.
Condition The reaction is performed under this condition. • ReactOnRaise — Execute the reaction whenever the alert shows as raised and the previous state of the alert was lowered. • ReactAlways — Execute the reaction whenever the alert shows as raised, subject to the alert’s time multiple. For example, if the monitoring has a default timer of 5 seconds and the alert’s time multiple is 6, the reaction will trigger every 5x6=30 seconds as long as the alert is raised. Command The command to be executed.
• Add your own sensors, alerts, or alert reactions by adding a line to the ACTIONS, ALERTS, or ALERT_REACTIONS section. Modifications in the ActionAndAlertsFile.txt file are only taken into consideration when the monitoring daemons are restarted. To restart the monitoring daemons: 1. Change the ActionAndAlertsFile.txt file on the management node. 2. Stop the Java GUI. 3. Stop the daemons. # /etc/init.d/cmu stop 4. Restart the daemons. # /etc/init.d/cmu restart 5. Start the Java interface. 5.5.
#- Native #cpuload "% cpu load (raw)"1 numerical MeanOverTime 100 % awk '/cpu / {printf"%d\n",$2+$3+$4}' /proc/stat #- Collectl cpuload "% cpu load (normalized)" 1 numerical Instantaneous 100 % COLLECTL (cputotals.user) + (cputotals.nice) + (cputotals.sys) The command field must start with the string “COLLECTL” in capital letters. The line continues with a series of collectl variables included in parenthesis and connected with arithmetical operators.
For more information about using and fine tuning collectl, see http://collectl.sourceforge.net/. 5.5.6.3 Installing and configuring colplot for plotting collectl data IMPORTANT: 1. Do not to use this option for HP Insight CMU diskless configurations. On the HP Insight CMU administration server, create an NFS export a directory to store collectl data from compute nodes: # mkdir /var/log/collectl # vi /etc/exports 2. Add the following line: /var/log/collectl 3.
9. Import the common directory created on the administration server for collectl. # mkdir /var/log/collectl # vi /etc/fstab X.X.X.X:/var/log/collectl /var/log/collectl nfs defaults 0 0 where X.X.X.X is the address of your HP Insight CMU administration server. 10. Modify the collectl configuration file to save data to be plotted in the common directory: # vi /etc/collectl.conf DaemonCommands = -s+dcmnNE --import misc --export lexpr -A server -i5 -f /var/log/collectl -P -oz -r 00:01,7 11.
Select plotting options, then click Generate Plot. Figure 43 ColPlot results 5.5.7 Monitoring GPUs and coprocessors 5.5.7.1 Monitoring NVIDIA GPUs If your client nodes contain NVIDIA GPUs and are running version 270.xx.xx or newer of the NVIDIA GPU driver, you can monitor your GPUs with HP Insight CMU. If you haven’t done so already, install the NVIDIA GPU driver version 270.xx.xx or newer on your client nodes. This can be done two ways: 1.
. . Running /opt/cmu/bin/cmu_config_nvidia adds a list of predefined GPU metrics to ActionAndAlertsFile.txt. To monitor these metrics using the GUI, select the desired metrics from the Monitoring sensors list as described in Figure 33 (page 69). NOTE: Not all metrics are supported by all NVIDIA GPUs and some lesser used metrics may be commented out within ActionAndAlertsFile.txt.
5.5.7.3 Monitoring Intel coprocessors If your client nodes contain Intel coprocessors, you can monitor the coprocessors with HP Insight CMU. Install the desired coprocessor drivers on your client nodes and verify the coprocessors are working. Use one of the following processes to install the drivers: Install manually 1. 2. 3. Install the coprocessor driver manually on one of the client nodes. Backup the client image. Clone the remaining clients with this new image.
k. l. Review the results and verify no errors are reported. With the coprocessors working, enable coprocessor monitoring by updating the /opt/ cmu/etc/ActionAndAlertsFile.txt file with metric entries for coprocessor monitoring. Do this by running the script /opt/cmu/bin/cmu_config_intel. This script takes the number of coprocessors on each client as an argument. The following example updates ActionAndAlertsFile.txt to monitor clients that have 3 coprocessors each.
keywords such as CMU_ALERT_NODES can be used to convey the names of the nodes that raised the alert through the SNMP trap. Figure 44 HP Insight CMU alert converted to SIM event To create a complete model for conveying HP Insight CMU alerts to HP SIM, you may choose to create your own SNMP Management Information Base (MIB) to handle the alerts you define. For information on how to configure SNMP with HP SIM, or how to compile and customize MIBs with HP SIM, see the "HP Systems Insight Manager User Guide".
data is received after this time interval expires, the GUI marks the extended metric data "invalid". Data Type A description of the format of the extended metric data. This is either numerical or string. Measurement Method This is either Instantaneous or MeanOverTime. Instantaneous means display the latest value. MeanOverTime displays the difference between the current value and the previous value divided by the time interval. Max Value This is used by the GUI to initialize the metric pies.
6 Managing a cluster with HP Insight CMU Cluster management tasks can be performed on one or more nodes with HP Insight CMU. These tasks depend on your privileges and the number of selected nodes. 6.1 Unprivileged user menu When the HP Insight CMU GUI is in normal mode, you can only monitor node status and visualize static data. You cannot perform any other action on the cluster nodes because of potentially destructive actions. 6.
To select a terminal emulator other than the default: 1. Edit /opt/cmu/etc/cmuserver.conf. 2. Six blocks of variable names begin with CMU_REMOTE_TERMINAL. Uncomment the full block of variables for the preferred terminal emulator. 3. Verify all variables for other terminal emulators are commented out. 4. Restart cmuserver: # /etc/init.d/cmu restart 6.4 Management card connection This menu is only available when one node is selected.
Figure 47 Power off dialog box 6.8 Boot When one or more nodes are selected, this task enables you to boot a collection of nodes on their own local disk or over the network. You must select nodes to be booted prior to running this command. The boot procedure uses the management card of each node. The password for the management card must be entered. Nodes to be booted must have the same management card password. IMPORTANT: If the nodes are booted, the boot procedure attempts a proper shutdown.
6.11 Multiple windows broadcast This task is available when one or more nodes are selected. The following connections are available for multiple windows broadcast: • A secure shell connection through the network, when the network is up on selected nodes. • Connection through the management card, if selected nodes have a management card. The multiple windows broadcast command launches a master console window and concurrent mirrored secure shell sessions embedded in an x term on all selected nodes.
Figure 51 pdsh window You can toggle the two filters on and off using dshbak or cmudiff. These two filters are mutually exclusive, so you can: • Filter with cmudiff • Filter with dshbak • Use no filter 6.12.1 cmudiff examples Example 1 date command The cmudiff output is two fields separated by dotted lines. The header displays: • The number of responses, 4 in this example (This amount means a response has been received from 4 compute nodes.
• Some details about output processing results, which are provided on the right. Characters that differ from the reference node are highlighted in red. In this example, the time drift in the “seconds” field differs. Depending on the output length, the output of cmudiff can be piped to the less editor to enable scrolling through the output with arrows. Output editing is terminated by entering q.
cmudiff filter is , with parameters cmu_pdsh> -d cmu_pdsh> dmidecode The comment now shows “(2 populations) o185i[040,042] are 83% similar”. This comment suggests that those two compute nodes have a different BIOS release date than all other nodes. NOTE: A nonresponsive node in the node selection for single window pdsh causes the answer from other nodes to be delayed until a timeout occurs from the nonresponsive node. You can reduce this delay by setting the value in the ConnectTimeout in .
Figure 52 Parallel distributed copy window 3. Complete the Source and Destination fields, and then click OK to execute the distributed copy. 6.14 User group management User groups are not required for backup and cloning operations. However, you can use the User Group Management window to add, delete, or rename a user group. A user group is a set of nodes named by the HP Insight CMU administrator. Each node can belong to several user groups.
Figure 53 User group management Select any number of nodes from the list of “Nodes in Cluster” on the left and use the arrows to move the nodes to the list of “Nodes in User Group” on the right. 6.14.2 Deleting user groups 1. 2. 3. In the User Group Management window, select the user group to delete. Click Delete. Click OK. 6.14.3 Renaming user groups 1. 2. 3. 4. In the User Group Management window, select the user group to rename. Click Rename. Enter the new name. Click OK. 6.
HP Insight CMU provides the latest conrep kit available at release time. If a different or newer version of conrep is required for the servers in your cluster, you can configure the full path and file name of the correct conrep binary by editing the CMU_BIOS_SETTINGS_TOOL variable in /opt/cmu/etc/cmuserver.conf. The conrep tool also requires an XML file containing the information necessary to interpret the BIOS flash memory data on your server into human-readable text.
1. In the /opt/cmu/etc/cmu_custom_menu file, uncomment the following line: SERVER;audit|dmidecode;/opt/cmu/bin/cmu_dsh -f CMU_TEMP_NODE_FILE -c "dmidecode" -e "-b -n -v0 -R0" 2. 3. Run the CLI. cmu> custom_run Title Command -------------------|------audit|dmidecode /opt/cmu/bin/cmu_dsh -f CMU_TEMP_NODE_FILE -c "dmidecode" -e "-b -n -v0 -R0" cmu> The available custom commands are displayed. 4. Run the dmidecode command on node10 from the CLI. cmu> custom_run "audit|dmidecode" node10 6.16.
Help commands To get help during a CLI session, use the help command. This command displays all available commands of HP Insight CMU CLI.
halt halt nodes of logical group group_1 except node_exp delay "mesg" all group_1 group_2 halt nodes of group_1 and group_2 cmu> Displaying logical groups of a cluster The groups command displays the list of the logical groups. cmu> groups list of group(s) with active nodes : debian default nodevmap pfmon sfs2 list of available group(s) for backup and cloning : default sfs2 suse10 pfmon testrh3u4 debian nathclontest nodevmap cmu> You can also call this command followed by a group name.
Executing a command on a list of nodes To execute a command on multiple nodes, you must specify the names of nodes. cmu> boot o185i222 o185i233 o185i243 active node list selected: cmu> o185i222 o185i233 o185i243 Executing a command on a range of nodes To execute a command on a range of nodes, you must specify the range using their attributes. Commands are executed on all nodes within the range.
Executing a command on specific nodes of a logical group You can use the but option to exclude active nodes of a group from the selection. Nodes to exclude can be specified with any combination of regular expressions. cmu> boot all default but o185i222 - o185i252 active node list selected: cmu> o185i194 o185i202 o185i216 o185i253 o185i254 6.17.4 Administration and cloning commands Booting a set of nodes You can boot any number of nodes in the cluster.
To broadcast on all nodes of the cluster: cmu> broadcast all selected o185i202 o185i214 o185i226 o185i238 o185i250 nodes: o185i192 o185i193 o185i194 o185i195 o185i196 o185i197 o185i198 o185i199 o185i200 o185i201 o185i203 o185i204 o185i205 o185i206 o185i207 o185i208 o185i209 o185i210 o185i211 o185i212 o185i213 o185i215 o185i216 o185i217 o185i218 o185i219 o185i220 o185i221 o185i222 o185i223 o185i224 o185i225 o185i227 o185i228 o185i229 o185i230 o185i231 o185i232 o185i233 o185i234 o185i235 o185i236 o185i237 o1
active node list selected: o185i192 Please read /opt/cmu/log/PowerOff.log for errors. cmu> Setting the locator LED on or off Sets the locator LED of any number of nodes on or off. You can use the regular expressions previously described.
Total | 1 | 0 | 0 Detailed logs are in /opt/cmu/log/cmucerbere.log and/opt/cmu/log/cmucerbere-*.log [INFO] CMU does not seem to be running /opt/cmu/tmp/GUI/config.txt was rewritten cmu> Adding a new logical group The add_logical_group command creates a new logical group. Parameters are specified on one line: cmu> add_logical_group image_name "device" For example: cmu> add_logical_group my_logical_group "cciss/c0d0" processing 1 logical group ...
[16:15:13] OSTYPE:Linux-CMU [16:15:13] [DollyClient] Starting to get fstab files [16:15:13] [DollyClient] Getting "/opt/cmu/tmp/fstab.txt" [16:15:14] [DollyClient] fstab of /dev/sda1 received and stored into /opt/cmu/tmp/fstab.txt [16:15:14] [DollyClient] Executing: /bin/grep "LABEL" /opt/cmu/tmp/fstab.txt | /usr/bin/wc -l >/opt/cmu/tmp/number_of_label [16:15:14] [DollyClient] No label in /opt/cmu/tmp/fstab.
[16:25:06] [DollyClient] Device is sda [16:25:06] [DollyClient] Asking for partition table of "/dev/sda" [16:25:06] [DollyClient] Getting /opt/cmu/image/test_julien/parttbl-sda.txt [16:25:07] [DollyClient] Getting /opt/cmu/image/test_julien/parttbl-sda.raw [16:25:07] [DollyClient] Getting /opt/cmu/image/test_julien/partarchi-sda1.tgz [16:25:17] [DollyClient] Getting /opt/cmu/image/test_julien/partarchi-sda5.tgz [16:25:38] [DollyClient] Getting /opt/cmu/image/test_julien/partarchi-sda6.
6.17.5 Administration utilities pdcp and pdsh HP Insight CMU includes the open source software pdcp and pdsh. Usage example of pdcp: # /opt/cmu/bin/pdcp -w cn0001,cn0002 source /tmp/dest where: source is a file on the management node. dest is the name of the destination file copied to compute nodes cn0001 and cn0002. Usage example of pdsh: # /opt/cmu/bin/pdsh -w cn0001,cn0002 ls cn0001: cn0001: cn0002: cn0002: cn0002: cn0002: bin inst-sys anaconda-ks.cfg CMU_CLONING_INFO install.log.syslog install.
7 Advanced topics 7.1 Accessing the GUI for non-root users HP Insight CMU allows non-root users to log into the GUI and access some or all of the privileged HP Insight CMU functionality available through the GUI. The GUI supports non-root user accounts that exist either as local accounts or as NIS accounts on the HP Insight CMU management node. The high-level operational goal of this support is to make the HP Insight CMU GUI a graphical extension of logging into the head node.
Table 3 Operational HP Insight CMU GUI features available by default for non-root users (continued) Cloning (Deploy Image) user (requires sudo) Autoinstall (kickstart|autoyast|preseed) user (requires sudo) Update→Get Nodes Static Info user (requires sudo) Update→Install CMU Monitoring Client user (requires sudo) Update→Rescan MAC root Insight→Show BIOS Settings user (requires sudo) Insight→Show BIOS Version user (requires sudo) Insight→Upgrade Firmware user (requires sudo) Any configured HP
Table 4 HP Insight CMU GUI features and their corresponding commands HP Insight CMU GUI feature (right-click node selection) HP Insight CMU management node command Management Card Connection /opt/cmu/bin/cmu_console Shutdown /opt/cmu/tools/halt.exp Power Off /opt/cmu/bin/cmu_power Boot /opt/cmu/tools/boot.exp Reboot /opt/cmu/tools/reboot.
In this context, the term "diskless" refers to any OS image that can be created and prepared locally on the HP Insight CMU management server and then served over the network to a PXE-booted set of compute nodes. A few different implementations of "diskless" OS images are: • stateful NFS-root — All reads and writes from the target compute nodes occur on the central NFS server. • stateless NFS-root — Reads occur from the central NFS server, but writes occur in memory (in a tmpfs filesystem).
-l The name of the logical group to delete. The delete_image program is expected to delete everything related to the diskless OS in /opt/ cmu/image//. 7.2.3 Configure diskless node The configure_node program is called when a compute node is added to the HP Insight CMU diskless logical group of type . This program is called with the following arguments: -l The name of the diskless logical group.
-n The hostname of the target node to boot. -i The IP address of the target node to boot. -m The MAC address of the target node to boot. -e The active Ethernet device of the target node to boot. The boot_node program is typically a subset of the configure_node program, and ensures that the given node is ready to be PXE-booted. It may call /opt/cmu/tools/cmu_add_node_to_dhcp again, and may check that the correct PXE-boot file is in place for the given node.
ILOCM The method for integration with HP Moonshot 1500 Chassis. The HP Insight CMU hardware API consists of a collection of programs that reside in /opt/cmu/ hardware// where refers to the name of the hardware API. For example the iLO API programs reside in the /opt/cmu/hardware/ILO/ directory. The name of the API programs in the hardware API directory must conform to the following format: cmu__power_ Where is one of: off Remove power from the server.
CMU_VALID_HARDWARE_TYPES=ILO:lo100i:ILOCM To add the IPMI hardware API, add IPMI to the list of valid hardware types: CMU_VALID_HARDWARE_TYPES=ILO:lo100i:ILOCM:IPMI After this is done, then you can configure servers in the HP Insight CMU database with this new "management card type". 7.4 Customizing kernel arguments for the HP Insight CMU provisioning kernel When backing up or cloning nodes, HP Insight CMU PXE-boots each node into an NFS-based diskless operating system provided by HP Insight CMU.
etc/bootopts/AC14000. The hexadecimal IP address AC14000 covers IP addresses 172.20.0.1 - 172.20.0.15. 7.5 Support for ScaleMP HP Insight CMU can be integrated to work with ScaleMP. To enable support for ScaleMP, add the following variable and setting to the /opt/cmu/etc/cmuserver.conf file: CMU_vSMP_PREFIX=vSMP_ This setting configures the prefix that is used to identify HP Insight CMU logical group nodes that can be pxe-booted into the virtual SMP environment.
The transfer uses TCP/IP sockets. The clone image is saved to the local disk. The node then asks the image server if any successors are waiting for upload. If any successors are waiting, the node then starts to transfer the image to a group member, while the image server uploads a third one. This process is called the tree propagation algorithm. After a node has received a completed image, it attempts to upload to another node within the entity.
Advanced topics
8 Support and other resources 8.1 Contacting HP 8.1.1 Before you contact HP Be sure to have the following information available before you contact HP: • Technical support registration number (if applicable) • Product serial number • Product identification number • Applicable error message • Add-on boards or hardware • Third-party hardware or software • Operating system type and revision level 8.1.
• Installation and user guides for your specific operating system. 8.3 Typographic conventions This document uses the following typographical conventions: %, $, or # A percent sign represents the C shell system prompt. A dollar sign represents the system prompt for the Bourne, Korn, and POSIX shells. A number sign represents the superuser prompt. audit(5) A manpage. The manpage name is audit, and it is located in Section 5. Command A command name or qualified command phrase.
CAUTION A caution calls attention to important information that if not understood or followed will result in data loss, data corruption, or damage to hardware or software. IMPORTANT This alert provides essential information to explain a concept or to complete a task. NOTE A note contains additional information to emphasize or supplement important points of the main text. 8.
A Troubleshooting Issues encountered while using HP Insight CMU can be classified as: • Network boot issues which affect cloning and backup • Backup specific issues • Cloning specific issues • Administration command issues • GUI specific issues A.1 HP Insight CMU logs Every HP Insight CMU command logs information in a dedicated log file. All log files are available in /opt/cmu/log. A.1.
• An incorrect MAC address in the HP Insight CMU database • The HP Insight CMU configuration on the management node is lost. Troubleshooting switch issues 1. Verify that the management node pings the iLO and the nodes. 2. Verify that broadcast is enabled and is redirected to the switch. 3. Verify that the spanning tree is disabled on all ports connected to a node. 4. Verify that « multicast IGMP snoop loop » is disabled on the switch.
A.4 Cloning issues If only one node cannot be cloned: 1. Verify that you can boot in network mode. 2. Verify that the node has the same hardware as other nodes. 3. Verify that the node does not have a hardware problem. 4. Power off manually, then relaunch cloning. If no nodes in a network entity can be cloned: 1. Clone all nodes except the first node in the network entity again. 2. Verify that you can boot in network mode. If no node in the cluster can be cloned: 1. Verify that you can boot in network mode.
3. Verify that rsh or ssh is enabled between all nodes of the cluster and the management node. All nodes must be able to execute commands as root for any other node without needing a password 4. Verify that the HP Insight CMU rpm is properly installed on all nodes. If the HP Insight CMU GUI is unable to start, with the message "Failed to validate certificate": Figure 54 Certificate error The detailed Java exception is: java.security.cert.CertPathValidatorException: java.security.
On Windows, go to System Preferences→Other→Java→Advanced→Enable online certificate validation. On Linux, run javaws -viewer in a shell, click the Advanced tab, then Enable online certificate validation. TIP: If you still encounter problems, try toggling the setting.
B Detailed installation instructions B.1 Install required RPMs 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. Install Install Install Install Install Install Install Install Install Install Install Install Install Install expect library. DHCP. the TFTP server. the TFTP client. Java Runtime Environment. For details, see “Java installation” (page 132). tcl-8 libraries. OpenSSL library. NFS server. xterm rpm. libX11 rpm. libXau rpm. libXdmcp rpm. perl-IO-Socket-SSL. perl-Net-SSLeay B.
• On SLES: # chkconfig nfsserver on # /etc/init.d/nfsserver start B.
3. Install the HP Insight CMU rpm: # rpm --import /mnt/cmuteam-rpm-key.asc # rpm -ivh /mnt/cmu-v7.1-1.i386.rpm Preparing... 1:cmu ########################################### [100%] NOTE: If you do not import the cmuteam-rpm-key. then the following warning message is received: # rpm -ivh REPOSITORY/cmu-v7.1-1.i386.rpm warning: REPOSITORY/cmu-v7.1-1.i386.rpm: Header V3 DSA signature: NOKEY, key ID b59742b4 Preparing...
1. Edit the /opt/cmu/etc/cmuserver.conf file: # vi /opt/cmu/etc/cmuserver.conf 2. 3. Search for the CMU_CLUSTER_IP variable. Replace the default value with the IP address of the Ethernet interface used for cloning. #CMU mgt node IP address from the point of view of the compute nodes # #'default' means this is backward compatibility mode (assuming hostname -i #not recommended anymore) CMU_CLUSTER_IP=X.X.X.X 4. Verify the address. # /opt/cmu/tools/cmu_mgt_net_info X.X.X.
monitoring Status of the monitoring daemon that gathers the information reported by the small monitoring agent installed on the compute nodes. web service Status of the HP Insight CMU GUI to start from a web browser on a workstation. nfs server Status of the NFS server. dhcpd.conf Status of the DHCPD configuration. NOTE: Because compute nodes are not installed on the cluster at this time, the monitoring agent is not started after the installation. This behavior is normal.
B.14.1 Configuring the GUI client on Linux workstations On Linux workstations, you can use a secure ssh tunnel or an X Window server to communicate between the workstation running the HP Insight CMU GUI and the HP Insight CMU management server. Using an ssh tunnel 1. To open the ssh tunnel, the following settings are required on the HP Insight CMU management server. • Put Xauth in the PATH.
• The server access control must allow access. To authorize access, use the xhost + command. • Allow rmi connection and X display export in your firewall configuration. B.14.
Figure 56 HP Insight CMU GUI NOTE: At this point in the installation process, the GUI window will not contain most of the details shown in the previous figure.
HP Insight CMU manpages 139
cmu_show_nodes(8) NAME cmu_show_nodes -- Display a list of nodes and node attributes. SYNOPSIS # /opt/cmu/bin/cmu_show_nodes [-a | -n ] [-i] [-d] [-f ] [-o ] DESCRIPTION Display a list of HP Insight CMU nodes and node attributes.
%c (ILOCM only) cartridge number %N (ILOCM only) node number EXAMPLES Default behavior: # /opt/cmu/bin/cmu_show_nodes cn0004 cn0005 cn0006 cn0008 cn0009 To show details for a specific node: # /opt/cmu/bin/cmu_show_nodes -n node1 -o "%n %i %k %m default %b %t" node1 16.16.184.40 255.255.248.0 1C-C1-DE-6E-24-AE default 16.16.188.40 lo100i To show details for all nodes: # /opt/cmu/bin/cmu_show_nodes -a -o "%n %i %k %m default %b %t" node1 16.16.184.40 255.255.248.0 1C-C1-DE-6E-24-AE default 16.16.188.
cmu_show_logical_groups(8) NAME cmu_show_logical_groups -- Show nodes belonging to a logical group. SYNOPSIS # /opt/cmu/bin/cmu_show_logical_groups <-h | [logical_group_name]> DESCRIPTION Show nodes belonging to an HP Insight CMU logical group.
cmu_show_network_entities(8) NAME cmu_show_network_entities -- Show network entities. SYNOPSIS # /opt/cmu/bin/cmu_show_network_entities <-h | [network_entity]> DESCRIPTION Show network entities.
cmu_show_user_groups(8) NAME cmu_show_user_groups -- Show user groups. SYNOPSIS # /opt/cmu/bin/cmu_show_user_groups <-h | [user_group]> DESCRIPTION Show user groups.
cmu_show_archived_user_groups(8) NAME cmu_show_archived_user_groups -- Show archived user groups. SYNOPSIS # /opt/cmu/bin/cmu_show_archived_user_groups [-h] | [-p] [-H] [-c] [-s separator] [-f] [-w width] DESCRIPTION Show archived user groups.
cmu_add_node(8) NAME cmu_add_node -- Add node(s) to the HP Insight CMU database. SYNOPSIS # /opt/cmu/bin/cmu_add_node <-h | -s | -i | -f filename> # /opt/cmu/bin/cmu_add_node -H|--hostname hostname -I|--ip ipaddress [-M|--mask netmask] [-A|--mac macaddress] [-L|--lg logicalgroup] [-G|--mgt-ip mgtcardip] [-T|--mgt-card ILO|lo100i|ILOCM] [-R|--arch architecture] [-C|--cartridge num] [-N|--node-number num] DESCRIPTION Adds one or more nodes to the HP Insight CMU database.
EXAMPLES Command-line mode: # /opt/cmu/bin/cmu_add_node -H cn0006 -I 16.16.184.116 -M 255.255.254.0 -A 00-02-A5-52-EB-F8 -L default -G 192.168.0.1 -T ILO -R x86_64 processing 1 node ... Interactive mode: In interactive mode, you are prompted for node parameters: # /opt/cmu/bin/cmu_add_node -i hostname> n10 ip address> 16.16.184.116 netmask> 255.255.248.0 mac address> 00-1C-C4-79-35-83 architecture> x86_64 mgtcard> ILO mgtcard ip address> 16.16.188.116 processing 1 node ...
cmu_add_network_entity(8) NAME cmu_add_network_entity -- Add network entities. SYNOPSIS # /opt/cmu/bin/cmu_add_network_entity <-f filename | -h> # /opt/cmu/bin/cmu_add_network_entity DESCRIPTION Add HP Insight CMU network entities.
cmu_add_logical_group(8) NAME cmu_add_logical_group -- Add logical groups. SYNOPSIS # /opt/cmu/bin/cmu_add_logical_group <-n | -i | -f filename | -s> # /opt/cmu/bin/cmu_add_logical_group <-n name -d devicename> # /opt/cmu/bin/cmu_add_logical_group <-n name -d diskless -I golden_node_ip -k kernel_version> DESCRIPTION Add HP Insight CMU logical groups.
cmu_add_to_logical_group_candidates(8) NAME cmu_add_to_logical_group_candidates -- Add nodes as candidates for logical groups. SYNOPSIS # /opt/cmu/bin/cmu_add_to_logical_group_candidates<-h | -t logical_group nodename> # /opt/cmu/bin/cmu_add_to_logical_group_candidates<-t logical_group nodename -f nodenamefile> DESCRIPTION Add nodes as a candidates for an HP Insight CMU logical group.
cmu_add_user_group(8) NAME cmu_add_user_group -- Add user groups. SYNOPSIS # /opt/cmu/bin/cmu_add_user_group <-f filename | -h> # /opt/cmu/bin/cmu_add_user_group DESCRIPTION Add user groups.
cmu_add_to_user_group(8) NAME cmu_add_to_user_group -- Add nodes to user groups. SYNOPSIS # /opt/cmu/bin/cmu_add_to_user_group <-h | -t user_group nodename> # /opt/cmu/bin/cmu_add_to_user_group <-t user_group nodename -f nodenamefile> DESCRIPTION Add nodes to user groups.
cmu_change_active_logical_group(8) NAME cmu_change_active_logical_group -- Change the active logical group for a node. SYNOPSIS # /opt/cmu/bin/cmu_change_active_logical_group <-h | -t logical_group nodename1 [nodename2] [...] # /opt/cmu/bin/cmu_change_active_logical_group < -t logical_group nodename -f nodenamefile > DESCRIPTION Change the active logical group for a node or a group of nodes.
cmu_change_network_entity(8) NAME cmu_change_network_entity -- Change the network entity for a node. SYNOPSIS # /opt/cmu/bin/cmu_change_network_entity <-h | -t network_entity nodename1 [nodename2] [...]> DESCRIPTION Changing the network entity for a node. A node can belong to only one network entity. A newly added node does not belong to any network entity.
cmu_del_from_logical_group_candidates(8) NAME cmu_del_from_logical_group_candidates -- Delete nodes from logical groups. SYNOPSIS # /opt/cmu/bin/cmu_del_from_logical_group_candidates <-h | -t logical_group nodename1 [nodename2] [...]> # /opt/cmu/bin/cmu_del_from_logical_group_candidates <-t logical_group nodename -f nodenamefile> DESCRIPTION Delete one or more nodes from a logical group.
cmu_del_from_network_entity(8) NAME cmu_del_from_network_entity -- Delete nodes from network entities. SYNOPSIS # /opt/cmu/bin/cmu_del_from_network_entity <-h | -t network_entity nodename1 [nodename2] [...]> # /opt/cmu/bin/cmu_del_from_network_entity <-t network_entity nodename -f nodenamefile> DESCRIPTION Delete one or more nodes from a network entity.
cmu_del_archived_user_group(8) NAME cmu_del_archived_user_group -- Delete an archived user group. SYNOPSIS # /opt/cmu/bin/cmu_del_archived_user_group [-h] | [-v] [-t timeout] [-d] DESCRIPTION Delete an archived user group.
cmu_del_from_user_group(8) NAME cmu_del_from_user_group -- Delete one or more nodes from a user group. SYNOPSIS # /opt/cmu/bin/cmu_del_from_user_group <-h | -t user_group nodename1 [nodename2] [...]> # /opt/cmu/bin/cmu_del_from_user_group <-t user_group nodename -f nodenamefile> DESCRIPTION Delete one or more nodes from a user group.
cmu_del_logical_group(8) NAME cmu_del_logical_group -- Delete a logical group. SYNOPSIS # /opt/cmu/bin/cmu_del_logical_group <-f filename | -h> # /opt/cmu/bin/cmu_del_logical_group DESCRIPTION Delete a logical group.
cmu_del_network_entity(8) NAME cmu_del_network_entity -- Delete a network entity. SYNOPSIS # /opt/cmu/bin/cmu_del_network_entity <-f filename | -h> # /opt/cmu/bin/cmu_del_network_entity DESCRIPTION Delete a network entity.
cmu_del_node(8) NAME cmu_del_node -- Delete a node. SYNOPSIS # /opt/cmu/bin/cmu_del_node <-f filename | -h> # /opt/cmu/bin/cmu_del_node DESCRIPTION Delete a node.
cmu_del_snapshots(8) NAME cmu_del_snapshots -- Delete monitoring snapshots from the history database. SYNOPSIS # /opt/cmu/bin/cmu_del_snapshots [-h] | <-a timestamp | -b timestamp | -z> [-v verbose] [-d dryrun] [-r] DESCRIPTION Delete monitoring snapshots from the history database.
cmu_del_user_group(8) NAME cmu_del_user_group -- Delete a user group. SYNOPSIS # /opt/cmu/bin/cmu_del_user_group <-f filename | -h> [-a] [-m] # /opt/cmu/bin/cmu_del_user_group DESCRIPTION Delete a user group.
cmu_console(8) NAME cmu_console -- Connect to compute node management ports. SYNOPSIS # /opt/cmu/bin/cmu_console DESCRIPTION Invoke directly from the operating system shell to connect to compute node management ports (iLO/lo100i). EXAMPLES # /opt/cmu/bin/cmu_console contacting ilo_ip_address... Warning: Permanently added 'ilo_ip_address' (RSA) to the list of known hosts. cmu@x.x.x.
cmu_power(8) NAME cmu_power -- Perform power actions on compute nodes. SYNOPSIS # /opt/cmu/bin/cmu_power <-h | -p action -n nodename1 [nodename2] [nodename3] | -a | -l logical_group_name | -u user_group_name | -f nodefile [ -e error_log ]> DESCRIPTION Perform iLO actions such as power on, power off, emulate power button, get power status, and UID on/off. OPTIONS -h show help -p action specifies the action to perform; valid actions are: OFF Power off.
EXAMPLES To power off one node: .cmu_power -p OFF -n cn0001 To power off nodes belonging to user group user1: .cmu_power -p OFF -u user1 To boot nodes belonging to logical group rh6u0_x86_64: .cmu_power -p BOOT -l rh6u0_x86_64 To turn on the UID led on nodes belonging to user group user2: .
cmu_custom_run(8) NAME cmu_custom_run -- A CLI to HP Insight CMU custom menu options. SYNOPSIS # /opt/cmu/bin/cmu_custom_run <-h | -l | -t command_title [-f nodefile]> DESCRIPTION Perform custom defined commands on a group of nodes or all nodes. The same custom defined commands are also available from the GUI.
cmu_clone(8) NAME cmu_clone -- Clone nodes in a logical group. SYNOPSIS # /opt/cmu/bin/cmu_clone <-n | -f nodelistfile> <-i imagename> [-s summarylog] [-b] [-p] [-r] DESCRIPTION Clone the specified node or nodelist in the specified logical group.
cmu_backup(8) NAME cmu_backup -- Issue backup commands directly from the Linux shell. SYNOPSIS # /opt/cmu/bin/cmu_backup <-h> | <-l logical_group -n compute_nodename-p "partition_list" | -r root_partition_number> [-e log_file] DESCRIPTION Create a backup image.
cmu_scan_macs(8) NAME cmu_scan_macs -- Scan IP addresses and create HP Insight CMU node definitions. SYNOPSIS # /opt/cmu/bin/cmu_scan_macs -h [-p ] -i -m -t [-b [-n ] | -b ] [-f ] [-a ] [-N ] [-s ] [-S ] [-o ] If no options are specified, then they are gathered through an interactive session.
when there is an intervening empty slot. The -S 0 option effectively forces a sequential set of values to be generated for %xi and the IP since intervening slots without cartridges won't effect their values. -p hostname_prefix If this option is specified, the hostname specified in -h must be a fixed string and have a numeric suffix. For example, 'n01', 'node_01', 'zeus001'. The suffix is incrementally increased to create subsequent hostnames.
EXAMPLES Example 1 To scan 128 sequential ILO addresses starting at 3.4.5.6 and put node definitions similar to the following in the HP Insight CMU database: # /opt/cmu/bin/cmu_scan_macs -h node%i -i 1.2.3.4 -m 255.255.0.0 -t ILO -b 3.4.5.6 -n 128 node1 1.2.3.4 255.255.0.0 00-1C-C4-AB-06-56 default 3.4.5.6 ILO x86_64 -1 -1 node2 1.2.3.5 255.255.0.0 00-1F-29-66-4C-F2 default 3.4.5.7 ILO x86_64 -1 -1 . .
n03_C01_N3 1.2.3.3 255.255.0.0 44-1e-a1-d3-b4-02 default 10.84.202.42 ILOCM x86_64 1 3 n04_C01_N4 1.2.3.4 255.255.0.0 44-1e-a1-d3-b3-de default 10.84.202.42 ILOCM x86_64 1 4 n09_C03_N1 n10_C03_N2 n11_C03_N3 n12_C03_N4 1.2.3.9 255.255.0.0 44-1e-a1-d3-b3-ac default 10.84.202.42 ILOCM x86_64 3 1 1.2.3.10 255.255.0.0 44-1e-a1-d3-ac-68 default 10.84.202.42 ILOCM x86_64 3 2 1.2.3.11 255.255.0.0 44-1e-a1-d3-ac-24 default 10.84.202.42 ILOCM x86_64 3 3 1.2.3.12 255.255.0.0 38-ea-a7-0f-01-dc default 10.84.202.
cmu_rescan_mac(8) NAME cmu_rescan_mac -- Rescan the MAC address of a node. SYNOPSIS # /opt/cmu/tools/cmu_rescan_mac -n nodename [N NIC_num] [-h] DESCRIPTION Use this command if you replace a failing node. After node replacement, you can add the new MAC address of the node into the HP Insight CMU database using /opt/cmu/tools/ cmu_rescan_mac. OPTIONS -n nodename the node name in the HP Insight CMU database -N NIC_num (ILOCM only) Indicates which of the node's NICs is attached to the admin network.
cmu_mod_node(8) NAME cmu_mod_node -- Add node(s) to the HP Insight CMU database. SYNOPSIS # /opt/cmu/bin/cmu_mod_node <-h | -s | -i | -f filename> # /opt/cmu/bin/cmu_mod_node -H|--hostname hostname [-I|--ip ipaddress] [-M|--mask netmask] [-A|--mac macaddress] [-L|--lg logicalgroup] [-G|--mgt-ip mgtcardip] [-R|--arch architecture] [-C|--cartridge num] [-N|--node-number num] DESCRIPTION Modify one or more nodes in the HP Insight CMU database.
# /opt/cmu/bin/cmu_mod_node -H cn0006 -I 16.16.184.116 -M 255.255.254.0 -A 00-02-A5-52-EB-F8 -L default -G 192.168.0.1 -R x86_64 processing 1 node ... Interactive mode: In interactive mode, you are prompted for node parameters: # /opt/cmu/bin/cmu_mod_node -i hostname> n10 ip address> 16.16.184.116 netmask> 255.255.248.0 mac address> 00-1C-C4-79-35-83 architecture> x86_64 mgtcard> ILO mgtcard ip address> 16.16.188.116 processing 1 node ...
cmu_monstat(8) NAME cmu_monstat -- Use monitoring to list sensors and alerts.
--all-lg Select all logical groups. --all-ne Select all network entities --all-ug Select all user groups --lg=lg1,lg2,... Specify the logical group(s) names or range. --ne=ne1,ne2,... Specify the network entity names or range. --nodes=node1,node2,... Specify the node(s) names or range. --ug=ug1,ug2,... Specify the user group(s) names or range.
cmu_image_open(8) NAME cmu_image_open -- Open an existing backup image for modification. SYNOPSIS # /opt/cmu/bin/cmu_image_open <-h | -i imagename> DESCRIPTION Open an existing HP Insight CMU backup image for modification.
cmu_image_commit(8) NAME cmu_image_commit -- Save a backup image previously expanded with cmu_image_open. SYNOPSIS # /opt/cmu/bin/cmu_image_commit <-h | -i imagename [-n new_image_name]> DESCRIPTION Saves an HP Insight CMU backup image that was previously expanded with the cmu_image_open command.
cmu_config_nvidia(8) NAME cmu_config_nvidia -- Configure NVIDIA GPU monitoring. SYNOPSIS # /opt/cmu/bin/cmu_config_nvidia <-h | -r | -n numGPUs> Where numGPUs specifies the number of GPUs in each client. DESCRIPTION This command configures NVIDIA GPU monitoring metrics in the HP Insight CMU /opt/cmu/ etc/ActionAndAlertsFile.txt file. Restart HP Insight CMU monitoring after using this command.
cmu_config_amd(8) NAME cmu_config_amd -- Configure AMD GPU monitoring. SYNOPSIS # /opt/cmu/bin/cmu_config_amd <-h | -n numGPUs> Where numGPUs specifies the number of GPUs in each client. DESCRIPTION This command configures AMD GPU monitoring metrics in the HP Insight CMU /opt/cmu/etc/ ActionAndAlertsFile.txt file. Restart HP Insight CMU monitoring after using this command.
cmu_config_intel(8) NAME cmu_config_intel -- Configure Intel coprocessor monitoring. SYNOPSIS # /opt/cmu/bin/cmu_config_intel <-h | -r | -n> DESCRIPTION This command configures Intel coprocessor monitoring metrics in the HP Insight CMU /opt/cmu/ etc/ActionAndAlertsFile.txt file. These metrics can subsequently be removed using the -r option. Restart HP Insight CMU monitoring after using this command.
cmu_mgt_config(8) NAME cmu_mgt_config -- Configure or test a set of Linux components required by HP Insight CMU. SYNOPSIS # /opt/cmu/bin/cmu_mgt_config [-c] [-t] [-d] [-e eth [:num2]] [-h] [-i] [-n num] [-s step,...] DESCRIPTION cmu_mgt_config attempts to configure (-c) or test (-t) a collection of Linux components required by HP Insight CMU. cmu_mgt_config can be run repeatedly without adversely affecting already configured components.
ssh_key Check for existence of the root ssh key or create one. firewall Check and optionally disable the firewall. tftp Check and configure tftp. nfs Check and configure NFS. dhcp Check and configure DHCP listening interface. java Check for required Java configuration. license Check for a valid HP Insight CMU license.
cmu_firmware_mgmt(8) NAME cmu_firmware_mgmt -- Verify and execute firmware SYNOPSIS # /opt/cmu/bin/cmu_firmware_mgmt [-h] [-d -f [-o"cmudiff_parameters"]] | [-c -f ] | [-v -f ] | [-u -f ] DESCRIPTION cmu_firmware_mgmt performs the following operations: • Display BIOS settings for the specified nodes • Display BIOS version for the specified nodes • Execute the firmware executable on the specified nodes OPTIONS -h Print this help text
Glossary administration disk The disk located on the image server on which HP Insight CMU is installed. A dedicated space can be allocated to the cloned images. administration network The private network within the system that is used for administrative operations. clone image The compressed image of the installation from the master disk. One clone image is needed for each logical group.
2. A software package that is capable of being installed or removed with the RPM software package management. secondary server A dedicated node in a network entity where the cloned image is temporarily stored. The cloned image is propagated only to the other nodes that are defined inside the entity. target disk The hard drive on a target node where the cloned image is installed. target node A compute node that will receive the cloned image from a secondary server.
Index A action files, 78 actionsandalerts.
E K extended metrics, 89 kit delivery, 19 F L firewall, 132 firmware installing, 100 upgrading, 100 firmware management, 99 firmware requirements, 14 LED status, 93 licensing, 133 Linux GUI client, 136 Linux API, 111 log files, 126 logical group management, 41 logical groups deleting, 42 modifying, 42 renaming, 42 login privileges, 21 G glossary, 187 group status, 68 GUI architecture, 32 customizing menu, 100 monitoring, 67 GUI client activating, 137 installing, 135 Linux, 136 starting, 32 starting
NVIDIA GPUs, 85 O operating system support, 20 P parameters examples, 15 pdcp, 97, 111 pdsh, 94, 111 power off, 92 preconfiguration, 51 provisioning, 41 user groups adding, 98 archived, 76 deleting, 99 renaming, 99 V virtual serial port connection, 92 X X display, 33 X server requirements, 33 xinetd, 131 R RAID configuration, 14 reboot, 93 reconfiguration, 52 related information, 123 remote hardware control API, 117 renaming logical groups, 42 renaming user groups, 99 rescan MAC, 53 restore database,