HP XC System Software Administration Guide Version 3.
© Copyright 2003, 2004, 2005, 2006, 2007 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. The information contained herein is subject to change without notice.
Table of Contents About This Document.......................................................................................................19 Intended Audience................................................................................................................................19 New and Changed Information in This Edition...................................................................................19 Typographic Conventions.....................................................................
2 Improved Availability...................................................................................................47 2.1 Purpose of the Availability Tool......................................................................................................47 2.2 Services Eligible for Improved Availability....................................................................................47 2.3 Availability Sets...........................................................................................
Managing Licenses......................................................................................................79 5.1 License Manager and License File...................................................................................................79 5.2 Determining If the License Manager Is Running............................................................................79 5.3 Starting and Stopping the License Manager.............................................................................
8 Monitoring the System with Nagios........................................................................105 8.1 Nagios Overview...........................................................................................................................105 8.1.1 Nagios Components..............................................................................................................106 8.1.2 Nagios Hosts....................................................................................................
11.5 Golden Image Checksum.............................................................................................................145 11.6 Updating the Golden Image........................................................................................................146 11.6.1 The cluster_config Utility....................................................................................................147 11.6.2 The updateimage Command....................................................................
15.2.5 Configuring SLURM Features.............................................................................................175 15.2.6 Propagating Resource Limits...............................................................................................176 15.3 Restricting User Access to Nodes................................................................................................178 15.4 Job Accounting.............................................................................................
18.3.1 Understanding the csys Utility in the Mounting Instructions............................................218 18.3.2 Mounting Internal File Systems...........................................................................................219 18.4 Mounting Remote File Systems...................................................................................................222 18.4.1 Understanding the Mounting Instructions.........................................................................223 18.4.
21.5.1 How To Start HP Serviceguard When Only the Head Node is Running...........................260 21.5.2 Restart Serviceguard Quorum Server if Quorum Server Node is Re-imaged....................260 21.5.3 Known Limitation if Nagios is Configured for Improved Availability..............................260 21.5.4 Network Restart Command Negatively Affects Serviceguard...........................................261 21.5.5 Problem Failing Over Database Package Under Serviceguard................................
C Setting Up MPICH.....................................................................................................309 C.1 Downloading the MPICH Source Files........................................................................................309 C.2 Building MPICH on the HP XC System.......................................................................................309 C.3 Running the MPICH Self-Tests..................................................................................................
List of Figures 1-1 1-2 1-3 3-1 7-1 7-2 7-3 7-4 7-5 8-1 8-2 8-3 8-4 8-5 8-6 8-7 8-8 9-1 18-1 18-2 21-1 A-1 B-1 D-1 HP XC File System Hierarchy.......................................................................................................29 HP XC Hierarchy Under /opt/hptc................................................................................................32 LVS View of Cluster..................................................................................................................
List of Tables 1-1 1-2 1-3 1-4 3-1 4-1 4-2 4-3 4-4 8-1 8-2 8-3 12-1 12-2 13-1 14-1 15-1 15-2 15-3 15-4 16-1 16-2 16-3 16-4 18-1 22-1 22-2 A-1 Log Files.........................................................................................................................................32 HP XC System Commands............................................................................................................33 HP XC Configuration Files...............................................................
List of Examples 4-1 4-2 7-1 8-1 15-1 16-1 16-2 16-3 16-4 16-5 16-6 16-7 18-1 18-2 18-3 21-1 A-1 A-2 A-3 A-4 A-5 Sample gconfig Script: Client Selection and Client-to-Server Assignment..................................69 Sample service.ini FIle...................................................................................................................74 Using the collectl Utility from the Command Line.......................................................................
About This Document This document describes the procedures and tools that are required to maintain the HP XC system. It provides an overview of the administrative environment and describes administration tasks, node maintenance tasks, Load Sharing Facility (LSF®) administration tasks, and troubleshooting information. An HP XC system is integrated with several open source software components. Some open source software components are being used for underlying technology, and their deployment is transparent.
• • • • • • • • • There is a description of the ovp utility --opts=--queue option that allows you to specify the LSF queue for performance health tests. A note regarding which ovp utility's performance health tests apply to Standard LSF and which apply to LSF-HPC with SLURM incorporated with SLURM was added.
... | WARNING CAUTION IMPORTANT NOTE The preceding element can be repeated an arbitrary number of times. Separates items in a list of choices. A warning calls attention to important information that if not understood or followed will result in personal injury or nonrecoverable system problems. A caution calls attention to important information that if not understood or followed will result in data loss, data corruption, or damage to hardware or software.
HP Message Passing Interface HP Message Passing Interface (HP-MPI) is an implementation of the MPI standard that has been integrated in HP XC systems. The home page and documentation is located at the following web address: http://www.hp.com/go/mpi HP Serviceguard HP Serviceguard is a service availability tool supported on an HP XC system. HP Serviceguard enables some system services to continue if a hardware or software failure occurs.
— — — — — Administering Platform LSF Administration Primer Platform LSF Reference Quick Reference Card Running Jobs with Platform LSF LSF procedures and information supplied in the HP XC documentation, particularly the documentation relating to the LSF-HPC integration with SLURM, supersedes the information supplied in the LSF manuals from Platform Computing Corporation. The Platform Computing Corporation LSF manpages are installed by default.
• http://sourceforge.net/projects/modules/ web address for Modules, which provide for easy dynamic modification of a user's environment through modulefiles, which typically instruct the module command to alter or set shell environment variables. • http://dev.mysql.com/ Home page for MySQL AB, developer of the MySQL database. This web address contains a link to the MySQL documentation, particularly the MySQL Reference Manual.
Compiler Web Addresses • http://www.intel.com/software/products/compilers/index.htm web address for Intel® compilers. • http://support.intel.com/support/performancetools/ web address for general Intel software development information. • http://www.pgroup.com/ Home page for The Portland Group™, supplier of the PGI® compiler. Debugger Web Address http://www.etnus.com Home page for Etnus, Inc., maker of the TotalView® parallel debugger. Software RAID Web Addresses • http://www.tldp.
HP Encourages Your Comments HP encourages comments concerning this document. We are committed to providing documentation that meets your needs. Send any errors found, suggestions for improvement, or compliments to: feedback@fc.hp.com Include the document title, manufacturing part number, and any comment, error found, or suggestion for improvement you have concerning this document.
1 HP XC Administration Environment This chapter introduces the HP XC Administration Environment.
Note: Perform all system administration from the management node that provides the appropriate service, usually the head node. 1.1.1.2 Local Storage The local storage for each node holds the operating system, a copy of the HP XC system software, and temporary space that can be used by jobs. When possible, ensure that jobs that use local storage clean up files after they are run. You might need to clean up temporary storage on local machines if jobs do not do so adequately. 1.1.
• Image server service The golden client maintains an image of the file system that is the model for all the nodes in the HP XC system. SystemImager is the underlying technology used to install the HP XC software, distribute the golden image, and distribute configuration changes. With the SystemImager, you can update every node with systemwide information, such as user account data. Note: The golden client and SystemImager are bound to the same node.
IMPORTANT: The HP XC system relies on key files. Interfering with these files can cause the system to fail. The best way to avoid this situation is to respect the placement of directories and files, especially when installing software packages. The file system layout is structured to isolate the files specific to the HP XC System Software from base operating system files.
1.2.1.1 Systemwide Directory, /hptc_cluster The /hptc_cluster directory is the global file system on an HP XC system. This file system is shared and mounted by all nodes. This directory contains configuration and log file information that is applicable across the system; various services rely on the files in this directory. These log files are in the /hptc_cluster/adm/logs directory. Use the following guidelines for the /hptc_cluster directory: • • • • • • • Keep this directory small.
Figure 1-2 HP XC Hierarchy Under /opt/hptc 1.2.1.3 HP XC Service Configuration Files The /opt/hptc/etc/ directory includes several subdirectories containing scripts used to configure services on nodes at installation time. The /opt/hptc/etc/sconfig.d directory contains scripts for system configuration. The /opt/hptc/etc/gconfig.d directory contains scripts used to gather information needed to configure a service on the HP XC system. The /opt/hptc/etc/nconfig.
Table 1-1 Log Files (continued) Component pathname of Log File Myrinet® gm_drain_test /var/log/diag/myrinet/gm_drain_test/ Myrinet gm_prodmode_mon diagnostic tool /var/log/diag/myrinet/gm_prodmode_mon/links.log ovp /hptc_cluster/adm/logs/ovp/ovp_nodename_mmddyy[rnn] /hptc_cluster/adm/logs/aggregator_nodename.log (alerts) /hptc_cluster/adm/logs/ovp/current_ovp_log (a symbolic link to the most recent log file) powerd /var/log/powerd/powerd.
Table 1-2 HP XC System Commands (continued) Command Description cluster_config The cluster_config command enables you to view and modify the default role assignments and node configuration, modify the default role assignments on any node, and add, modify, or delete Ethernet connections to any node except the head node. Manpage: cluster_config(8) collectl The collectl utility collects data on the nodes of the HP XC system and plays back the information as ASCII text or in a plot form.
Table 1-2 HP XC System Commands (continued) Command Description openipport The superuser uses the openipport command to open a specified port in the firewall. Manpage: openipport(8) ovp Use the ovp utility to verify the installation, configuration, and operation of the HP XC system. Manpage: ovp(8) perfplot The perfplot utility graphs the data from the data files generated with the xcxperf utility. This utility is described in the HP XC System Software User's Guide.
Table 1-2 HP XC System Commands (continued) Command Description xcxclus The xcxclus utility is a graphic utility that enables you to monitor a number of nodes simultaneously. This utility is described in the HP XC System Software User's Guide. Manpage: xcxclus(1) xcxperf The xcxperf utility provides a graphic display of node performance for a variety of metrics. This utility is described in the HP XC System Software User's Guide. Manpage: xcxperf(1) 1.3.
Important: Do not pass a command that requires interaction as an argument to the pdsh command. Prompting from the remote node can cause the command to hang. The following example runs the uptime command on all the nodes in a four-node system. # pdsh -a "uptime" n4: 15:51:40 up 2 n3: 15:49:17 up 1 n2: 15:50:32 up 1 n1: 15:47:21 up 1 days, 2:41, 4 users, load average: 0.48, 0.29, 0.11 day, 4:55, 0 users, load average: 0.00, 0.00, 0.00 day, 4:55, 0 users, load average: 0.00, 0.00, 0.
# cexec -a "who --count" n12: root bg rmk n12: # users=3 n25: root wra guest spg n25: # users=4 For additional information, see cexec(1). 1.4 Configuration and Management Database The HP XC system stores information about the nodes and system configuration in the configuration and management database (CMDB). This is a MySQL database that runs on the node with the node management role. The CMDB is constructed during HP XC system installation.
Reconfiguration For a system reconfiguration, the policy in effect is to preserve any customizing you have done to a Linux configuration file unless the change undermines the proper operation of the HP XC System Software. In that case, the HP XC System Software overwrites the configuration file and the changes you made are deleted. Some configuration parameters can be expressed in terms of a range.
Table 1-3 HP XC Configuration Files Component Referenced in Configuration Files collectl utility Chapter 7 (page 87) /opt/hp/collectl/etc/collectl.ini Cluster configuration: Chapter 16 (page 189), Appendix A (page 289) /opt/hptc/config/base_addr.ini Configuration and management database (CMDB) N/A /etc/my.cnf Ethernet port mappings N/A /opt/hptc/config/modelmap Firewall Chapter 12 (page 153) /etc/sysconfig/iptables.proto /etc/sysconfig/ip6tables.
golden master and how to distribute software throughout the HP XC system, see Chapter 11 (page 139). 1.6 Installation and Software Distribution HP XC System Software is installed during the initial installation (new installation), described in the HP XC System Software Installation Guide. Periodically, later releases may require you to reinstall the system software or upgrade it from its most recent version.
1.8.1 Linux Virtual Server for HP XC Cluster Alias The HP XC system uses the Linux Virtual Server (LVS) to present a single host name for user logins. LVS is a highly scalable virtual server built on a cluster of real servers. By using LVS, the architecture of the HP XC system is transparent to end users, and they see only a single virtual server. This eliminates the need for users to know how the cluster is configured in order to successfully log in and use it.
Use either of the following methods for setting up NIS on your HP XC system: • Set all the nodes as NIS clients. Both the master and slave NIS server are external to the HP XC system. • Set the head node as a NIS slave (secondary) server. The NIS master server is external to the HP XC system. Nodes within the HP XC system use the internal server for NIS information. HP recommends this configuration for larger systems using NIS. 1.
1.10.1 Administrator Passwords During system installation, you assigned the following passwords: • Root password Used to access the superuser account, to perform system administration, and to invoke management tools. • Database administrator password Used for changes to the configuration and management database.
Table 1-4 Recommended Administrative Tasks When Task Reference Once, after initial installation Create a system log book for monitoring and configuration configuration changes to your system. Run the ovp utility. Run the sys_check command to establish a baseline. N/A Chapter 7: “Monitoring the System” (page 87) Run the dgemm command to detect any nodes that are not performing at their peak performance. Frequently Regularly Consult the Nagios Web interface to monitor the system status.
2 Improved Availability The improved availability feature of the HP XC system offers the following benefits: • • It enables services and, thus, user jobs, to continue to run, even after a node failure. It enables you to run new jobs. The improved availability feature relies on an availability tool controlling nodes and services in an availability set. The HP XC System Software provides commands to transfer control of services to the availability tool.
• • nat for Network Address Translation nagios for the Nagios Master service NOTE: The nagios_monitor service is not eligible. For more information on services, see Chapter 4 (page 59). 2.3 Availability Sets A set of nodes is designated as an availability set during the configuration of the HP XC System Software. These nodes provide failover and failback functionality. One node in the availability set is typically the primary provider for the service.
http://docs.hp.com/en/ha.html#Serviceguard Each availability set relates to a corresponding Serviceguard cluster. NOTE: In the examples in this section, assume the PATH environment variable has been updated for Serviceguard commands. 2.4.1 Viewing the Serviceguard Cluster Status View the status of the Serviceguard cluster (that is, the availability set). The Serviceguard cmviewcl command is a key component of this procedure.
3. Use the Serviceguard cmviewcl command on one node of each availability set to view the status of the Serviceguard cluster. For example: # pdsh -w n13,n14 /usr/local/cmcluster/bin/cmviewcl n13: n13: CLUSTER STATUS n13: avail2 up n13: n13: NODE STATUS STATE n13: n12 up running n13: n13: PACKAGE STATUS STATE AUTO_RUN n13: n12 up running enabled n13: n13: NODE STATUS STATE n13: n13 up running n13: n13: PACKAGE STATUS STATE AUTO_RUN n13: nat.
Nodes must be stopped and restarted during the reimaging process. When a node is reimaged, but its partner in an availability set is still running, the reimaged node comes under the control of the availability tool automatically. When both or all the nodes in an availability set are down, you must use the transfer_to_avail command on the head node to transfer control of services to the availability tool.
Info: Serviceguard found running on these nodes: 'n16 n14' Info: Starting transfer of services from Serviceguard... stopAvailTool: ========== Executing '/opt/hptc/availability/serviceguard/stop_av ail'... Stopping HP Serviceguard cluster [-C /usr/local/cmcluster/conf/avail1.config -P /usr/local/cmcluster/conf/nat.n16/nat.n16.config -P /usr/local/cmclust er/conf/nat.n14/nat.n14.config -P /usr/local/cmcluster/conf/lvs.n16 /lvs.n16.config -P /usr/local/cmcluster/conf/nagios.n16/nagios.n16.
3 Starting Up and Shutting Down the HP XC System This chapter addresses the following topics: • • • • • • • “Understanding the Node States” (page 53) “Starting Up the HP XC System” (page 54) “Shutting Down the HP XC System” (page 56) “Shutting Down One or More Nodes” (page 57) “Determining a Node's Power Status” (page 57) “Locating a Given Node” (page 57) “Disabling and Enabling a Node” (page 58) 3.
Table 3-1 Node States (continued) Node State Description Boot_Fail The node failed to boot. The node is returned to the Boot_Ready state. AVAILABLE The node is ready for use. Nodes transition between node states accordingly. Figure 3-1 illustrates the transition of node states.
Note: Nodes that are disabled either as a result of a critical service configuration failure or with the setnode --disable command cannot be started with the startsys command until they are enabled with the setnode --enable command. You can start up only specified nodes by specifying them in a nodelist parameter.
2. Invoke the startsys command with the --image_and_boot option: # startsys --image_and_boot 3. If your system has been configured for improved availability and the nodes that provide availability have been reimaged, enter the transfer_to_avail command to transfer control to the availability tool: # transfer_to_avail For additional information on options that affect imaging, see startsys(8). 3.2.
NOTE: If all these conditions apply, follow the procedure to restart HP Serviceguard: • Your system is configured for improved availability. • Your system is configured for HP Serviceguard, but it does not start up. • You rebooted the head node. • You want to start up the head node only. 1. Enter the following command where hn_name is the name of the head node: # cmruncl -n hn_name The default installed path name for this command is /usr/local/cmcluster/bin/cmruncl. 2. Answer y at the prompt. 3.
The Unit Identifier LED on the node illuminates on the node's front panel. 3. Invoke the locatenode command again, this time with the --off option to turn off the Unit Identifier LED when you are done. 3.7 Disabling and Enabling a Node You can disable one or more nodes in the HP XC system. Disabled nodes are ignored when the HP XC system is started or stopped with the startsys and stopsys commands, respectively.
4 Managing and Customizing System Services This chapter describes the HP XC system services and the procedures for their use. This chapter addresses the following topics: • • • • • • • “HP XC System Services” (page 59) “Displaying Services Information” (page 61) “Restarting a Service” (page 63) “Stopping a Service” (page 64) “Adding a New Service” (page 76) “Global System Services” (page 64) “Customizing Services and Roles” (page 64) 4.
Table 4-1 Linux and third-party System Services (continued) Service Function Database Name IP Firewall Sets up IP firewalls on nodes. iptables LSF Master Node Load Sharing Facility for HP XC master node. lsf LVS Director Handles the placement of user login sessions on lvs nodes when a user logs in to the cluster alias. NAT Server Network Address Translation server. nat NAT Client Network Address Translation client.
Table 4-2 HP XC System Services (continued) HP XC Service Function Database Name Slurm Launch Allows user to launch jobs on nodes with slurm_compute service. slurm_launch smartd daemon Monitors the reliability of specific hard drives on CP6000 systems. smartd Supermon Aggregator Gathers information from subordinate nodes running Supermon. supermond Image Server Holds and distributes the system images.
nsca: n3 ntp: n3 pdsh: n[1-3] pwrmgmtserver: n3 slurm: n3 supermond: n3 swmlogger: n3 syslogng_forward: n3 For more information, see shownode(8) You can obtain an extensive list of all services running on a given node by invoking the following command: # service --status-all 4.2.2 Displaying the Nodes That Provide a Specified Service You can use the shownode servers command to display the node or nodes that provide a specific service to a given node. You do not need to be superuser to use this command.
The shownode services node client command does not display any output if no client exists. Another keyword, servers, allows you to determine the node that provides a specified node its services.
4.4 Stopping a Service The method to use to stop a service depends whether or not improved availability is in effect for that service.
• • • • • “Advance Planning” (page 72) “Editing the roles_services.ini File” (page 72) “Creating a service.ini File” (page 73) “Adding a New Service” (page 76) “Verifying a New Service” (page 78) 4.6.1 Overview of the HP XC Services Configuration HP XC System Software includes a predefined set of services that are delivered using node role assignments; however, a third-party software installation might require you to add a service that is not part of the default HP XC services model.
4.6.2 Service Configuration Sequence of Operation To understand the relationship between the cluster_config utility and the service configuration scripts, it is important to know the sequence of events that occur during cluster_config processing: 1. 2. 3. 4. 5. 6. 7. Service-specific attributes are made available to the cluster_config utility in service-specific *.ini files. As the superuser (root), you run the cluster_config utility on the head node to configure the HP XC system.
Table 4-4 Location of Configuration Script Directories (continued) Script Directory Invoked by This cluster_config Configuration Argument Invoked by This cluster_config Unconfiguration Argument /opt/hptc/etc/nconfig.d/ nconfigure nunconfigure /opt/hptc/etc/cconfig.d/ cconfigure cunconfigure To see the sconfig, gconfig, nconfig, and cconfig scripts that are delivered as part of the default services configuration mode, llook in the /opt/hptc/etc/*config.d/ directories. 4.6.
The head node is the golden client, and only one golden client is supported. Each script in this directory is executed unconditionally during the sconfigure process. The sconfigure scripts return 0 (zero) on success and return a nonzero value on failures. You can stop the configuration process on a nonrecoverable sconfigure error, which is indicated by the sconfigure script exiting with a return code of 255. Alternatively, you can use config_die( ) in ConfigUtils.pm to return 255.
Writing gconfigure Scripts The information in this section provides information about how to write a gconfigure script. A sample gconfigure script is provided in the /opt/hptc/templates/gconfig.d/gconfig_template.pl file for your reference. The gconfigure and other configuration scripts often use the Perl Set::Node package, which is derived from the Set::Scalar package (from Perl's CPAN) to facilitate working with sets of nodes with the usual set operators (union, intersection, difference, and so on).
3 The $assignment_flags are used to determine the pattern of the client-to-server assignments. The default is a server may be assigned to itself as a client. The following service attributes control a client's assignment to itself as a server and are valid for client and server assignments: • sa_do_not_assign_to_self • sa_must_assign_to_self • sa_may_assign_to_self This is the default, which is the same as the double quote character (“).
Nodes with na_disable_server.service assigned for a service are excluded from the server list passed into the gconfig script as servers. Nodes with na_disable_client.service assigned for a service are not returned as potential clients of that service. In general, these flags are something the gconfig script does not need to handle explicitly. Nothing precludes each gconfig script from offering optimal choices to you through its user interface.
scripts could choose that each use a disjoint 5 of 10 servers passed in order to spread the load. At present, for services that are part of the same role, there are no mechanisms in place to achieve this. 4.6.7 Advance Planning You use the cluster_config utility to assigns roles, and hence, services, to nodes. To add a new service to the roles model, perform one of the following: • • Add a new service to an existing role or roles.
common management_server EOT • The second stanza lists all the services: services = <
Example 4-2 Sample service.ini FIle [Config] # Is the service included in the default configuration? [0/1] service_recommended = 0 or 1 # How many nodes can each server handle optimally? # (Use 1000000 if there should be only one server per system.) optimal_scale_out = some_value # Must the service be run on the head node? [0/1] # Is it advantageous to run the service on the head node? [0/1] # (No more than one of these should be 1.
head_node_desired Assigning 1 to this parameter indicates that running the service on the head node is beneficial but not necessary. Assigning 0 indicates that the service does not benefit from running on the head node. Do not assign 1 to both the head_node_required and head_node_desired parameters. If you assign 1 to the head_node_desired parameter, assign 0 to the head_node_required parameter.
1 0 1 3 Nodes assigned service roles Exclusive servers Non-exclusive servers Compute nodes Role HN HN Ext Ext Exc Recommend Assigned Rec Req Rec Req Rec Rec Role ---------------------------------------------------------------3 3 compute 1 1 disk_io 1 1 external (optional) 1 0 login (optional) 1 1 management_hub 1 1 management_server 1 0 nis_server (optional) 1 1 resource_management The column headings in the middle of the report correspond to parameters in the service.
4. Use the text editor of your choice to edit the roles_services.ini file as follows: a. Add the name of the new service to the stanza that lists the services. services = <
a. Add the name of the new role to the stanza that lists all the roles: roles = <
5 Managing Licenses This chapter describes the following topics: • • • “License Manager and License File” (page 79) “Determining If the License Manager Is Running” (page 79) “Starting and Stopping the License Manager” (page 79) 5.1 License Manager and License File The license manager service runs on the head node and maintains licensing information for software on the HP XC system. You can find additional information on the FLEXlm license manager at the Macrovision web address: http://www.macrovision.
5.3.1 Starting the License Manager Use the following command to start the license manager: # service hptc-lm start 5.3.2 Stopping the License Manager Use the following command to stop the license manager: # service hptc-lm stop 5.3.
6 Managing the Configuration and Management Database The configuration and management database, CMDB, is key to the configuration of the HP XC system. It keeps track of which nodes are enabled or disabled, the services that a node provides, the services that a node receives, and so on.
host_name: hwaddr: ipaddr: level: location: netmask: region: cp-n1 00:e0:8b:01:02:03 172.21.0.1 1 Level 1 Switch 172.20.65.2, Port 40 255.224.0.0 0 cp-n2: cp_type: host_name: hwaddr: ipaddr: level: location: netmask: region: IPMI cp-n2 00:e0:8b:01:02:04 172.21.0.2 1 Level 1 Switch 172.20.65.2, Port 41 255.224.0.0 0 . . .
6.2.3 Displaying Blade Enclosure Information You can use the shownode command to provide information for the HP XC system with HP BladeSystems. With this command, you can: • • • • Display the names of the blade enclosures and the nodes (server blades) within them. List all the blade enclosures in the HP XC system and, for each, the nodes within them. To list the nodes for a specified blade enclosure. To list the blade enclosure for a specified node. For more information, see shownode(1). 6.
in .Log. Archiving has the advantage of decreasing the size of the log tables, which enables the shownode metrics command to run more quickly. You must be superuser to use this command. Use the --archive archive-file option to specify the file to which the sensor data will be archived. The --tmpdir directory option lets you assign a temporary directory for use while archiving. You can retain previous archives with the --keep n option.
command. See “Archiving Sensor Data from the Configuration Database” (page 83) for a description of the time parameter. The following command purges sensor data older than two weeks: # managedb purge 2w For more information on the managedb command, see managedb(8). The archive.cron script enables you to automate the process of archiving sensor data from the CMDB. For more information on this script, see archive.cron(4). 6.
7 Monitoring the System System monitoring can identify situations before they become problems. This chapter addresses the following topics: • • • • • • • • • • “Monitoring Tools” (page 87) “Monitoring Strategy” (page 88) “Displaying System Environment Data” (page 89) “Monitoring Disks” (page 90) “Displaying System Statistics” (page 90) “Logging Node Events” (page 92) “The collectl Utility” (page 94) “HP Graph” (page 97) “The resmon Utility” (page 101) “The netdump and crash Utilities” (page 102) 7.
The Supermon components consist of the kernel modules to collect the statistics, the mond and supermond daemons, and the script to load and configure the daemons. The data collected by Supermon includes system performance sensor and environment data, such as fan, temperature, and power supply status. This data is collected on a regular basis. — The syslog and syslog-ng Services The syslog service runs on each node in the HP XC system.
Figure 7-1 System Monitoring The mond and syslog daemons run on every node. The Supermon service manages requests for mond daemons that run on a subset of nodes. The mond daemon can be configured to pass any metric data for aggregation to the parent Supermon service. The Nagios master and other Nagios monitors run their check_metrics plug-in periodically, which causes Supermon data collection and storage into the database.
management interface include the hpasm package. You can use the /sbin/hplog utility to display the following environment data: • • • Thermal sensor data Fan data Power data In addition, most hpasm errors are logged to the syslog system logger. For more information, see hpasm(4) and hplog(8). 7.4 Monitoring Disks The Self-Monitoring, Analysis and Reporting Technology (SMART) system is built into many IDE, SCSI-3, and other hard drives.
7.5.2 Monitoring Processor Usage and Load from the Command Line The shownode metrics cpus command displays the nice value (this value reflects the amount of time the CPU has spent in user mode with low priority) as well as the user, system, and idle times (in milliseconds) for each processor on the node from which this command is issued.
date and time stamp date and time stamp |n15 |n16 |6289408 |0 |1134780 |0 |5154628 |0 |1164160 |0 7.6 Logging Node Events This section describes how the HP XC system uses the syslog and syslogng_forward services to log node events and how these events are arranged according to the syslog-ng.conf rules file. 7.6.1 Understanding the Event Logging Structure The HP XC System Software uses aggregator nodes to log events from clients.
Filters Destinations Logs Define the rules to segregate messages. For example, messages can be separated by host, severity, facility, and so on. Contains the devices and files where the messages are sent or saved. Combines the sources, filters, and destination into specific rules to handle the different messages. You can use a text editor, such as emacs or vi, to read the log files, and you can use a variety of text manipulation commands to find, sort, and format these log files. 7.6.
9. Update the golden image to ensure a permanent change. For more information on updating the golden image, see Chapter 11 (page 139) . 7.7 The collectl Utility The collectl utility collects data on the nodes of the HP XC system. As a development or debug tool, the collectl utility typically gathers more detail more frequently than the supermon utility. The collectl utility does have some overhead, but for most situations, it consumes less than 0.
Example 7-1 Using the collectl Utility from the Command Line # collectl waiting for 10 second sample... ### RECORD 1 >>> n3 <<< (m.n) (date and time stamp) ### # CPU SUMMARY (INTR, CTXSW & PROC /sec) # USER NICE SYS IDLE WAIT INTR CTXSW PROC RUNQ RUN AVG1 AVG5 AVG15 0 0 0 99 0 1055 65 0 151 0 0.02 0.04 0.
By default, the collectl service gathers information on the following subsystems: • • • • • • • • • CPU Disk Inode and file system Lustre file system Memory Networks Sockets TCP Interconnect The collectl(1) manpage discusses running the collectl utility as a service. 7.7.3 Running the collectl Utility in a Batch Job Submission You can run the collectl utility as one job in a batch job submission.
7.8 HP Graph The RRDtool software tool is integrated into the HP XC system to create and displays graphs about the network bandwidth and other system utilization. You can access this display by selecting HP Graph in the Nagios menu. Figure 7-4 is an example of the default display. It provides an overview of the system with graphs for node allocation, CPU usage, memory, Ethernet traffic, and, if relevant, Interconnect traffic.
Figure 7-4 HP Graph System Display By selecting an item in the menu in the upper left-hand side, you can specify the graphical data on any Nagios host. Figure 7-5 shows the graphical data for one node on the system.
NOTE: The detail graphs for a system display show the graphs for a specified metric on all the Nagios hosts. The detail graphs for a Nagios host display show all the applicable metrics for that Nagios host. 7.
Figure 7-5 HP Graph Host Display 100 Monitoring the System
The Metric menu influences the display of the detail graphs for a system display. This menu offers the following choices: bytes in bytes out cpu idle cpu iowait cpu system cpu usage This graph reports the rate of data received on all network devices on the node. This graph reports the rate of data transmitted on all network devices on the node. This graph indicates how much of the node's CPU set was available for other tasks.
useful commands to collect and present data in a scalable and intuitive fashion. The Web pages update automatically at a preconfigured interval (120 seconds by default). To open the Web page, open a browser on the head node and point it to the following: https://head_node_fully_qualified_domain_name/resmon. You are prompted to supply your Nagios user name and password (which were defined during the initial installation and configuration of the HP XC system).
The rev_number and platform parameters have the same meaning as in step 2. At the time of this publication, the revision number is 4.0-2.17.1hp. 4. Distribute the software to the appropriate nodes in the HP XC system. See Chapter 11 (page 139) for more information. 7.10.2 Configuring Netdump This section describes how to configure the netdump-server and the netdump client. 7.10.2.1 Configuring the Netdump Client Use the following procedure to configure the netdump client: 1. 2.
See netdump(8) and netdump-server(8) for more information. 7.10.4 Obtaining the Kernel Dump When a node running the netdump service experiences a kernel crash, its oops message and kernel memory are automatically transmitted to the netdump-server node. This data is stored in a subdirectory (identified by date and time) of the /var/crash directory on the netdump-server node. After the data is saved, the node that crashed reboots. 7.10.
8 Monitoring the System with Nagios The Nagios open source application has been customized and configured to monitor the HP XC system and network health. This chapter introduces Nagios and discusses these modifications.
This section addresses the following topics: • • • • • “Nagios Components” (page 106) “Nagios Hosts” (page 106) “Nagios Plug-Ins” (page 106) “Nagios Web Interface” (page 107) “Nagios Files” (page 107) 8.1.1 Nagios Components The components that comprise Nagios are as follows: Nagios • Nagios engine • Nagios Web interface • Nagiostats tool Standard Plug-Ins These plug-ins are not configured for any particular system. Although they are all provided, not all these plug-ins are used on HP XC systems.
• • • Syslog alerts status System event log System free space status report For more information on the services monitored by Nagios and the type of function monitored for that service, see Table 8-2. 8.1.4 Nagios Web Interface Nagios provides a Web interface capable of displaying current system and networking information in a browser window. See “Using the Nagios Web Interface” (page 107) for more information. 8.1.
Figure 8-1 Nagios Main Window You can choose any of the options on the left navigation bar. These options are shown in Figure 8-2.
Figure 8-2 Nagios Menu (Truncated) After you chose an option from the window, you are initially prompted for a login and a password. This login and the password were established when the HP XC system is configured. Usually, the login name is nagiosadmin. The Nagios passwords are maintained in the /opt/hptc/nagios/etc/htpasswd.users file. Use the htpasswd command to manipulate this file to add a user, to delete a user, or to change the user password. Nagios offers various views of the HP XC system.
Note: The term Hosts on the Nagios window refers to any entity with an IP address, not just nodes. For example, Nagios monitors the 1,024 nodes and four switches in an HP XC system, and reports on the status of 1,028 hosts. SFS is also an example of a Nagios host; Nagios finds the name of the SFS server and displays its status. Keep this in mind when using the Nagios application. The HP XC System Software provides plug-ins that monitor these and other system statistics. 8.2.
Figure 8-4 Nagios Service Detail View The Status column identifies problems. In this example, the Status column flags a problem with the head node's Slurm Monitor. Selecting a link for a Nagios service in the Service column opens the Nagios Service Information view for the corresponding Nagios service. For example, selecting the Slurm Monitor link in this example opens the following view, as shown in Figure 8-5. 8.
Figure 8-5 Nagios Service Information View You can also use the Nagios report generator utility, nrg, to obtain an analysis of the Nagios service (plug-in). Select the analyze option to display a two-column listing of service status. The following is the command line entry for this feature: # nrg --mode analyze nh [Slurm Monitor - Critical] 'sinfo' reported problems with nodes in some partitions, specifically, some nodes may be marked with an '*' which indicates they may be unresponsive to SLURM.
Figure 8-6 Nagios Service Problems View Selecting the link corresponding to the Nagios Host opens the Nagios Host Information view for that Nagios host. Figure 8-7 is an example of the Nagios Host Information view displayed by selecting the link for xc10n4 in the Nagios Service Problems view shown in Figure 8-6. 8.
Figure 8-7 Nagios Host Information View You can also use the Nagios report generator utility, nrg, to obtain an analysis of the Nagios service (plug-in). Select the analyze option to display a two-column listing of service status. The following is the command line entry for this feature: # nrg --mode analyze nh [Slurm Monitor - Critical] 'sinfo' reported problems with nodes in some partitions, specifically, some nodes may be marked with an '*' which indicates they may be unresponsive to SLURM.
• • • • “Changing Sensor Thresholds” (page 117) “Adjusting the Time Allotted for Metrics Collection” (page 118) “Changing the Default Nagios User Name” (page 118) “Disabling Individual Nagios Plug-Ins” (page 120) 8.3.1 Stopping and Restarting Nagios Nagios can record a multitude of alerts on large systems when many nodes undergo known maintenance operations. These operations can include restarting or shutting down the HP XC system.
NOTE: If you change the nagios_vars.ini file, you must propagate it to all nodes. For more information, see Chapter 11 (page 139) Figure 8-8 Nagios Configuration When you change the Nagios configuration, you must perform the following tasks: 1. 2. 3. 4. 5. 6. Read the Nagios documentation carefully. Change the template files accordingly. Stop the Nagios service. For instructions on how to stop the Nagios service, see “Stopping and Restarting Nagios” (page 115).
NOTE: Ensure that the sendmail utility is running. For information on the implementation of the sendmail utility on the HP XC system, see “Modifying Sendmail” (page 133). You can customize the Nagios configuration to specify whom to contact by editing the /opt/hptc/nagios/etc/contacts.cfg file.
8.3.5 Adjusting the Time Allotted for Metrics Collection Table 8-1 displays the default collection intervals for the Supermon Metrics Monitor service. The Supermon Metrics Monitor schedules and collects individual metrics at a specified interval. You can change an interval. The interval must be a multiple of the time specified by the value of the normal_check_interval parameter defined in the /opt/hptc/nagios/etc/templates/nagios_template.cfg or /opt/hptc/nagios/etc/templates/nagios_monitor.
2. Verify the Nagios user ID: # grep nagios /etc/passwd nagios:x:222:222::/home/nagios:/bin/bash NOTE: 3. The default Nagios user account ID, nagios, is 222. Use the standard user account utilities to delete the nagios user account, then add another: # userdel –r nagios # useradd –u 222 –g hpadm newname Alternatively, you can use NIS to change the user account name if this appropriate for your site. NOTE: 4. This example retains the default user ID for Nagios.
11. Restart Nagios. For instructions on how to restart Nagios, see “Stopping and Restarting Nagios” (page 115). 8.3.7 Disabling Individual Nagios Plug-Ins All the Nagios plug-ins developed for the HP XC system are enabled by default. However, you can modify the /opt/hptc/nagios/etc/templates/*_template.cfg files to customize the service checks as needed. IMPORTANT: Do not modify files in the /opt/hptc/nagios/etc directory with file names of the form *_local.cfg or xc_*.cfg.
column of the Nagios Service Detail View and Service Problems View windows. Figure 8-4 (page 111) and Figure 8-6 (page 113) show an example of these windows, respectively. Table 8-2 Monitored Nagios Services Category Nagios Service Function Monitoring Plug-Ins Configuration Monitor This plug-in updates node configuration. It periodically generates and updates configuration display information for all nodes in the HP XC system (see Configuration).
Table 8-2 Monitored Nagios Services (continued) Category Nagios Service Function System Service Reports Apache HTTPS Server This plug-in monitors the Web server providing the Nagios Web interface. Root key synchronization This plug-in verifies that the ssh configuration files are synchronized across the HP XC system. Switch status This plug-in gathers switch status and metrics through SNMP.
Service Description Specifies the Nagios service name. Actively launched on service node? Indicates whether or not Nagios periodically runs this service check at the specified normal check interval. Max Check Attempts Indicates the number of times Nagios examines the service before reporting a failure. Normal check Indicates the frequency of the check interval. Retry check interval Indicates the amount of time Nagios waits before retrying after a failure.
NOTE: These default settings may have been altered by site customizations. To display the current values for your installation, use the Nagios Web interface: select View Config from the Configuration section under the Nagios menu. 8.4.3 Understanding Nagios Alert Messages The HP XC System Software provides several value-added plug-ins that can generate alert messages based on patterns provided by various data sources, such as syslog and the Hardware System Event logs.
[n47] Power Unit Power Redundancy Redundancy Lost 7 8 9 A date and time stamp indicating when the cause for the alert happened. How long the message waited in the nand queue, that is, how much time elapsed before this message was mailed. The nand sequence number. The nand daemon receives and batches messages generated by Nagios and sends them by e-mail. 8.4.4 System Event Log Monitoring This section explains the system event log and describe configuration details. 8.4.4.
8.5 Nan Notification Aggregator and Delimiter The HP XC System Software incorporates the Nan notification aggregator and delimiter for the Nagios paging system. Nan is an open source supplement to the Nagios application. Nagios is capable of sending quantities of messages especially when the system is starting up, shutting down, or experiencing a failure.
Example 8-1 The nrg Utility System State Analysis # nrg --mode analyze Nodelist ---------------------n[3-7] nh Description --------------------------------------------------[Environment - NODATA] No sensor data is available for reporting. Use 'shownode metrics sensors -last 20m node xxxx' for each of these nodes to verify if sensor data has been recently collected. This status is drawn from the same source as the shownode metrics sensors command.
Run 'sinfo' for more information. n[3-7] [Slurm Status - Critical] sinfo reported problems with partitions for this node nh [Supermon Metrics Monitor - Critical] The metrics monitor has returned a critical status indicating a number of nodes have reported critical thresholds. If the actual status is 'Service timed out' then the monitor has taken too long to complete a single iteration.
— — — • • • • Ok Unkown Pending Nagios hosts status. Nagios services status. Nagios monitors status. A list of nodes that are up or down. For additional information about this utility, see nrg(8).
9 Network Administration This chapter addresses the following network topics: • • • • “Network Address Translation Administration” (page 131) “Network Time Protocol Service” (page 132) “Changing the External IP Address of a Head Node” (page 133) “Modifying Sendmail” (page 133) 9.1 Network Address Translation Administration Network Address Translation (NAT) enables compute nodes that do not contain external devices to have external network access.
Improved Availability Is Not in Effect You establish the external role assignment when you configure the HP XC system using the cluster_config utility. When nodes are configured as NAT clients, the default gateways are established. By default, each NAT client has a single gateway. If a NAT server fails, however, the NAT client loses connectivity. You can configure a system for multiple gateways to lessen the possibility of loss of connectivity, but the system may have performance problems.
Other tools (ntpq and ntpdc) are also available. For more information on NTP, see ntpd(1), ntpdc(1), and ntpq(1). 9.3 Changing the External IP Address of a Head Node Use the following procedure to change the external IP address of the head node: NOTE: This procedure requires you to reboot the head node. 1. Edit the /etc/sysconfig/netinfo file as follows: a. Specify the new head node external IP address in the --ip option of the network command. b.
10 Managing Patches and RPM Updates This chapter addresses the following topics: • • • • “Sources for Software Packages and Information” (page 135) “Downloading and Installing Patches” (page 135) “Rebuild Kernel Dependent Modules” (page 136) “Rebuilding Serviceguard Modules” (page 136) 10.1 Sources for Software Packages and Information For each supported version of the HP XC System Software, HP releases all Linux security updates and HP XC software patches on the HP IT Resource Center (ITRC) website.
3. 4. 5. 6. From the registration confirmation window, select the option to go directly to the ITRC home page. From the IT Resource Center home page, select patch/firmware database from the maintenance and support (hp products) list. From the patch / firmware database page, select Linux under find individual patches. From the search for patches page, in step 1 of the search utility, select vendor and version, select hpxc as the vendor. Select the HP XC version that is appropriate for the cluster platform.
NOTE: As an alternative, you can reinstall the Serviceguard RPMs rather than rebuilding the kernel modules. The /usr/local/cmcluster/bin directory is the default location of the Serviceguard commands. If you installed Serviceguard in a location other than the default, look in the /etc/cmcluster.conf file for the location of the Serviceguard bin directory. The examples in this procedure use the default location; specify another path if you installed Serviceguard in a different directory. 1.
11 Distributing Software Throughout the System This chapter addresses the following topics: • “Overview of the Image Replication and Distribution Environment” (page 139) • “Installing and Distributing Software Patches” (page 140) • “Adding Software or Modifying Files on the Golden Client” (page 140) • “Determining Which Nodes Will Be Imaged” (page 145) • “Golden Image Checksum” (page 145) • “Updating the Golden Image” (page 146) • “Propagating the Golden Image to All Nodes” (page 149) • “Maintaining a Globa
3. Distribute the golden image or individual files to the client nodes. See “Propagating the Golden Image to All Nodes” (page 149). 11.2 Installing and Distributing Software Patches The following is a generic procedure for installing software patches: 1. 2. Log in as superuser (root) on the head node. Use the rpm command to install the software package on the head node: # rpm -Uvh package.
configuration file that is replicated across the HP XC system, such as a NIS or NTP configuration file. Note: It is important to have a consistent procedure for managing software updates and changes to your HP XC system. If you need to add a software package or service configuration file to the golden client that should not be distributed to all nodes, be sure to prevent these files from becoming part of the golden image by using an exclusion file. See “Exclusion Files” (page 149) for further details.
3. In the /var/lib/systemimager/scripts directory, create symbolic links to this master autoinstallation script for the nodes that will receive this override. The symbolic link names must follow the format name.sh, where name is the host name of each node to receive the override. For further information on using overrides in the SystemImager environment, see the FAQ chapter in the SystemImager Manual, located at the following web address: http://systemimager.
# for i in $(expandnodes $(shownode servers lvs)) do ln -sf compiler.master.0 $i.sh done 10. Verify that the links are correct # ls -l . . . lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx 1 root 1 root 1 root root root root ... n7.sh -> compiler.master.0 ... n8.sh -> compiler.master.0 ... n9.sh -> compiler.master.0 Now the system can be imaged.
[override_n8_override] DIR = /var/lib/systemimager/overrides/n8_override [override_base_image] DIR = /var/lib/systemimager/overrides/base_image [override_compiler] DIR = /var/lib/systemimager/overrides/compiler Save the file and exit the text editor. 8. Reimage the nodes: # setnode --resync --all # stopsys # startsys --image_and_boot 9. Verify that the propagation occurred as expected by examining the files on the node. 11.3.
Global service configuration scripts are located in the /opt/hptc/etc/gconfig.d directory. • Node-Specific Configuration The node-specific service configuration step uses the results of the global service configuration step described previously to apply to a specific node its “personality” with respect to the service. User interaction is not permitted because this step runs on a per-node basis.
NTP_SERVER2: NTP_SERVER3: RPCNFSDCOUNT: 8 xc_version: Version number The table entry golden_image_md5sum identifies the MD5 checksum of the golden image file structure. The table entry golden_image_modification_time identifies the date and time the current golden image was created. The table entry golden_image_tar_valid is set to 1 when the compressed tar file of the golden image is created. It is set to 0 while the during the creation of the golden image tar file.
Whichever method you use to update the golden image, you can protect the golden image from contamination with golden client (head node) specific personality by using an exclusion file. This exclusion file is passed to the rsync command as a list of exclude patterns. For a detailed description of exclusion files, and how to use exclusion files to manage software updates, see “Exclusion Files” (page 149). Note: Before updating the golden image, make a copy in case you need to revert back.
The --infile and --outfile options to the cluster_config command enable you to export and import configuration data, and to bypass much of the cluster_config utility interface if the hardware configuration is unchanged. The configuration data includes the configuration of availability sets (for systems using improved availability), role assignments, and external connections. The following lists the sequence of events for using the cluster_config command with these options: 1.
# updateimage --gc `nodename` --no-netboot 11.6.3 Exclusion Files Exclusion files protect the golden image from being contaminated with node-specific content from the golden client. While the golden client represents the configuration from which all other nodes are replicated, the golden client is also an actively participating node, and has its own configuration. A simple example of the types of files to be excluded is a log file, one of many such files in the /var directory.
11.7.1 Using the Full Imaging Installation A recommended procedure to propagate the golden image is to install all client nodes automatically. This ensures that they receive the updated image and any updated configuration information automatically. When all nodes are set to network boot, a reboot of each client node starts the automatic installation. After each node completes its installation, it automatically reboots and is available for service.
the service in the default OFF orientation, because few nodes run this service. Use the following procedure: 1. 2. 3. Log in as the superuser on the head node. Use the squeue command to ensure that no current LSF-HPC with SLURM jobs are running and that the SLURM queues are idle. Install the software on the golden client. Note the names of the services that are created as a result. 4. 5. 6. 7.
12 Opening an IP Port in the Firewall This chapter addresses the following topics: • • “Open Ports” (page 153) “Opening Ports in the Firewall” (page 154) 12.1 Open Ports Each node in an HP XC system is set up with an IP firewall, for security purposes, to block communications on unused network ports. External system access is restricted to a small set of externally exposed ports. Table 12-1 lists the base ports that are always open by default; these ports are labeled “External”.
Table 12-2 Service Ports Service Internal or External Port Number Protocol Comments Flamethrower Internal 9000 to 9020 udp The highest port number used is based on the number of modules configured to udpcast. Usually, the upper limit is 9020. LSF External 6878 to 6879, 6881 to 6883 tcp/udp Only if the HP XC system is set up as a member of a larger LSF cluster.
12.2.1 Opening a Temporary Port in the Firewall The openipport command enables the superuser to open an IP service port in the firewall using the following information: • • • The port number to open The protocol to be used The list of interfaces on which the port is to be opened NOTE: Use the openipport command judiciously. The port remains open unless or until the node is reimaged, even if the node is rebooted.
Notes: For clarity, the mnemonics for the interface are shown in bold and the noncomment lines span two lines. Noncomment lines each must take only one line in the iptables.proto file.
13 Connecting to a Remote Console This chapter addresses the following topics: • • “Console Management Facility” (page 157) “Accessing a Remote Console” (page 157) 13.1 Console Management Facility The Console Management Facility (CMF) daemon collects and stores console output for all other nodes on the system. This information is stored individually for every node and is backed up periodically. This information is stored under dated directories in the /hptc_cluster/adm/logs/cmf.dated/current directory.
6. Enter the escape character returned by the console command in Step 3 to end the connection. Note: Some nodes, depending on the machine type, accept a key sequence to enter and exit their command-line mode. See Table 13-1 to determine if these key sequences apply to your node machine type. Do not enter the key sequence to enter command-line mode. Doing so stops the Console Management Facility (CMF) from logging the console data for the node.
14 Managing Local User Accounts and Passwords This chapter describes how to add, modify, and delete a local user account on the HP XC system.
2. Collect as much of the following information about this account as possible: • Login name This information is required. • • User's name User's password Note: A customary practice is to assign a temporary password that the user changes with the passwd command, but this data must be propagated to all the other system nodes also. See “Distributing Software Throughout the System” (page 139) for more information.
Note: Make sure that users who change their user account parameters do so on the golden client node, or that they notify you from which node they changed their parameters. You must propagate these user account changes to all the other nodes in the system as described in “Distributing Software Throughout the System” (page 139). 14.5 Deleting a Local User Account Remove a user account with the userdel command; you must be superuser on the golden client node to use this command.
3. Use the text editor of your choice as follows: a. Open the temporary file. b. Append the following lines to the temporary file: # Download NIS maps according to update frequency. 20 * * * * /usr/lib/yp/ypxfr_1perhour 40 6 * * * /usr/lib/yp/ypxfr_1perday 55 6,18 * * * /usr/lib/yp/ypxfr_2perday c. 4. Save your changes and exit the text editor. Replace the existing root user's crontab file with the temporary file: # crontab /tmp/root_crontab 5. Remove the temporary file: # rm -f /tmp/root_crontab 6.
4. Run the following commands to update all the appropriate files throughout the HP XC system: # # # # pdcp pdcp pdcp pdcp -a -a -a -a -x -x -x -x `hostname` `hostname` `hostname` `hostname` /etc/passwd /etc/passwd /etc/group /etc/group /etc/shadow /etc/shadow /etc/gshadow /etc/gshadow When this step is complete, the root password is changed on all the nodes in the HP XC system. 14.8.
Enter 1-9 and press return: 8 1. Passwd settings 2. Access protocol settings Enter 1,2 and press return: 1 changing password for quadrics Current Password: New password: Retype new password: 14.8.3.3 Voltaire InfiniBand Switch Administrative Password The documentation that came with your model of the Voltaire InfiniBand switch describes how to set the administrative password.
4. Copy the /etc/shadow file into the cliPassWord.crpt file: # cp /etc/shadow /mnt/jffs/voltaire/config/cliPassWord.crpt 14.8.4 Changing the Console Port Password When you change the password for an iLO, MP, or IPMI, you must ensure that the console port password matches its counterpart in the CMDB. CAUTION: Changing the console port password can cause complications if you do not also update the CMDB manually with the new password. Updating the CMDB is beyond the scope of this document.
2. Use the ipmitool to set the BMC password.
4. Run the following commands to update all the appropriate files throughout the HP XC system: # # # # pdcp pdcp pdcp pdcp -a -a -a -a -x -x -x -x `hostname` `hostname` `hostname` `hostname` /etc/passwd /etc/passwd /etc/group /etc/group /etc/shadow /etc/shadow /etc/gshadow /etc/gshadow When this step is complete, the lsfadmin account password is changed on all the nodes in the HP XC system. 14.
15 Managing SLURM The HP XC system uses the Simple Linux Utility for Resource Management (SLURM).
primary slurmctld daemon. On returning to service, the primary slurmctld daemon regains control of the SLURM subsystem from the backup slurmctld daemon. SLURM offers a set of utilities that provide information about SLURM configuration, state, and jobs, most notably scontrol, squeue, and sinfo. See scontrol(1), squeue(1), and sinfo(1) for more information about these utilities. SLURM enables you to collect and analyze job accounting information.
Table 15-1 SLURM Configuration Settings (continued) Setting Default Value* PartitionName 'lsf RootOnly=YES Shared=FORCE Nodes=compute_nodes SwitchType switch/elan for systems with the Quadrics interconnect switch/none for systems with any other interconnect * Default values can be adjusted during installation. You can also use the scontrol show config command to examine the current SLURM configuration.
TIP: Run the badmin reconfig command after the spconfig command to update LSF HPC with the information on each node's static resources (that is, core and memory), as reported by SLURM. 15.2.1 Configuring SLURM System Interconnect Support SLURM has system interconnect support for Quadrics ELAN, which assists MPI jobs with the global exchange process during startup, when each process is establishing the communication channels with the other processes in the job.
Weight The scheduling priority of the node. Nodes of lower priority are scheduled before nodes of higher priority, all else being equal. To change the configuration of a set of nodes, first locate the line in the slurm.conf file that starts with the following text to specify the configuration: NodeName= Multiple node sets are allowed on the HP XC system; the initial configuration specifies a single node set. Consider a system that has 512 nodes, and all those nodes are in the same partition.
Note: The root-only lsf partition is provided for submitting and managing jobs through an interaction of SLURM and LSF. If you intend to use SLURM independently from LSF, consider configuring a separate SLURM partition for that purpose. Table 15-2 describes the SLURM partition characteristics available on HP XC systems. Table 15-2 SLURM Partition Characteristics Characteristic Description Nodes List of nodes that constitute this partition.
PartitionName=lsf RootOnly=yes Shared=Force Nodes=n[1-128] PartitionName=cs Default=YES Shared=YES Nodes=n[129-256] If you make any changes, be sure to run the scontrol reconfigure command to update SLURM with these new settings. 15.2.5 Configuring SLURM Features A standard element of SLURM is the ability to configure and subsequently use a feature. You can use features to assign characteristics to nodes to manage multiple node types. SLURM features are specified in the slurm.conf file.
Example 15-1 Using a SLURM Feature to Manage Multiple Node Types a. Use the text editor of your choice to edit the slurm.conf file to change the node configuration to the following: NodeName=exn[1-64] Procs=2 Feature=single,compute NodeName=exn[65-96] Procs=4 Feature=dual,compute NodeName=exn[97-98] Procs=4 Feature=service Save the file. b. Update SLURM with the new configuration: # scontrol reconfig c. Verify the configuration with the sinfo command. The output has been edited to fit on the page.
POSIX message queues stack size cpu time max user processes virtual memory file locks (bytes, -q) 819200 (kbytes, -s) 10240 (seconds, -t) unlimited (-u) 8113 (kbytes, -v) unlimited (-x) unlimited Only soft resource limits can be manipulated. Soft and hard resource limits differ.
If you make any changes, be sure to run the scontrol reconfigure command to update SLURM with these new settings. If a user tries to propagate a resource limit with the srun --propagate command, but the compute node has a lower hard limit than the soft limit, an error message results: $ srun --propagate=RLIMIT_CORE . . . Can't propagate RLIMIT_CORE of 100000 from submit host. For more information, see slurm.conf(5). 15.
SLURM job accounting attempts to gather all the statistics available on the systems on which it is run.
Note: The bacct command reports a slightly increased value for a job's runtime when compared to the value reported by the sacct command. LSF-HPC with SLURM sums the resource usage values reported by itself and SLURM. 15.4.2 Disabling Job Accounting Job accounting is turned on by default. Note that job accounting is required if you are using LSF. Follow this procedure to turn off job accounting: 1. 2. Log in as the superuser on the SLURM server (see “Configuring SLURM Servers” (page 172)).
Note: You must specify an absolute pathname for the log file; it must begin with the / character. You can choose to isolate this data log on one node or in the /htpc_cluster directory so that all nodes can access it. However, this log file must be accessible to the following: • • • Nodes that run the slurmctld daemon LSF Any node from which you execute the sacct command Note: Ensure that the log file is located on a file system with adequate storage to avoid file system full conditions.
MaxSendRetryDelay StaggerSlotSize The maximum number of seconds to pause before sending an accounting message. The actual delay is a random value between 1 and this value. The default value is 5 seconds. Generally, the increment of time a process pauses before sending its message. For n tasks, an equal number of staggered time slots are defined in increments of (StaggerSlotSize * 0.001) seconds.
lsf swaptest up up infinite infinite 1 4 down n17 idle n[1-4] In this example, node n17 is down. The squeue utility reports the state of jobs currently running under the SLURM's control. For more information about the squeue utility, see squeue(1). The SLURM log files on each node in /var/slurm/log are helpful for diagnosing specific problems. The log files slurmctld.log and slurmd.log log entries from their respective daemons.
Table 15-4 Output of the sinfo command for Various Transitions (continued) Transition Cause: sinfo shows: Node fails while no job is running on idle the node. idle* Node fails while a job is running on the node The System Administrator sets the node state to down. The System Administrator sets the node state to drain while a job is running on the node. The System Administrator sets the node state to drain while a job is running on the node.
NOTE: If the user logged in from a node that is also a compute node, the epilog script also ends the user's login. You can avoid this problem by editing the EPILOG_EXCLUDE_NODES variable in the epilog file. It is empty by default. Specify the host names of the login nodes, separated by spaces, so that the epilog script does not kill the user jobs on those nodes; for example: EPILOG_EXCLUDE_NODES="n101 n102 n103 n104 n105" The SLURM epilog is located at /opt/hptc/slurm/etc/slurm.epilog.clean initially.
15.9 Enabling SLURM to Recognize a New Node Use the following procedure to enable SLURM to recognize a new node, that is, a node known to the HP XC system but not managed by SLURM. This procedure adds node n9 to the SLURM lsf partition, which already consists of nodes n1 through n8. 1. 2. Log in to the head node as the superuser (root). Log in to the node to be added to gather information on the node's characteristics: a.
NodeName=n[1-5] Procs=2 RealMemory=1994 NodeName=n[6-8] Procs=2 RealMemory=4032 NodeName=n9 Procs=2 RealMemory=2008 NOTE: If node the value for the RealMemory characteristic of n9 were 4032 in the example, the portion of the file would be changed to the following: NodeName=n[1-5] Procs=2 RealMemory=1994 NodeName=n[6-9] Procs=2 RealMemory=4032 The order of NodeNames arguments listed in this file is important because SLURM uses this to determine the contiguity of the nodes. b.
16 Managing LSF The Load Sharing Facility (LSF) from Platform Computing Corporation is a batch system resource manager used on the HP XC system. LSF is an integral part of the HP XC environment. On an HP XC system, a job is submitted to LSF, which places the job in a queue and allows it to run when the necessary resources become available. In addition to launching jobs, LSF provides extensive job management and information capabilities.
Standard LSF-HPC is installed and configured on all nodes of the HP XC system by default. The LSF RPM places the LSF tar files from Platform Computing in the /opt/hptc/lsf/files/lsf/ directory. Standard LSF-HPC is installed, during the operation of the cluster_config utility, in the /opt/hptc/lsf/top directory.
16.2.1 Integration of LSF-HPC with SLURM The LSF component of the LSF-HPC with SLURM product acts primarily as the workload scheduler and node allocator running on top of SLURM. The SLURM component provides a job execution and monitoring layer for LSF-HPC with SLURM. LSF-HPC with SLURM uses SLURM interfaces to perform the following: • • • • • • To query system topology information for scheduling purposes. To create allocations for user jobs. To dispatch and launch user jobs. To monitor user job status.
The environment in which the job is launched contains SLURM and LSF-HPC with SLURM environment variables that describe the job's allocation. SLURM srun commands in the user's job use the SLURM environment variables to distribute the tasks throughout the allocation. The integration of LSF-HPC with SLURM has one drawback: the bsub command's -i option for providing input to the user job is not supported. A workaround is to provide any file input directly to the job.
The LSB_HOSTS and LSB_MCPU_HOSTS environment variables, as initially established by LSF-HPC with SLURM, do not accurately reflect the host names of the HP XC system nodes that SLURM allocated for the user's job. This JOB_STARTER script corrects these environment variables so that existing applications compatible with LSF can use them without further adjustment. The SLURM srun command used by the JOB_STARTER script ensures that every interactive job submitted by a user begins on the first allocated node.
Table 16-1 LSF-HPC with SLURM Interpretation of SLURM Node States (continued) Node Description In Use A node in any of the following states: Unavailable ALLOCATED The node is allocated to a job. COMPLETING The node is allocated to a job that is in the process of completing. The node state is removed when all the job processes have ended and the SLURM epilog program (if any) has ended. DRAINING The node is currently running a job but will not be allocated to additional jobs.
2. 3. 4. 5. 6. 7. Rerun the cluster_config utility. Proceed through the process until you reach the LSF section. When you are prompted to configure LSF, enter yes. When prompted, select the type of LSF you want to install: • Standard LSF-HPC is choice 1. • LSF-HPC with SLURM is choice 2, the default. When prompted, enter d to delete the existing LSF installation. Answer the remainder of the questions as appropriate for your system. The cluster_config updates the golden image.
• HP OEM licensing is configured. HP OEM licensing is enabled in LSF-HPC with SLURM by adding the following string to the configuration file, /opt/hptc/lsf/top/conf/lsf.conf. This tells LSF-HPC with SLURM where to look for the shared object to interface with HP OEM licensing. XC_LIBLIC=/opt/hptc/lib/libsyslic.so • Access to LSF-HPC with SLURM from every node in the cluster is configured.
Alternatively, you can invoke the following command to start LSF-HPC with SLURM on the current node: # controllsf start here 16.5.2 Shutting Down LSF-HPC with SLURM At system shutdown, the /etc/init.d/lsf script ensures an orderly shutdown of LSF-HPC with SLURM. You can use the controllsf command, as shown here, to stop LSF-HPC with SLURM regardless of where it is active in the HP XC system: # controllsf stop 16.
Example 16-3 Basic Job Launch Without the JOB_STARTER Script Configured $ bsub -I hostname Job <20> is submitted to default queue . <> <> n120 Example 16-4 is a similar example, but 20 processors are reserved. Example 16-4 Launching Another Job Without the JOB_STARTER Script Configured $ bsub -I -n20 hostname Job <21> is submitted to default queue . <> <
You can use the -l (long) option to obtain detailed information about a job, as shown in this example: $ bjobs -l 116 Job <116>, User , Project default, Status , Queue , Co mmand date time: Submitted from host , CWD <$HOME>, Ou tput File <./>, 8 Processors Requested; date time: Started on 8 Hosts/Processors <8*lsfhost.
3. Copy the JOB_STARTER script to this new directory: # cp /opt/hptc/lsf/bin/job_starter.sh /hptc_cluster/lsf/bin/ 4. Use the text editor of your choice to edit the copied file as follows: a. Open the file. b. Locate the line with the /opt/hptc/bin/srun command: /opt/hptc/bin/srun -n1 -X /bin/env -u SLURM_NNODES -u SLURM_DISTRIBUTION \ c. Add the -u option as the first argument to the /opt/hptc/bin/srun command: /opt/hptc/bin/srun -u -n1 -X /bin/env -u SLURM_NNODES -u SLURM_DISTRIBUTION \ d. 5.
16.10 Job Accounting Standard LSF job accounting using the bacct command is available. The output of a job contains total CPU time and memory usage: $ cat 231.out . . . Resource usage summary: CPU time : Max Memory : Max Swap : 8252.65 sec. 4 MB 113 MB . . .
16.12 Load Indexes and Resource Information LSF-HPC with SLURM gathers limited resource information and load indexes from the LSF execution host and from its integration with SLURM. Not all indexes are reported because SLURM does not provide the same information that LSF-HPC with SLURM usually reports. The LSF lshosts and lsload commands are two common commands for obtaining resource information from LSF-HPC with SLURM.
/hptc_cluster/slurm/etc/slurm.conf; it is not obtained directly from the nodes. See the SLURM documentation for more information on configuring the slurm.conf file. 16.13 LSF-HPC with SLURM Monitoring LSF-HPC with SLURM is monitored and controlled by Nagios using the check_lsf plug-in.
Note: At least two nodes must have the resource management roles to enable LSF-HPC with SLURM failover. One is selected as the master (primary LSF execution host), and the others are considered backup nodes. At any time,LSF-HPC with SLURM daemons start and run only on the master node. The Nagios LSF failover module monitors the virtual IP associated with the primary LSF execution host.
If more than two nodes are assigned the resource management role, the first becomes the primary resource management host and the second becomes the backup SLURM host and the first LSF-HPC with SLURM failover candidate. Additional nodes with the resource management role can serve as LSF-HPC with SLURM failover nodes if the either or both of the first two nodes are down. Resource management candidate nodes are ordered in ASCII sort order by node name, after the head node, which is taken first.
2. Use the following command to stop LSF-HPC with SLURM: # controllsf stop 3. Use the following command to restart LSF-HPC with SLURM on another node; this example starts LSF-HPC with SLURM on node n18: # ssh n18 controllsf start here 16.15 Moving SLURM and LSF Daemons to Their Backup Nodes It may be necessary to move SLURM and LSF daemons from their primary node to their backup node. One reason for this is to perform maintenance on the primary node.
16.16 Enhancing LSF-HPC with SLURM You can set environment variables to influence the operation of LSF-HPC with SLURM in the HP XC system. These environment variables affect the operation directly and set thresholds for LSF-HPC with SLURM and SLURM interplay. 16.16.1 LSF-HPC with SLURM Enhancement Settings Table 16-3 describes the environment variables in the lsf.conf file that you can use to enhance LSF-HPC with SLURM. Table 16-3 Environment Variables for LSF-HPC with SLURM Enhancement (lsf.
Table 16-3 Environment Variables for LSF-HPC with SLURM Enhancement (lsf.conf File) (continued) Environment Variable Description LSF_ENABLE_EXTSCHEDULER=Y|y This setting enables external scheduling for LSF-HPC with SLURM The default value is Y, which is automatically set by lsfinstall. LSF_HPC_EXTENSIONS="ext_name,..." This setting enables Platform LSF extensions. This setting is undefined by default.
Table 16-3 Environment Variables for LSF-HPC with SLURM Enhancement (lsf.conf File) (continued) Environment Variable Description LSF_HPC_NCPU_COND=and|or This entry in the lsf.conf file defines how any two LSF_HPC_NCPU_* thresholds are combined. The default value is or. LSF_HPC_NCPU_INCREMENT=increment This entry in the lsf.conf file defines the upper limit for the number of processors that are changed since the last checking cycle. The default value is 0.
Table 16-4 describes the environment variables in the lsb.queues file that you can use to enhance LSF-HPC with SLURM. Table 16-4 Environment Variables for LSF-HPC with SLURM Enhancement (lsb.queues File) Environment Variable Description DEFAULT_EXTSCHED= SLURM[options[;options]...] This entry specifies SLURM allocation options for the queue. The -ext options to the bsub command are merged with DEFAULT_EXTSCHED options, and -ext options override any conflicting queue-level options set by DEFAULT_EXTSCHED.
Table 16-4 Environment Variables for LSF-HPC with SLURM Enhancement (lsb.queues File) (continued) Environment Variable Description You can use the MANDATORY_EXTSCHED environment variable in combination with DEFAULT_EXTSCHED in the same queue.
16.17 Configuring an External Virtual Host Name for LSF-HPC with SLURM on HP XC Systems An external virtual host name for LSF-HPC with SLURM on an HP XC system needs to be accessed from the external network. This access could be required if the HP XC system is added to an existing LSF cluster, or if the HP XC system is 'Multi-Clustered' with another LSF cluster. See the LSF documentation for more details on LSF Multi-Clusters. Perform the following steps to configure an external virtual host name: 1.
17 Managing Modulefiles This chapter describes how to load, unload, and examine modulefiles. Modulefiles provide a mechanism for accessing software commands and tools, particularly for third-party software. The HP XC System Software does not use modules for system-level manipulation. A modulefile contains the information that alters or sets shell environment variables, such as PATH and MANPATH. Some modulefiles are provided with the HP XC System Software and are available for you to load.
18 Mounting File Systems This chapter provides information and procedures for performing tasks to mount file systems that are internal and external to the HP XC system. It addresses the following topics: • “Overview of the Network File System on the HP XC System” (page 215) • “Understanding the Global fstab File” (page 215) • “Mounting Internal File Systems Throughout the HP XC System” (page 217) • “Mounting Remote File Systems” (page 222) 18.
Example 18-1 Unedited fstab.proto File # This file contains additional file system information # for all # the nodes in the HP XC cluster. When a node # boots, this file will be parsed and from it a new # /etc/fstab will be created. # # Do not edit /etc/fstab! Edit this file # (/hptc_cluster/etc/fstab) instead. # # How this file is organized: # # * Comments begin with # and continue to the end of line # # * Each non-comment line is a line that may be copied # to /etc/fstab verbatim.
The file systems can be either of the following: • External to the node, but internal to the HP XC system. “Mounting Internal File Systems Throughout the HP XC System” (page 217) describes this situation. The use of csys is strongly recommended. For more information, see csys(5). • External to the HP XC system. “Mounting Remote File Systems” (page 222) describes this situation. NFS mounting is recommended for remote file system mounting. 18.
Figure 18-1 Mounting an Internal File System HP XC Cluster . . . n59 n60 /scratch /dev/sdb1 n61 /scratch n62 /scratch n63 /scratch n64 . . . 18.3.1 Understanding the csys Utility in the Mounting Instructions The csys utility provides a facility for managing file systems on a systemwide basis. It works in conjunction with the mount and umount commands by providing a pseudo file system type. The csys utility is documented in csys(5).
interconnect) to be used. The hostaddress is specified either by its node name or by its IP address. The following entry uses the node name for n60 over the administration network: hostaddress=n60 The prefix ic- specifies the system interconnect. The following entry uses the node name for n60 over the system interconnect: hostaddress=ic-n60 The following entry uses the IP address for n60 over the administration network, 172.22.0.60.
Note: The node that exports the file system to the other nodes in the HP XC system must have the disk_io role. 3. Determine whether you want to mount this file system over the administration network or over the system interconnect. As a general rule, specify the administration network for administrative data and the system interconnect for application data. 4. Edit the /hptc_cluster/etc/fstab.proto file as follows: a.
7. Verify the internal file system mounting by entering the following command, which ensures that the file system is mounted on the nodes: # cexec -a "mount | grep /scratch" n62: 192.168.0.60:/scratch on /scratch type nfs (rw,hard,intr,bg,rsize=8192,wsize=8192,addr=192.168.0.60) n61: 192.168.0.60:/scratch on /scratch type nfs (rw,hard,intr,bg,rsize=8192,wsize=8192,addr=192.168.0.60) n63: 192.168.0.60:/scratch on /scratch type nfs (rw,hard,intr,bg,rsize=8192,wsize=8192,addr=192.168.0.
Example 18-2 The fstab.proto File Edited for Internal File System Mounting # This file contains additional file system information # for all # the nodes in the HP XC cluster. When a node # boots, this file will be parsed and from it a new # /etc/fstab will be created. # # Do not edit /etc/fstab! Edit this file # (/hptc_cluster/etc/fstab) instead.
Figure 18-2 Mounting a Remote File System HP XC Cluster . . . n21 n22 External Server n23 xeno /extra n24 n25 . . . 18.4.1 Understanding the Mounting Instructions The syntax of the fstab entry for remote mounting using NFS is as follows: exphost:expfs mountpoint fstype options Specifies the external server that is exporting the file system. exphost The exporting host can be expressed as an IP address or as a fully qualified domain name.
18.4.2 Mounting a Remote File System Use the following procedure to mount a remote file system to one or more nodes in an HP XC system: 1. Determine which file system to export. In this example, the file system /extra is exported by the external server xeno. 2. Ensure that this file system can be NFS exported. Note: This information is system dependent and is not covered in this document. Consult the documentation for the external server. 3. 4. Log in as superuser on the head node.
Example 18-3 The fstab.proto File Edited for Remote File System Mounting # This file contains additional file system information # for all # the nodes in the HP XC cluster. When a node # boots, this file will be parsed and from it a new # /etc/fstab will be created. # # Do not edit /etc/fstab! Edit this file # (/hptc_cluster/etc/fstab) instead.
19 Managing Software RAID Arrays The HP XC system can mirror data on a RAID array. This chapter addresses the following topics: • • • • • • “Overview of Software RAID” (page 227) “Installing Software RAID on the Head Node” (page 227) “Installing Software RAID on Client Nodes” (page 227) “Examining a Software RAID Array” (page 228) “Error Reporting” (page 229) “Removing Software RAID from Client Nodes” (page 229) 19.
Use the following procedure to install software RAID on client nodes: 1. 2. 3. Log in as superuser (root) on the head node. Generate a list of nodes on whose disks you will install the HP XC System Software with software RAID. Edit the /etc/systemimager/systemimager.
Active Working Failed Spare Devices Devices Devices Devices : : : : 2 2 0 0 Number 0 1 Major Minor RaidDevice State 8 1 0 active sync /dev/sda1 8 17 1 active sync /dev/sdb1 UUID : eead90a0:35c0bf46:9160b26b:2d754a4d Events : 0.10 Nagios uses the mdadm command to verify the status of the RAID array. 19.5 Error Reporting Errors can be reported during the installation of software RAID on a client node.
20 Using Diagnostic Tools This chapter discusses the diagnostic tools that the HP XC system provides. It addresses the following topics: • • • • “Using the sys_check Utility” (page 231) “Using the ovp Utility for System Verification” (page 231) “Using the dgemm Utility to Analyze Performance” (page 237) “Using the System Interconnect Diagnostic Tools” (page 238) Troubleshooting procedures are described in Chapter 21: Troubleshooting (page 245). 20.
• • • The administration network is operational. All application nodes are responding and available to run applications. The nodes in the HP XC system are performing optimally. These nodes are tested for the following: — CPU core usage — CPU core performance — Memory usage — Memory performance — Network performance under stress — Bidirectional network performance between pairs of nodes — Unidirectional network performance between pairs of nodes For a complete list of verification tests, see ovp(8).
Test list for license: file_integrity server_status Test list for SLURM: spconfig daemon_responds partition_state node_state Test list for LSF: identification hosts_static_resource_info hosts_status Test list for interconnect: myrinet/monitoring_line_card_setup Test list for nagios: configuration Test list for xring: xring (X) Test list for perf_health: cpu_usage memory_usage cpu memory network_stress network_bidirectional network_unidirectional Test list for myrinet_status: myrinet_status An 'X' indicates
By default, if any part of the verification fails, the ovp command ignores the test failure and continues with the next test. You can use the --failure_action option to control how the ovp command treats test failures. When you run the ovp command as superuser (root), it stores a record of the verification in a log file in the /hptc_cluster/adm/logs/ovp directory.
tests. Perform these tests on a large number of nodes for the most accurate results. The default value for the number of nodes is 4, which is the minimum value to use. The --all_group option enables you to select the node grouping size. network_bidirectional network_unidirectional Tests network performance between pairs of nodes using the Pallas benchmark's Exchange test. Tests network performance between pairs of nodes using the HP MPI ping_pong_ring test.
Testing cpu_usage ... The headnode is excluded from the cpu usage test. Number of nodes allocated for this test is 14 Job <102> is submitted to default queue . <> <>> All nodes have cpu usage less than 10%. +++ PASSED +++ This verification has completed successfully. A total of 1 test was run. Details of this verification have been recorded in: /hptc_cluster/lsf/home/ovp_n16_mmddyy.
/hptc_cluster/root/home/ovp_n16_mmddyy.tests Details of this verification have been recorded in: /hptc_cluster/root/home/ovp_n16_mmddyyr1.log 20.3 Using the dgemm Utility to Analyze Performance You can use the dgemm utility, in conjunction with other diagnostic utilities, to help detect nodes that may not be performing at their peak performance. When a processor is not performing at its peak efficiency, the dgemm utility displays a WARNING message.
2. Load the mpi/hp/default modulefile: # module load mpi/hp/default 3. Invoke the following command if you are superuser (root): # mpirun -prot -TCP -srun -v -p lsf -n max \ /opt/hptc/contrib/bin/dgemm.x Invoke the following command if you are not superuser: $ bsub -nmax -o ./ mpirun -prot -TCP -srun -v -n max \ /opt/hptc/contrib/bin/dgemm.x The max parameter is the maximum number of processors available to you in the lsf partition.
The gm_prodmode_mon diagnostic tool searches /etc/hosts for entries whose name matches the regular expression “MR0[NT][0–9][0–9]”. This command uses the links -dump command to obtain the current values and parses the output. The gm_prodmode_mon diagnostic tool generates an alert if any errors are found. All alerts are logged in the /var/log/messages file.
health to the network, either to a log file for generic usage or through a MySQL database (as is the case in HP XC systems). In addition to logging errors in the QsNet database, the swmlogger daemon also logs all errors to the /var/log/messages file. See the diagnostics section of the installation and operation guide for your model of HP cluster platform for additional information on the generic use of swmlogger.
Specifies the rail number. The default is 0. Clears out the log file directory. This ensures that if the file already exists, the old data is deleted before the new test is run, to ensure that the data is fresh from the current run. -r rail -clean HP recommends using this option. Specifies that you want to run this test only on a subset of nodes; the nodes parameter is a comma-separated list of nodes. The default is to run this test on all nodes.
-v -t timeout -N nodes -clean Specifies verbose output, which is required to identify which component or location is causing errors. Specifies the timeout value (in seconds), that is, the length to wait for any test to finish. The default value is 300. Enables you to run the qsnet2_level_test on only a subset of nodes. The argument nodes is a comma-separated list, for example: n1,n2,n4. The default operation is to run the qsnet2_level_test utility on all nodes.
Killed Test ran on: n1,n2,n3 Parsing output level3: n1 - (NodeId = 4) ERROR: Test incomplete level3: n2 - (NodeId = 3) ERROR: Test incomplete level3: n3 - (NodeId = 2) ERROR: Test incomplete Parsing complete Example 4 The following example parses the output files created from a previous run of this command. This example specifies the log file directory created after unzipping and extracting the qsnet2_drain_test log file, which is described in the next section.
• • Links reporting IB_TIMEOUT, meaning the node is down. Links reporting a state other than PORT_ACTIVE, meaning the link is down. The output that ib_prodmode_mon produces identifies the bad links so that you can take corrective action.
21 Troubleshooting This chapter provides information to help you troubleshoot problems with HP XC systems.
21.1.2 Mismatched Secure Shell Keys If a node on your system has a mismatched Secure Shell (ssh) key, review the following list for the source of the problem: • The node was not imaged, and was booted an old image, which had older ssh keys. In this instance, it is the image, not the keys, that is out of synchronization. You can solve this problem by imaging the node properly and rebooting. • The keys were regenerated on the head node.
21.2 Nagios Troubleshooting This section contains general troubleshooting information for Nagios application. NOTE: Nagios runs only nodes with the management_server or management_hub roles. See “Messages Reported by Nagios” for additional information. 21.2.1 Determining the Status of the Nagios Service Use the following command to determine if Nagios is running properly: # pdsh -a "service nagios status" Nagios ok: located 1 process, status log updated 22 seconds ago Gathering status for nrpe ...
21.2.3 Nagios Log Files The following log files provide information on Nagios operation: • • • Examine the /opt/hptc/nagios/nagios.log file for errors. Examine the /opt/hptc/nagios/status.log file for the system status. Examine the /var/log/messages file for Nagios errors on nodes running Nagios 21.2.4 Running Nagios Plug-Ins Manually The Nagios plug-ins are located in the /opt/hptc/nagios/libexec directory. You can invoke them from the command line if needed.
command' plug-in. nh [Environment - ASSUMEDOK] Pending services are normal, they indicate data has not yet been received by the Nagios engine. Service *may* be fine, but if it continues to pend for more then about 30 minutes it may indicate data is not being collected. n[8-15] nh [Load Average - ASSUMEDOK] Pending services are normal, they indicate data has not yet been received by the Nagios engine.
a Warning or Critical message, find the information for that service in the Status Information column and apply it to the specified Nagios host. NOTE: The following messages are based on the default values installed on your HP XC system. They will differ if the messages in the /opt/hptc/nagios/etc/misccommands.cfg file have been changed. Service: Apache HTTPS Server Status Information: HTTPS performance information Displays the status of the Apache HTTPS web server on the HP XC system.
Nagios performs a ping command on the interconnect at regular intervals. Typically, this entry provides the status information output from that command and the Interconnect's IP address. A warning or critical message indicates that the specified node or system interconnect failed to respond to the ping command in the allotted time. Determine if the node is powered on, enabled, and responsive.
Typically, this entry reports the number of new records processed in the /hptc_cluster/adm/logs/consolidated.log file. A warning or critical message occurs when there is insufficient time to process a huge volume of messages before the Nagios service_check_timeout period expires. Nagios examines the recent incoming consolidated log messages and issues a warning or critical message if the incoming rate since last interval exceeds a configured number of records.
2. Make sure that you are running an HP XC kernel. The HP XC kernels are identified by the presence of XC in the kernel name: # uname -a Linux n16 2.4.21-15.7hp.XCsmp #1 SMP date ... GNU/Linux 3. Make sure that your system has Myrinet boards installed: # lspci -v | grep Myrinet 05:0d.0 Network controller: MYRICOM Inc. Myrinet 2000 . . . Subsystem: MYRICOM Inc. Myrinet 2000 Scalable Cluster Interconnect 4.
# uname -a Linux n16 2.4.21-15.7hp.XCsmp #1 SMP date ... GNU/Linux 3. Make sure that your system has Quadrics boards installed. The system shown in the following example has both Elan3 and Elan4 boards; most systems have only one type. # lspci -v | grep Quad 60:01.0 Network controller: Quadrics Ltd QsNet Elan3 Network Adapter (rev 01) 80:01.0 Network controller: Quadrics Ltd QsNetII Elan4 Network Adapter (rev 01) 4. Make sure that all the Quadrics RPMs are installed: # rpm -q -a | grep qs . . . qsswm-2.
You can try to ping other nodes that are connected to the network. 8. 9. Make sure that the nodes are wired to the Quadrics switches in the correct order. See the HP XC Hardware Preparation Guide for additional information. You can find additional information about Quadrics in the /proc/qsnet directory. Use the find command to display it: # find /proc/qsnet -type f -print -exec cat {} \; 21.4.
3. 4. Enter 7 to exit. Make sure that you are running an HP XC kernel. The HP XC kernels are identified by the presence of XC in the kernel name: # uname -a Linux n3 2.4.21-15.7hp.XCsmp #1 SMP date ... GNU/Linux Make sure that your system has InfiniBand boards installed: # lspci -v | grep Infini 04:00.0 InfiniBand: Mellanox Technology MT23108 InfiniHost (rev a1) Subsystem: Mellanox Technology MT23108 InfiniHost 5. Make sure that the InfiniBand RPM is installed: # rpm -q -a | grep ibhost ibhost-biz-3.0.
You can try to ping other nodes that are connected to the network. 8. You can find additional information about InfiniBand in the /proc/voltaire directory. Use the find command to display it: # find /proc/voltaire -type f -print -exec cat {} \; 21.4.4 OFED Troubleshooting Procedures Starting with Version 3.2, the HP XC System Software uses the OpenFabrics Enterprise Distribution (OFED) InfiniBand software stack.
IMPORTANT: The fw_ver parameter indicates the firmware version. The InfiniBand board firmware should be the latest version available with your software release, and must be at least as recent as the minimum firmware versions listed in the HP XC master firmware list: http://www.docs.hp.com/en/linuxhpc.html When examining the ibv_devinfo command output, you should see a PORT_ACTIVE state indication for at least one port of the InfiniBand board.
libibverbs-utils-1.0.4-0 libipathverbs-1.0-0 libipathverbs-devel-1.0-0 libmthca-1.0.3-0 libmthca-devel-1.0.3-0 libopensm-2.0.0-0 libopensm-devel-2.0.0-0 libosmcomp-2.0.0-0 libosmcomp-devel-2.0.0-0 libosmvendor-2.0.0-0 libosmvendor-devel-2.0.0-0 librdmacm-0.9.0-0 librdmacm-devel-0.9.0-0 librdmacm-utils-0.9.0-0 mpitests_openmpi_gcc-2.0-0 mstflint-1.0-0 ofed-docs-1.1-0.noarch.rpm ofed-scripts-1.1-0.noarch.rpm openib-diags-1.1.0-0 openmpi_gcc-1.1.1-1 perftest-1.0-0 tvflash-0.9.0-0 ib-enhanced-services-0.9.0-1.
8. Find additional information about InfiniBand in the /sys/class/infiniband* directory. Use the find command to locate the information: # find /sys/class/infiniband* -type f -print -exec cat {} \; 9. Consult the documentation for the available OFED commands located in the /usr/local/ofed directory tree; there are manpages for the commands and other online OFED documentation in the /usr/local/ofed/docs and /usr/local/ofed/src/openib-1.1/Documentation/infiniband directories.
Running /sbin/service nagios restart on the non-headnode in the availability set causes the nagios master to fail over. 21.5.4 Network Restart Command Negatively Affects Serviceguard If a node is actively participating in a Serviceguard cluster, the Serviceguard tool manages some HP XC services and their aliases. Because Serviceguard handles relocating these aliases after a node dies, there are no network scripts defined for the aliases.
slurm.conf The SLURM configuration file, /hptc_cluster/slurm/etc/slurm.conf.
Healthy node is down The most common reason for SLURM to list an apparently healthy node down is that a specified resource has dropped below the level defined for the node in the /hptc_cluster/slurm/etc/slurm.conf file. For example, if the temporary disk space specification is TmpDisk=4096, but the available temporary disk space falls below 4 GB on the system, SLURM marks it as down.
• • • • • Ensure that the lsf partition is configured correctly. Verify that the system licensing is operational. Use the lmstat -a command. Ensure that munge is running on all compute nodes. If you are experiencing LSF communication problems, examine for potential firewall issues.
4. 5. 6. 7. 8. The RUN_WINDOW for the night queue ends but Job #75 did not complete. Job #75 is suspended. Job #76 is scheduled on a higher priority queue named main but is suspended. The RUN_WINDOW for queue night opens again according to the queue definition. Job #75 resumes on the night queue. Job #76 run on the main queue. A work around is to ensure that jobs end when the RUN_WINDOW for the queue ends. Use the LSF RUNLIMIT or TERMINATE_WHEN setting in the lsb.queues file to do so.
22 Servicing the HP XC System This section describes procedures for servicing the HP XC system. For more information, see the service guide for your cluster platform. This chapter addresses the following topics: • • • • • “Adding a Node” (page 267) “Replacing a Client Node” (page 269) “Replacing a System Interconnect Board in an HP CP6000 System” (page 272) “Software RAID Disk Replacement” (page 272) “Incorporating External Network Interface Cards” (page 275) 22.
8. If your system is configured for improved availability, enter the transfer_from_avail command: # transfer_from_avail 9. Run the cluster_config utility to configure the nodes and set the imaging environment: # ./cluster_config The cluster_config utility prompts you for key information, as shown here: a. The following menu is displayed: [L]ist Nodes, [M]odify Nodes, [H]elp, [P]roceed, [Q]uit: Modify the default role assignments depending upon your system requirements.
22.2 Replacing a Client Node The following procedure describes how to replace a faulty client node in an HP XC system. The example commands in the procedure use node n3. CAUTION: Do not use this procedure to replace the head node. CAUTION: The replacement node must have the identical (exact) hardware configuration to the node being replaced; the following characteristics must be identical: • • • Number of processors Memory size Number of ports 1.
Notes: The -oldmp option is also required for CP6000 systems because their management processors (MPs) have statically-set IP addresses and are not configured to use DHCP.
5. 6. Set the Onboard Administrator password by following the procedure in the HP XC Hardware Preparation Guide. Follow these steps to find the MAC address of the new Onboard Administrator. a. Connect a terminal device to the port of the Onboard Administrator. b. Log in to the Onboard Administrator using the administrator password you set in Step 5. c.
22.4 Replacing a System Interconnect Board in an HP CP6000 System Use the following procedure to replace a Myrinet system interconnect board, InfiniBand system interconnect board, or a Quadrics system interconnect board in an CP6000 system. The example commands in the procedure use node n3. Caution: The replacement system interconnect board must be the same as the system interconnect board to be replaced. 1. 2. Log in as superuser on the head node.
1. Examine the array.
7. 8. Partition the new disk. Add the new partitions back to their arrays: # mdadm /dev/md1 -a /dev/sdb1 # mdadm /dev/md2 -a /dev/sdb2 # mdadm /dev/md3 -a /dev/sdb3 The new partition begins synchronizing with the existing corresponding partition automatically. 9. Use the following two commands to update the mdadm configuration file, /etc/mdadm.conf, which the mdadm command uses to manage the RAID arrays.
1. Use the systemconfigurator command as follows: # /usr/bin/systemconfigurator -runboot -stdin <
• • • “Reconfiguring the Nodes” (page 287) “Verifying Success” (page 287) “Updating the Golden Image” (page 288) 22.6.1 Gathering Information You need to gather information on the nodes, the NIC, and the network to incorporate an external NIC. This section discusses how to acquire that information and provides a worksheet you can use to note the settings for your system.
Complete the corresponding portions of Table 22-1 (page 279) with the information from this section. 22.6.1.2 Determining NIC-Specific Information For most model types, you need to know the following Ethernet interface data for the NIC: • the PCI bus ID NOTE: • • The PCI bus ID does not apply to the model type rx8620 server.
description: Ethernet interface product: NetXtreme BCM5721 Gigabit Ethernet PCI Express vendor: Broadcom Corporation physical id: 0 bus info: pci@03:00.0 logical name: eth1 version: 11 serial: 00:00:00:00:00:01 description: Ethernet interface product: NetXtreme BCM5703 Gigabit Ethernet vendor: Broadcom Corporation physical id: 1 bus info: pci@05:01.0 logical name: eth2 version: 10 serial: 00:00:00:00:00:02 ... The dd variable was used in this example to denote the last portion of the MAC address.
This indicates that the disconnected Ethernet device is eth0. 4. Repeat steps 2 and 3 for the remaining Ethernet devices. Complete the corresponding portions of Table 22-1 (page 279) with the information from this section. 22.6.1.
!Ethernet EOT expression Admin_value Interconnect_value External_value Depending whether or not Gigabit Ethernet is used for the interconnect switch and depending on the number of Ethernet ports that are detected, the values in the table are assigned to the Administration Port, the Interconnect port, and the External port. These values can be stated as Ethernet device names, PCI bus IDs, and the literal strings undef and offboard.
Table 22-2 Modelmap Values (continued) Column Values AdminPort The port specified in this column is used to connect the node to the administration network of the HP XC system. Valid values for this column are: Bus_ID1 Indicates the hardware PCI bus ID, for example, 20:02.0; this is the most reliable method for designating a physical Ethernet port. Indicates an Ethernet device, starting with eth0. ethn Interconnect The port specified in this column is used for the system interconnect.
7. Change the text in the External column accordingly: • If the entry in the External column is offboard, change that text to the PCI bus ID of the first added NIC. Before: After: offboard 06:01.0 If there is more than one NIC, add a comma character (,) then enter the PCI bus ID of the next added NIC. For example: Before: After: offboard 06:01.0,06:01.1 Repeat as necessary. IMPORTANT: The list to specify multiple PCI bus IDs must be a comma-separated list without space characters.
IMPORTANT: Be sure that the output represents the results you want before proceeding to the next task, Otherwise, repeat the procedure in this section until the output accurately represents the results you want. 22.6.3 Using the device_config Command The device_config command allows you to update the command and management database (CMDB) to incorporate one or more new external Ethernet ports, external1, external2, and so on. The cluster_config utility configures the first external Ethernet port, external.
IMPORTANT: configure. You need to repeat this procedure for each external Ethernet port that you 1. Review the information gathered from Table 22-1 (page 279). You will need the following information for the device_config command: • Node name • External device/port • IP address • External host name • Netmask • Gateway • MAC address 2. Enter the device_config command with the --dryrun option to perform a practice run.
NOTE: If you are using IPv6, you need to configure the /etc/sysconfig/ip6tables.proto file. The method for doing so is analogous to configuring the iptables.proto file. If a service is not aware of the external physical Ethernet port, it will not be able to communicate through its corresponding virtual ports unless you custom configure the firewall. As shipped, the firewall prototype file, /etc/sysconfig/iptables.
1 This line opens virtual port 443 for TCP on the first added physical external Ethernet port, External1, on all nodes in the HP XC system. The text -i External1 matches all nodes, so virtual port 443 will be open on all nodes with External1 connections. 3 This line opens the ftp virtual ports (20 and 21) on the first added physical external Ethernet port, External1, on node n19.
ipaddr: ipv6addr: mtu: name: netmask: Interconnect: device: gateway: hwaddr: iftype: ifusage: interface_number: ipaddr: ipv6addr: mtu: name: netmask: 192.0.2.2 station2.example.com 255.255.248.0 ipoib0 Infiniband Interconnect 172.22.0.16 ic-n19 255.255.0.0 . . . 22.6.6 Reconfiguring the Nodes After editing the platform_vars.ini file and updating the database, you need to reconfigure the nodes with the new NICs. Use the following procedure: 1. 2. Log in as superuser (root) on the head node.
22.6.7.1 Verifying the Ethernet Port Use the ifconfig command to verify that the Ethernet port for the NIC that you incorporated into the HP XC system is functioning correctly: # ifconfig eth2 eth2 Link encap:Ethernet HWaddr 00:00:00:00:00:02 inet addr:192.0.2.2 Bcast:192.0.2.100 Mask:255.255.248.
A Installing LSF-HPC with SLURM into an Existing Standard LSF Cluster This appendix describes how to join an HP XC system running LSF-HPC with SLURM (integrated with the SLURM resource manager) to an existing standard LSF Cluster without destroying existing LSF-HPC with SLURM configuration. After installation, the HP XC system is treated as one host in the overall LSF cluster, that is, it becomes a cluster within the LSF cluster.
• • • You should be familiar with the LSF installation documentation and the README file provided in the LSF installation tar file. You should also be familiar with the normal procedures in adding a node to an existing LSF cluster, such as: — Establishing default communications (rhosts or ssh keys) — Setting up shared directories — Adding common users You should also have read Chapter 16: Managing LSF (page 189) in this document. A.
1. 2. Log in as superuser (root) on the head node of the HP XC system. Make sure any current LSF application on the HP XC system is shut down and won't interfere. If LSF-HPC with SLURM is currently installed and running on the HP XC system, shut it down with the controllsf command: # controllsf stop 3. Consider removing this installation to avoid confusion: # /opt/hptc/etc/gconfig.d/C55lsf gunconfigure removing /opt/hptc/lsf/top/conf... removing /opt/hptc/lsf/top/6.2... removing /opt/hptc/lsf/top/work...
# shownode roles --role resource_management external resource_management: xc[127-128] external: xc[125-128] If this command is not available, examine the role assignments by running the cluster_config command and viewing the node configurations. Be sure to quit after you determine the configuration of the nodes. Do not "proceed" with reconfiguring the HP XC system with any changes at this point. There will be another opportunity to reconfigure the system with cluster_config utility later.
7. Modify the Head Node. These steps modify the head node and propagate those changes to the rest of the HP XC system. The recommended method is to use the updateimage and the updateclient commands as documented in Chapter 11: Distributing Software Throughout the System (page 139). Make the modifications first, then propagate the following changes: a. Lower the firewall on the HP XC external network. LSF daemons communicate through pre-configured ports in the lsf.
fi esac # cat lsf.csh if ( "${path}" !~ *-slurm/etc* ) then if ( -f /opt/hptc/lsf/top/conf/cshrc.lsf ) then source /opt/hptc/lsf/top/conf/cshrc.lsf endif endif The goal of these custom files is to source (only once) the appropriate LSF environment file: $LSF_ENVDIR/cshrc.lsf for csh users, and $LSF_ENVDIR/profile.lsf for users of sh, bash, and other shells based on sh. Create /etc/profile.d/lsf.sh and /etc/profile.d/lsf.csh on the HP XC system to set up the LSF environment on HP XC.
d. The HP XC controllsf command can double as the Red Hat /etc/init.d/ service script for starting LSF-HPC with SLURM when booting the HP XC system and stopping LSF-HPC with SLURM when shutting it down. When starting LSF-HPC with SLURM, the controllsf command establishes the LSF alias and starts the LSF daemons.
2. Copy the LSF-HPC with SLURM tar files to a temporary location on the node that hosts LSF_TOP. In the sample case, this node is plain. Unpack the installation scripts on completion: [root@plain lsf]# mkdir hpctmp [root@plain lsf]# scp root@xc-head:/opt/hptc/lsf/files/hpc* hpctmp/ root@xc-head's password: hpc6.2_lsfinstall.ta 100% |****************************| 237 KB hpc6.2_linux2.4-glib 100% |****************************| 37039 KB [root@plain lsf]# cd hpctmp/ [root@plain hpctmp]# tar zxf hpc6.
4. Start the LSF installation process: # ./lsfinstall -f install.config Logging installation sequence in /shared/lsf/hpctmp/hpc6.2_lsfinstall/Install.log LSF pre-installation check ... Checking the LSF TOP directory /shared/lsf ... ... Done checking the LSF TOP directory /shared/lsf ... LSF license is defined in "/shared/lsf/conf/lsf.conf", LSF_LICENSE is ignored ... Checking LSF Administrators ...
Enabling LSB_SHORT_HOSTLIST in /shared/lsf/conf/lsf.conf ... Enabling schmod_slurm in /shared/lsf/conf/lsbatch/corplsf/configdir/lsb.modules ... Setting JOB_ACCEPT_INTERVAL = 0 to /shared/lsf/conf/lsbatch/corplsf/configdir/lsb.params ... Setting MXJ to ! in /shared/lsf/conf/lsbatch/corplsf/configdir/lsb.hosts ... Adding server hosts ... Host(s) "xclsf" has (have) been added to the cluster "corplsf". ... LSF configuration is done. Creating lsf_getting_started.html ... ... Done creating lsf_getting_started.
3. Edit the LSF_TOP/conf/lsf.cluster.clustername file using the text editor of your choice: a. In the Host section, find the HP XC "node" and add slurm in the RESOURCES column. For our example the new entry resembles the following: Begin Host HOSTNAME model ... xclsf ! End Host b. type ! server r1m 1 mem 3.5 () swp RESOURCES () #Keywords (slurm) In the Parameters section set up the floating client address range (FLOAT_CLIENTS_ADDR_RANGE) using the nodeBase entry from Step 1.
The next step is to configure the LSF alias on the HP XC system. An alias is used on the HP XC system to prevent hard-wiring LSF to any one node, so that the LSF node in HP XC can fail over to another node if the current node becomes compromised (hung or crashed). HP XC provides infrastructure to monitor the LSF node and fail over the LSF daemons to another node if necessary. The selected IP and host name must not be in use but must be known on the external network.
A.9 Sample Running Jobs Example A-1 Running Jobs as a User on an External Node Launching to a Linux Itanium Resource $ bsub -I -n1 -R type=LINUX86 hostname Job <411> is submitted to default queue . <> <> plain Example A-2 Running Jobs as a User on an External Node Launching to an HP XC Resource $ bsub -I -n1 -R type=SLINUX64 hostname Job <412> is submitted to default queue . <
A.10 Troubleshooting • • • • • • Use the following commands to verify your configuration changes: — iptables -L and other options to confirm the firewall settings — pdsh -a 'ls -l /etc/init.d/lsf' to confirm startup script — pdsh -a 'ls -ld /shared/lsf/' (using our running example) to confirm that the LSF tree was properly mounted — pdsh -a 'ls -l /etc/profile.d/lsf.
B Installing Standard LSF on a Subset of Nodes This document provides instructions for installing standard LSF on a subset of nodes in the HP XC system; another subset of nodes runs LSF-HPC with SLURM. This situation is useful for an HP XC system that is comprised of two different types of nodes, for example, a set of large SMP nodes (“fat” nodes) running LSF-HPC with SLURM and a set of “thin” nodes running Standard LSF, as Figure B-1 shows.
B.2 Assumptions • • • LSF-HPC with SLURM for SLURM was installed by the cluster_config process using default values You have a proper Platform LSF license There is no need to communicate with an external LSF cluster (this can be done, but involves additional procedures to prepare the external network connections). B.3 Sample Case Consider an HP XC system of 128 nodes consisting of: • A head node with a host name of xc128 • 6 large SMP nodes (or fat nodes) with the host names xc[1-6] • 122 thin nodes.
# pdsh -a -x xc[1-6] "touch /var/lsf/lsfslurm" • Change the file name in the setup files by executing the following sed commands: # sed -e "s?/etc/hptc-release?/var/slurm/lsfslurm?g" \ < profile.lsf.notxc > profile.tmp # sed -e "s?/etc/hptc-release?/var/slurm/lsfslurm?g" \ < cshrc.lsf.notxc > cshrc.tmp • Verify that only the filename changed: # diff profile.tmp profile.lsf.
c. Set the file permissions: # chmod 555 /opt/hptc/lsf/etc/slsf d. Create the appropriate soft link to the file: # ln -s /opt/hptc/lsf/etc/slsf /etc/init.d/slsf e. Enable the file: # chkconfig --add slsf # chkconfig --list slsf slsf0:off 1:off 2:off f. 4:on 5:on 6:off Edit the /opt/hptc/systemimager/etc/chkconfig.map file to add the following line to enable this new "service" on all nodes in the HP XC system: slsf 8.
lsfhost.loc xc1 xc2 xc3 xc4 xc5 xc6 SLINUX6 LINUX64 LINUX64 LINUX64 LINUX64 LINUX64 LINUX64 Itanium2 Itanium2 Itanium2 Itanium2 Itanium2 Itanium2 Itanium2 60.0 60.0 60.0 60.0 60.0 60.0 60.0 228 8 8 8 8 8 8 1973M 3456M 3456M 3456M 3456M 3456M 3456M 6143M 6143M 6143M 6143M 6143M 6143M Yes Yes Yes Yes Yes Yes Yes (slurm) () () () () () () 12.
C Setting Up MPICH MPICH, as described on its web address, http://www-unix.mcs.anl.gov/mpi/mpich1/, is a freely available portable implementation of MPI. This appendix provides the information you need to set up MPICH on an HP XC system.
NOTE: 6. Be sure to specify the directory with the -prefix= option. Build MPICH with the make command. # make NOTE: Building MPICH may take longer than 2 hours. C.3 Running the MPICH Self-Tests Optionally, you can run the MPICH self-tests with the following command: % make testing Two Fortran tests are expected to fail because they are not 64-bit clean. Tests that use ADIOI_Set_lock() fail on some platforms as well, for unknown reasons. C.
D HP MCS Monitoring You can monitor the optional HP Modular Cooling System (MCS) by using the Nagios interface. During HP XC system installation, you generated an initialization file, /opt/hptc/config/mcs.ini, which specifies the names and IP addresses of the MCS devices. This file is used in the creation of the /opt/hptc/nagios/etc/mcs_local.cfg file, which Nagios uses to monitor the MCS devices.
a. Issue the following command: # /opt/hptc/config/sbin/mcs_config 6. 7. 8. b. Restart Nagios. For more information, see “Stopping and Restarting Nagios” (page 115). Examine the /opt/hptc/config/mcs_advExpected.static.db file to ensure that the values for the MCS advanced setting are appropriate for your site. Restart Nagios if you changed this file. For more information, see “Stopping and Restarting Nagios” (page 115).
D.4 MCS Log Files The following log files contain MCS-related data collected by the check_mcs_trends plug-in: • /opt/hptc/nagios/var/mcs_trends.staticdb This log file tracks the following: — tempWaterIn — waterFlow — lastStatus — lastCheck • /opt/hptc/nagios/var/env_logs/mcs_trends.log This log file tracks the following: — Hex1InTemp — Hex1OutTemp — Hex2InTemp — Hex2OutTemp — Hex3InTemp — Hex3OutTemp — waterInTemp — waterOutTemp — waterFlowRate — waterFlowRetries In this file, the data is stored in a row.
Figure D-1 MCS Hosts in Nagios Service Details Window 314 HP MCS Monitoring
Glossary A administration branch The half (branch) of the administration network that contains all of the general-purpose administration ports to the nodes of the HP XC system. administration network The private network within the HP XC system that is used for administrative operations. availability set An association of two individual nodes so that one node acts as the first server and the other node acts as the second server of a service. See also improved availability, availability tool.
operating system and its loader. Together, these provide a standard environment for booting an operating system and running preboot applications. enclosure The hardware and software infrastructure that houses HP BladeSystem servers. extensible firmware interface See EFI. external network node A node that is connected to a network external to the HP XC system. F fairshare An LSF job-scheduling policy that specifies how resources should be shared by competing users.
image server A node specifically designated to hold images that will be distributed to one or more client systems. In a standard HP XC installation, the head node acts as the image server and golden client. improved availability A service availability infrastructure that is built into the HP XC system software to enable an availability tool to fail over a subset of eligible services to nodes that have been designated as a second server of the service See also availability set, availability tool.
LVS Linux Virtual Server. Provides a centralized login capability for system users. LVS handles incoming login requests and directs them to a node with a login role. M Management Processor See MP. master host See LSF master host. MCS An optional integrated system that uses chilled water technology to triple the standard cooling capacity of a single rack. This system helps take the heat out of high-density deployments of servers and blades, enabling greater densities in data centers.
onboard administrator See OA. P parallel application An application that uses a distributed programming model and can run on multiple processors. An HP XC MPI application is a parallel application. That is, all interprocessor communication within an HP XC parallel application is performed through calls to the MPI message passing library. PXE Preboot Execution Environment.
an HP XC system, the use of SMP technology increases the number of CPUs (amount of computational power) available per unit of space. ssh Secure Shell. A shell program for logging in to and executing commands on a remote computer. It can provide secure encrypted communications between two untrusted hosts over an insecure network. standard LSF A workload manager for any kind of batch job.
Index Symbols A adding a local user account, 159 adding a node, 267 adding a service, 76 administrative passwords, 162–167 archive.
D database cannot connect, 245 dbsysparams command, 34, 83, 132 deadman module, 136 deleting a local user account, 161 device_config command, 34, 283 dgemm utility, 45, 237 DHCP service, 29 diagnostic tools dgemm, 237 Gigabit Ethernet system interconnect, 244 gm_drain_test, 239 gm_prodmode_mon, 238 InfiniBand system interconnect, 243 Myrinet system interconnect, 238 ovp, 231 qsdiagadm, 240 qselantestp, 240 qsnet2_drain_test, 243 qsnet2_level_test, 241 Quadrics system interconnect, 239 swmlogger daemon, 239
HP BladeSystems information, 83 HP documentation providing feedback for, 26 HP Graph, 97–101 HP Serviceguard, 48–51 HP XC command set, 33 configuration file guidelines, 38 HP XC system booting, 53 file system hierarchy, 29 log files, 32 shutdown, 56 startup, 53 hpasm, 89 /hptc_cluster directory, 31, 60, 144, 262, 263 guidelines, 31 troubleshooting mount failure, 246 I I/O service, 28 image replication and distribution, 139 exclusion files, 149 image server services, 29 improved availability, 41, 47–52 avai
troubleshooting, 263–265 LSF-HPC with SLURM failover, 203, 204, 264 running jobs, 205 LSF-HPC with SLURM integration, 191 LSF-HPC with SLURM interplay, 211 LSF-HPC with SLURM jobs controlling, 198 monitoring, 198 lsf.
Network Address Translation (see NAT) network boot, 147, 148 Network File System (see NFS) Network Information Service (see NIS) Network Interface Cards (see NICs) Network Time Protocol (see NTP) NFS, 215 attribute caching, 246 mount options, 246 RPC services, 215 troubleshooting mount failure, 246 NICs incorporating external, 275–288 NIS, 42 synchronizing, 161 node, 27 adding, 267 disabling, 58 display services for, 62 distributing file images to, 139 enabling, 58 locating, 57 replacing, 269 services provi
scp command, 44 secure shell, 44 security, 43 security patches, 135 Self-Monitoring Analysis and Reporting Technology (see SMART) sendmail utility, 117 sensor thresholds changing, 117 server blades information, 83 service, 27, 59 adding, 76 central control daemon service, 28 compute service, 28 configuration and management database, 29 configuration files, 32, 40 customizing, 64 DHCP, 29 display all, 61 display node that provides a service, 62, 82 display services for node, 62 global maintenance, 150 global
superuser password changing, 162 swmlogger daemon, 239 sys_check utility, 35, 45, 231 syslog service, 60, 88, 92 syslog-ng configuration files, 40 syslog-ng rules files modifying, 93 templates, 93 syslog-ng service, 88 syslog-ng.
*A-XCADM-32u2* Printed in the US