HP XC System Software User's Guide Version 3.
© Copyright 2003, 2005, 2006, 2007 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. The information contained herein is subject to change without notice.
Table of Contents About This Document.......................................................................................................15 Intended Audience................................................................................................................................15 New and Changed Information in This Edition...................................................................................15 Typographic Conventions.....................................................................
2.3.1 Determining the LSF Cluster Name and the LSF Execution Host..........................................36 2.4 Getting System Help and Information............................................................................................36 3 Configuring Your Environment with Modulefiles.......................................................37 3.1 Overview of Modules......................................................................................................................37 3.
.2 Submitting a Serial Job Using LSF-HPC.........................................................................................53 5.2.1 Submitting a Serial Job with the LSF bsub Command............................................................53 5.2.2 Submitting a Serial Job Through SLURM Only......................................................................54 5.3 Submitting a Parallel Job...............................................................................................................
9.3.3 Using the srun Command with LSF-HPC...............................................................................92 9.4 Monitoring Jobs with the squeue Command..................................................................................92 9.5 Terminating Jobs with the scancel Command.................................................................................93 9.6 Getting System Information with the sinfo Command...................................................................93 9.
A.2 Launching a Serial Interactive Shell Through LSF-HPC..............................................................125 A.3 Running LSF-HPC Jobs with a SLURM Allocation Request........................................................126 A.3.1 Example 1. Two Cores on Any Two Nodes..........................................................................126 A.3.2 Example 2. Four Cores on Two Specific Nodes....................................................................127 A.
List of Figures 4-1 4-2 7-1 7-2 7-3 7-4 7-5 10-1 11-1 Library Directory Structure...........................................................................................................51 Recommended Library Directory Structure..................................................................................51 The xcxclus Utility Display...........................................................................................................74 The xcxclus Utility Display Icon...........................
List of Tables 1-1 1-2 3-1 4-1 5-1 10-1 10-2 10-3 Determining the Node Platform...................................................................................................24 HP XC System Interconnects.........................................................................................................26 Supplied Modulefiles....................................................................................................................38 Compiler Commands........................................
List of Examples 5-1 5-2 5-3 5-4 5-5 5-6 5-7 5-8 5-9 5-10 5-11 5-12 5-13 5-14 5-15 5-16 5-17 5-18 8-1 8-2 9-1 9-2 9-3 9-4 9-5 9-6 9-7 9-8 10-1 10-2 10-3 10-4 10-5 10-6 10-7 10-8 10-9 10-10 Submitting a Job from the Standard Input....................................................................................54 Submitting a Serial Job Using LSF-HPC .......................................................................................54 Submitting an Interactive Serial Job Using LSF-HPC only............
About This Document This document provides information about using the features and functions of the HP XC System Software. It describes how the HP XC user and programming environments differ from standard Linux® system environments.
Command Computer output Ctrl+x ENVIRONMENT VARIABLE [ERROR NAME] Key Term User input Variable [] {} ... | WARNING CAUTION IMPORTANT NOTE A command name or qualified command phrase. Text displayed by the computer. A key sequence. A sequence such as Ctrl+x indicates that you must hold down the key labeled Ctrl while you press another key or mouse button. The name of an environment variable, for example, PATH. The name of an error, usually returned in the errno variable. The name of a keyboard key.
HP XC System Software User's Guide Provides an overview of managing the HP XC user environment with modules, managing jobs with LSF, and describes how to build, run, debug, and troubleshoot serial and parallel applications on an HP XC system.
software components are generic, and the HP XC adjective is not added to any reference to a third-party or open source command or product name. For example, the SLURM srun command is simply referred to as the srun command. The location of each Web site or link to a particular topic listed in this section is subject to change without notice by the site provider. • http://www.platform.com Home page for Platform Computing Corporation, the developer of the Load Sharing Facility (LSF).
• http://www.balabit.com/products/syslog_ng/ Home page for syslog-ng, a logging tool that replaces the traditional syslog functionality. The syslog-ng tool is a flexible and scalable audit trail processing tool. It provides a centralized, securely stored log of all devices on the network. • http://systemimager.org Home page for SystemImager®, which is the underlying technology that distributes the golden image to all nodes and distributes configuration changes throughout the system.
MPI Web Sites • http://www.mpi-forum.org Contains the official MPI standards documents, errata, and archives of the MPI Forum. The MPI Forum is an open group with representatives from many organizations that define and maintain the MPI standard. • http://www-unix.mcs.anl.gov/mpi/ A comprehensive site containing general information, such as the specification and FAQs, and pointers to other resources, including tutorials, implementations, and other MPI-related sites. Compiler Web Sites • http://www.
Manpages for third-party software components might be provided as a part of the deliverables for that component. Using discover(8) as an example, you can use either one of the following commands to display a manpage: $ man discover $ man 8 discover If you are not sure about a command you need to use, enter the man command with the -k option to obtain a list of commands that are related to a keyword. For example: $ man -k keyword HP Encourages Your Comments HP encourages comments concerning this document.
1 Overview of the User Environment The HP XC system is a collection of computer nodes, networks, storage, and software, built into a cluster, that work together. It is designed to maximize workload and I/O performance, and to provide the efficient management of large, complex, and dynamic workloads.
$ head /proc/cpuinfo Table 1-1 presents the representative output for each of the platforms. This output may differ according to changes in models and so on.
compute role distributes login requests from users. A node with the login role is referred to as a login node in this manual. The compute role is assigned to nodes where jobs are to be distributed and run. Although all nodes in the HP XC system are capable of carrying out computations, the nodes with the compute role are the primary nodes used to run jobs. Nodes with the compute role become a part of the resource pool used by LSF-HPC and SLURM, which manage and distribute the job workload.
the HP XC. So, for example, if the HP XC system interconnect is based on a Quadrics® QsNet II® switch, then the SFS will serve files over ports on that switch. The file operations are able to proceed at the full bandwidth of the HP XC system interconnect because these operations are implemented directly over the low-level communications libraries.
Additional information on supported system interconnects is provided in the HP XC Hardware Preparation Guide. 1.1.8 Network Address Translation (NAT) The HP XC system uses Network Address Translation (NAT) to enable nodes in the HP XC system that do not have direct external network connections to open outbound network connections to external network resources. 1.
Modulefiles can be loaded into the your environment automatically when you log in to the system, or any time you need to alter the environment. The HP XC system does not preload modulefiles. See Chapter 3 “Configuring Your Environment with Modulefiles” for more information. 1.3.3 Commands The HP XC user environment includes standard Linux commands, LSF commands, SLURM commands, HP-MPI commands, and modules commands. This section provides a brief overview of these command sets.
1.4.2 Serial Applications You can build and run serial applications under the HP XC development environment. A serial application is a command or application that does not use any form of parallelism. Full details and examples of how to build, run, debug, and troubleshoot serial applications are provided in “Building Serial Applications”. 1.5 Run-Time Environment This section describes LSF-HPC, SLURM, and HP-MPI, and how these components work together to provide the HP XC run-time environment.
1.5.3 Standard LSF Standard LSF is also available on the HP XC system. The information for using standard LSF is documented in the LSF manuals from Platform Computing. For your convenience, the HP XC documentation CD contains these manuals. 1.5.4 How LSF-HPC and SLURM Interact In the HP XC environment, LSF-HPC cooperates with SLURM to combine the powerful scheduling functionality of LSF-HPC with the scalable parallel job launching capabilities of SLURM.
— however, it manages the global MPI exchange so that all processes can communicate with each other. See the HP-MPI documentation for more information. 1.6 Components, Tools, Compilers, Libraries, and Debuggers This section provides a brief overview of the some of the common tools, compilers, libraries, and debuggers available for use on HP XC. An HP XC system is integrated with several open source software components.
2 Using the System This chapter describes the tasks and commands that the general user must know to use the system. It addresses the following topics: • • • • “Logging In to the System” (page 33) “Overview of Launching and Managing Jobs” (page 33) “Performing Other Common User Tasks” (page 35) “Getting System Help and Information” (page 36) 2.1 Logging In to the System Logging in to an HP XC system is similar to logging in to any standard Linux system.
overview about some basic ways of running and managing jobs. Full information and details about the HP XC job launch environment are provided in “Using SLURM”) and the LSF-HPC section of “Using LSF-HPC”) of this document. 2.2.1 Introduction As described in “Run-Time Environment” (page 29), SLURM and LSF-HPC cooperate to run and manage jobs on the HP XC system, combining LSF-HPC's powerful and flexible scheduling functionality with SLURM's scalable parallel job-launching capabilities.
For more information about using this command and a sample of its output, see “Examining System Core Status” (page 105) • The LSF lshosts command displays machine-specific information for the LSF execution host node. $ lshosts For more information about using this command and a sample of its output, see “Getting Information About the LSF Execution Host Node” (page 105) . • The LSF lsload command displays load information for the LSF execution host node.
2.3.1 Determining the LSF Cluster Name and the LSF Execution Host The lsid command returns the LSF cluster name, the LSF-HPC version, and the name of the LSF execution host: $ lsid Platform LSF HPC version number for SLURM, date and time stamp Copyright 1992-2005 Platform Computing Corporation My cluster name is hptclsf My master name is lsfhost.localdomain In this example, hptclsf is the LSF cluster name, and lsfhost.
3 Configuring Your Environment with Modulefiles The HP XC system supports the use of Modules software to make it easier to configure and modify the your environment. Modules software enables dynamic modification of your environment by the use of modulefiles.
access the mpi** scripts and libraries. You can specify the compiler it uses through a variety of mechanisms long after the modulefile is loaded. The previous scenarios were chosen in particular because the HP-MPI mpicc command uses heuristics to try to find a suitable compiler when MPI_CC or other default-overriding mechanisms are not in effect. It is possible that mpicc will choose a compiler inconsistent with the most recently loaded compiler module.
Table 3-1 Supplied Modulefiles (continued) Modulefile Sets the HP XC User Environment to Use: icc/8.1/default Intel C/C++ Version 8.1 compilers. icc/9.0/default Intel C/C++ Version 9.0 compilers. icc/9.1/default Intel C/C++ Version 9.1 compilers. idb/7.3/default Intel IDB debugger. idb/9.0/default Intel IDB debugger. idb/9.1/default Intel IDB debugger. ifort/8.0/default Intel Fortran Version 8.0 compilers. ifort/8.1/default Intel Fortran Version 8.1 compilers. ifort/9.
3.3 Modulefiles Automatically Loaded on the System The HP XC system does not load any modulefiles into your environment by default. However, there may be modulefiles designated by your system administrator that are automatically loaded. “Viewing Loaded Modulefiles” describes how you can determine what modulefiles are currently loaded on your system. You can also automatically load your own modules by creating a login script and designating the modulefiles to be loaded in the script.
For example, if you wanted to automatically load the TotalView modulefile when you log in, edit your shell startup script to include the following instructions. This example uses bash as the login shell. Edit the ~/.bashrc file as follows: # if the 'module' command is defined, $MODULESHOME # will be set if [ -n "$MODULESHOME" ]; then module load totalview fi From now on, whenever you log in, the TotalView modulefile is automatically loaded in your environment. 3.
In this example, a user attempted to load the ifort/8.0 modulefile. After the user issued the command to load the modulefile, an error message occurred, indicating a conflict between this modulefile and the ifort/8.1 modulefile, which is already loaded. When a modulefile conflict occurs, unload the conflicting modulefile before loading the new modulefile. In the previous example, you should unload the ifort/8.0 modulefile before loading the ifort/8.1 modulefile.
4 Developing Applications This chapter discusses topics associated with developing applications in the HP XC environment. Before reading this chapter, you should you read and understand Chapter 1 “Overview of the User Environment” and Chapter 2 “Using the System”.
HP UPC is a parallel extension of the C programming language, which runs on both common types of multiprocessor systems: those with a common global address space (such as SMP) and those with distributed memory. UPC provides a simple shared memory model for parallel programming, allowing data to be shared or distributed among a number of communicating processors.
4.3 Examining Nodes and Partitions Before Running Jobs Before launching an application, you can determine the availability and status of the system's nodes and partitions. Node and partition information is useful to have before launching a job so that you can launch the job to properly match the resources that are available on the system.
4.6.1 Serial Application Build Environment You can build and run serial applications in the HP XC programming environment. A serial application is a command or application that does not use any form of parallelism. An example of a serial application is a standard Linux command, such as the ls or hostname command. A serial application is basically a single-core application that has no communication library calls such as MPI. 4.6.
4.7.1.1 Modulefiles The basics of your working environment are set up automatically by your system administrator during the installation of HP XC. However, your application development environment can be modified by means of modulefiles, as described in “Overview of Modules”. There are modulefiles available that you can load yourself to further tailor your environment to your specific application development requirements. For example, the TotalView module is available for debugging applications.
To compile programs that use SHMEM, it is necessary to include the shmem.h file and to use the SHMEM and Elan libraries. For example: $ gcc -o shping shping.c -lshmem -lelan 4.7.1.6 MPI Library The MPI library supports MPI 1.2 as described in the 1997 release of MPI: A Message Passing Interface Standard. Users should note that the MPI specification describes the application programming interface, but does not specify the contents of the MPI header files, mpi.h and mpif.
4.7.1.12 MKL Library MKL is a math library that references pthreads, and in enabled environments, can use multiple threads. MKL can be linked in a single-threaded manner with your application by specifying the following in the link command: • On the CP3000 and CP4000 platforms (as appropriate): -L/opt/intel/mkl70/lib/32 -lmkl_ia32 -lguide -pthread -L/opt/intel/mkl70/lib/em64t -lmkl_em64t -lguide -pthread • On the CP6000 platforms: -L/opt/intel/mkl70/lib/64 -lmkl_ipf -lguide -pthread 4.7.1.
To compile and link a C application using the mpicc command: $ mpicc -o mycode hello.c To compile and link a Fortran application using the mpif90 command: $ mpif90 -o mycode hello.f In the above examples, the HP-MPI commands invoke compiler utilities which call the C and Fortran compilers with appropriate libraries and search paths specified to build the parallel application called hello. The -o specifies that the resulting program is called mycode. 4.
names. However, HP recommends an alternative method. The dynamic linker, during its attempt to load libraries, will suffix candidate directories with the machine type. The HP XC system on the CP4000 platform uses i686 for 32-bit binaries and x86_64 for 64-bit binaries. HP recommends structuring directories to reflect this behavior.
NOTE: 52 There is no shortcut as there is for the dynamic loader.
5 Submitting Jobs This chapter describes how to submit jobs on the HP XC system; it addresses the following topics: • • • • • • • “Overview of Job Submission” (page 53) “Submitting a Serial Job Using LSF-HPC” (page 53) “Submitting a Parallel Job” (page 55) “Submitting a Parallel Job That Uses the HP-MPI Message Passing Interface” (page 56) “Submitting a Batch Job or Job Script” (page 60) “Submitting a Job from a Host Other Than an HP XC Host” (page 65) “Running Preexecution Programs” (page 65) 5.
The srun command is only necessary to launch the job on the allocated node if the HP XC JOB STARTER script is not configured to run a job on the compute nodes in the lsf partition. The jobname parameter can be name of an executable or a batch script. If jobname is executable, job is launched on LSF execution host node. If jobname is batch script (containing srun commands), job is launched on LSF-HPC node allocation (compute nodes).
The following is the C source code for this program; the file name is hw_hostname.c. #include #include int main() { char name[100]; gethostname(name, sizeof(name)); printf("%s says Hello!\n", name); return 0; } The following is the command line used to compile this program: $ cc hw_hostname.
bsub -n num-procs [bsub-options] srun [srun-options] jobname [job-options] The bsub command submits the job to LSF-HPC. The -n num-procs parameter, which is required for parallel jobs, specifies the number of cores requested for the job. The num-procs parameter may be expressed as minprocs[,maxprocs] where minprocs specifies the minimum number of cores and the optional value maxprocs specifies the maximum number of cores. The SLURM srun command is required to run jobs on an LSF-HPC node allocation.
The srun command, used by the mpirun command to launch the MPI tasks in parallel in the lsf partition, determines the number of tasks to launch from the SLURM_NPROCS environment variable that was set by LSF-HPC; this environment variable is equivalent to the number provided by the -n option of the bsub command. Any additional SLURM srun options are job specific, not allocation-specific. The mpi-jobname is the executable file to be run.
With LSF-HPC integrated with SLURM, you can use the LSF-SLURM External Scheduler to specify SLURM options that specify the minimum number of nodes required for the job, specific nodes for the job, and so on. Note: The SLURM external scheduler is a plug-in developed by Platform Computing Corporation for LSF-HPC; it is not actually part of SLURM. This plug-in communicates with SLURM to gather resource information and request allocations of nodes, but it is integrated with the LSF-HPC scheduler.
Example 5-9 Using the External Scheduler to Submit a Job to Run on Specific Nodes $ bsub -n4 -ext "SLURM[nodelist=n6,n8]" -I srun hostname Job <70> is submitted to default queue . <> <> n6 n6 n8 n8 In the previous example, the job output shows that the job was launched from the LSF execution host lsfhost.localdomain, and it ran on four cores on the specified nodes, n6 and n8.
Example 5-13 Using the External Scheduler to Constrain Launching to Nodes with a Given Feature $ bsub -n 10 -ext "SLURM[constraint=dualcore]" -I srun hostname You can use the bqueues command to determine the SLURM scheduler options that apply to jobs submitted to a specific LSF-HPC queue, for example: $ bqueues -l dualcore | grep SLURM MANDATORY_EXTSCHED: SLURM[constraint=dualcore] 5.
Example 5-15 Submitting a Batch Script with the LSF-SLURM External Scheduler Option $ bsub -n4 -ext "SLURM[nodes=4]" -I ./myscript.sh Job <79> is submitted to default queue . <> <> n1 n2 n3 n4 Hello world! I'm 0 of 4 on n1 Hello world! I'm 1 of 4 on n2 Hello world! I'm 2 of 4 on n3 Hello world! I'm 3 of 4 on n4 Example 5-16 and Example 5-17 show how the jobs inside the script can be manipulated within the allocation.
Example 5-18 Environment Variables Available in a Batch Job Script $ cat ./envscript.sh #!/bin/sh name=`hostname` echo "hostname = $name" echo "LSB_HOSTS = '$LSB_HOSTS'" echo "LSB_MCPU_HOSTS = '$LSB_MCPU_HOSTS'" echo "SLURM_JOBID = $SLURM_JOBID" echo "SLURM_NPROCS = $SLURM_NPROCS" $ bsub -n4 -I ./envscript.sh Job <82> is submitted to default queue . <> <
The ping_pong_ring application is submitted twice in a Makefile named mymake; the first time as run1 and the second as run2: $ cat mymake PPR_ARGS=10000 NODES=2 TASKS=4 all: run1 run2 run1: mpirun -srun -N ${NODES} -n ${TASKS} ./ping_pong_ring ${PPR_ARGS} run2: mpirun -srun -N ${NODES} -n ${TASKS} ./ping_pong_ring ${PPR_ARGS} The following command line makes the program and executes it: $ bsub -o %J.out -n2 -ext "SLURM[nodes=2]" make -j2 -f .
1 This line attempts to submit a program that does not exist. The following command line makes the program and executes it: $ bsub -o %J.out -n2 -ext "SLURM[nodes=2]" make -j3 \ -f ./mymake PPR_ARGS=100000 Job <117> is submitted to default queue . The output file contains error messages related to the attempt to launch the nonexistent program. $ cat 117.out . . . mpirun -srun -N 2 -n 4 ./ping_pong_ring 100000 mpirun -srun -N 2 -n 4 ./ping_pong_ring 100000 mpirun -srun -N 2 -n 4 .
5.6 Submitting a Job from a Host Other Than an HP XC Host To submit a job from a host other than an HP XC host to the HP XC system, use the LSF -R option, and the HP XC host type SLINUX64 (defined in lsf.shared) in the job submission resource requirement string.
6 Debugging Applications This chapter describes how to debug serial and parallel applications in the HP XC development environment. In general, effective debugging of applications requires the applications to be compiled with debug symbols, typically the -g switch. Some compilers allow -g with optimization. This chapter addresses the following topics: • • “Debugging Serial Applications” (page 67) “Debugging Parallel Applications” (page 67) 6.
6.2.1 Debugging with TotalView TotalView is a full-featured, debugger based on GUI and specifically designed to fill the requirements of parallel applications running on many cores. You can purchase the TotalView debugger, from Etnus, Inc., for use on the HP XC cluster. TotalView is not included with the HP XC software and technical support is not provided by HP. Contact Etnus, Inc. for any issues with TotalView. This section provides only minimum instructions to get you started using TotalView.
6.2.1.3 Using TotalView with SLURM Use the following commands to allocate the nodes you need before you debug an application with SLURM, as shown here: $ srun -Nx -A $ mpirun -tv -srun application These commands allocate x nodes and run TotalView to debug the program named application. Be sure to exit from the SLURM allocation created with the srun command when you are done. 6.2.1.4 Using TotalView with LSF-HPC HP recommends the use of xterm when debugging an application with LSF-HPC.
6.2.1.6 Debugging an Application This section describes how to use TotalView to debug an application. 1. Compile the application to be debugged. For example: $ mpicc -g -o Psimple simple.c -lm Use the -g option to enable debugging information. 2. Run the application in TotalView: $ mpirun -tv -srun -n2 ./Psimple 3. The TotalView main control window, called the TotalView root window, opens. It displays the following message in the window header: Etnus TotalView Version# 4.
6.2.1.7 Debugging Running Applications As an alternative to the method described in “Debugging an Application”, it is also possible to "attach" an instance of TotalView to an application which is already running. 1. Compile a long-running application as in “Debugging an Application”: $ mpicc -g -o Psimple simple.c -lm 2. Run the application: $ mpirun -srun -n2 Psimple 3. Start TotalView: $ totalview 4. Select Unattached in the TotalView Root Window to display a list of running processes.
7 Monitoring Node Activity This chapter describes the optional utilities that provide performance information about the set of nodes associated with your jobs.
Figure 7-1 The xcxclus Utility Display The icons show most node utilization statistics as a percentage of the total resource utilization. For example, Figure 7-1 indicates that the CPU cores are almost fully utilized, at 94 per cent, and 95 per cent of available CPU time. These values are rounded to the nearest integer. Selecting (that is, clicking on) an icon automatically invokes another utility, xcxperf, described in “Using the xcxperf Utility to Display Node Performance” (page 77).
Figure 7-2 The xcxclus Utility Display Icon The following describes the format of the node icons: 1. 2. The node designator is on the upper left of the icon. The left portion of the icon represents the Ethernet connection or connections. In this illustration, two Ethernet connections are used. The data for eth0 is above the data for eth1. As many as 4 Ethernet connections can be displayed. 3. The center portion of the icon displays core usage data for each CPU core in the node.
The following describes the menu options at the top of the xcxclus display window: File Exit.. Terminates the xcxclus utility. Options Utilization Enables you to specify the utilization data in terms of cumulative or incremental utilization. Refresh... Opens a dialog box that enables you to set the refresh rate. CPU Info Enables you to display core utilization in terms of user or system statistics, or both. Key Turns off the display of the color key at the bottom of the display.
Figure 7-3 The clusplot Utility Display The clusplot utility uses the GNUplot open source plotting program. 7.4 Using the xcxperf Utility to Display Node Performance The xcxperf utility provides a graphic display of node performance for a variety of metrics. You can invoke the xcxperf utility either by entering it on the command line or by selecting a node icon in the xcxclus display. The xcxperf utility displays a dynamic graph showing the performance metrics for the node.
When you specify the -o option and a prefix, the xcxperf utility generates a data file for the node it monitors; this data file differs from those generated by the xcxclus utility. The following example runs the xcxperf utility and stores the output in a file named test.
Specifying the data file prefix when you invoke the xcxperf utility from the command line plays back the display according to the recorded data. The following command line plays back the test.xcxperf data file: $ xcxperf test The graphical display differs from the depiction in Figure 7-4because there is an additional pull-down menu named Control next to the File menu. Choosing the Play... option from the Control menu opens a dialog box that you can use to control the playback.
Figure 7-5 The perfplot Utility Display 7.6 Running Performance Health Tests You can run the ovp command to generate reports on the performance health of the nodes. Use the following format to run a specific performance health test: ovp [options] [-verify=perf_health/test] Where: options Specify additional command line options for the test. The ovp --help perf_health command lists the command line options for each test.
NOTE: The --nodelist=nodelist option is particularly useful for determining problematic nodes. If you use this option and the --nnodes=n option, the --nnodes=n option is ignored. • test The --queue LSF_queue option specifies the LSF queue for the performance health tests. Indicates the test to perform. The following tests are available: Tests CPU core performance using the Linpack cpu benchmark. Tests CPU core usage. All CPU cores should be cpu_usage idle during the test.
network_stress network_bidirectional network_unidirectional By default, the ovp command reports if the nodes passed or failed the given test. Use the ovp --verbose option to display additional information. The results of the test are written to a file in your home directory. The file name has the form ovp_node_date[rx].log where node is the node from which the command was launched and date is a date stamp in the form mmddyy.
The following example tests the memory of nodes n11, n12, n13, n14, and n15. The --keep option preserves the test data in a temporary directory. $ ovp --verbose --opts=--nodelist=n[11-15] --keep \ -verify=perf_health/memory XC CLUSTER VERIFICATION PROCEDURE date time Verify perf_health: Testing memory ... Specified nodelist is n[11-15] Number of nodes allocated for this test is 5 Job <103> is submitted to default queue . <> <
8 Tuning Applications This chapter discusses how to tune applications in the HP XC environment. 8.1 Using the Intel Trace Collector and Intel Trace Analyzer This section describes how to use the Intel Trace Collector (ITC) and Intel Trace Analyzer (ITA) with HP-MPI on an HP XC system. The Intel Trace Collector/Analyzer were formerly known as VampirTrace and Vampir, respectively.
Example 8-1 The vtjacobic Example Program For the purposes of this example, the examples directory under /opt/IntelTrace/ITC is copied to the user's home directory and renamed to examples_directory. The GNU Makefile looks as follows: CC F77 CLINKER FLINKER IFLAGS CFLAGS FFLAGS LIBS CLDFLAGS = = = = = = = = = mpicc.mpich mpif77.mpich mpicc.mpich mpif77.
8.2 The Intel Trace Collector and Analyzer with HP-MPI on HP XC NOTE: The Intel Trace Collector (OTA) was formerly known as VampirTrace. The Intel Trace Analyzer was formerly known as Vampir. 8.2.1 Installation Kit The following are installation-related notes: There are two installation kits for the Intel Trace Collector: • • ITC-IA64-LIN-MPICH-PRODUCT.4.0.2.1.tar.gz ITA-IA64-LIN-AS21-PRODUCT.4.0.2.1.tar.gz The Intel Trace Collector is installed in the /opt/IntelTrace/ITC directory.
Running a Program Ensure that the -static-libcxa flag is used when you use mpirun.mpich to launch a C or Fortran program. The following is a C example called vtjacobic: # mpirun.mpich -np 2 ~/xc_PDE_work/ITC_examples_xc6000/vtjacobic warning: this is a development version of HP-MPI for internal R&D use only /nis.home/user_name/xc_PDE_work/ITC_examples_xc6000/vtjacobic: 100 iterations in 0.228252 secs (28.712103 MFlops), m=130 n=130 np=2 [0] Intel Trace Collector INFO: Writing tracefile vtjacobic.
[0] Intel Trace Collector INFO: Writing tracefile vtjacobif.stf in /nis.home/user_name/xc_PDE_work/ITC_examples_xc6000 mpirun exits with status: 0 Running a Program Across Nodes (Using LSF) The following is an example that uses the LSF bsub command to run the program named vtjacobic across four nodes: # bsub -n4 -I mpirun.mpich -np 2 ./vtjacobic The license file and the OTC directory need to be distributed across the nodes. 8.
9 Using SLURM HP XC uses the Simple Linux Utility for Resource Management (SLURM) for system resource management and job scheduling.
The srun command handles both serial and parallel jobs. The srun command has a significant number of options to control the execution of your application closely. However, you can use it for a simple launch of a serial program, as Example 9-1 shows. Example 9-1 Simple Launch of a Serial Program $ srun hostname n1 9.3.1 The srun Roles and Modes The srun command submits jobs to run under SLURM management. The srun command can perform many roles in launching and managing your job.
Example 9-2 Displaying Queued Jobs by Their JobIDs $ squeue --jobs 12345,12346 JOBID PARTITION NAME USER ST TIME_USED NODES NODELIST(REASON) 12345 debug job1 jody R 0:21 4 n[9-12] 12346 debug job2 jody PD 0:00 8 The squeue command can report on jobs in the job queue according to their state; possible states are: pending, running, completing, completed, failed, timeout, and node_fail. Example 9-3 uses the squeue command to report on failed jobs.
Example 9-8 Reporting Reasons for Downed, Drained, and Draining Nodes $ sinfo -R REASON Memory errors Not Responding NODELIST n[0,5] n8 9.7 Job Accounting HP XC System Software provides an extension to SLURM for job accounting. The sacct command displays job accounting data in a variety of forms for your analysis. Job accounting data is stored in a log file; the sacct command filters that log file to report on your jobs, jobsteps, status, and errors.
10 Using LSF-HPC The Load Sharing Facility (LSF-HPC) from Platform Computing Corporation is a batch system resource manager used on the HP XC system. On an HP XC system, a job is submitted to LSF-HPC, which places the job in a queue and allows it to run when the necessary resources become available. In addition to launching jobs, LSF-HPC provides extensive job management and information capabilities.
LSF-HPC is installed and configured on all nodes of the HP XC system by default. Nodes without the compute role are closed with '0' job slots available for use. The LSF environment is set up automatically for the user on login; LSF commands and their manpages are readily accessible: • • • • • The bhosts command is useful for viewing LSF batch host information. The lshosts command provides static resource information. The lsload command provides dynamic resource information.
SLURM_JOBID This environment variable is created so that subsequent srun commands make use of the SLURM allocation created by LSF-HPC for the job. This variable can be used by a job script to query information about the SLURM allocation, as shown here: $ squeue --jobs $SLURM_JOBID “Translating SLURM and LSF-HPC JOBIDs” describes the relationship between the SLURM_JOBID and the LSF-HPC JOBID.
Example 10-2 Examples of Launching LSF-HPC Jobs Without the srun Command The following bsub command line invokes the bash shell to run the hostname command with the pdsh command: [lsfadmin@n16 ~]$ bsub -n4 -I -ext "SLURM[nodes=4]" /bin/bash -c 'pdsh -w "$LSB_HOSTS" hostname' Job <118> is submitted to default queue . <> <
• LSF-HPC integrated with SLURM only runs daemons on one node within the HP XC system. This node hosts an HP XC LSF Alias, which is an IP address and corresponding host name specifically established for LSF-HPC integrated with SLURM on HP XC to use. The HP XC system is known by this HP XC LSF Alias within LSF. Various LSF-HPC commands, such as lsid , lshosts, and bhosts, display HP XC LSF Alias in their output. The default value of the HP XC LSF Alias, lsfhost.
sometime in the future, depending on resource availability and batch system scheduling policies. Batch job submissions typically provide instructions on I/O management, such as files from which to read input and filenames to collect output. By default, LSF-HPC jobs are batch jobs. The output is e-mailed to the user, which requires that e-mail be set up properly. SLURM batch jobs are submitted with the srun -b command. By default, the output is written to $CWD/slurm-SLURMjobID.
LSF-HPC allocates the appropriate whole node for exclusive use by the serial job in the same manner as it does for parallel jobs, hence the name “pseudo-parallel”. Parallel job A job that requests more than one slot, regardless of any other constraints. Parallel jobs are allocated up to the maximum number of nodes specified by the following specifications: • SLURM[nodes=min-max] (if specified) • SLURM[nodelist=node_list] (if specified) • bsub -n Parallel jobs and serial jobs cannot run on the same node.
The HP XC system has several features that make it optimal for running parallel applications, particularly (but not exclusively) MPI applications. You can use the bsub command's -n to request more than one core for a job. This option, coupled with the external SLURM scheduler, discussed in “LSF-SLURM External Scheduler”, gives you much flexibility in selecting resources and shaping how the job is executed on those resources.
Figure 10-1 How LSF-HPC and SLURM Launch and Manage a Job User 1 N16 N 16 N16 Login node $ bsub-n4 -ext ”SLURM[nodes-4]” -o output.out./myscript 2 lsfhost.localdomain LSF Execution Host job_starter.sh $ srun -nl myscript 6 3 4 hostname N2 Compute Node n2 SLURM_JOBID=53 SLURM_NPROCS=4 7 N1 $ hostname n1 5 Compute Node myscript $ hostname $ srun hostname $ mpirun -srun ./hellompi 6 srun hostname N3 Compute Node n3 6 7 hostname n1 6 7 hostname N4 Compute Node n4 7 1. 2.
4. LSF-HPC prepares the user environment for the job on the LSF execution host node and dispatches the job with the job_starter.sh script. This user environment includes standard LSF environment variables and two SLURM-specific environment variables: SLURM_JOBID and SLURM_NPROCS. SLURM_JOBID is the SLURM job ID of the job. Note that this is not the same as the LSF-HPC jobID. “Translating SLURM and LSF-HPC JOBIDs” describes the relationship between the SLURM_JOBID and the LSF-HPC JOBID.
10.10.1 Examining System Core Status The bhosts command displays LSF-HPC resource usage information. This command is useful to examine the status of the system cores. The bhosts command provides a summary of the jobs on the system and information about the current state of LSF-HPC. For example, it can be used to determine if LSF-HPC is ready to start accepting batch jobs.
• • The maxmem column displays minimum maxmem over all available computer nodes in the lsf partition. The maxtmp column (not shown) displays minimum maxtmp over all available computer nodes in the lsf partition. Use the lshosts -l command to display this column. 10.10.3 Getting Host Load Information The LSF lsload command displays load information for LSF execution hosts. $ lsload HOST_NAME lsfhost.
10.11 Getting Information About Jobs There are several ways you can get information about a specific job after it has been submitted to LSF-HPC integrated with SLURM. This section briefly describes some of the commands that are available under LSF-HPC integrated with SLURM to gather information about a job.
Example 10-3 Job Allocation Information for a Running Job $ bjobs -l 24 Job <24>, User , Project , Status , Queue , Interactive pseudo-terminal shell mode, Extsched , Command date and time stamp: Submitted from host , CWD <$HOME>, 4 Processors Requested, Requested Resources ; date and time stamp: Started on 4 Hosts/Processors <4*lsfhost.
Example 10-5 Using the bjobs Command (Short Output) $ bjobs 24 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 24 msmith RUN normal n16 lsfhost.localdomain /bin/bash date and time As shown in the previous output, the bjobs command returns information that includes the job id, user name, job status, queue name, submitting host, executing host, job name, and submit time.
Table 10-2 Output Provided by the bhist Command Field Description JOBID The job ID that LSF-HPC assigned to the job. USER The user who submitted the job. JOB_NAME The job name assigned by the user. PEND The total waiting time, excluding user suspended time, before the job is dispatched. PSUSP The total user suspended time of a pending job. RUN The total run time of the job. USUSP The total user suspended time after the job is dispatched.
$ bjobs -l 99 | grep slurm date and time stamp: slurm_id=123;ncpus=8;slurm_alloc=n[13-16]; The SLURM JOBID is 123 for the LSF JOBID 99. You can also find the allocation information in the output of the bhist command: $ bhist -l 99 | grep slurm date and time stamp: slurm_id=123;ncpus=8;slurm_alloc=n[13-16]; When LSF-HPC creates an allocation in SLURM, it constructs a name for the allocation by combining the LSF cluster name with the LSF-HPC JOBID.
$ bjobs -l 124 | grep slurm date and time stamp: slurm_id=150;ncpus=8;slurm_alloc=n[1-4]; LSF allocated nodes n[1-4] for this job. The SLURM JOBID is 150 for this allocation. Begin your work in another terminal. Use ssh to login to one of the compute nodes. If you want to run tasks in parallel, use the srun command with the --jobid option to specify the SLURM JOBID.
Example 10-10 Launching an Interactive MPI Job on All Cores in the Allocation This example assumes 2 cores per node.
10.14 LSF-HPC Equivalents of SLURM srun Options Table 10-3 describes the srun options and lists their LSF-HPC equivalents. Table 10-3 LSF-HPC Equivalents of SLURM srun Options srun Option Description LSF-HPC Equivalent -n Number of processes (tasks) to run. bsub -n num --ntasks=ntasks Specifies the number of cores per task.
Table 10-3 LSF-HPC Equivalents of SLURM srun Options (continued) srun Option Description LSF-HPC Equivalent -i Specify how stdin is to be redirected. You can use when launching parallel tasks. --input=none,tasked Specify how stderr is to be redirected. bsub -e error_file Specify how stderr is to be redirected. You can use when launching parallel tasks. --error=none,tasked -J Specify a name for the job. You cannot use this option. When creating allocation. SLURM sets LSF-HPC job id automatically.
Table 10-3 LSF-HPC Equivalents of SLURM srun Options (continued) srun Option Description LSF-HPC Equivalent -l Prepend task number to lines of stdout/stderr. Use as an argument to srun when launching parallel tasks. Do not line buffer stdout from remote tasks. Use as an argument to srun when launching parallel tasks. Distribution method for remote --distribution=block|cyl processes. Use as an argument to srun when launching parallel tasks.
11 Advanced Topics This chapter covers topics intended for the advanced user. This chapter addresses the following topics: • • • • • • “Enabling Remote Execution with OpenSSH” (page 117) “Running an X Terminal Session from a Remote Node” (page 117) “Using the GNU Parallel Make Capability” (page 119) “Local Disks on Compute Nodes” (page 122) “I/O Performance Considerations” (page 123) “Communication Between Nodes” (page 123) 11.
$ echo $DISPLAY :0 Next, get the name of the local machine serving your display monitor: $ hostname mymachine Then, use the host name of your local machine to retrieve its IP address: $ host mymachine mymachine has address 14.26.206.134 Step 2. Logging in to HP XC System Next, you need to log in to a login node on the HP XC system. For example: $ ssh user@xc-node-name Once logged in to the HP XC system, you can start an X terminal session using SLURM or LSF-HPC.
$ sinfo PARTITION AVAIL TIMELIMIT NODES lsf up infinite 2 STATE NODELIST idle n[46,48] According to the information returned about this HP XC system, LSF-HPC has two nodes available for use, n46 and n48. Determine the address of your monitor's display server, as shown at the beginning of “Running an X Terminal Session from a Remote Node”. You can start an X terminal session using this address information in a bsub command with the appropriate options.
One way is to prefix the actual compilation line in the rule with an srun command. So, instead of executing cc foo.c -o foo.o it would execute srun cc foo.c -o foo.o. With concurrency, multiple command nodes would have multiple srun commands instead of multiple cc commands. For projects that recursively run make on subdirectories, the recursive make can be run on the compute nodes. For example: $ cd subdir; srun $(MAKE)...
then \ echo "Making $$i ..."; \ (cd $$i; make); \ echo ""; \ fi; \ done clean: @ \ for i in ${HYPRE_DIRS}; \ do \ if [ -d $$i ]; \ then \ echo "Cleaning $$i ..."; \ (cd $$i; make clean); \ fi; \ done veryclean: @ \ for i in ${HYPRE_DIRS}; \ do \ if [ -d $$i ]; \ then \ echo "Very-cleaning $$i ..."; \ (cd $$i; make veryclean); \ fi; \ done 11.3.1 Example Procedure 1 Go through the directories serially and have the make procedure within each directory be parallel.
$(MAKE) $(MAKE_J) struct_matrix_vector/libHYPRE_mv.a struct_linear_solvers/libHYPRE_ls.a utilities/libHYPRE_utilities.a $(PREFIX) $(MAKE) -C test struct_matrix_vector/libHYPRE_mv.a: $(PREFIX) $(MAKE) -C struct_matrix_vector struct_linear_solvers/libHYPRE_ls.a: $(PREFIX) $(MAKE) -C struct_linear_solvers utilities/libHYPRE_utilities.a: $(PREFIX) $(MAKE) -C utilities The modified Makefile is invoked as follows: $ make PREFIX='srun -n1 -N1' MAKE_J='-j4' 11.3.
11.5 I/O Performance Considerations Before building and running your parallel application, I/O performance issues on the HP XC cluster must be considered. The I/O control system provides two basic types of standard file system views to the application: • • Shared Private 11.5.1 Shared File View Although a file opened by multiple processes of an application is shared, each core maintains a private file pointer and file position.
respectively. These subsections are not full solutions for integrating MPICH with the HP XC System Software. Figure 11-1 MPICH Wrapper Script #!/bin/csh srun csh -c 'echo `hostname`:2' | sort | uniq > machinelist set hostname = `head -1 machinelist | awk -F: '{print $1}'` ssh $hostname /opt/mpich/bin/mpirun options... -machinefile machinelist a.out The wrapper script is based on the following assumptions: • • • • • Each node in the HP XC system contains two CPUs.
A Examples This appendix provides examples that illustrate how to build and run applications on the HP XC system. The examples in this section show you how to take advantage of some of the many methods available, and demonstrate a variety of other user commands to monitor, control, or kill jobs. The examples in this section assume that you have read the information in previous chapters describing how to use the HP XC commands to build and run parallel applications.
Examine the partition information: $ sinfo PARTITION AVAIL TIMELIMIT NODES lsf up infinite 6 STATE NODELIST idle n[5-10] Examine the local host information: $ hostname n2 Examine the job information: $ bjobs No unfinished job found Run the LSF bsub -Is command to launch the interactive shell: $ bsub -Is -n1 /bin/bash Job <120> is submitted to default queue . <> <
date and time stamp: Submitted from host , CWD <$HOME>, 2 Processors Requested; date and time stamp: Started on 2 Hosts/Processors <2*lsfhost.localdomain>; date and time stamp: slurm_id=24;ncpus=4;slurm_alloc=n[13-14]; date and time stamp: Done successfully. The CPU time used is 0.0 seconds.
example steps through a series of commands that illustrate what occurs when you launch an interactive shell. Examine the LSF execution host information: $ bhosts HOST_NAME STATUS lsfhost.
Summary of time in seconds spent in various states by date and time PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL 11 0 124 0 0 0 135 Exit from the shell: $ exit exit Examine the finished job's information: $ bhist -l 124 Job <124>, User , Project , Interactive pseudo-terminal shell mode, Extsched , Command date and time stamp: Submitted from host , to Queue , CWD <$HOME>, 4 Processors Requested, Requested Resources ; date and time stamp: Dispat
srun hostname srun uname -a Run the job: $ bsub -I -n4 myjobscript.sh Job <1006> is submitted to default queue . <> <> n14 n14 n16 n16 Linux n14 2.4.21-15.3hp.XCsmp #2 SMP date and time stamp ia64 ia64 ia64 GNU/Linux Linux n14 2.4.21-15.3hp.XCsmp #2 SMP date and time stamp ia64 ia64 ia64 GNU/Linux Linux n16 2.4.21-15.3hp.XCsmp #2 SMP date and time stamp ia64 ia64 ia64 GNU/Linux Linux n16 2.4.21-15.3hp.
Show the SLURM job ID: $ env | grep SLURM SLURM_JOBID=74 SLURM_NPROCS=8 Run some commands from the pseudo-terminal: $ srun hostname n13 n13 n14 n14 n15 n15 n16 n16 $ srun -n3 hostname n13 n14 n15 Exit the pseudo-terminal: $ exit exit View the interactive jobs: $ bjobs -l 1008 Job <1008>, User smith, Project , Status , Queue , Interactive pseudo-terminal mode, Command date and time stamp: Submitted from host n16, CWD <$HOME/tar_drop1/test>, 8 Processors Requested; date and
View the node state: $ sinfo PARTITION AVAIL TIMELIMIT NODES lsf up infinite 4 STATE idle NODELIST n[13-16] A.7 Submitting an HP-MPI Job with LSF-HPC This example shows how to run an MPI job with the bsub command. Show the environment: $ lsid Platform LSF HPC version number for SLURM, date and time stamp Copyright 1992-2006 Platform Computing Corporation My cluster name is penguin My master name is lsfhost.
EXTERNAL MESSAGES: MSG_ID FROM POST_TIME 0 1 lsfadmin date and time MESSAGE SLURM[nodes=2] ATTACHMENT N View the finished job: $ bhist -l 1009 Job <1009>, User , Project , Interactive mode, Extsched , Command date and time stamp: Submitted from host , to Queue ,CWD <$HOME>, 6 Processors Requested; date and time stamp: Dispatched to 6 Hosts/Processors <6*lsfhost.
Glossary A administration branch The half (branch) of the administration network that contains all of the general-purpose administration ports to the nodes of the HP XC system. administration network The private network within the HP XC system that is used for administrative operations. availability set An association of two individual nodes so that one node acts as the first server and the other node acts as the second server of a service. See also improved availability, availability tool.
operating system and its loader. Together, these provide a standard environment for booting an operating system and running preboot applications. enclosure The hardware and software infrastructure that houses HP BladeSystem servers. extensible firmware interface See EFI. external network node A node that is connected to a network external to the HP XC system. F fairshare An LSF job-scheduling policy that specifies how resources should be shared by competing users.
image server A node specifically designated to hold images that will be distributed to one or more client systems. In a standard HP XC installation, the head node acts as the image server and golden client. improved availability A service availability infrastructure that is built into the HP XC system software to enable an availability tool to fail over a subset of eligible services to nodes that have been designated as a second server of the service See also availability set, availability tool.
LVS Linux Virtual Server. Provides a centralized login capability for system users. LVS handles incoming login requests and directs them to a node with a login role. M Management Processor See MP. master host See LSF master host. MCS An optional integrated system that uses chilled water technology to triple the standard cooling capacity of a single rack. This system helps take the heat out of high-density deployments of servers and blades, enabling greater densities in data centers.
onboard administrator See OA. P parallel application An application that uses a distributed programming model and can run on multiple processors. An HP XC MPI application is a parallel application. That is, all interprocessor communication within an HP XC parallel application is performed through calls to the MPI message passing library. PXE Preboot Execution Environment.
an HP XC system, the use of SMP technology increases the number of CPUs (amount of computational power) available per unit of space. ssh Secure Shell. A shell program for logging in to and executing commands on a remote computer. It can provide secure encrypted communications between two untrusted hosts over an insecure network. standard LSF A workload manager for any kind of batch job.
Index A ACML library, 49 application development, 43 building parallel applications, 49 building serial applications, 46 communication between nodes, 123 compiling and linking parallel applications, 49 compiling and linking serial applications, 46 debugging parallel applications, 67 debugging serial applications, 67 debugging with TotalView, 68 determining available resources for, 104 developing libraries, 50 developing parallel applications, 46 developing serial applications, 45 examining core availability
core availability, 45 CP3000, 24 MKL library, 49 system interconnect, 26 CP3000BL, 24 CP4000, 24 ACML library, 49 compilers, 44, 48 designing libraries for, 50 MKL library, 49 software packages, 31 system interconnect, 26 CP6000 MKL library, 49 system interconnect, 26 D DDT, 67 debugger TotalView, 68 debugging DDT, 67 gdb, 67 idb, 67 pgdbg, 67 TotalView, 67 debugging options setting, 45 debugging parallel applications, 67 debugging serial applications, 67 determining LSF execution host, 104 developing appl
job accounting, 94 job allocation information obtaining, 107 job manager, 96 job scheduler, 96 JOBID translation, 110 L launching jobs srun, 91 libraries, 31 building parallel applications, 49 library development, 50 Linux manpages, 36 local disk configuring, 122 login node, 43 login procedure, 33 LSF documentation, 18 HP-MPI, 89 lsf partition, 105 obtaining information, 106 LSF-HPC, 95, 96 bhist command, 111 bhosts command, 105 bjobs command, 108, 110 bqueues command, 106 bsub command, 101, 110 determinin
building, 49 compiling and linking, 49 debugging, 67 debugging with TotalView, 68 developing, 43 environment for developing, 28 examples of, 125 partition reporting state of, 93 PATH environment variable setting with a module, 38 Pathscale building parallel applications, 48 Pathscale compilers, 44 Pathscale Fortran (see Fortran) performance considerations, 123 performance health tests, 80–83 perfplot utility, 79 pgdbg, 67 PGI building parallel applications, 48 PGI compilers, 44 PGI Fortran (see Fortran) plo
exiting, 71 setting preferences, 69 setting up, 68 tuning applications, 85 U Unified Parallel C (see UPC) UPC, 43 user environment, 37 utilization metrics, 73 V Vampir, 87 VampirTrace/Vampir, 85 W Web site HP XC System Software documentation, 16 X xcxclus utility, 73, 77 data file, 75, 77 plotting data from, 76 xcxperf utility, 77 data file, 77 playback, 78 plotting data from, 79 xterm running from remote node, 117 145