HP XC System Software User's Guide Version 3.2

9.3.3 Using the srun Command with LSF-HPC...............................................................................92
9.4 Monitoring Jobs with the squeue Command..................................................................................92
9.5 Terminating Jobs with the scancel Command.................................................................................93
9.6 Getting System Information with the sinfo Command...................................................................93
9.7 Job Accounting................................................................................................................................94
9.8 Fault Tolerance................................................................................................................................94
9.9 Security............................................................................................................................................94
10 Using LSF-HPC............................................................................................................95
10.1 Information for LSF-HPC..............................................................................................................95
10.2 Overview of LSF-HPC Integrated with SLURM...........................................................................96
10.3 Differences Between LSF-HPC and LSF-HPC Integrated with SLURM.......................................98
10.4 Job Terminology............................................................................................................................99
10.5 Using LSF-HPC Integrated with SLURM in the HP XC Environment.......................................101
10.5.1 Useful Commands...............................................................................................................101
10.5.2 Job Startup and Job Control.................................................................................................101
10.5.3 Preemption..........................................................................................................................101
10.6 Submitting Jobs............................................................................................................................101
10.7 LSF-SLURM External Scheduler..................................................................................................102
10.8 How LSF-HPC and SLURM Launch and Manage a Job.............................................................102
10.9 Determining the LSF Execution Host..........................................................................................104
10.10 Determining Available System Resources.................................................................................104
10.10.1 Examining System Core Status..........................................................................................105
10.10.2 Getting Information About the LSF Execution Host Node...............................................105
10.10.3 Getting Host Load Information.........................................................................................106
10.10.4 Examining System Queues................................................................................................106
10.10.5 Getting Information About the lsf Partition......................................................................106
10.11 Getting Information About Jobs................................................................................................107
10.11.1 Getting Job Allocation Information...................................................................................107
10.11.2 Examining the Status of a Job............................................................................................108
10.11.3 Viewing the Historical Information for a Job....................................................................109
10.12 Translating SLURM and LSF-HPC JOBIDs...............................................................................110
10.13 Working Interactively Within an Allocation..............................................................................111
10.14 LSF-HPC Equivalents of SLURM srun Options........................................................................114
11 Advanced Topics......................................................................................................117
11.1 Enabling Remote Execution with OpenSSH................................................................................117
11.2 Running an X Terminal Session from a Remote Node................................................................117
11.3 Using the GNU Parallel Make Capability...................................................................................119
11.3.1 Example Procedure 1...........................................................................................................121
11.3.2 Example Procedure 2...........................................................................................................121
11.3.3 Example Procedure 3...........................................................................................................122
11.4 Local Disks on Compute Nodes..................................................................................................122
11.5 I/O Performance Considerations.................................................................................................123
11.5.1 Shared File View..................................................................................................................123
11.5.2 Private File View..................................................................................................................123
11.6 Communication Between Nodes.................................................................................................123
11.7 Using MPICH on the HP XC System...........................................................................................123
11.7.1 Using MPICH with SLURM Allocation..............................................................................124
11.7.2 Using MPICH with LSF Allocation.....................................................................................124
A Examples....................................................................................................................125
A.1 Building and Running a Serial Application.................................................................................125
6 Table of Contents