Notes on lsf-hpc – HP XC System 3.x Software User Manual

Page 72

Advertising
background image

exclude=

list-of-nodes

contiguous=yes

The srun(1) manpage provides details on these options and their arguments.

The following are interactive examples showing how these options can be used on an HP XC system.

To launch the hostname command on 10 cores in parallel:

$ bsub -n 10 -I srun hostname

To launch the hostname command on 10 nodes in parallel:

$ bsub -n 10 -ext "SLURM[nodes=10]" -I srun hostname

To launch the hostname command on 10 nodes in parallel, but avoiding node n16:

$ bsub -n 10 -ext "SLURM[nodes=10;exclude=n16]" -I srun hostname

To launch the hostname command on 10 cores on nodes with a dualcore SLURM feature assigned
to them:

$ bsub -n 10 -ext "SLURM[constraint=dualcore]" -I srun hostname

To launch the hostname command once on nodes n1 through n10 (n[1-10]):

$ bsub -n 10 -ext "SLURM[nodelist=n[1-10]]" srun hostname

To determine the external SLURM scheduler options that apply to jobs submitted to the LSF dualcore
queue:

$ bqueues -l dualcore | grep SLURM

MANDATORY_EXTSCHED: SLURM[constraint=dualcore]

Notes on LSF-HPC

The following are noteworthy items for users of LSF-HPC on HP XC systems:

A SLURM partition named lsf is used to manage LSF-HPC jobs. You can view information about this
partition with the sinfo command.

LSF-HPC daemons only run on one node in the HP XC system. As a result, the lshosts and bhosts
commands only list one host that represents all the resources of the HP XC system.

The total number of cores listed by the lshosts and bhosts commands for that host should be equal
to the total number of cores assigned to the SLURM lsf partition.

When a job is submitted and the resources are available, LSF-HPC creates a properly sized SLURM
allocation and adds several standard LSF environment variables to the environment in which the job is
to be run. The following two environment variables are also added:

SLURM_JOBID

This environment variable is created so that subsequent srun commands make
use of the SLURM allocation created by LSF-HPC for the job. This variable can be
used by a job script to query information about the SLURM allocation, as shown
here:

$ squeue --jobs $SLURM_JOBID

"Translating SLURM and LSF-HPC JOBIDs"

describes the relationship between

the SLURM_JOBID and the LSF-HPC JOBID.

SLURM_NPROCS

This environment variable passes along the total number of tasks requested with
the bsub -n command to all subsequent srun commands. User scripts can
override this value with the srun -n command, but the new value must be less
than or equal to the original number of requested tasks.

Use the bjobs -l and bhist -l LSF commands to see the components of the actual SLURM allocation
command.

Use the bkill command to kill jobs.

72

Using LSF

Advertising