1 useful commands, 2 job startup and job control, 3 preemption – HP XC System 3.x Software User Manual

Page 101: 6 submitting jobs, Using

Advertising
background image

LSF-HPC allocates the appropriate whole node for exclusive
use by the serial job in the same manner as it does for parallel
jobs, hence the name “pseudo-parallel”.

Parallel job

A job that requests more than one slot, regardless of any other
constraints. Parallel jobs are allocated up to the maximum
number of nodes specified by the following specifications:

SLURM[nodes=min-max]

(if specified)

SLURM[nodelist=node_list]

(if specified)

bsub -n

Parallel jobs and serial jobs cannot run on the same node.

Small job

A parallel job that can potentially fit into a single node, and
does not explicitly request more than one node (SLURM[nodes]
or SLURM[node_list] specification). LSF-HPC tries to allocate
a single node for a small job.

10.5 Using LSF-HPC Integrated with SLURM in the HP XC Environment

This section provides some additional information that should be noted about using LSF-HPC
in the HP XC Environment.

10.5.1 Useful Commands

The following describe useful commands for LSF-HPC Integrated with SLURM:

Use the bjobs -l and bhist -l commands to see the components of the actual SLURM
allocation command.

Use the bkill command to kill jobs.

Use the bjobs command to monitor job status in LSF-HPC integrated with SLURM.

Use the bqueues command to list the configured job queues in LSF-HPC integrated with
SLURM.

10.5.2 Job Startup and Job Control

When LSF-HPC starts a SLURM job, it sets SLURM_JOBID to associate the job with the SLURM
allocation. While a job is running, all LSF-HPC supported operating-system-enforced resource
limits are supported, including core limit, CPU time limit, data limit, file size limit, memory
limit, and stack limit. If the user kills a job, LSF-HPC propagates signals to entire job, including
the job file running on the local node and all tasks running on remote nodes.

10.5.3 Preemption

LSF-HPC uses the SLURM "node share" feature to facilitate preemption. When a low-priority is
job preempted, job processes are suspended on allocated nodes, and LSF-HPC places the
high-priority job on the same node. After the high-priority job completes, LSF-HPC resumes
suspended low-priority jobs.

10.6 Submitting Jobs

The bsub command submits jobs to LSF-HPC; it is used to request a set of resources on which
to launch a job. This section focuses on enhancements to this command from the LSF-HPC
integration with SLURM on the HP XC system; this section does not discuss standard bsub
functionality or flexibility. See the Platform LSF documentation and the bsub(1) manpage for
more information on this important command. The topic of submitting jobs with the LSF-SLURM
External Scheduler is explored in detail in

“Submitting a Parallel Job Using the SLURM External

Scheduler”

.

10.5 Using LSF-HPC Integrated with SLURM in the HP XC Environment

101

Advertising