Submitting the job, Prolog file, Submitting the job prolog file – HP XC System 3.x Software User Manual

Page 71: Prolog

Advertising
background image

Submitting the Job

Use the HP-LSF bsub command to submit the following job:

% bsub -n num_nodes \
mpirun -srun \
--task-prolog=`pwd`/slurm.task-prolog.hpcpi \
--task-epilog=`pwd`/slurm.task-epilog.hpcpi \

myApp myArgs

The num_nodes is the number of nodes for the job, myApp is the name of the MPI application,
and myArgs are any arguments for the MPI application.

The prolog file is slurm.task-prolog.hpcpi and the epilog file is
slurm.task-epilog.hpcpi

. Both files are located in the current working directory, and

prefacing the file names with `pwd`/ ensures that SLURM can locate the files.

The prolog file starts the hpcpid daemon on all nodes in the job allocation.

To use an HPCPI label, run the hpcpictl label command from the mpirun utility. Replace
the myApp myArgs run string in the example with an hpcpictl label command that launches
myApp

as follows:

hpcpictl label myLabel [label_selectors] myApp myArgs

prolog File

The contents of the prolog file (slurm.task-prolog.hpcpi) are as follows:

#!/bin/csh -f

if ( ! $?SLURM_LOCALID ) then
exit
endif
if ( ! $?SLURM_TASK_PID ) then
exit
endif

# Only start the HPCPI daemon from one task per node.
#
if ( $SLURM_LOCALID == 0 ) then
#
# Start hpcpid with the -terminate-with option to ensure that
# hpcpid terminates when the SLURM job finishes, in case
# the epilog doesn't run or some other catastrophe.
#
# We want hpcpid to terminate with the task. $SLURM_TASK_PID is the pid of
# the slurmstepd. Its parent, the one we want to terminate with,
# is the initial slurmstepd on this node for this task.
#
# -epoch uses the current epoch, so each node will use the
# previously created epoch.
# The >& redirection of the output is useful for logging
# and debugging, but is also used because SLURM
# expects script output in the form VAR=VAL pairs.
#
set termWithPID=`ps --no-heading --format ppid -p $SLURM_TASK_PID`
hpcpid -terminate-with $termWithPID -epoch >& $HPCPIDB/task-prolog.`hostname`.$$
endif
# Each task should wait until the HPCPI daemon is up
#
foreach try (1 2 3 4 5 6 7 8 9 10)
sleep 1
hpcpictl show >& /dev/null
if ( $status == 0 ) then
exit
endif

In normal operation, the daemon is terminated by code in the epilog script. The prolog script
starts hpcpid with the option -terminate-with pid as a contingency method to terminate

Collecting Data on Multiple Nodes

71

Advertising