Usage examples
Submitting your job to a batch queue
Use the
qsub
command to submit your job to the PPPL cluster for processing.
Your job is described by a batch script,
and it is this script that is submitted.
You will be returned a job id containing the job number.
For example:
[sunfire05.pppl.gov|82] qsub batch_test
82029.phoenix.pppl.gov
[sunfire05.pppl.gov|83] _
(note: the job number is returned from the PBS server phoenix.)
When the job is done, the standard output and error files (stdout, stderr) will
be left in the current working directory, i.e. the directory from which you
submitted the job. By default, the stdout file is named
<jobname>.o<jobid>
and the stderr file is named
<jobname>.e<jobid>. These
names may be overridden.
Using PBS directives
PBS allows you to specify directives in your
job script that will customize
your job. The format of a PBS directive is:
#PBS <flag> [arguments]
where the string "#PBS" is NOT a comment, but rather a special string used by
PBS to denote a PBS directive. An example is the specification of the job name:
#PBS -N myjob
Specifying the number of hosts on which to run
By default, the job will be run on a single host. However, you can specify the
use of multiple hosts (especially if you have a parallelized program) by
specifying a PBS directive; for example:
#PBS -l nodes=12:ppn=2
where
12
is the number of nodes to run upon, and
ppn
is the processors per node.
Job queue specification
By default, a job will be put into the
dque,
or default queue. The queue selection can be overriden in your job script by
specifying the PBS directive:
#PBS -q [kestrel, kite, sque]
by selecting the queue to which you wish to submit your job, for example:
#PBS -q sque
Specifying the standard output
and standard error files
By default, the standard output (stdout) and error files (stderr) are named
<jobname>.o<jobid>
and
<jobname>.e<jobid>
respectively. These names may be overridden using the PBS directive:
#PBS -o myoutput.out
#PBS -e myerror.err
You can also specify that standard output and error be combined
into one file, whose default value is
<jobname>.o<jobid>,
using the directive:
#PBS -j oe
Wall time
The amount of wall clock time needed to run the job may be specified
by a PBS directive
#PBS -l walltime=60:30:00
This wall clock time estimate (in this case, 60 hours and 30 minutes)
allows the scheduler to know when the systems you are using will be
available again. Your job will be terminated (via a kill -15 command)
when the wall time estimated is exceeded. So be generous but, for
help in reasonably accurate scheduling and load balancing
estimates, not too generous.
Using nodes with larger amounts of memory
Hosts in the PPPL cluster have at least 2GB of memory. However,
at least 80 hosts have 4GB of memory. These hosts are contained
in the kestrel and sque queues. To use these
systems, you can specify the "largemem" attribute.
For example:
#PBS -l nodes=4:ppn=2:largemem
where
4
is the number of nodes to run upon,
ppn
is the processors per node, and
largemem specifies that PBS should only schedule the job
to nodes with large memory sizes (typically 4GB).
A Generic batch script
Click here for a script which is a generic batch
job script that includes most common directives and options used by PPPL jobs.
Some simple example batch
scripts
> vi
test.job
#!/bin/bash
# --- send the output to the test.out file
# the default is .o<jobid>
#PBS -o
test.out
# --- send the error output to the test.err file
# the default is .e<jobid>
#PBS -e test.err
echo "Print out the hostname and date"
/bin/hostname
/bin/date
exit 0
To submit this job, type:
> qsub test.job
This example job runs on multiple hosts:
#!/bin/bash
# --- send the job to 4 nodes, with 2 processors per node
#PBS -lnodes=4:ppn=2
# --- send the output to the test.out file
# the default is .o<jobid>
#PBS -o
test.out
# --- send the error output to the test.err file
# the default is .e<jobid>
#PBS -e test.err
# --- print out the list of nodes this job is running upon
/bin/cat $PBS_NODEFILE
echo "Print out the hostname and date"
/bin/hostname
/bin/date
exit 0