Usage examples
Submitting your job to a batch queue
Use the qsub command to
submit your job to the PPPL cluster for processing. Your job is
described by a batch script,
and it is this script that is submitted.
You will be returned a job id containing the job number. For example:
[sunfire05.pppl.gov|82] qsub batch_test
82029.bennu.pppl.gov
[sunfire05.pppl.gov|83] _
(note: the job number is returned from the PBS server bennu.pppl.gov.)
When the job is done, the standard output and error files (stdout,
stderr) will be left in the current working directory, i.e. the
directory from which you submitted the job. By default, the stdout file
is named <jobname>.o<jobid>
and the stderr file is named <jobname>.e<jobid>.
These names may be overridden.
Using PBS directives
PBS allows you to specify directives in your job
script that will customize your job. The format of a PBS directive
is:
#PBS <flag> [arguments]
where the string "#PBS" is NOT a comment, but rather a special string
used by PBS to denote a PBS directive. An example is the specification
of the job name:
#PBS -N myjob
Specifying the number of hosts on which to run
By default, the job will be run on a single host. However, you can
specify the use of multiple hosts (especially if you have a
parallelized program) by specifying a PBS directive; for example:
#PBS -l nodes=12:ppn=2
where 12 is the number of
nodes to run upon, and ppn is
the processors per node.
Job queue specification
By default, a job will be put into the dque, or default queue. The queue
selection can be overriden in your job script by specifying the PBS
directive:
#PBS -q [kestrel, kite, sque]
by selecting the queue to which you wish to submit your job, for
example:
#PBS -q sque
Specifying the
standard output and standard error files
By default, the standard output (stdout) and error files (stderr) are
named <jobname>.o<jobid>
and <jobname>.e<jobid>
respectively. These names may be overridden using the PBS directive:
#PBS -o myoutput.out
#PBS -e myerror.err
You can also specify that standard output and error be combined into
one file, whose default value is <jobname>.o<jobid>,
using the directive:
#PBS -j oe
Wall time
The amount of wall clock time needed to run the job may be specified by
a PBS directive
#PBS -l walltime=60:30:00
This wall clock time estimate (in this case, 60 hours and 30 minutes)
allows the scheduler to know when the systems you are using will be
available again. Your job will be terminated (via a kill -15 command)
when the wall time estimated is exceeded. So be generous but, for help
in reasonably accurate scheduling and load balancing estimates, not too
generous.
Using nodes with larger amounts of memory
Hosts in the PPPL cluster have at least 2GB of memory. However, at
least 80 hosts have 4GB of memory. These hosts are contained in the kestrel
and sque queues. To use these systems, you can specify the
"largemem" attribute. For example:
#PBS -l nodes=4:ppn=2:largemem
where 4 is the number of
nodes to run upon, ppn is the
processors per node, and largemem specifies that PBS should
only schedule the job to nodes with large memory sizes (typically 4GB).
A Generic batch script
Click here for a script which is a
generic batch job script that includes most common directives and
options used by PPPL jobs.
Some simple example
batch scripts
> vi
test.job
#!/bin/bash
# --- send the output to the test.out file
# the default is .o<jobid>
#PBS -o
test.out
# --- send the error output to the test.err file
# the default is .e<jobid>
#PBS -e test.err
echo "Print out the hostname and date"
/bin/hostname
/bin/date
exit 0
To submit this job, type:
> qsub test.job
This example job runs on multiple hosts:
#!/bin/bash
# --- send the job to 4 nodes, with 2 processors per node
#PBS -lnodes=4:ppn=2
# --- send the output to the test.out file
# the default is .o<jobid>
#PBS -o
test.out
# --- send the error output to the test.err file
# the default is .e<jobid>
#PBS -e test.err
# --- print out the list of nodes this job is running upon
/bin/cat $PBS_NODEFILE
echo "Print out the hostname and date"
/bin/hostname
/bin/date
exit 0