Usage examples

Submitting your job to a batch queue

Use the qsub command to submit your job to the PPPL cluster for processing. Your job is described by a batch script, and it is this script that is submitted.

You will be returned a job id containing the job number. For example:
[sunfire05.pppl.gov|82] qsub batch_test
82029.phoenix.pppl.gov

[sunfire05.pppl.gov|83] _

(note: the job number is returned from the PBS server phoenix.)

When the job is done, the standard output and error files (stdout, stderr) will be left in the current working directory, i.e. the directory from which you submitted the job. By default, the stdout file is named <jobname>.o<jobid> and the stderr file is named <jobname>.e<jobid>. These names may be overridden.

Using PBS directives

PBS allows you to specify directives in your job script that will customize your job. The format of a PBS directive is:
#PBS <flag> [arguments]
where the string "#PBS" is NOT a comment, but rather a special string used by PBS to denote a PBS directive. An example is the specification of the job name:
#PBS -N myjob

Specifying the number of hosts on which to run

By default, the job will be run on a single host. However, you can specify the use of multiple hosts (especially if you have a parallelized program) by specifying a PBS directive; for example:
 #PBS -l nodes=12:ppn=2
where 12 is the number of nodes to run upon, and ppn is the processors per node.

Job queue specification

By default, a job will be put into the dque, or default queue. The queue selection can be overriden in your job script by specifying the PBS directive:
#PBS -q [kestrel, kite, sque]
by selecting the queue to which you wish to submit your job, for example:
#PBS -q sque

Specifying the standard output and standard error files

By default, the standard output (stdout) and error files (stderr) are named <jobname>.o<jobid> and <jobname>.e<jobid> respectively. These names may be overridden using the PBS directive:
#PBS -o myoutput.out
#PBS -e myerror.err
You can also specify that standard output and error be combined into one file, whose default value is <jobname>.o<jobid>, using the directive:

#PBS -j oe

Wall time

The amount of wall clock time needed to run the job may be specified by a PBS directive
#PBS -l walltime=60:30:00
This wall clock time estimate (in this case, 60 hours and 30 minutes) allows the scheduler to know when the systems you are using will be available again. Your job will be terminated (via a kill -15 command) when the wall time estimated is exceeded. So be generous but, for help in reasonably accurate scheduling and load balancing estimates, not too generous.

Using nodes with larger amounts of memory

Hosts in the PPPL cluster have at least 2GB of memory. However, at least 80 hosts have 4GB of memory. These hosts are contained in the kestrel and sque queues. To use these systems, you can specify the "largemem" attribute. For example:
 #PBS -l nodes=4:ppn=2:largemem
where 4 is the number of nodes to run upon, ppn is the processors per node, and largemem specifies that PBS should only schedule the job to nodes with large memory sizes (typically 4GB).

A Generic batch script

Click here for a script which is a generic batch job script that includes most common directives and options used by PPPL jobs.

Some simple example batch scripts

> vi test.job

#!/bin/bash

# --- send the output to the test.out file
#     the default is .o<jobid>

#PBS -o test.out
# --- send the error output to the test.err file
#     the default is .e<jobid>
#PBS -e test.err

echo "Print out the hostname and date"
/bin/hostname
/bin/date
exit 0

To submit this job, type:

> qsub test.job


This example job runs on multiple hosts:

#!/bin/bash
# --- send the job to 4 nodes, with 2 processors per node
#PBS -lnodes=4:ppn=2
# --- send the output to the test.out file
#     the default is .o<jobid>

#PBS -o test.out
# --- send the error output to the test.err file
#     the default is .e<jobid>
#PBS -e test.err

# --- print out the list of nodes this job is running upon
/bin/cat $PBS_NODEFILE

echo "Print out the hostname and date"
/bin/hostname
/bin/date
exit 0