Scheduling Jobs

The Engineering clusters use Slurm as the job scheduler for reserving resources for job execution. This document provides an introduction into the most common options that are used to submit jobs to an engineering cluster.

sbatch: error: Batch job submission failed: Invalid account or account/partition combination
If you see the above error when submitting a job in any of the instructions below, please send an email to linux-engr-helpdesk@tamu.edu as this is a problem with the Slurm account.

 

Job Submission Script

The preferred method of submitting a job is by using a submission script. This section goes over the basics of writing a submission script as well as a few common examples.

Writing the script

A submission script is a Bash script, which can be viewed as a text document. Within the script, you will declare resources needed for the job to run as well as the commands needed to run your job. Below is a simple example of a job submission script.

#!/bin/bash #SBATCH --job-name=myappjob #SBATCH --output=screenout.txt #SBATCH --error=screenerror.txt #SBATCH --ntasks=2 module load mpi/openmpi-x86_64 ./myprogram

The first line of the script, #!/bin/bash, is called the shebang line. This line is required in all submission scripts. It tells the compute nodes when the script is run what to use to interpret the commands in the script.  In our case everyone’s default shell is bash, so we tell the compute nodes to use bash to interpret our commands.  

The second part of the script is the lines beginning with #SBATCH. These lines are the options that tell Slurm what type and how many resources you need.  In this example, we first set the job name to myappjob. The job name is just an easy way for you to view your job later. The output and error parameters are the next two lines. These setup the screen output and screen error redirects to the file you specify. This is not your program output files, but rather what you would usually see on the screen if you ran your command without the scheduler. The last parameter is the number of tasks my job needs. In this example, I am requesting 2 tasks, which is essentially 2 CPU’s. Please read the Tasks vs CPU’s section below for more information about tasks. Please read the Slurm Parameters section below for more information about SBATCH parameters.

The last part of this script is the actual commands needed to run myprogram. In this example, I first load the mpi module. You will need to load any modules your program needs before you actually run your program.  I then call the program name, myapp, telling it to start running.

Submitting the script

Now that we have written the submission script, we will need to submit it to the Slurm scheduler for queuing. Assuming the submission file we created was called myscript.job, we can submit it to the Slurm scheduler using the following command:

sbatch myscript.job

You will be given a job id here, which you can use later to get detailed information about your job. 

Interactive Job Submissions

Not all jobs will fit into the use case of submitting a job script. For example, when a user is needing to interact with the job, whether it be through a GUI, or through command line. In these cases, an interactive job is the best way to go. The command to start an interactive job is srun. The following is a basic example of an interactive job:

srun --pty /bin/bash

In this example we are starting an interactive Bash session, or terminal session, with one task. This will basically open up an SSH session with a compute node so we can run our commands on it. All the Slurm options listed above can be used with the srun command, with the addition of two additional parameters.  The --pty is one of two additional options you can use with the srun command, and --x11 is the other command. Both are explained in the table below. The format of the srun command is below:

Slurm Option

Description

--pty

Allows for interacting with the submitted job. This option sets up a pseudo terminal for which you will be interacting with

--x11

Enables X11 forwarding for your job. If you plan on running a GUI application, you will need to use this option

Slurm Parameters

The following table is a list of common Slurm parameters, examples of how to use them, and a brief description of what they do. For a complete list of parameters, please refer to the Slurm SBATCH documentation.

Slurm Option

Example

Description

--error=<filename>

--error=errorout.txt

Redirect the screen error (standard error) to the specified file

--output=<filename>

--output=screenout.txt

Redirect the screen output (standard out) to the specified file.  This is NOT your programs output file

--job-name=<jobname>

--job-name=myjob-name

A friendly name given to a job

--mail-type=<type>

--mail-type=END,FAIL

Notifies the user when certain event types occur.  Valid types are NONE, BEGIN, END, FAIL, REQUEUE, ALL.  --mail-user but also be set

--mail-user=<email>

--mail-user=netid@tamu.edu

User to receive email notifications of state changes defined by --mail-type

--ntasks=<numtasks>

--ntasks=4

The number of tasks needed.  Please read the Tasks vs CPU’s section for more information

--cpus-per-task=<num>

--cpus-per-task=20

The number of CPUs needed per task.  Please read the Tasks vs CPU’s section for more information

--partition=<partition_name>

--partition=large

Specifies the partition to submit your job to.

--qos=<qos>

--qos=normal

Specifies the Quality of Service (QOS) your job should use

Tasks versus CPUs

Slurm has the concept of tasks and CPUs, which this section will help to explain the difference.  

A task in Slurm is to be understood as a process. Therefore, a multi-process program is composed of several tasks. An example of a multi-process program is MCNP, or any program using mpi. This is because with mpi, multiple processes are spawned that communicate with each other. To request these types of jobs, you need to use the --ntasks Slurm option.

However, a multithreaded program is a single processes that can use multiple CPUs. If you are running a program that uses several threads, but not processes, you will need to use the --cpus-per-task option. An example of a multithreaded program is MATLAB. This will allow a single task to be able to use more than one CPU.

A task cannot be split across multiple compute nodes. So requesting CPUs with --cpus-per-task will ensure that all CPUs are allocated on the same compute node. By contrast, requesting the same amount of CPUs with the --ntasks option may result in several CPUs being allocated on different compute nodes.