Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Warning

sbatch: error: Batch job submission failed: Invalid account or account/partition combination
If you see the above error when submitting a job in any of the instructions below, please send an email to linux-engr-helpdesk@tamu.edu as this is a problem with the slurm Slurm account.

Table of Contents
stylenone
excludeRelated articles

Job Submission Script

The preferred method of submitting a job is by using a submission script. This section goes over the basics of writing a submission script as well as a few common examples.

...

The first line of the script, #!/bin/bash, is called the shebang line. This line is required in all submission scripts.  It  It tells the compute nodes when the script is run what to use to interpret the commands in the script.  In our case everyone’s default shell is bash, so we tell the compute nodes to use bash to interpret our commands.  

The second part of the script is the lines beginning with #SBATCH. These lines are the options that tell Slurm what type and how many resources you need.  In this example, we first set the job name to “myappjob” myappjob.  The  The job name is just an easy way for you to view your job later.  The  The output and error parameters are the next two lines.  These  These setup the screen output and screen error redirects to the file you specify.  This  This is not your program output files, but rather what you would usually see on the screen if you ran your command without the scheduler.  The  The last parameter is the number of tasks my job needs.  In  In this example, I am requesting 2 tasks, which is essentially 2 cpu’sCPU’s.  Please  Please read the Tasks vs CPU’s section below for more information about tasks.  Please  Please read the Slurm Parameters section below for more information about SBATCH parameters.

The last part of this script is the actual commands needed to run my program myprogram.  In  In this example, I first load the mpi module.  You  You will need to load any modules your program needs before you actually run your program.  I then call the program name, myapp, telling it to start running.

...

Code Block
sbatch myscript.job

And that’s it!  We have now submitted our job to the queue.  You will be given a job id here, which you can use later to get detailed information about your job. 

...

Not all jobs will fit into the use case of submitting a job script.  For  For example, when a user is needing to interact with the job, whether it be through a GUI, or through command line.  In  In these cases, an interactive job is the best way to go.  The  The command to start an interactive job is srun.  The  The following is a basic example of an interactive job:

...

In this example we are starting an interactive Bash session, or terminal session, with one task.  This  This will basically open up an SSH session with a compute node so we can run our commands on it.   All the Slurm options listed above can be used with the srun command, with the addition of two additional parameters.  The --pty is one of two additional options you can use with the srun command, and --x11 is the other command.  Both  Both are explained in the table below.  The  The format of the srun command is below:

...

Slurm Option

Description

--pty

Allows for interacting with the submitted job.  This  This option sets up a pseudo terminal for which you will be interacting with

--x11

Enables X11 forwarding for your job.  If  If you plan on running a GUI application, you will need to use this option

...

The following table is a list of common Slurm parameters, examples of how to use them, and a brief description of what they do. For a complete list of parameters, please refer to the Slurm SBATCH documtationdocumentation.

Slurm Option

Example

Description

--error=<filename>

--error=errorout.txt

Redirect the screen error (standard error) to the specified file

--output=<filename>

--output=screenout.txt

Redirect the screen output (standard out) to the specified file.  This is NOT your programs output file

--job-name=<jobname>

--job-name=myjob-name

A friendly name given to a job

--mail-type=<type>

--mail-type=END,FAIL

Notifies the user when certain event types occur.  Valid types are NONE, BEGIN, END, FAIL, REQUEUE, ALL.  --mail-user but also be set

--mail-user=<email>

--mail-user=netid@tamu.edu

User to receive email notifications of state changes defined by --mail-type

--ntasks=<numtasks>

--ntasks=4

The number of tasks needed.  Please read the Tasks vs CPU’s section for more information

--cpus-per-task=<num>

--cpus-per-task=20

The number of cpus CPUs needed per task.  Please read the Tasks vs CPU’s section for more information

--partition=<partition_name>

--partition=large

Specifies the partition to submit your job to.

--qos=<qos>

--qos=normal

Specifies the Quality of Service (QOS) your job should use

Tasks

...

versus CPUs

Slurm has the concept of tasks and CPUs, which this section will help to explain the difference.  

A task in Slurm is to be understood as a process.   Therefore, a multi-process program is composed of several tasks.  An  An example of a multi-process program is MCNP, or any program using mpi.  This  This is because with mpi, multiple processes are spawned that communicate with each other.  To  To request these types of jobs, you need to use the --ntasks Slurm option.

However, a multithreaded program is a single processes that can use multiple CPUs.   If you are running a program that uses several threads, but not processes, you will need to use the --cpus-per-task option.  An  An example of a multithreaded program is MATLAB.  This  This will allow a single task to be able to use more than one CPU.

Info

A task cannot be split across multiple compute nodes.  So  So requesting CPUs with --cpus-per-task will ensure that all CPUs are allocated on the same compute node.  By  By contrast, requesting the same amount of CPUs with the --ntasks option may result in several CPUs being allocated on different compute nodes.

...

Filter by label (Content by label)
showSpace
showLabelsfalse
max5
spacescom.atlassian.confluence.content.render.xhtml.model.resource.identifiers.SpaceResourceIdentifier@d0f4455a
falsesortmodified
typeshowSpacepagefalse
reversetrue
labelstypeslurm hpcpage
cqllabel in ( "slurm" , "hpc" ) and type = "page" and space = "helpdesk"
labelsslurm hpc
Page Properties
hiddentrue

Related issues