...
Warning |
---|
sbatch: error: Batch job submission failed: Invalid account or account/partition combination |
Table of Contents | ||||
---|---|---|---|---|
|
Job Submission Script
The preferred method of submitting a job is by using a submission script. This section goes over the basics of writing a submission script as well as a few common examples.
...
The first line of the script, #!/bin/bash
, is called the shebang line. This line is required in all submission scripts. It It tells the compute nodes when the script is run what to use to interpret the commands in the script. In our case everyone’s default shell is bash, so we tell the compute nodes to use bash to interpret our commands.
The second part of the script is the lines beginning with #SBATCH
. These lines are the options that tell Slurm what type and how many resources you need. In this example, we first set the job name to “myappjob” myappjob. The The job name is just an easy way for you to view your job later. The The output and error parameters are the next two lines. These These setup the screen output and screen error redirects to the file you specify. This This is not your program output files, but rather what you would usually see on the screen if you ran your command without the scheduler. The The last parameter is the number of tasks my job needs. In In this example, I am requesting 2 tasks, which is essentially 2 cpu’sCPU’s. Please Please read the Tasks vs CPU’s section below for more information about tasks. Please Please read the Slurm Parameters section below for more information about SBATCH parameters.
The last part of this script is the actual commands needed to run myprogram. In In this example, I first load the mpi module. You You will need to load any modules your program needs before you actually run your program. I then call the program name, myapp, telling it to start running.
...
Not all jobs will fit into the use case of submitting a job script. For For example, when a user is needing to interact with the job, whether it be through a GUI, or through command line. In In these cases, an interactive job is the best way to go. The The command to start an interactive job is srun. The The following is a basic example of an interactive job:
...
In this example we are starting an interactive Bash session, or terminal session, with one task. This This will basically open up an SSH session with a compute node so we can run our commands on it. All the Slurm options listed above can be used with the srun command, with the addition of two additional parameters. The --pty is one of two additional options you can use with the srun command, and --x11 is the other command. Both Both are explained in the table below. The The format of the srun command is below:
...
Slurm Option | Description |
--pty | Allows for interacting with the submitted job. This This option sets up a pseudo terminal for which you will be interacting with |
--x11 | Enables X11 forwarding for your job. If If you plan on running a GUI application, you will need to use this option |
...
Slurm Option | Example | Description |
--error=<filename> | --error=errorout.txt | Redirect the screen error (standard error) to the specified file |
--output=<filename> | --output=screenout.txt | Redirect the screen output (standard out) to the specified file. This is NOT your programs output file |
--job-name=<jobname> | --job-name=myjob-name | A friendly name given to a job |
--mail-type=<type> | --mail-type=END,FAIL | Notifies the user when certain event types occur. Valid types are NONE, BEGIN, END, FAIL, REQUEUE, ALL. --mail-user but also be set |
--mail-user=<email> | --mail-user=netid@tamu.edu | User to receive email notifications of state changes defined by --mail-type |
--ntasks=<numtasks> | --ntasks=4 | The number of tasks needed. Please read the Tasks vs CPU’s section for more information |
--cpus-per-task=<num> | --cpus-per-task=20 | The number of cpus CPUs needed per task. Please read the Tasks vs CPU’s section for more information |
--partition=<partition_name> | --partition=large | Specifies the partition to submit your job to. |
--qos=<qos> | --qos=normal | Specifies the Quality of Service (QOS) your job should use |
Tasks
...
versus CPUs
Slurm has the concept of tasks and CPUs, which this section will help to explain the difference.
A task in Slurm is to be understood as a process. Therefore, a multi-process program is composed of several tasks. An An example of a multi-process program is MCNP, or any program using mpi. This This is because with mpi, multiple processes are spawned that communicate with each other. To To request these types of jobs, you need to use the --ntasks Slurm option.
However, a multithreaded program is a single processes that can use multiple CPUs. If you are running a program that uses several threads, but not processes, you will need to use the --cpus-per-task option. An An example of a multithreaded program is MATLAB. This This will allow a single task to be able to use more than one CPU.
Info |
---|
A task cannot be split across multiple compute nodes. So So requesting CPUs with --cpus-per-task will ensure that all CPUs are allocated on the same compute node. By By contrast, requesting the same amount of CPUs with the --ntasks option may result in several CPUs being allocated on different compute nodes. |
...