{"type":"doc","content":[{"type":"paragraph","content":[{"text":"The Engineering clusters use ","type":"text"},{"text":"Slurm","type":"text","marks":[{"type":"link","attrs":{"href":"https://slurm.schedmd.com/"}}]},{"text":" as the job scheduler for reserving resources for job execution. This document provides an introduction into the most common options that are used to submit jobs to an engineering cluster.","type":"text"}]},{"type":"panel","attrs":{"panelType":"error"},"content":[{"type":"paragraph","content":[{"text":"sbatch: error: Batch job submission failed: Invalid account or account/partition combination","type":"text","marks":[{"type":"strong"}]},{"type":"hardBreak"},{"text":"If you see the above error when submitting a job in any of the instructions below, please send an email to ","type":"text"},{"text":"linux-engr-helpdesk@tamu.edu","type":"text","marks":[{"type":"link","attrs":{"href":"mailto:linux-engr-helpdesk@tamu.edu"}}]},{"text":" as this is a problem with the Slurm account.","type":"text"}]}]},{"type":"paragraph"},{"type":"extension","attrs":{"layout":"default","extensionType":"com.atlassian.confluence.macro.core","extensionKey":"toc","parameters":{"macroParams":{"style":{"value":"none"},"exclude":{"value":"Related articles"}},"macroMetadata":{"macroId":{"value":"42e99029-c705-415b-aaa6-f11dee86437e"},"schemaVersion":{"value":"1"},"title":"Table of Contents"}}}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Job Submission Script","type":"text"}]},{"type":"paragraph","content":[{"text":"The preferred method of submitting a job is by using a submission script. This section goes over the basics of writing a submission script as well as a few common examples. ","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Writing the script","type":"text"}]},{"type":"paragraph","content":[{"text":"A submission script is a Bash script, which can be viewed as a text document. Within the script, you will declare resources needed for the job to run as well as the commands needed to run your job. Below is a simple example of a job submission script.","type":"text"}]},{"type":"codeBlock","attrs":{"language":"shell"},"content":[{"text":"#!/bin/bash\n#SBATCH --job-name=myappjob\n#SBATCH --output=screenout.txt\n#SBATCH --error=screenerror.txt\n#SBATCH --ntasks=2\n\nmodule load mpi/openmpi-x86_64\n./myprogram","type":"text"}]},{"type":"paragraph","content":[{"text":"The first line of the script, ","type":"text"},{"text":"#!/bin/bash","type":"text","marks":[{"type":"code"}]},{"text":", is called the shebang line. This line is required in all submission scripts. It tells the compute nodes when the script is run what to use to interpret the commands in the script. In our case everyone’s default shell is bash, so we tell the compute nodes to use bash to interpret our commands. ","type":"text"}]},{"type":"paragraph","content":[{"text":"The second part of the script is the lines beginning with ","type":"text"},{"text":"#SBATCH","type":"text","marks":[{"type":"code"}]},{"text":". These lines are the options that tell Slurm what type and how many resources you need. In this example, we first set the job name to ","type":"text"},{"text":"myappjob","type":"text","marks":[{"type":"strong"}]},{"text":". The job name is just an easy way for you to view your job later. The output and error parameters are the next two lines. These setup the screen output and screen error redirects to the file you specify. This is not your program output files, but rather what you would usually see on the screen if you ran your command without the scheduler. The last parameter is the number of tasks my job needs. In this example, I am requesting 2 tasks, which is essentially 2 CPU’s. Please read the ","type":"text"},{"text":"Tasks vs CPU’s","type":"text","marks":[{"type":"link","attrs":{"href":"https://tamuengr.atlassian.net/wiki/spaces/helpdesk/pages/567509048/Scheduling+Jobs#Tasks-vs.-CPUs"}}]},{"text":" section below for more information about tasks. Please read the ","type":"text"},{"text":"Slurm Parameters","type":"text","marks":[{"type":"link","attrs":{"href":"https://tamuengr.atlassian.net/wiki/spaces/helpdesk/pages/567509048/Scheduling+Jobs#Slurm-Parameters"}}]},{"text":" section below for more information about SBATCH parameters.","type":"text"}]},{"type":"paragraph","content":[{"text":"The last part of this script is the actual commands needed to run ","type":"text"},{"text":"myprogram","type":"text","marks":[{"type":"strong"}]},{"text":". In this example, I first load the mpi module. You will need to load any modules your program needs before you actually run your program. I then call the program name, ","type":"text"},{"text":"myapp","type":"text","marks":[{"type":"strong"}]},{"text":", telling it to start running.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Submitting the script","type":"text"}]},{"type":"paragraph","content":[{"text":"Now that we have written the submission script, we will need to submit it to the Slurm scheduler for queuing. Assuming the submission file we created was called ","type":"text"},{"text":"myscript.job","type":"text","marks":[{"type":"code"}]},{"text":", we can submit it to the Slurm scheduler using the following command:","type":"text"}]},{"type":"codeBlock","content":[{"text":"sbatch myscript.job","type":"text"}]},{"type":"paragraph","content":[{"text":"You will be given a job id here, which you can use later to get detailed information about your job. ","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Interactive Job Submissions","type":"text"}]},{"type":"paragraph","content":[{"text":"Not all jobs will fit into the use case of submitting a job script. For example, when a user is needing to interact with the job, whether it be through a GUI, or through command line. In these cases, an interactive job is the best way to go. The command to start an interactive job is ","type":"text"},{"text":"srun","type":"text","marks":[{"type":"strong"}]},{"text":". The following is a basic example of an interactive job:","type":"text"}]},{"type":"codeBlock","content":[{"text":"srun --pty /bin/bash","type":"text"}]},{"type":"paragraph","content":[{"text":"In this example we are starting an interactive Bash session, or terminal session, with one task. This will basically open up an SSH session with a compute node so we can run our commands on it. All the Slurm options listed above can be used with the srun command, with the addition of two additional parameters. The ","type":"text"},{"text":"--pty","type":"text","marks":[{"type":"strong"}]},{"text":" is one of two additional options you can use with the srun command, and ","type":"text"},{"text":"--x11","type":"text","marks":[{"type":"strong"}]},{"text":" is the other command. Both are explained in the table below. The format of the srun command is below:","type":"text"}]},{"type":"codeBlock","content":[{"text":"srun [options...] ","type":"text"}]},{"type":"table","attrs":{"layout":"default"},"content":[{"type":"tableRow","content":[{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[340.0]},"content":[{"type":"paragraph","content":[{"text":"Slurm Option","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[340.0]},"content":[{"type":"paragraph","content":[{"text":"Description","type":"text","marks":[{"type":"strong"}]}]}]}]},{"type":"tableRow","content":[{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[340.0]},"content":[{"type":"paragraph","content":[{"text":"--pty","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[340.0]},"content":[{"type":"paragraph","content":[{"text":"Allows for interacting with the submitted job. This option sets up a pseudo terminal for which you will be interacting with","type":"text"}]}]}]},{"type":"tableRow","content":[{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[340.0]},"content":[{"type":"paragraph","content":[{"text":"--x11","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[340.0]},"content":[{"type":"paragraph","content":[{"text":"Enables X11 forwarding for your job. If you plan on running a GUI application, you will need to use this option","type":"text"}]}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Slurm Parameters","type":"text"}]},{"type":"paragraph","content":[{"text":"The following table is a list of common Slurm parameters, examples of how to use them, and a brief description of what they do. For a complete list of parameters, please refer to the ","type":"text"},{"text":"Slurm SBATCH documentation","type":"text","marks":[{"type":"link","attrs":{"href":"https://slurm.schedmd.com/sbatch.html"}}]},{"text":".","type":"text"}]},{"type":"table","attrs":{"layout":"default"},"content":[{"type":"tableRow","content":[{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"Slurm Option","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"Example","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"Description","type":"text","marks":[{"type":"strong"}]}]}]}]},{"type":"tableRow","content":[{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--error=","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--error=errorout.txt","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"Redirect the screen error (standard error) to the specified file","type":"text"}]}]}]},{"type":"tableRow","content":[{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--output=","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--output=screenout.txt","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"Redirect the screen output (standard out) to the specified file. This is NOT your programs output file","type":"text"}]}]}]},{"type":"tableRow","content":[{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--job-name=","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--job-name=myjob-name","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"A friendly name given to a job","type":"text"}]}]}]},{"type":"tableRow","content":[{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--mail-type=","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--mail-type=END,FAIL","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"Notifies the user when certain event types occur. Valid types are NONE, BEGIN, END, FAIL, REQUEUE, ALL. ","type":"text"},{"text":"--mail-user","type":"text","marks":[{"type":"strong"}]},{"text":" but also be set","type":"text"}]}]}]},{"type":"tableRow","content":[{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--mail-user=","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--mail-user=netid@tamu.edu","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"User to receive email notifications of state changes defined by ","type":"text"},{"text":"--mail-type","type":"text","marks":[{"type":"strong"}]}]}]}]},{"type":"tableRow","content":[{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--ntasks=","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--ntasks=4","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"The number of tasks needed. Please read the Tasks vs CPU’s section for more information","type":"text"}]}]}]},{"type":"tableRow","content":[{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--cpus-per-task=","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--cpus-per-task=20","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"The number of CPUs needed per task. Please read the Tasks vs CPU’s section for more information","type":"text"}]}]}]},{"type":"tableRow","content":[{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--partition=","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--partition=large","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"Specifies the partition to submit your job to.","type":"text"}]}]}]},{"type":"tableRow","content":[{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--qos=","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"--qos=normal","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[226.67]},"content":[{"type":"paragraph","content":[{"text":"Specifies the Quality of Service (QOS) your job should use","type":"text"}]}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Tasks versus CPUs","type":"text"}]},{"type":"paragraph","content":[{"text":"Slurm has the concept of tasks and CPUs, which this section will help to explain the difference. ","type":"text"}]},{"type":"paragraph","content":[{"text":"A task in Slurm is to be understood as a process. Therefore, a multi-process program is composed of several tasks. An example of a multi-process program is MCNP, or any program using mpi. This is because with mpi, multiple processes are spawned that communicate with each other. To request these types of jobs, you need to use the ","type":"text"},{"text":"--ntasks ","type":"text","marks":[{"type":"strong"}]},{"text":"Slurm option.","type":"text"}]},{"type":"paragraph","content":[{"text":"However, a multithreaded program is a single processes that can use multiple CPUs. If you are running a program that uses several threads, but not processes, you will need to use the ","type":"text"},{"text":"--cpus-per-task","type":"text","marks":[{"type":"strong"}]},{"text":" option. An example of a multithreaded program is MATLAB. This will allow a single task to be able to use more than one CPU.","type":"text"}]},{"type":"panel","attrs":{"panelType":"info"},"content":[{"type":"paragraph","content":[{"text":"A task cannot be split across multiple compute nodes. So requesting CPUs with ","type":"text"},{"text":"--cpus-per-task","type":"text","marks":[{"type":"strong"}]},{"text":" will ensure that all CPUs are allocated on the same compute node. By contrast, requesting the same amount of CPUs with the ","type":"text"},{"text":"--ntasks","type":"text","marks":[{"type":"strong"}]},{"text":" option may result in several CPUs being allocated on different compute nodes.","type":"text"}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Related articles","type":"text"}]},{"type":"paragraph","content":[{"type":"placeholder","attrs":{"text":"The content by label feature displays related articles automatically, based on labels you choose. To edit options for this feature, select the placeholder below and tap the pencil icon."}}]},{"type":"extension","attrs":{"layout":"default","extensionType":"com.atlassian.confluence.macro.core","extensionKey":"contentbylabel","parameters":{"macroParams":{"showLabels":{"value":"false"},"max":{"value":"5"},"spaces":{"value":"com.atlassian.confluence.content.render.xhtml.model.resource.identifiers.SpaceResourceIdentifier@d0f4455a"},"sort":{"value":"modified"},"showSpace":{"value":"false"},"reverse":{"value":"true"},"type":{"value":"page"},"cql":{"value":"label in ( \"slurm\" , \"hpc\" ) and type = \"page\" and space = \"helpdesk\""},"labels":{"value":"slurm hpc"}},"macroMetadata":{"macroId":{"value":"72fd526f-9bab-444b-80e2-4cebe4022f4a"},"schemaVersion":{"value":"4"},"title":"Content by Label"}}}},{"type":"bodiedExtension","attrs":{"layout":"default","extensionType":"com.atlassian.confluence.macro.core","extensionKey":"details","parameters":{"macroParams":{"hidden":{"value":"true"}},"macroMetadata":{"macroId":{"value":"64658789-fbd7-4ed0-8162-064b512a6cc8"},"schemaVersion":{"value":"1"},"title":"Page Properties"}}},"content":[{"type":"table","attrs":{"layout":"default"},"content":[{"type":"tableRow","content":[{"type":"tableHeader","attrs":{"colspan":1,"rowspan":1,"colwidth":[1193.0]},"content":[{"type":"paragraph","content":[{"text":"Related issues","type":"text"}]}]},{"type":"tableCell","attrs":{"colspan":1,"rowspan":1,"colwidth":[573.0]},"content":[{"type":"paragraph"}]}]}]}]}],"version":1}

Browser not supported