Olympus GPU User information
This page provides basic information for researchers using the Olympus cluster for CPU based research
Requirements to use GPU resources on Olympus.
You will need PI approval to have your account enabled in the research QOS groups.
Send an email to linux-engr-helpdesk@tamu.edu.
Additional instructions on getting PI approval will be provided in the ticket.
A scratch working directory will be setup when your access is approved. Your directory is mounted at /mnt/shared-scratch/<your-PI>/<your-netid>. THIS DIRECTORY IS NOT BACKED UP!
You will also have access to your research groups network storage directory. This will be mounted at /mnt/research/<your-PI>. There is a Shared directory and Students/<your-netid> directory located here.
If you are using X11 interactive programs, you will need an ssh/xwindows client on your computer.
On windows systems, install MobaXTerm personal edition.Â
Putty and XMing are also an option for Windows users.
On Macintosh install the XQuartz software. Detailed instructions for accessing Olympus from off campus can be found here:
Graphical Applications on the Olympus Cluster and ECEN Interactive Machines from Off-Campus
How go login to Olympus
Open MobaXTerm on windows or the terminal program on Mac
ssh to
olympus.ece.tamu.edu
, i.e.ssh -Y <netid>@olympus.ece.tamu.edu
(replace <netid> with your NetID)Log in using your NetID passwordHow to access GPU resources
IT IS EXTREMELY IMPORTANT THAT YOU ALLOCATE RESOURCES PROPERLY.
Do NOT leave interactive GPU sessions open if you are not actively using the session!
Each GPU node has 32 cores and 4 GPUs. for a total of 20 GPU’s. When requesting resources please select 8 cpus per GPU. If you are unsure of the processor/GPU requirements for your job, please contact the Linux helpdesk linux-engr-helpdesk@tamu.edu.
Deep/Machine Learning environments
For information on the differences between Anaconda and Signularity, see
Anaconda and Singularity basic information
In most cases Anaconda can provide the virtual environment needed. Be sure to install anaconda in either your /mnt/shared-scratch or /mnt/research/PI-name directory. You do not have enough space in your home directory for multiple Anaconda environments.
The following sites provide information on the installation and configuration of anaconda on Linux. Be sure to install anaconda in your /mnt/shared-scratch directory!
Installing Anaconda on Linux - Problem Solving with Python
When using tensorflow be sure to install the GPU enable versions.
A second option is to use a container. We use Singularity. This solution is required if you have software that requires a different distro (Ubuntu, Centos 8, etc. ). The setup/configuration of singularity containers is more involved than Anaconda.
Singularity Containers on Olympus GPU Nodes
Interactive jobs
The following command will open an interactive shell on a GPU node. This shell can be used for either anaconda or singularity container development.
srun -p gpu --cpus-per-task=8 --gres=gpu:tesla:1 -J gpu-job1 --cpus-per-task=8 -q olympus-research-gpu-sh --pty --x11=first bash
-p gpu which partition to use
-J gpu-job1 - job name assigned in slurm
--cpus-per-task=8 - number of CPU cores to assign to job
-q olympus-research-gpu - choose the gpu qos that you have access to. It will be one of the following: ecen-ugrad-gpu, olympus-research-gpu, or olympus-research-gpu2. If you are unsure of which qos to use, please contact
--pty - connects its stdout
and stderr
to your current session
--x11=first - needed if using X11 forwarding for graphic display. If only using terminal, this is not needed
--gres=gpu:1:tesla - Number and type of GPUS This uses one V100 GPUIf you would like to use the Nvidia A100 gpus use --gres=GPU:a100:1
Batch jobs
Batch jobs run in the background with no interactive shell. A script file is required to submit batch jobs to the scheduler. An example script file for a Matlab job would look like the following. The lines starting with #SBATCH are slurm command lines, not comments.
Example script using Anaconda:
#!/bin/sh
#SBATCH --job-name=Gpu_Batch       # Job name
#SBATCH -o singularity_test.out # output file name
#SBATCH -e singularity_test.err # error file name
#SBATCH --mail-type=ALLÂ Â Â Â Â Â Â Â Â # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=your-email@tamu.edu # Where to send mail
#SBATCH --nodes=1Â Â Â Â Â Â Â Â Â Â Â Â # Use one node
#SBATCH --ntasks=1 Â Â Â Â Â Â Â Â Â Â Â # Run a single task
#SBATCH --cpus-per-task=8Â Â Â Â Â Â Â Â # Number of CPU cores per task
#SBATCH --gres=gpu:tesla:1 # Type and number of GPUs
#SBATCH --partition=gpu      # Partition/Queue to run in
#SBATCH --qos=olympus-research-gpu # Set QOS to use
#SBATCH --time=01:00:00Â Â Â Â Â Â Â Â Â # Time limit hrs:min:sec - set to 1 hour
# enter your commands below
# set woring directory if different than current directory
# cd /working/directory
# Start anaconda shell (if needed)
# Activate conda environment
# Run Python
Example script using singularity:
#!/bin/sh
#SBATCH --job-name=Gpu_Batch       # Job name
#SBATCH -o singularity_test.out # output file name
#SBATCH -e singularity_test.err # error file name
#SBATCH --mail-type=ALLÂ Â Â Â Â Â Â Â Â # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=your-email@tamu.edu # Where to send mail
#SBATCH --nodes=1Â Â Â Â Â Â Â Â Â Â Â Â # Use one node
#SBATCH --ntasks=1 Â Â Â Â Â Â Â Â Â Â Â # Run a single task
#SBATCH --cpus-per-task=8Â Â Â Â Â Â Â Â # Number of CPU cores per task
#SBATCH --gres=gpu:tesla:1 # Type and number of GPUs
#SBATCH --partition=gpu      # Partition/Queue to run in
#SBATCH --qos=olympus-research-gpu # Set QOS to use
#SBATCH --time=01:00:00Â Â Â Â Â Â Â Â Â # Time limit hrs:min:sec - set to 1 hour
# enter your commands below
# set woring directory if different than current directory
# cd /working/directory
# Run singularity container
singularity run --nv /mnt/shared-scratch/containers/cuda_10.2-devel-ubuntu18.04.sif ~/test-script.sh
You will need a second script file when using Singularity containers in batch mode. In this examples, the second script (test-script.sh) contains the commands that will be executed inside the Singularity container (/mnt/shared-scratch/containers/cuda_10.2-devel-ubuntu18.04.sif)