Olympus GPU User information

This page provides basic information for researchers using the Olympus cluster for CPU based research

Requirements to use GPU resources on Olympus.

  1. You will need PI approval to have your account enabled in the research QOS groups.

    1. Send an email to linux-engr-helpdesk@tamu.edu.

    2. Additional instructions on getting PI approval will be provided in the ticket.

  2. A scratch working directory will be setup when your access is approved. Your directory is mounted at /mnt/shared-scratch/<your-PI>/<your-netid>. THIS DIRECTORY IS NOT BACKED UP!

  3. You will also have access to your research groups network storage directory. This will be mounted at /mnt/research/<your-PI>. There is a Shared directory and Students/<your-netid> directory located here.

  4. If you are using X11 interactive programs, you will need an ssh/xwindows client on your computer.

    1. On windows systems, install MobaXTerm personal edition. 

    2. Putty and XMing are also an option for Windows users.

    3. On Macintosh install the XQuartz software.  Detailed instructions for accessing Olympus from off campus can be found here:

Graphical Applications on the Olympus Cluster and ECEN Interactive Machines from Off-Campus

How go login to Olympus

  1. Open MobaXTerm on windows or the terminal program on Mac

  2. ssh to olympus.ece.tamu.edu, i.e. ssh -Y <netid>@olympus.ece.tamu.edu (replace <netid> with your NetID)

  3. Log in using your NetID passwordHow to access GPU resources

IT IS EXTREMELY IMPORTANT THAT YOU ALLOCATE RESOURCES PROPERLY.

Do NOT leave interactive GPU sessions open if you are not actively using the session!

Each GPU node has 32 cores and 4 GPUs. for a total of 20 GPU’s. When requesting resources please select 8 cpus per GPU. If you are unsure of the processor/GPU requirements for your job, please contact the Linux helpdesk linux-engr-helpdesk@tamu.edu.

Deep/Machine Learning environments

For information on the differences between Anaconda and Signularity, see

Anaconda and Singularity basic information

In most cases Anaconda can provide the virtual environment needed. Be sure to install anaconda in either your /mnt/shared-scratch or /mnt/research/PI-name directory. You do not have enough space in your home directory for multiple Anaconda environments.

The following sites provide information on the installation and configuration of anaconda on Linux. Be sure to install anaconda in your /mnt/shared-scratch directory!

https://problemsolvingwithpython.com/01-Orientation/01.05-Installing-Anaconda-on-Linux/

When using tensorflow be sure to install the GPU enable versions.

A second option is to use a container. We use Singularity. This solution is required if you have software that requires a different distro (Ubuntu, Centos 8, etc. ). The setup/configuration of singularity containers is more involved than Anaconda.

https://tamuengr.atlassian.net/wiki/spaces/helpdesk/pages/1987510273

Interactive jobs

The following command will open an interactive shell on a GPU node. This shell can be used for either anaconda or singularity container development.

srun -p gpu --cpus-per-task=8 --gres=gpu:tesla:1 -J gpu-job1 --cpus-per-task=8 -q olympus-research-gpu-sh --pty --x11=first bash

-p gpu which partition to use

-J gpu-job1 - job name assigned in slurm

--cpus-per-task=8 - number of CPU cores to assign to job

-q olympus-research-gpu - choose the gpu qos that you have access to. It will be one of the following: ecen-ugrad-gpu, olympus-research-gpu, or olympus-research-gpu2. If you are unsure of which qos to use, please contact

--pty - connects its stdoutand stderr to your current session

--x11=first - needed if using X11 forwarding for graphic display. If only using terminal, this is not needed

--gres=gpu:1:tesla - Number and type of GPUS This uses one V100 GPUIf you would like to use the Nvidia A100 gpus use --gres=GPU:a100:1

Batch jobs

Batch jobs run in the background with no interactive shell. A script file is required to submit batch jobs to the scheduler. An example script file for a Matlab job would look like the following. The lines starting with #SBATCH are slurm command lines, not comments.

Example script using Anaconda:

#!/bin/sh #SBATCH --job-name=Gpu_Batch             # Job name #SBATCH -o singularity_test.out # output file name #SBATCH -e singularity_test.err # error file name #SBATCH --mail-type=ALL                  # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=your-email@tamu.edu  # Where to send mail #SBATCH --nodes=1                        # Use one node #SBATCH --ntasks=1                       # Run a single task #SBATCH --cpus-per-task=8                # Number of CPU cores per task #SBATCH --gres=gpu:tesla:1 # Type and number of GPUs #SBATCH --partition=gpu           # Partition/Queue to run in #SBATCH --qos=olympus-research-gpu # Set QOS to use #SBATCH --time=01:00:00                  # Time limit hrs:min:sec - set to 1 hour # enter your commands below # set woring directory if different than current directory # cd /working/directory # Start anaconda shell (if needed) # Activate conda environment # Run Python

Example script using singularity:

#!/bin/sh #SBATCH --job-name=Gpu_Batch             # Job name #SBATCH -o singularity_test.out # output file name #SBATCH -e singularity_test.err # error file name #SBATCH --mail-type=ALL                  # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=your-email@tamu.edu  # Where to send mail #SBATCH --nodes=1                        # Use one node #SBATCH --ntasks=1                       # Run a single task #SBATCH --cpus-per-task=8                # Number of CPU cores per task #SBATCH --gres=gpu:tesla:1 # Type and number of GPUs #SBATCH --partition=gpu           # Partition/Queue to run in #SBATCH --qos=olympus-research-gpu # Set QOS to use #SBATCH --time=01:00:00                  # Time limit hrs:min:sec - set to 1 hour # enter your commands below # set woring directory if different than current directory # cd /working/directory # Run singularity container singularity run --nv /mnt/shared-scratch/containers/cuda_10.2-devel-ubuntu18.04.sif ~/test-script.sh

You will need a second script file when using Singularity containers in batch mode. In this examples, the second script (test-script.sh) contains the commands that will be executed inside the Singularity container (/mnt/shared-scratch/containers/cuda_10.2-devel-ubuntu18.04.sif)