Page Comparison

This document details how to use the ECEN Olympus cluster and how to use it to remotely access Linux software used in the ECEN Zachry academic Linux labs and gpu resources.for research.

What is the Olympus Cluster

The Olympus cluster consists of the login node (olympus.ece.tamu.edu), six eight non-GPU compute nodes and five GPU compute nodes. The cluster has software that ensures users receive the resources needed for their labs and research by distributing users' jobs across the compute nodes based on their course requirements.

Table of Contents

...

the user’s requirements. There is limited software installed on the Olympus head node.

Five Nodes - Poweredge 730XD - dual Xeon E5-2650 v3 - 20 cores (40 with HT) with 256GB RAM

100 core total

Three Nodes - Poweredge R6525- Dual AMD EPYC 7443 - 48 cores (96 with HT) with 256GB RAM

144 core total

Three Nodes - Poweredge C4140 - Dual Xeon Gold 6130 - 32 cores (64 with HT) with 196GB RAM, 4 Tesla V100’s per node

96 core and 12 Nvidia V100 total

Two Nodes - PowerEdge R750xa - Dual Xeon Gold 6326 - 32 cores (64 with HT) with 256GB RAM, 4 Ampere A100’s per node

64 core and 8 Nvidia A100 total

Cluster Configuration and Usage Limitations

To assure resources are available to all students, the following limitations are enforced.

Each user is allowed two simultaneous interactive sessions on the non-GPU compute nodes. Users can log in to Olympus using ssh with two different sessions and run the proper load-ecen-### command in each ssh session.
Each interactive session is limited to a maximum of 12 hours.

How to Use the Cluster

On windows systems, install MobaXTerm personal edition. On Macintosh install the XQuartz software. Putty and XMing are also an option for Windows users. Detailed instructions for accessing Olympus from off campus can be found here:

Graphical Applications on the Olympus Cluster and ECEN Interactive Machines from Off-Campus

if you are on campus, you can connect directly to Olympus from your personal computer.

Open MobaXTerm on windows or the terminal program on Mac
ssh to olympus.ece.tamu.edu, i.e. ssh -Y netid@olympus.ece.tamu.edu (replace netid with your NetID)
Log in using your NetID password
Next, you will need to connect to an available compute node. Enter the proper load-ecen-###command at the prompt and hit return. The command that you will run depends on which course you are taking. The following are valid commands:
1. load-ecen-248
2. load-ecen-350
3. load-ecen-403
4. load-ecen-403
5. load-ecen-425
6. load-ecen-449
7. load-ecen-454
8. load-ecen-468
9. load-ecen-474
10. oad-ecen-475
11. load-ecen-651
12. load-ecen-655
13. load-ecen-676
14. load-ecen-704
15. load-ecen-714

16. load-ecen-749

5. Source the same file that you use in the Zachry Linux Labs.

How to start a non-interactive (batch)

Once you have set up your environment and debugged your programs in the interactive session, you can submit a job to run in batch mode. These jobs run in the background on the cluster and do not require an active terminal session once submitted.

The GPU queue has the following limitations:

Maximum of 8 CPU cores per job
Maximum of 1 GPU per job
Maximum of 1 Job running per user. You can queue multiple jobs in the system.
Maximum runtime of 36 hours per job

Jobs are submitted using a script file. An example script file is located at:

/mnt/lab_files/ECEN403-404/submit-gpu.sh

This file has comment lines detailing what each command does. Copy this file to your home directory and update it to match your virtual environment and program. Once this has been done, submit the script to the scheduler using the command: sbatch name_of_shell_file.sh. If you did not change the name of the script file, the command would be sbatch submit-gpu.sh. You can check the status of your job using the command qstat or squeue.

You can observe the progress of your job by checking the log files that are generated. These files are updated as your program runs.

Instructions for Using Singularity Containers for GPU and specialty programs on Olympus

Singularity Containers on Olympus GPU Nodes Nodes are grouped into partitions. The following partitions are configured.

CPU: Eight nodes - Five nodes have academic priority (academic jobs will run on these nodes first)

CPU-RESEARCH: Three nodes - research jobs will run on these nodes - requires PI approval for access

GPU: Five nodes for projects and research - requires PI/Faculty approval for access

Resource allocation is set using Quality of Service groups (qos) in slurm.

QOS name	Hardware Limits	Default Time Limits	Hard Time Limit	Partition
olympus-academic	6 cpu cores	12 hours	12 hours	academic
olympus-cpu-research	144 cpu cores	48 hours	7 days	cpu-research
olympus-ugrad-gpu	8 cpu, 1gpu	36 hours	36 hours	gpu-research
olympus-research-gpu-sh	16 cpu 2gpu	12 hours	12 hours	gpu-research
olympus-research-gpu	32 cpu, 4gpu	4 days	4 days	gpu-research
olympus-research-gpu2	160 cpu 20 gpu	7 days	14 days	gpu-research

QOS Uses –

olympus-academic – access to acadmic partition for courses with Linux requirements.

olympus-cpu-research – access to cpu-research partition

olympus-ugrad-gpu – undergraduate access to gpu-research partition

olympus-research-gpu – access to the gpu-research partition

olympus-research-gpu-sh – interactive job access to gpu-research partition

olympus-research2 -unlimited access to gpu-research partition, special case use

Versions Compared

Old Version 3

New Version Current

Key

What is the Olympus Cluster

Cluster Configuration and Usage Limitations

How to Use the Cluster

How to start a non-interactive (batch)

Instructions for Using Singularity Containers for GPU and specialty programs on Olympus

Academic users - instructions for using Olympus

Research - Olympus CPU User Information

Research - Olympus GPU User Information