Olympus Cluster Information
This document details the ECEN Olympus cluster and how to use it to remotely access Linux software used in academic Linux labs and for research.
What is the Olympus Cluster
The Olympus cluster consists of the login node (olympus.ece.tamu.edu), 21 non-GPU compute nodes and six GPU compute nodes. The cluster has software that ensures users receive the resources needed for their labs and research by distributing users' jobs across the compute nodes based on the user’s requirements. There is limited software installed on the Olympus login node. The login node is not to used for cpu intensive or long running jobs.
CPU resources
14 nodes - HPE DL385 Gen10 - Dual AMD 7F72 - 48 cores(96 with HT) with 1TB RAM
672 cores
Three nodes - Poweredge R6525- Dual AMD EPYC 7443 - 48 cores (96 with HT) with 256GB RAM
144 core
Four nodes - Dual Xeon E5-2697A - 32 cores (64 with HT) with 512GB RAM
144 Cores
There are total of 960 physical cores available for CPU jobs.
GPU resources
Three nodes - Poweredge C4140 - Dual Xeon Gold 6130 - 32 cores (64 with HT) with 196GB RAM, 4 Tesla V100’s per node
96 core and 12 Nvidia V100 total
Two nodes - PowerEdge R750xa - Dual Xeon Gold 6326 - 32 cores (64 with HT) with 256GB RAM, 4 Ampere A100’s per node
64 core and 8 Nvidia A100 total
One node - Mercury GPU208- EPYC 9575F - 64 cores (128 with HT) with 1152GB RAM, two H200 GPUS
64 core and 2 Nvidia H100 total
There are a total of 22 GPU’s: 12 - V100s, 8 - A100s and 2 - H200s
Cluster Configuration and Usage Limitations
To assure resources are available to all students, the following limitations are enforced. Nodes are grouped into partitions. The following partitions are configured.
CPU: 24 nodes - three nodes have academic priority (academic jobs will run on these nodes first)
CPU-RESEARCH: 21 nodes - research jobs will run on these nodes - requires PI/Faculty approval for access
GPU: Six nodes for projects and research - requires PI/Faculty approval for access
Resource allocation is set using Quality of Service groups (qos) in slurm.
QOS name | Hardware Limits | Default Time Limits | Hard Time Limit | Partition |
olympus-academic | 6 cpu cores | 12 hours | 12 hours | academic |
olympus-cpu-research | none | 48 hours | 7 days | cpu-research |
olympus-cesg | none | none | none | cesg* |
olympus-ugrad-gpu | 8 cpu, 1gpu | 36 hours | 36 hours | gpu-research or gpu-research-sh |
olympus-research-gpu-sh | 16 cpu 2gpu | 12 hours | 12 hours | gpu-research-sh |
olympus-research-gpu | 32 cpu, 4gpu | 4 days | 4 days | gpu-research |
olympus-research-gpu2 | none | 7 days | 14 days | gpu-research |
QOS Uses –
olympus-academic – access to acadmic partition for courses with Linux requirements.
olympus-cpu-research – access to cpu-research partition
olympus-ugrad-gpu – undergraduate access to gpu-research partition (sbatch) or gpu-research-sh (interactive)
olympus-research-gpu – access to the gpu-research partition for sbatch jobs only
olympus-research-gpu-sh – interactive job access to gpu-research partition
olympus-research2 -unlimited access to gpu-research partition, special case use
cesg - unlimited access to 4 cesg nodes in the cluster. access restricted to CESG users.
Useful commands: