Olympus Cluster Information

Olympus Cluster Information

This document details the ECEN Olympus cluster and how to use it to remotely access Linux software used in academic Linux labs and for research.

What is the Olympus Cluster

The Olympus cluster consists of the login node (olympus.ece.tamu.edu), 21 non-GPU compute nodes and six GPU compute nodes.    The cluster has software that ensures users receive the resources needed for their labs and research by distributing users' jobs across the compute nodes based on the user’s requirements. There is limited software installed on the Olympus login node. The login node is not to used for cpu intensive or long running jobs.

CPU resources

  • 14 nodes - HPE DL385 Gen10 - Dual AMD 7F72 - 48 cores(96 with HT) with 1TB RAM

672 cores

  • Three nodes -  Poweredge R6525- Dual AMD EPYC 7443 - 48 cores (96 with HT) with 256GB RAM

144 core

  • Four nodes - Dual Xeon E5-2697A - 32 cores (64 with HT) with 512GB RAM

144 Cores

There are total of 960 physical cores available for CPU jobs.

GPU resources

  • Three nodes - Poweredge C4140 - Dual Xeon Gold 6130 - 32 cores (64 with HT) with 196GB RAM, 4 Tesla V100’s per node

96 core and 12 Nvidia V100 total

  • Two nodes - PowerEdge R750xa - Dual Xeon Gold 6326 - 32 cores (64 with HT) with 256GB RAM, 4 Ampere A100’s per node

64 core and 8 Nvidia A100 total

  • One node - Mercury GPU208- EPYC 9575F - 64 cores (128 with HT) with 1152GB RAM, two H200 GPUS

64 core and 2 Nvidia H100 total

There are a total of 22 GPU’s: 12 - V100s, 8 - A100s and 2 - H200s

Cluster Configuration and Usage Limitations

To assure resources are available to all students, the following limitations are enforced. Nodes are grouped into partitions.  The following partitions are configured.

CPU: 24 nodes -  three nodes have academic priority (academic jobs will run on these nodes first)

CPU-RESEARCH:  21 nodes - research jobs will run on these nodes - requires PI/Faculty approval for access

GPU:  Six nodes for projects and research - requires PI/Faculty approval for access

Resource allocation is set using Quality of Service groups (qos) in slurm. 

QOS name

Hardware Limits

Default Time Limits

Hard Time Limit

Partition

olympus-academic

6 cpu cores

12 hours

12 hours

academic

olympus-cpu-research

none

48 hours

7 days

cpu-research

olympus-cesg

none

none

none

cesg*

olympus-ugrad-gpu

8 cpu, 1gpu

36 hours

36 hours

gpu-research or

gpu-research-sh

olympus-research-gpu-sh

16 cpu 2gpu

12 hours

12 hours

gpu-research-sh

olympus-research-gpu

32 cpu, 4gpu

4 days

4 days

gpu-research

olympus-research-gpu2

none

7 days

14 days

gpu-research

QOS Uses –

olympus-academic – access to acadmic partition for courses with Linux requirements.

olympus-cpu-research – access to cpu-research partition

olympus-ugrad-gpu – undergraduate access to gpu-research partition (sbatch) or gpu-research-sh (interactive)

olympus-research-gpu – access to the gpu-research partition for sbatch jobs only

olympus-research-gpu-sh – interactive job access to gpu-research partition

olympus-research2 -unlimited access to gpu-research partition, special case use

cesg - unlimited access to 4 cesg nodes in the cluster. access restricted to CESG users.

Useful commands: