Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

This document details how to use the ECEN Olympus cluster to remotely access software used in academic Linux labs and for research.

What is the Cluster

The Olympus cluster consists of the login node (olympus.ece.tamu.edu), eight non-GPU compute nodes and five GPU compute nodes.    The cluster has software that ensures users receive the resources needed for their labs by distributing users across the compute nodes based on their course requirements. There is limited software installed on the Olympus head node.

Nodes 1-5  Poweredge 730XD - dual Xeon(R) CPU E5-2650 v3 - 20 cores (40 with HT) per node, 256GB RAM

100 core total

Nodes 6-8  Poweredge R6525- Dual AMD EPYC 7443 - 48 cores (96 with HT) per node, 256GB RAM

144 core total

Nodes 9-11 Poweredge C4140 - Dual Xeon(R) Gold 6130 - 32 cores (64 with HT) per node, 196GB RAM, 4 Tesla V100’s per node

96 core and 12 V100 total

Nodes 12-13 PowerEdge R750xa - Dual Xeon(R) Gold 6326 - 32 cores (64 with HT) per node, 256GB RAM, 4 Tesla A100’s per node

64 core and 8 A100 total

Cluster Usage Limitations

To assure resources are available to all students, the following limitations are enforced.

Nodes are grouped into partitions.  The following partitions are configured.

CPU Nodes: nodes 1-8.  Nodes 1-5 have academic priority (jobs will run on these nodes first)

CPU-RESEARCH:  Nodes 6-8 research jobs will run on these nodes - requires PI approval

GPU:  nodes 9-13 for coursework and research - requires PI approval

Resource allocation is set using Quality of Server (qos) in slurm. 

QOS name

Hardware Limits

Default Time Limits

Hard Time Limit

Partition

Ugrad (academic)

4 cpu cores

12 hours

12 hours

CPU

Grad (academic)

6 cpu cores

12 hours

12 hours

CPU

Research

12 cpu cores

48 hours

48 hours

CPU-Research

Ecen-ugrad-gpu

8 cpu, 1gpu

36 hours

36 hours

GPU

Olympus-research-gpu

32 cpu, 4gpu

4 days

4days

GPU

Olympus-research-gpu2

160 cpu/20 gpu

7 days

21 days

GPU

Link for Academic (ECEN lab users). Link for Research users

Non-GPU limitations:

  1. Undergraduate users (academic)

    1. are allowed two simultaneous interactive sessions on the non-GPU compute nodes.  Users can log in to Olympus using ssh with two different sessions and run the proper load-ecen-### command in each ssh session. 

    2. Each interactive session is limited to a maximum of 12 hours.

  2. Graduate Users (academic)

    1. are allowed to use up to eight cores on the non-GPU compute nodes.  Users can log in to Olympus using ssh with four different sessions and run the proper load-ecen-### command in each ssh session. 

    2. Each interactive session is limited to a maximum of 12 hours.

  3. Research Users

    1. are allowed to use up to 10 cores on the non-GPU compute nodes. 

    2. Each job is limited to a maximum of 48 hours

GPU Limitations

GPU nodes are available for faculty and students for approved instructional and research use. If you need GPU access please have your professor contact the Linux support team. 

  1. Undergraduate Users

    1. are limited to using 8 cpu cores and 1 gpu

  2. Graduate/ Research Users

    1. are limited to using a total of 32 cpu cores and 4 gpus

How to Use the Cluster

Requirements to Login to Olympus.

  1. You will need an ssh/xwindows client on your computer.

    1. On windows systems, install MobaXTerm personal edition. 

    2. Putty and XMing are also an option for Windows users.

    3. On Macintosh install the XQuartz software.  Detailed instructions for accessing Olympus from off campus can be found here:

Graphical Applications on the Olympus Cluster and ECEN Interactive Machines from Off-Campus

How go login to Olympus

  1. Open MobaXTerm on windows or the terminal program on Mac

  2. ssh to olympus.ece.tamu.edu, i.e. ssh -Y netid@olympus.ece.tamu.edu (replace netid with your NetID)

  3. Log in using your NetID password

  4. For non-gpu academic users, you will need to connect to an available compute node.  Enter the proper load-ecen-###command at the prompt and hit return. The command that you will run depends on which course you are taking. The following are valid commands:

    1. load-ecen-248

    2. load-ecen-350

    3. load-ecen-403

    4. load-ecen-425

    5. load-ecen-449

    6. load-ecen-454

    7. load-ecen-468

    8. load-ecen-474

    9. load-ecen-475

    10. load-ecen-620

    11. load-ecen-625

    12. load-ecen-651

    13. load-ecen-655

    14. load-ecen-676

    15. load-ecen-680

    16. load-ecen-704

    17. load-ecen-714

    18. load-ecen-720

    19. load-ecen-749

  5. Source the same file that you use in the Zachry Linux Labs.

  6. For CPU research users the following interactive load commands are available.

    1. load-2core - creates a 2 core job on a cpu node

    2. load-4core - creates a 4 core job on a cpu node

  7. For GPU users see instructions below on setting up containers using Singularity. Singularity is similar to Docker and allow you to create custom environments for your gpu jobs. These environments include using different versions of Linux inside the container.

Instructions for Using Singularity Containers for GPU and specialty programs on Olympus

Singularity Containers on Olympus GPU Nodes

Once you have set up your environment and debugged your environment/programs in the interactive gpu session, you can submit a job to run in batch mode. 

How to start a non-interactive (batch)

These jobs run in the background on the cluster and do not require an active terminal session once submitted.  

The GPU queue has the following limitations:  

  1. Maximum of 8 CPU cores per job

  2. Maximum of 1 GPU per job

  3. Maximum of 1 Job running per user.  You can queue multiple jobs in the system.

  4. Maximum runtime of 36 hours per job

Jobs are submitted using a script file.  An example script file is located at:

/mnt/lab_files/ECEN403-404/submit-gpu.sh

This file has comment lines detailing what each command does.  Copy this file to your home directory and update it to match your virtual environment and program. Once this has been done, submit the script to the scheduler using the command: sbatch name_of_shell_file.sh. If you did not change the name of the script file, the command would be sbatch submit-gpu.sh. You can check the status of your job using the command qstat or squeue.

You can observe the progress of your job by checking the log files that are generated.  These files are updated as your program runs.

  • No labels