Skip Navigation

CPU and GPU Resources on the OSCER Supercomputer

CPU and GPU Resources on the OSCER Supercomputer

 

With funding from the OU Vice President for Research, the Data Institute for Societal Challenges has acquired a set of CPU and GPU nodes for the OSCER Supercomputer, as well as large-scale storage on OURdisk.  These resources are available for use by all members of DISC who are located on the OU campuses, including: faculty, researchers, postdocs, and students. 

 


Resources

Purchased resources (only some are available at this time):

  • 2 x 64-core CPU-only nodes with 2TB of RAM

  • 1 x Quad 80GB A100 nodes

  • 5 x Dual 80GB A100 nodes

  • 2 x Dual 80GB H100 nodes

  • 93 TB OURDisk storage


Getting Access

 


More Information

The queues associated with the standard DISC partitions will operate according to the typical OSCER policies, which take into account resource requests for specific jobs and recent resource utilization by specific users.  A key effect of these policies is that resource allocations are prioritized to balance available resources across users.  For the purposes of meeting deadlines for submission of papers, research proposals/reports, theses, or dissertations, users can request short-term, high priority access to the DISC resources.  Jobs in the high priority partitions will generally be executed before the standard partitions, though they will not interrupt currently executing jobs.

  • Proposals should include:
  • Project title and short narrative (1 paragraph)

  • Name and username of requester and (if applicable) name of supervisor

  • Resources requested: CPUs (number of processes and threads), GPUs (number), and memory footprint

  • Requested duration of the high priority access in units of months (typical will be 1-2 months)

  • The nature of the deadline that the user is working to meet

  • Short discussion of steps taken to optimize the use of the requested resources

Send proposals (pdf format) to: disc@ou.edu

 

 

Because these resources are shared by a large number of users, it is important for all of us to take steps to make efficient use of them.  Specific steps include:

  • Reserve appropriate amounts of memory and numbers of CPU threads

  • Reserve GPUs as part of your resource request & only use GPUs that have been explicitly assigned to you

  • Optimize use of allocated GPUs (our goal is that allocated GPUs be used at near 100% capacity)

The current partition state is as follows.  Note that not all nodes have been installed yet.

Partition Name

High Priority Name

Notes

Nodes

Threads Available

Memory Available

GPUs Available

Max LSCRATCH

disc

 

All DISC owned nodes

Curr: 12

n/a

n/a

n/a

n/a

disc_largemem

disc_largemem_hp

Not yet installed

2

c915-c916

128

2 TB

0

852 GB

disc_dual_a100

disc_dual_a100_hp

4/5 installed

GPU: 2xA100 80GB

5

c862-c866

128

500 GB

2

852 GB

disc_quad_a100

disc_quad_a100_hp

GPU: 4xA100 80GB

1

c856

128

1 TB

4

852 GB

disc_dual_h100

disc_dual_h100_hp

GPU: 2xH100 80GB

4

c849 … c852

128

500 GB

2

852 GB

Storage options include:

  • Your own home directory (~20 GB): /home/username

  • Temporary storage of data and results: /scratch/username.  Data here has a limited lifetime (~2 weeks) & will automatically be deleted
  • Medium term storage of data: 
    • DISC OURDisk partition: /ourdisk/hpc/disc/
    • This space is managed entirely by the users.  We expect that unneeded data will be removed in a timely fashion
    • Never ever place conda environments in OURDisk

 

 

Local Scratch Disk

Local scratch disks are high-speed storage (SSDs) that are part of each compute node & are dynamically created as a job is started.

  • The name of your assigned local scratch space is stored in the $LSCRATCH environment variable

  • The size of your local scratch is 852GB * cpus-per-task / available_threads

  • The local scratch is destroyed as soon as your job completes, so do not plan to permanently store results here

  • All 12 nodes are up
  • DISC OURdisk space is available by request

Batch files are used to specify the details of your executing jobs, including the resource request and the specific program to execute.  For the DISC nodes, your resource request should include the following lines (as applicable):

 

Select one of the disc partitions:

#SBATCH --partition=disc

 

Select a specific node or list of nodes to execute on (optional):

#SBATCH --nodelist=c851

 

Maximum physical memory to use (example: 15 gigabytes):

#SBATCH --mem=15G

 

Maximum number of threads your program will use (example: 10 threads):

#SBATCH --cpus-per-task=10

 

Number of GPUs (example: 2 GPUs):

#SBATCH --gpus-per-node=2

(never use GPUs without including this reservation, as you can otherwise disrupt other users)

 

 

For more information, please contact Chongle Pan at cpan@ou.edu and Andrew Fagg at andrewhfagg@gmail.com