Table of Contents
- Basic SLURM Usage
- Environment Modules - LMOD
- Running Jobs on Sooner (EL9/EL7 Containers)
- Sample Batch Scripts
- Monitoring Jobs
- Job Arrays
Basic SLURM Usage
All jobs run on the HPC system must be submitted through the SLURM batch scheduling system. To run a job:
1) Create a batch script that specifies the resources required (e.g. CPU, memory, runtime, ect.) and the commands needed to execute your job.
2) Submit the batch script using sbatch <script_name>
3) View your submitted jobs with: squeue -u <username>
Below are some common SLURM commands used to manipulate jobs:
- scancel <jobID> cancels a job (running or pending)
- scontrol hold <jobID> places a pending job on hold
- scontrol release <jobID> releases a job from hold
- scontrol update JobID=<jobID> <options> modifies job attributes (e.g. job name, partition)
For more details, see SLURM references or run man <command>.
Environment Modules (LMOD)
Both Sooner and Schooner provide a wide range of software, compilers, and libraries. Multiple versions of the same software may be available. LMOD allows you to manage your environment by loading only the modules you need. Modules use the following naming format:<application_name>/<application_version>-<compiler_type>
Example:Python/3.13.5-GCCcore-14.3.0
Example: MATLAB/2025a (no compiler specified)
Accessing Modules on Sooner
To use modules on Sooner, enter a EL9 container by typing el9. After configuring your environment, exit the container before submitting jobs by typing exit. Note that you can access modules that were installed on Schooner (EL7) by typing el7.
Some commands to interact with the modules are as follows:
- module avail lists all available modules. Modules marked with (D) are the default versions and are automatically loaded if a version isn't specified.
- module load <application> loads the default version of a module and its dependencies. Typing module load <application_name>/<application_version>-<compiler_type> will load that specific module and its dependencies
- module list displays the currently loaded modules.
- module unload <application> removes a specific module from your environment
- module purge removes all loaded modules.
Running Jobs on Sooner
Sooner uses containerized environments based on Enterprise Linux 9 (EL9) and Enterprise Linux 7 (EL7) to ensure compatibility with software originally built on Schooner (EL7).
Before submitting jobs on Sooner, ensure you are NOT inside a container. If the module command is available (e.g. module list runs without a "command not found" error), you are still inside a container. Exit by typing exit.
Sample Batch Scripts
A SLURM job script typically consists of two parts: resource requests and job step specification. Requesting resources involves indicating the number of CPUs, memory, runtime, and other such requirements using #SBATCH directives. Job steps specify the commands that will run on the compute node
Serial Job Example
The example below shows a simple serial (non-parallel) job on Sooner. It runs the hostname command to display the name of the node where the job executes.
You can create or edit this script using any text editor. If you do not have a preferred editor, nano is a simple option for beginners. Script filenames are flexible, but using a .sbatch extension is recommended to distinguish them from other shell scripts. In this example, single-test.sbatch requests a single core, 1GB of RAM, and 1 hour runtime.
#!/bin/bash
#SBATCH --partition=sooner_test
#SBATCH --container=el9hw
#SBATCH --ntasks=1
#SBATCH --cput-per-task=1
#SBATCH --mem=1GB
#SBATCH --time=01:00:00
#SBATCH --job-name=jobname
#SBATCH --output=jobname_%J_stdout.txt
#SBATCH --error=jobname_%J_stderr.txt
#SBATCH --mail-user=youremailaddress@yourinstitution.edu
#SBATCH --mail-type=ALL
#SBATCH --chdir=/path/to/working/directory
# Load modules and set environment
module purge
module load Python/3.13.5-GCCcore-14.3.0
# Display the node where the job is running
echo "Job started at: $(date) on $(hostname)"
# Run your command
python script.py
After you have saved this file -- here called single-test.sbatch -- you will need to make it executable with the command
chmod +x single-test.sbatch
And then you can submit your job with
sbatch single-test.sbatch
Explanation of SBATCH Directives
The following directives control how your job is scheduled and executed:
#SBATCH --partition=sooner_test
Specifies the partition (queue) where the job will run. In most cases, use sooner_test. For a list of all available partitions, see this link.
#SBATCH --ntasks=1
Requests one task (CPU core). This configuration runs a serial (non-parallel) job.
#SBATCH --mem=1GB
Specified the amount of memory required per node (default unit is MB). Accurate memory requests help the scheduler allocate resources efficiently and avoid oversubscription.
#SBATCH --output=jobname_%J_stdout.txt
#SBATCH --error=jobname_%J_stderr.txt
Defines files for standard output and error messages. %J is replaced with the job ID at runtime.
#SBATCH --time=01:00:00
Sets the maximum runtime (HH:MM:SS). Most partitions allow up to 48 hours, specified as 48:00:00 or 2-00:00:00..
#SBATCH --job-name=jobname
Assigns a name to the job. This name appears in squeue.
#SBATCH --mail-user=youremailaddress@yourinstitution.edu
Indicates the e-mail address to send notifications to.
#SBATCH --mail-type=ALL
Notifies when the job begins, ends, or fails. If this directive is omitted, notifications are typically sent only on failure.
#SBATCH --chdir=/home/$USER/working_directory
Sets the directory where the job will run. This directory must exist before submitting the job.
echo "Job started at: $(date) on $(hostname)"
python script.py
Thiese commands print the time of day and name of the compute node running the job and runs a python script. Replace this with your actual application, script, or workflow.
Parallel Job Example
The batch script below can be used for a simple parallel job on Sooner. It runs on 40 MPI processes with 20 MPI processes per node.
You can download the sample code from Wikipedia and save it as mpi_example.c in your working directory.
Compile it using the following commands:
module load OpenMPI/5.0.8-GCC-14.3.0
mpicc mpi_example.c -o hello.mpi
Now run the following code. See the non-parallel job example above for more information about batch script naming and submission:
#!/bin/bash
#SBATCH --partition=sooner_test
#SBATCH --container=el9hw
#SBATCH --exclusive
#SBATCH --nodes=2
#SBATCH --mem=0
#SBATCH --time=00:10:00
#SBATCH --ntasks=40
#SBATCH --ntasks-per-node=20
#SBATCH --job-name=mpi_job
#SBATCH --output=jobname_%J_stdout.txt
#SBATCH --error=jobname_%J_stderr.txt
#SBATCH --chdir=/path/to/working/directory
# Load modules and set environment
module purge
module load OpenMPI/5.0.8-GCC-14.3.0
# Run your MPI program
mpirun hello.mpi
Explanation of SBATCH directives
#SBATCH --exclusive
Requests exclusive access to the participating compute nodes. This prevents other users’ jobs from running on the same nodes and reduces resource contention that could affect performance.
#SBATCH --nodes=2
#SBATCH --ntasks=40
#SBATCH --ntasks-per-node=20
Requests two compute nodes and a total of 40 MPI tasks, with 20 MPI tasks running on each node. This example uses 20 MPI tasks per node to evenly distribute processes across the allocated nodes. Different node types may have different core counts (for example, some nodes have 24 cores), so ntasks-per-node should generally match the available cores on the nodes being used.
module purge
module load OpenMPI
These commands reset the module environment and then loads OpenMPI. In practice, load the modules needed to execute the program.
mpirun hello.mpi
This command executes the hello.mpi file we compiled earlier.
Upon successfully completing your job, your output file should look like this:
We have 40 processes. Process 1 reporting for duty.
Process 2 reporting for duty.
Process 3 reporting for duty.
Process 4 reporting for duty.
Process 5 reporting for duty.
Process 6 reporting for duty.
Process 7 reporting for duty.
Process 8 reporting for duty.
Process 9 reporting for duty.
Process 10 reporting for duty.
Process 11 reporting for duty.
Process 12 reporting for duty.
Process 13 reporting for duty.
Process 14 reporting for duty.
Process 15 reporting for duty.
Process 16 reporting for duty.
Process 17 reporting for duty.
Process 18 reporting for duty.
Process 19 reporting for duty.
Process 20 reporting for duty.
Process 21 reporting for duty.
Process 22 reporting for duty.
Process 23 reporting for duty.
Process 24 reporting for duty.
Process 25 reporting for duty.
Process 26 reporting for duty.
Process 27 reporting for duty.
Process 28 reporting for duty.
Process 29 reporting for duty.
Process 30 reporting for duty.
Process 31 reporting for duty.
Process 32 reporting for duty.
Process 33 reporting for duty.
Process 34 reporting for duty.
Process 35 reporting for duty.
Process 36 reporting for duty.
Process 37 reporting for duty.
Process 38 reporting for duty.
Process 39 reporting for duty.
GPU Job Example
The sooner_gpu_test partition contains GPU cards that can provide huge accelleration for certain parallel computing tasks, via CUDA frameworks. Access to GPU nodes is available for free to OSCER users.
To request a GPU, the #SBATCH --gres=gpu:<count> SLURM directive should be used in your job submission. The scheduler will automatically select a GPU node based on availability and other resource parameters. If specified, you can request a specific GPU with #SBATCH --gres=gpu:<GPU_type>:<count>.
Below is an template batch script that can be used to request 1 GPU with 1 CPU core and 2GB RAM.
#!/bin/bash
#SBATCH --partition=sooner_gpu_test
#SBATCH --container=el9hw
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --ntasks-per-cpu=1
#SBATCH --gres=gpu:1
#SBATCH --mem=2GB
#SBATCH --time=02:00:00
#SBATCH --chdir=/path/to/working/directory
./run_code.sh
Monitoring Jobs
We have developed a utility called memprofile that enables real-time monitoring of CPU, GPU, and memory usage.
- Load the Oscer module in your batch script (for example: module load Oscer/0.2.0)
- Add the following command to your batch script: memprofile
By default, once the job begins running, memprofile records resource statistics every 5 seconds and writes them to the job’s error log. You can run memprofile -h to view additional options, including specifying a different output file for logging or adjusting the logging interval.
The output format is still being refined, but it already provides useful information about resource utilization. Since this utility is currently under active development, we welcome any feedback or suggestions for improvement.
Information on jobs (squeue)
The squeue command provides information about jobs running on Schooner or Sooner in the following format: Job ID, Partition, Name, User, Job State, Time, Nodes, Reason.
Typing squeue lists all current (and pending) jobs in the queue.
Typing squeue -u <username> lists all jobs in the queue for the specified user.
Typing squeue -u <username> -t PENDING lists all the pending jobs for the specified user.
Typing squeue -u <username> -t RUNNING lists all the running jobs for the specified user.
More information about the squeue command can be found on the SLURM online documentation or by typing man squeue on Schooner or Sooner.
Partition Information (sinfo)
The sinfo command provides information about the partitions (or queues) you have access to on Schooner or Sooner. The information is displayed in the following format: Partition Name, Availability, Time Limit, Nodes, State, Node List.
Typing sinfo provides information on all the queues you are assigned access to.
Typing sinfo -p <partition_name> provides information for the specified queue
More information about the sinfo command can be found on the SLURM online documentation or by typing man sinfo on Schooner or Sooner.
Debugging Jobs (scontrol)
The scontrol command can be used to get configuration details about a job, node, or partition, and is especially useful for debugging jobs.
Typing scontrol show job <job_number> gives details about the particular job.
Typing scontrol show partition <partition_name> provides configuration details (for example: priority, allowed accounts, allowed memory per node, etc.) for all available partitions.
More information about the scontrol command can be found on the SLURM online documentation or by typing man scontrol on Schooner or Sooner.
Using Job Arrays
Job arrays allow you to use SLURM to create multiple jobs from a single SLURM script. Instead of creating several nearly identical batch scripts, you submit a single script and SLURM launches multiple tasks automatically.
Job arrays work best when:
- Jobs are independent of one another
- The same program is run with different input files or arguments
- You are performing parameter sweeps or processing datasets in parallel
- Tasks require similar resources (runtime, memory, CPUs/GPUs)
- Jobs do not require MPI communication across tasks
For additional information, see the SLURM job support page.
Basic Example
The script below creates 90 tasks. Each task receives a unique index through $SLURM_ARRAY_TASK_ID, which can be used to select different input files or parameters.
#!/bin/bash
#SBATCH --partition=sooner_test
#SBATCH --container=el9hw
#SBATCH --ntasks=1
#SBATCH --cput-per-task=1
#SBATCH --mem=1GB
#SBATCH --array=1-90
#SBATCH --time=01:00:00
#SBATCH --job-name=array_example
#SBATCH --output=array_%A_%a.out
#SBATCH --error=array_%A_%a.err
#SBATCH --chdir=/path/to/working/directory
# Load modules and set environment
module purge
module load Python/3.13.5-GCCcore-14.3.0
echo "Array job ID: $SLURM_ARRAY_JOB_ID"
echo "Task ID: $SLURM_ARRAY_TASK_ID"
# Example: use the task ID as an input number
python my_script.py input_${SLURM_ARRAY_TASK_ID}.txt
When the job is submitted with sbatch job_array.sh, SLURM launches 90 tasks:
python my_script.py input_1.txt
python my_script.py input_2.txt
python my_script.py input_3.txt
...
python my_script.py input_90.txt
Each tasks runs independently and writes spearate output files with the SLURM directive #SBATCH --output=array_%A_%a.out:
array_12345_1.out
array_12345_2.out
...
where:
- %A = master array jobID
- %a = task index within the array
Array Options
#SBATCH --array=1-15
Runs tasks: 1, 2, 3, ..., 15
#SBATCH --array=2,4,6
Runs tasks: 2, 4, 6
#SBATCH --array=10-20:2
Runs tasks: 10, 12, 14, 16, 18, 20
You can limit the number of simultaneously running tasks using the % separator:
#SBATCH --array=1-100%10
This submits 100 tasks but limits execution to 10 running at a time.
Useful Environment Variables
| SLURM Environment Variable | Description |
|---|---|
| SLURM_ARRAY_JOB_ID | Parent array job ID |
| SLURM_ARRAY_TASK_ID | Current task index |
| SLURM_ARRAY_TASK_COUNT | Total number of tasks |

