Submitting Jobs

Skip Side Navigation

Quick Links

Getting Started

Request an OSCER Account

Support Articles

support@oscer.ou.edu

Basic SLURM Usage
Environment Modules - LMOD
- Accessing Modules on Sooner
Running Jobs on Sooner
Sample Batch Scripts
Monitoring Jobs
Job Arrays

Basic SLURM Usage

All jobs run on the HPC system must be submitted through the SLURM batch scheduling system. To run a job:

1) Create a batch script that specifies the resources required (e.g. CPU, memory, runtime, ect.) and the commands needed to execute your job.

2) Submit the batch script using sbatch <script_name>

3) View your submitted jobs with: squeue -u <username>

Below are some common SLURM commands used to manipulate jobs:

scancel <jobID> cancels a job (running or pending)
scontrol hold <jobID> places a pending job on hold
scontrol release <jobID> releases a job from hold
scontrol update JobID=<jobID> <options> modifies job attributes (e.g. job name, partition)

For more details, see SLURM references or run man <command>.

Environment Modules (LMOD)

Both Sooner and Schooner provide a wide range of software, compilers, and libraries. Multiple versions of the same software may be available. LMOD allows you to manage your environment by loading only the modules you need. Modules use the following naming format:<application_name>/<application_version>-<compiler_type>

Example:Python/3.13.5-GCCcore-14.3.0

Example: MATLAB/2025a (no compiler specified)

Accessing Modules on Sooner

Some commands to interact with the modules are as follows:

module avail lists all available modules. Modules marked with (D) are the default versions and are automatically loaded if a version isn't specified.
module load <application> loads the default version of a module and its dependencies. Typing module load <application_name>/<application_version>-<compiler_type> will load that specific module and its dependencies
module list displays the currently loaded modules.
module unload <application> removes a specific module from your environment
module purge removes all loaded modules.

Running Jobs on Sooner

Sooner uses containerized environments based on Enterprise Linux 9 (EL9) and Enterprise Linux 7 (EL7) to ensure a smooth transition away from the older Schooner supercomputer.

Sample Batch Scripts

A SLURM job script typically consists of two parts: resource requests and job step specification. Requesting resources involves indicating the number of CPUs, memory, runtime, and other such requirements using #SBATCH directives. Job steps specify the commands that will run on the compute node

Serial Job Example

The example below shows a simple serial (non-parallel) job on Sooner. It runs the hostname command to display the name of the node where the job executes.

You can create or edit this script using any text editor. If you do not have a preferred editor, nano is a simple option for beginners. Script filenames are flexible, but using a .sbatch extension is recommended to distinguish them from other shell scripts. In this example, single-test.sbatch requests a single core, 1GB of RAM, and 1 hour runtime.

#!/bin/bash

#SBATCH --partition=normal
#SBATCH --container=el9hw
#SBATCH --ntasks=1
#SBATCH --cput-per-task=1
#SBATCH --mem=1GB
#SBATCH --time=01:00:00
#SBATCH --job-name=jobname
#SBATCH --output=jobname_%J_stdout.txt
#SBATCH --error=jobname_%J_stderr.txt
#SBATCH --mail-user=youremailaddress@yourinstitution.edu
#SBATCH --mail-type=ALL
#SBATCH --chdir=/path/to/working/directory

# Load modules and set environment
module purge
module load Python/3.13.5-GCCcore-14.3.0

# Display the node where the job is running
echo "Job started at: $(date) on $(hostname)"

# Run your command
python script.py

After you have saved this file -- here called single-test.sbatch -- you will need to make it executable with the command

chmod +x single-test.sbatch

And then you can submit your job with

sbatch single-test.sbatch

Explanation of SBATCH Directives

The following directives control how your job is scheduled and executed:

#SBATCH --partition=sooner_test

Specifies the partition (queue) where the job will run. In most cases, use sooner_test. For a list of all available partitions, see this link.

#SBATCH --ntasks=1

Requests one task (CPU core). This configuration runs a serial (non-parallel) job.

#SBATCH --mem=1GB

Specified the amount of memory required per node (default unit is MB). Accurate memory requests help the scheduler allocate resources efficiently and avoid oversubscription.

#SBATCH --output=jobname_%J_stdout.txt
#SBATCH --error=jobname_%J_stderr.txt

Defines files for standard output and error messages. %J is replaced with the job ID at runtime.

#SBATCH --time=01:00:00

Sets the maximum runtime (HH:MM:SS). Most partitions allow up to 48 hours, specified as 48:00:00 or 2-00:00:00..

#SBATCH --job-name=jobname

Assigns a name to the job. This name appears in squeue.

#SBATCH --mail-user=youremailaddress@yourinstitution.edu

Indicates the e-mail address to send notifications to.

#SBATCH --mail-type=ALL

Notifies when the job begins, ends, or fails. If this directive is omitted, notifications are typically sent only on failure.

#SBATCH --chdir=/home/$USER/working_directory

Sets the directory where the job will run. This directory must exist before submitting the job.

echo "Job started at: $(date) on $(hostname)"
python script.py

Thiese commands print the time of day and name of the compute node running the job and runs a python script. Replace this with your actual application, script, or workflow.

Parallel Job Example

The batch script below can be used for a simple parallel job on Sooner. It runs on 40 MPI processes with 20 MPI processes per node.

You can download the sample code from Wikipedia and save it as mpi_example.c in your working directory.

Compile it using the following commands:

module load OpenMPI/5.0.8-GCC-14.3.0
mpicc mpi_example.c -o hello.mpi

Now run the following code. See the non-parallel job example above for more information about batch script naming and submission:

#!/bin/bash

#SBATCH --partition=normal
#SBATCH --container=el9hw
#SBATCH --exclusive
#SBATCH --nodes=2
#SBATCH --mem=0
#SBATCH --time=00:10:00
#SBATCH --ntasks=40
#SBATCH --ntasks-per-node=20
#SBATCH --job-name=mpi_job
#SBATCH --output=jobname_%J_stdout.txt
#SBATCH --error=jobname_%J_stderr.txt
#SBATCH --chdir=/path/to/working/directory

# Load modules and set environment
module purge
module load OpenMPI/5.0.8-GCC-14.3.0

# Run your MPI program
mpirun hello.mpi

Explanation of SBATCH directives

#SBATCH --exclusive

Requests exclusive access to the participating compute nodes. This prevents other users’ jobs from running on the same nodes and reduces resource contention that could affect performance.

#SBATCH --nodes=2
#SBATCH --ntasks=40
#SBATCH --ntasks-per-node=20

Requests two compute nodes and a total of 40 MPI tasks, with 20 MPI tasks running on each node. This example uses 20 MPI tasks per node to evenly distribute processes across the allocated nodes. Different node types may have different core counts (for example, some nodes have 24 cores), so ntasks-per-node should generally match the available cores on the nodes being used.

For ntasks <= cores per node, set ntasks-per-node equal to ntasks.

For ntasks > cores per node, set ntasks-per-node equal to the number of available cores per node.

module purge
module load OpenMPI

These commands reset the module environment and then loads OpenMPI. In practice, load the modules needed to execute the program.

mpirun hello.mpi

This command executes the hello.mpi file we compiled earlier.

Upon successfully completing your job, your output file should look like this:

We have 40 processes. Process 1 reporting for duty.
Process 2 reporting for duty.
Process 3 reporting for duty.
Process 4 reporting for duty.
Process 5 reporting for duty.
Process 6 reporting for duty.
Process 7 reporting for duty.
Process 8 reporting for duty.
Process 9 reporting for duty.
Process 10 reporting for duty.
Process 11 reporting for duty.
Process 12 reporting for duty.
Process 13 reporting for duty.
Process 14 reporting for duty.
Process 15 reporting for duty.
Process 16 reporting for duty.
Process 17 reporting for duty.
Process 18 reporting for duty.
Process 19 reporting for duty.
Process 20 reporting for duty.
Process 21 reporting for duty.
Process 22 reporting for duty.
Process 23 reporting for duty.
Process 24 reporting for duty.
Process 25 reporting for duty.
Process 26 reporting for duty.
Process 27 reporting for duty.
Process 28 reporting for duty.
Process 29 reporting for duty.
Process 30 reporting for duty.
Process 31 reporting for duty.
Process 32 reporting for duty.
Process 33 reporting for duty.
Process 34 reporting for duty.
Process 35 reporting for duty.
Process 36 reporting for duty.
Process 37 reporting for duty.
Process 38 reporting for duty.
Process 39 reporting for duty.

GPU Job Example

The gpu partition contains GPU cards that can provide huge accelleration for certain parallel computing tasks, via CUDA frameworks. Access to GPU nodes is available for free to OSCER users.

To request a GPU, the #SBATCH --gres=gpu:<count> SLURM directive should be used in your job submission. The scheduler will automatically select a GPU node based on availability and other resource parameters. If specified, you can request a specific GPU with #SBATCH --gres=gpu:<GPU_type>:<count>.

Below is an template batch script that can be used to request 1 GPU with 1 CPU core and 2GB RAM.

#!/bin/bash

#SBATCH --partition=gpu
#SBATCH --container=el9hw
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --ntasks-per-cpu=1
#SBATCH --gres=gpu:1
#SBATCH --mem=2GB
#SBATCH --time=02:00:00
#SBATCH --chdir=/path/to/working/directory

./run_code.sh

Monitoring Jobs

We have developed a utility called memprofile that enables real-time monitoring of CPU, GPU, and memory usage.

Load the Oscer module in your batch script (for example: module load Oscer/0.2.0)
Add the following command to your batch script: memprofile

By default, once the job begins running, memprofile records resource statistics every 5 seconds and writes them to the job’s error log. You can run memprofile -h to view additional options, including specifying a different output file for logging or adjusting the logging interval.

The output format is still being refined, but it already provides useful information about resource utilization. Since this utility is currently under active development, we welcome any feedback or suggestions for improvement.

Information on jobs (squeue)

The squeue command provides information about jobs running on Schooner or Sooner in the following format: Job ID, Partition, Name, User, Job State, Time, Nodes, Reason.

Typing squeue lists all current (and pending) jobs in the queue.
Typing squeue -u <username> lists all jobs in the queue for the specified user.
Typing squeue -u <username> -t PENDING lists all the pending jobs for the specified user.
Typing squeue -u <username> -t RUNNING lists all the running jobs for the specified user.

More information about the squeue command can be found on the SLURM online documentation or by typing man squeue on Schooner or Sooner.

Partition Information (sinfo)

The sinfo command provides information about the partitions (or queues) you have access to on Schooner or Sooner. The information is displayed in the following format: Partition Name, Availability, Time Limit, Nodes, State, Node List.

Typing sinfo provides information on all the queues you are assigned access to.
Typing sinfo -p <partition_name> provides information for the specified queue

More information about the sinfo command can be found on the SLURM online documentation or by typing man sinfo on Schooner or Sooner.

Debugging Jobs (scontrol)

The scontrol command can be used to get configuration details about a job, node, or partition, and is especially useful for debugging jobs.

Typing scontrol show job <job_number> gives details about the particular job.
Typing scontrol show partition <partition_name> provides configuration details (for example: priority, allowed accounts, allowed memory per node, etc.) for all available partitions.

More information about the scontrol command can be found on the SLURM online documentation or by typing man scontrol on Schooner or Sooner.

Using Job Arrays

Job arrays allow you to use SLURM to create multiple jobs from a single SLURM script. Instead of creating several nearly identical batch scripts, you submit a single script and SLURM launches multiple tasks automatically.

Job arrays work best when:

Jobs are independent of one another
The same program is run with different input files or arguments
You are performing parameter sweeps or processing datasets in parallel
Tasks require similar resources (runtime, memory, CPUs/GPUs)
Jobs do not require MPI communication across tasks

For additional information, see the SLURM job support page.

Basic Example

The script below creates 90 tasks. Each task receives a unique index through $SLURM_ARRAY_TASK_ID, which can be used to select different input files or parameters.

#!/bin/bash

#SBATCH --partition=normal
#SBATCH --container=el9hw
#SBATCH --ntasks=1
#SBATCH --cput-per-task=1
#SBATCH --mem=1GB
#SBATCH --array=1-90
#SBATCH --time=01:00:00
#SBATCH --job-name=array_example
#SBATCH --output=array_%A_%a.out
#SBATCH --error=array_%A_%a.err
#SBATCH --chdir=/path/to/working/directory

# Load modules and set environment
module purge
module load Python/3.13.5-GCCcore-14.3.0

echo "Array job ID: $SLURM_ARRAY_JOB_ID"
echo "Task ID: $SLURM_ARRAY_TASK_ID"

# Example: use the task ID as an input number
python my_script.py input_${SLURM_ARRAY_TASK_ID}.txt

When the job is submitted with sbatch job_array.sh, SLURM launches 90 tasks:

python my_script.py input_1.txt
python my_script.py input_2.txt
python my_script.py input_3.txt
...
python my_script.py input_90.txt

Each tasks runs independently and writes spearate output files with the SLURM directive #SBATCH --output=array_%A_%a.out:

array_12345_1.out
array_12345_2.out
...

where:

%A = master array jobID
%a = task index within the array

Array Options

#SBATCH --array=1-15

Runs tasks: 1, 2, 3, ..., 15

#SBATCH --array=2,4,6

Runs tasks: 2, 4, 6

#SBATCH --array=10-20:2

Runs tasks: 10, 12, 14, 16, 18, 20

You can limit the number of simultaneously running tasks using the % separator:

#SBATCH --array=1-100%10

This submits 100 tasks but limits execution to 10 running at a time.

Useful Environment Variables

SLURM Environment Variable	Description
SLURM_ARRAY_JOB_ID	Parent array job ID
SLURM_ARRAY_TASK_ID	Current task index
SLURM_ARRAY_TASK_COUNT	Total number of tasks