Running Jobs on Schooner
Table of Contents
- Schooner's Job Scheduler - SLURM
- Documentation
- Translating to SLURM commands from other workload managers
- Basic SLURM Commands
- Schooner's Environment Module System - LMOD
- Sample Jobs
Schooner's Job Scheduler - SLURM
Schooner uses SLURM to manage jobs on the cluster. The Simple Linux Utility for Resource Management (SLURM) is an open-source, scalable cluster management and job scheduling system, and is used on about 60% of the largest compute clusters in the world.
Learning SLURM
Online: Official SLURM documentation
On Schooner: Use the man command to learn more about the commands.
Example: Typing man sbatch will give you the manual page for the sbatch command
Got scripts for other workload managers?
If you have scripts written for other workload managers like PBS/Torque, LSF, etc., please refer to this conversion guide for the most common SLURM commands, environment variables, and job specification options.
Information on jobs (squeue)
The squeue command provides information about jobs running on Schooner in the following format: Job ID, Partition, Name, User, Job State, Time, Nodes, Reason
Typing squeue lists all current (and pending) jobs in the queue
Typing squeue -u <username> lists all jobs in the queue for the specified user
Typing squeue -u <username> -t PENDING lists all the pending jobs for the specified user
Typing squeue -u <username> -t RUNNING lists all the running jobs for the specified user
More information about the squeue command can be found on the SLURM online documentation or by typing man squeue on Schooner.
Partition Information (sinfo)
The sinfo command provides information about the partitions (or queues) you have access to on Schooner. The information is displayed in the following format: Partition Name, Availability, Time Limit, Nodes, State, Node List
Typing sinfo provides information on all the queues you are assigned access to
Typing sinfo -p <partition name> provides information for the specified queue
More information about the sinfo command can be found on the SLURM online documentation or by typing man sinfo on Schooner.
Debugging Jobs (scontrol)
The scontrol command can be used for getting configuration details about a job, node or partition, and is especially useful when debugging jobs.
scontrol show job <job number> gives details about the particular job
scontrol show partition provides configuration details (example: priority, allowed accounts, allowed memory per node, etc.) of all available partitions
More information about the scontrol command can be found on the SLURM online documentation or by typing man scontrol on Schooner.
Submitting Batch Jobs (sbatch)
The sbatch command is used to submit jobs to SLURM. The SBATCH command within a script file can be used to specify resource parameters, such as: job name, output file, run time, etc. The Sample Jobs section below goes over some basic sbatch commands and SBATCH flags.
More information about the sbatch command can be found on the SLURM online documentation or by typing man sbatch on Schooner.
Cancelling Jobs (scancel)
Use the scancel command to cancel pending and running jobs.
scancel <job number> cancels the specified job
scancel -u <username> cancels all jobs for the specified user
scancel -t PENDING -u <username> cancels all pending jobs for the specified user
More information about the scancel command can be found on the SLURM online documentation or by typing man scancel on Schooner.
Schooner's Environment Module System - LMOD
Schooner hosts a large number of software, compilers and libraries to meet the needs of our users. Often times, we have to deploy multiple versions of the same software or compilers. Lmod helps manage such installations by setting up modules. Users can specify their custom environment by only loading modules that they need.
LMOD modules are displayed in the following form: <application name>/<application version>-<compiler type>
Example: NAMD/2.11-intel-2016a-mpi
Software that has not been compiled will not have the <compiler type> information. Example: MATLAB/2015a
Listing all available modules on Schooner
Typing module avail gives you a list of all the modules installed on Schooner. The list will be in the following format: <application name>/<application version>-<compiler type>.
You would see a list similar to this:
Default Modules
You will notice some modules have a (D) next to them. This is for modules that have multiple versions available for it. The (D) indicates that particular version of the module has been designated as the default module, and is the version that gets loaded in your environment if you do not specify the application version.
Example:
The Autoconf/2.69-intel-2016a has been designated as the default module for Autoconf on Schooner.
Autoconf/2.69-GCC-4.9.2
Autoconf/2.69-GCC-4.9.3-2.25
Autoconf/2.69-GNU-4.9.3-2.25
Autoconf/2.69-goolf-1.4.10
Autoconf/2.69-intel-2016a (D)
Loading a module
Typing module load <application name> will load the default version of the module AND its dependencies.
Typing module load <application name>/<application version>-<compiler type> will load that specific module AND its dependencies
Example:
The following versions of Autoconf have been installed on Schooner:
Autoconf/2.69-GCC-4.9.2
Autoconf/2.69-GCC-4.9.3-2.25
Autoconf/2.69-GNU-4.9.3-2.25
Autoconf/2.69-goolf-1.4.10
Autoconf/2.69-intel-2016a (D)
Typing module load Autoconf will load Autoconf/2.69-intel-2016a
If you wish to load Autoconf/2.69-GCC-4.9.2, you will have to type module load Autoconf/2.69-GCC-4.9.2
Listing all loaded modules in your environment
Typing module list displays all the modules currently loaded in your environment
Removing modules from your environment
To remove specific modules, type module unload <application name>
To remove ALL modules, type module purge
Sample Jobs
A typical job script has two parts: requesting resources and job steps. Requesting resources include specifying the amount of CPU needed, time to run, memory, etc. This is done within your script by using the SBATCH command. Job steps describe tasks that must be performed.
Non-Parallel Job
The following code is a simple non-parallel job that uses the hostname command to get the name of the node that executed this job. You can create or edit this file with your favorite editor. If you don't have a favorite editor, we suggest nano. The filename of the submit script can be anything you like, but we suggest the extension .sbatch, to distinguish it from other shell scripts. In this example, let's call it single-test.sbatch.
#!/bin/bash
#
#SBATCH --partition=normal
#SBATCH --ntasks=1
#SBATCH --mem=1024
#SBATCH --output=jobname_%J_stdout.txt
#SBATCH --error=jobname_%J_stderr.txt
#SBATCH --time=12:00:00
#SBATCH --job-name=jobname
#SBATCH --mail-user=youremailaddress@yourinstitution.edu
#SBATCH --mail-type=ALL
#SBATCH --chdir=/home/yourusername/directory_to_run_in
#
#################################################
hostname
After you have saved this file -- here called single-test.sbatch -- you will need to make it executable with the command
chmod +x single-test.sbatch
And then you can submit your job with
sbatch single-test.sbatch
Code Walkthrough
The SBATCH directive below says the name of the partition to be used. In most cases, you should use the queue named normal.
#SBATCH --partition=normal
The SBATCH directive below says to use 1 CPU core of 1 CPU chip on 1 compute node, meaning that this batch jobs is non-parallel (serial).
#SBATCH --ntasks=1
The SBATCH directive below indicates to the scheduler the amount of memory your job will use in megabytes. This is critical information for scheduling of nonexclusive jobs, since it prevents the scheduler from assigning more jobs to a given compute node than that node has memory for.
#SBATCH --mem=1024
The default unit is MB, but you can also specify GB, for example --mem=8G
The SBATCH directives below tells SLURM to send ouput and error messages to the filenames
listed below. Note that, in these filenames, %J will be replaced by the batch job ID number.
#SBATCH --output=jobname_%J_stdout.txt
#SBATCH --error=jobname_%J_stderr.txt
The SBATCH directive below says to run for up to 12 hours (and zero minutes and zero seconds)
#SBATCH --time=12:00:00
The maximum time limit for most partitions is 48h, which can be specified as 48:00:00 or 2-00:00:00
The SBATCH directive below says the name of the batch job. This name will appear in the batch partition when you do an squeue command. You can rename jobname to any name you like.
#SBATCH --job-name=jobname
The SBATCH directive below says the e-mail address to send notifications to, which should be changed to your e-mail address.
#SBATCH --mail-user=youremailaddress@yourinstitution.edu
The SBATCH directive below says to e-mail a notification when the batch job either completes or fails. If you do not include this SBATCH directive, you will only get an e-mail if the batch job fails.
#SBATCH --mail-type=ALL
Change to the directory that you want to run in.
#SBATCH --chdir=/home/yourusername/directory_to_run_in
This directory needs to exist before the job is submitted.
This command gets the name of the compute node that runs the job. This is just a very simple example. You would put your actual executable there, or your job loop, or whatever payload you need to run.
hostname
Parallel Job
The following code is a simple parallel job that runs on 40 MPI processes at 20 MPI processes per node.
Download the sample code from Wikipedia and save it as mpi_example.c in your working directory
Compile it using the following commands:
module load OpenMPI
mpicc mpi_example.c -o hello.mpi
Now run the following code (see the non-parallel job example above for more information about batch script naming and submission):
#!/bin/bash
#SBATCH --partition=normal
#SBATCH --exclusive
#SBATCH --nodes=2
#SBATCH --ntasks=40
#SBATCH --ntasks-per-node=20
#SBATCH --output=jobname_%J_stdout.txt
#SBATCH --error=jobname_%J_stderr.txt
#SBATCH --time=10:00
#SBATCH --job-name=jobname
#SBATCH --mail-user=youremailaddress@yourinstitution.edu
#SBATCH --mail-type=ALL
#SBATCH --chdir=/home/yourusername/directory_to_run_in
module load OpenMPI
mpirun hello.mpi
Code Walkthrough
The SBATCH directive below says the name of the partition to be used. In most cases, you should use the queue named normal.
#SBATCH --partition=normal
The SBATCH directive below says to request exclusive access on the participating compute nodes, so that other batch jobs (for example, those submitted by other users) don't run on the same compute nodes as this batch job, and therefore don't interfere with it.
#SBATCH --exclusive
Use 40 MPI processes at 20 MPI processes per node, which is to say 2 nodes in the case of the normal partition.
Please use the following pattern for nodes in the normal partition:
For ntasks <= 20, please use ntasks-per-node equal to n unless you have a very good reason to do otherwise.
For ntasks >= 20, please use ntasks-per-node equal to 20 unless you have a very good reason to do otherwise.
This is because each compute node has 2 chips and each chip has 10 cores, for a total of 20 cores per node. We recommend using the same number of MPI processes per node as cores, unless you've benchmarked your code's performance and found that you take fewer node hours by using fewer than 20 per node.
#SBATCH --nodes=2
#SBATCH --ntasks=40
#SBATCH --ntasks-per-node=20
The SBATCH directives below tells SLURM to send ouput and error messages to the filenames
listed below. Note that, in these filenames, %J will be replaced by the batch job ID number.
#SBATCH --output=jobname_%J_stdout.txt
#SBATCH --error=jobname_%J_stderr.txt
The SBATCH directive below says to run for up to 10 minutes
#SBATCH --time=10:00
The SBATCH directive below says the name of the batch job. This name will appear in the batch partition when you do an squeue command. You can rename jobname to any name you like.
#SBATCH --job-name=jobname
The SBATCH directive below says the e-mail address to send notifications to, which should be changed to your e-mail address.
#SBATCH --mail-user=youremailaddress@yourinstitution.edu
The SBATCH directive below says to e-mail a notification when the batch job either completes or fails. If you do not include this SBATCH directive, you will only get an e-mail if the batch job fails.
#SBATCH --mail-type=ALL
Change to the directory that you want to run in.
#SBATCH --chdir=/home/yourusername/directory_to_run_in
This command loads the modules needed to execute the program. In this case, we are using OpenMPI.
module load OpenMPI
This command executes the hello.mpi file we compiled earlier
mpirun hello.mpi
Upon successfully completing your job, your output file should look like this:
We have 40 processes.
Process 1 reporting for duty.
Process 2 reporting for duty.
Process 3 reporting for duty.
Process 4 reporting for duty.
Process 5 reporting for duty.
Process 6 reporting for duty.
Process 7 reporting for duty.
Process 8 reporting for duty.
Process 9 reporting for duty.
Process 10 reporting for duty.
Process 11 reporting for duty.
Process 12 reporting for duty.
Process 13 reporting for duty.
Process 14 reporting for duty.
Process 15 reporting for duty.
Process 16 reporting for duty.
Process 17 reporting for duty.
Process 18 reporting for duty.
Process 19 reporting for duty.
Process 20 reporting for duty.
Process 21 reporting for duty.
Process 22 reporting for duty.
Process 23 reporting for duty.
Process 24 reporting for duty.
Process 25 reporting for duty.
Process 26 reporting for duty.
Process 27 reporting for duty.
Process 28 reporting for duty.
Process 29 reporting for duty.
Process 30 reporting for duty.
Process 31 reporting for duty.
Process 32 reporting for duty.
Process 33 reporting for duty.
Process 34 reporting for duty.
Process 35 reporting for duty.
Process 36 reporting for duty.
Process 37 reporting for duty.
Process 38 reporting for duty.
Process 39 reporting for duty.