Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Submitting and Running Jobs

Overview

To submit, run, monitor, and cancel jobs on Savio, you’ll use SLURM (the Simple Linux Utility for Resource Management). SLURM is an open-source scheduler that manages jobs, job steps, nodes, partitions, and accounts on the cluster.

Common SLURM Commands

Command

Description

Example

sbatch

Submit a batch job script

sbatch myjob.sh

srun

Run an interactive job

srun --pty bash

scancel

Cancel a job

scancel 12345

squeue

View job queue

squeue -u $USER

sq

Explain why a job is pending

module load sq; sq

sacctmgr

Check project/account access

sacctmgr -p show associations user=$USER

sinfo

Show node and partition status

sinfo

sacct

View accounting and completed jobs

sacct -j 123451

Submitting Jobs

Jobs on Savio can be batch (non-interactive) or interactive. When submitting, you must specify: Account (required): (--account)

Each job runs under an account that determines which resources you can use and how usage is billed. Check your available accounts using:

sacctmgr -p show associations user=$USER

Partition (required): (--partition)

Specifies the group of nodes (e.g., CPU, GPU, HTC). For example:

#SBATCH --partition=savio3_htc

Time limit (required): (--time)

Specifies the maximum wall-clock time your job can run. Format: days-hours:minutes:seconds. For example:

#SBATCH --time=00:30:00   # 30 minutes

QoS (optional; depends on your project): Defines job priority and limits. Common QoS values include:

#SBATCH --qos=savio_debug

Example: Basic Batch Job

#!/bin/bash
#SBATCH --job-name=test
#SBATCH --account=fc_NAME
#SBATCH --partition=savio3_htc
#SBATCH --time=00:00:30

echo "Hello world from Savio!"

Submit with:

sbatch myjob.sh

Output is saved as:

slurm-<jobid>.out

Interactive Jobs

For interactive sessions (e.g., debugging, testing, GUI tools):

srun --pty -A <account> -p <partition> -t 00:30:00 bash -i

You’ll see:

srun: job 669120 queued and waiting for resources
srun: job 669120 has been allocated resources
[user@n0047 ~]$

Now you’re on a compute node. Run your commands normally.

Job Arrays

Use job arrays to run multiple similar jobs efficiently:

#!/bin/bash
#SBATCH --job-name=array_example
#SBATCH --account=fc_NAME
#SBATCH --partition=savio3_htc
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
#SBATCH --time=00:01:00

echo "Running task $SLURM_ARRAY_TASK_ID"

Submit with:

sbatch array_job.sh

Each array element runs as a separate job (task 0, task 1, etc.).

Low Priority Jobs

Use the savio_lowprio QoS to take advantage of idle cluster resources.

Example:

#SBATCH --qos=savio_lowprio
#SBATCH --requeue

Email Notifications and Output Files

By default, SLURM writes all job output to slurm-%j.out. To customize this:

#SBATCH --output=myjob_%j.out
#SBATCH --error=myjob_%j.err

If you would like to receive email updates:

#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=EMAIL

Example:

#!/bin/bash
#SBATCH --job-name=test
#SBATCH --account=fc_NAME
#SBATCH --partition=savio3_htc
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --time=00:00:30
#SBATCH --output=test_%j.out
#SBATCH --error=test_%j.err
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=EMAIL

echo "hello world"

Monitoring Jobs

Via Command Line

Command

Purpose

Example

squeue -u $USER

View active jobs

sacct -j <jobid>

View completed job info

sacct -j 12345 --format=JobID,JobName,Elapsed,MaxRSS

sinfo

Show node/partition status

sinfo -p savio3_htc

wwall -j <jobid>

Check resource usage snapshot

wwtop

“top”-like node summary

To monitor a running job interactively:

srun --jobid=<jobid> --pty /bin/bash

Via MyBRC Portal

  1. Visit MyBRC Portal and go to Jobs → Job List

  2. Filter by user/project

  3. Click a job’s SLURM ID to view:

Pending Jobs

To diagnose pending jobs:

module load sq
sq

will provide you with an explanation.

For a raw queue check:

squeue -p <partition_name> --state=PD -l

To estimate start time:

squeue -j <jobid> --start

Tips to start faster:

For more details on using running and submitting jobs, look at the full documentation provided by Research IT.