Submitting and Running Jobs
Overview¶
To submit, run, monitor, and cancel jobs on Savio, you’ll use SLURM (the Simple Linux Utility for Resource Management). SLURM is an open-source scheduler that manages jobs, job steps, nodes, partitions, and accounts on the cluster.
Common SLURM Commands¶
Command | Description | Example |
| Submit a batch job script |
|
| Run an interactive job |
|
| Cancel a job |
|
| View job queue |
|
| Explain why a job is pending |
|
| Check project/account access |
|
| Show node and partition status |
|
| View accounting and completed jobs |
|
Submitting Jobs¶
Jobs on Savio can be batch (non-interactive) or interactive. When submitting, you must specify:
Account (required):
(--account)
Each job runs under an account that determines which resources you can use and how usage is billed. Check your available accounts using:
sacctmgr -p show associations user=$USERPartition (required):
(--partition)
Specifies the group of nodes (e.g., CPU, GPU, HTC). For example:
#SBATCH --partition=savio3_htcTime limit (required):
(--time)
Specifies the maximum wall-clock time your job can run. Format: days-hours:minutes:seconds. For example:
#SBATCH --time=00:30:00 # 30 minutesQoS (optional; depends on your project): Defines job priority and limits. Common QoS values include:
savio_normal— defaultsavio_debug— for short testssavio_lowprio— low-priority preemptible jobs For example:
#SBATCH --qos=savio_debugExample: Basic Batch Job¶
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --account=fc_NAME
#SBATCH --partition=savio3_htc
#SBATCH --time=00:00:30
echo "Hello world from Savio!"Submit with:
sbatch myjob.shOutput is saved as:
slurm-<jobid>.outInteractive Jobs¶
For interactive sessions (e.g., debugging, testing, GUI tools):
srun --pty -A <account> -p <partition> -t 00:30:00 bash -iYou’ll see:
srun: job 669120 queued and waiting for resources
srun: job 669120 has been allocated resources
[user@n0047 ~]$Now you’re on a compute node. Run your commands normally.
Job Arrays¶
Use job arrays to run multiple similar jobs efficiently:
#!/bin/bash
#SBATCH --job-name=array_example
#SBATCH --account=fc_NAME
#SBATCH --partition=savio3_htc
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
#SBATCH --time=00:01:00
echo "Running task $SLURM_ARRAY_TASK_ID"Submit with:
sbatch array_job.shEach array element runs as a separate job (task 0, task 1, etc.).
Low Priority Jobs¶
Use the savio_lowprio QoS to take advantage of idle cluster resources.
Pros: Doesn’t count toward your condo allocation
Cons: Lower priority and can be preempted (killed or requeued)
Example:
#SBATCH --qos=savio_lowprio
#SBATCH --requeueEmail Notifications and Output Files¶
By default, SLURM writes all job output to slurm-%j.out. To customize this:
#SBATCH --output=myjob_%j.out
#SBATCH --error=myjob_%j.errIf you would like to receive email updates:
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=EMAILExample:
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --account=fc_NAME
#SBATCH --partition=savio3_htc
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --time=00:00:30
#SBATCH --output=test_%j.out
#SBATCH --error=test_%j.err
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=EMAIL
echo "hello world"Monitoring Jobs¶
Via Command Line
Command | Purpose | Example |
| View active jobs | |
| View completed job info |
|
| Show node/partition status |
|
| Check resource usage snapshot | |
| “top”-like node summary |
To monitor a running job interactively:
srun --jobid=<jobid> --pty /bin/bashVia MyBRC Portal
Visit MyBRC Portal and go to
Jobs → Job ListFilter by user/project
Click a job’s SLURM ID to view:
Start/end times
Nodes used
CPUs & memory
Service Units consumed
Pending Jobs¶
To diagnose pending jobs:
module load sq
sqwill provide you with an explanation.
For a raw queue check:
squeue -p <partition_name> --state=PD -lTo estimate start time:
squeue -j <jobid> --startTips to start faster:
Request fewer nodes or shorter wall time
Submit to a less-busy partition
Use
sinfoto see idle nodesUse
savio_lowprioQoS when possible
For more details on using running and submitting jobs, look at the full documentation provided by Research IT.