SLURM Introduction and Job Submission Script

SLURM Overview

SLURM (Simple Linux Utility for Resource Management) is an open-source workload manager designed for Linux clusters of all sizes. It is used to allocate resources to users for the purpose of running jobs, such as computations, simulations, or data analysis tasks. SLURM allows users to schedule and manage the execution of jobs on compute clusters efficiently.

Key components of SLURM:

  • Nodes: Physical or virtual machines in the cluster.

  • Partitions: Logical sets of nodes grouped together, often by hardware characteristics or purpose.

  • Jobs: Tasks submitted by users to be executed on the cluster.

  • Schedulers: Components responsible for deciding which jobs run on which nodes and when.

Job Submission Script

Below is a detailed explanation of the provided SLURM job submission script.

#!/bin/bash
#SBATCH --account="$ACCOUNT_NAME"
#SBATCH --constraint='gpu'
#SBATCH --nodes=1
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=12
#SBATCH --partition=normal
#SBATCH --constraint=gpu
#SBATCH --hint=nomultithread
#SBATCH --time=6:00:00
#SBATCH --output=slurm/slurm_%j.out
#SBATCH --error=slurm/slurm_%j.err
export OMP_NUM_THREADS=12 #$SLURM_CPUS_PER_TASK
cd ${SCRATCH}/lit_ecology_classifier
module purge
module load daint-gpu cray-python
source lit_ecology/bin/activate
python -m lit_ecology_classifier.predict --datapath /path/to/data --outpath /outdirectory --model_path /path/to/model

### Breakdown of the Script

  1. Shebang Line:

    This line indicates that the script should be run using the Bash shell.

  2. SBATCH Directives:

    These lines configure the SLURM job parameters. - #SBATCH –account=”$ACCOUNT_NAME”: Specifies the account to charge for the job. - #SBATCH –constraint=’gpu’: Constrains the job to run on nodes with GPUs. - #SBATCH –nodes=1: Requests 1 node for the job. - #SBATCH –ntasks-per-core=1: Specifies 1 task per CPU core. - #SBATCH –ntasks-per-node=1: Specifies 1 task per node. - #SBATCH –cpus-per-task=12: Allocates 12 CPU cores per task. - #SBATCH –partition=normal: Submits the job to the - #SBATCH –time=6:00:00: Demands for 6 hours of runtime.