====== slurm ======

For HPC (High performance computing) environments where there are many users who have more computation to do than the computers can handle at one time, a job queue scheduling system exists to fairly distribute and queue computation for all users. **slurm** provides this.

This means we can not simply run an iterative terminal (e.g. matlab, R, ipython) or even a script. Instead we need to wrap our computation in scripts that are formatted to enter the job queue and submit them with ''sbatch''. The biggest change is we need to know how many cores and how long the job we are submitting will take.


Both [[:tools:PSC]] and [[:tools:CRC]] use [[https://slurm.schedmd.com/documentation.html|slurm]] for job scheduling. It's an alternative to PBS or [[https://en.wikipedia.org/wiki/TORQUE|torque]]. We interact with slurm using ''sbatch'', ''squeue''


===== Resources and Documentation =====

There are many guides to using slurm, some written specifically for the super computer we use.

  * [[https://www.psc.edu/resources/bridges-2/user-guide/|PSC bridges guide]] 
  * [[https://slurm.schedmd.com/tutorials.html#tools|slurm's vidoe and tutorial links]]
  * [[https://crc.pitt.edu/getting-started/running-jobs-slurm|Running Jobs With SLURM @ CRC]]
  * [[https://github.com/LabNeuroCogDevel/PSC_free_all_t1/|LNCD code running freesurfer on bridges1 circa 2020]]


===== sbatch options =====

Jobs are submitted with options given to ''sbatch''. Here's a subset of important options: 

  * ''-p RM-shared'' : the partition you are requesting resources from. The most common one is RM-shared, but there is also RM, RM-512, and EM (extreme memory)
  * ''--time hh:mm:ss'' : maximum run time for your job. On RM-shared, the max run time appears to be 48:00:00. If the partition you are requesting is full/backed up, jobs with shorter run times are prioritized over those with longer requested times
  * ''--nodes'' : The number of nodes to use. Typically 1 is sufficient (and appears to be the max you can request on RM-shared)
  * ''--ntasks-per-node'' : The number of cores to use per node. Importantly, increasing the number of cores requested increases your job's memory (RAM) allocation. On RM-shared, each core comes with 1.95G memory. So four nodes e.g. will get you 7.81 GB
  * ''-n'' : number of cores requested in total (useful if you are requesting >1 node and dividing requested cores over nodes)
  * ''-J "$subid-$script"'' : The name of your job. By default, jobs are named by their job id, however you can customize the job name via variables like $subid
  * ''-o'' : output log file name
  * ''-e'' : error log file name
  * ''--export'': comma separated list of variables sent to script. replacement for using command line arguments.
===== Usage =====

from a "head" node (ie. after ''ssh bridges2.psc.edu'') , interactively run
<code=bash>
export SUBJECT=ABCD
sbatch -o $logfile -e $logfile -J $jobname my_batch_job.bash

# can also use export to explicitly variables used by batch script
sbatch --export=ALL,SUBJECT=ABCD  my_batch_job.bash
</code>

Before launching a job, you need to know
  - how long the job will take ("walltime"). over estimate.
    * if you underestimate, the job will be killed by the scheduler before it finishes
    * the higher your estimate, the longer it'll take your job to leave the queue and start running 
    * 1000s of very short jobs will also be penalized by the scheduler
  - how many cores (forks, threads, tasks) to use. This sets cpu hours "billing." 
     * You're charged walltime*requested cores. Even if you don't use the cores, you're blocking others from them.

The script given to ''sbatch'' to submit should contain special comments for slurm node settings not specified on the command line. Usually this is the expected runtime and the number of CPU cores. The script should use global variables instead of input arguments/options. Here we use ''$SUBJECT'' in the script and ''export SUBJECT='' before submitting with ''sbatch''.
<code>
#!/bin/bash
#SBATCH --partition=RM-shared
#SBATCH --ntasks-per-node=1
#SBATCH --nodes=1
#SBATCH --time=8:00
# above allocates 8 hours
# Acceptable time formats include:
#  "minutes", "minutes:seconds", "hours:minutes:seconds",
#  "days-hours", "days-hours:minutes" "days-hours:minutes:seconds"


# example command using global "$SUBJECT" variable
long_running_process /path/to/$SUBJECT
</code>