This is an old revision of the document!

Pittsburgh Super Computer

General information

For planned outages see: https://www.psc.edu/calendar/

To log in to the supercomputer ssh userid@bridges2.psc.edu. This is world accessible and total independent for either the pitt or UPMC network.

To test what resources you have access to type projects Quick test for interactive queue: salloc. See slurm for more

Getting an account

https://operations.access-ci.org/identity/new-user
??
get approval from PI (Finn or Bea)

Getting data to and from the PSC

You can copy files to/from rhea-PSC via rsync, for example rsync –size-only -avhi –exclude CuBIDS –exclude miniconda3 $software_dir $psc:${psc_destdir}. Check which files will be rsynced before officially running it by adding –dry-run to the rsync call.

Submitting jobs on the PSC

Jobs are submitted on the PSC via sbatch. See sbatch options here https://www.psc.edu/resources/bridges-2/user-guide/#system-configuration. A description of some options is below:

-p RM-shared : the partition you are requesting resources from. The most common one is RM-shared, but there is also RM, RM-512, and EM (extreme memory)
–time hh:mm:ss : maximum run time for your job. On RM-shared, the max run time appears to be 48:00:00. If the partition you are requesting is full/backed up, jobs with shorter run times are prioritized over those with longer requested times
–nodes : The number of nodes to use. Typically 1 is sufficient (and appears to be the max you can request on RM-shared)
–ntasks-per-node : The number of cores to use per node. Importantly, increasing the number of cores requested increases your job's memory (RAM) allocation. On RM-shared, each core comes with 1.95G memory. So four nodes e.g. will get you 7.81 GB
-n : number of cores requested in total (useful if you are requesting >1 node and dividing requested cores over nodes)
-J “$subid-$script” : The name of your job. By default, jobs are named by their job id, however you can customize the job name via variables like $subid
-o : output log file name
-e : error log file name
–export: see below

*If you need to run a script that requires command line arguments, you can export them, for example: export bids_dir freesurfer_dir freesurfer_sif license acq_label in your script and then –export=“ALL,SUBJECT_ID=$subject_id,ACQ=$acq_label,BIDS_DIR=$bids_dir,FS_DIR=$freesurfer_dir,FS_SIF=$freesurfer_sif,LIC=$license” in your sbatch call

Not sure what resources to request? You can run 1 job with more resources than you think you will need. When the job completes successfully, check the resources it used via seff $jobid; this will tell you the CPU utilized, the job wall-clock run time, the amount of memory utilized, etc. Hence, when initially testing an sbatch submission, it is recommended to launch just one test participant (or one test run) in order to figure out if the job will complete successfully. If you are launching jobs for a list of participants or range of runs, this can be accomplished by adding break to a bash loop that launches the jobs in succession.

Checking the status and resource usage of your submitted jobs

When you have launched some jobs, you can check on whether they are running via squeue -u $userid

To find the jobid of jobs that you previously ran (that are either running, completed successfully, or exited with an error), use sacct –starttime yyyy-mm-dd. This will list the JobID, JobName, Partition, Account, AllocCPUs, State, and ExitCode of the job.

allocation hour calculator: TODO

Resources

slurm
CRC