Pittsburgh Super Computer
General information
- To log in to the supercomputer
ssh userid@bridges2.psc.edu
.- For graphical apps (eg. AFNI), see SSH Display Forwarding (X11)
- This is world accessible and total independent for either the pitt or UPMC network. no myapps, Global Connect, or other Remote Access needed!
- To test what resources you have access to type
projects
- Quick test for interactive queue:
salloc
. See slurm for more - use
-p RM-shared --ntasks-per-node=1
for lowest hour burn rate - For planned outages see: https://www.psc.edu/calendar/
ssh -Y $USER@bridges2.psc.edu # -Y for X11 display forwarding projects # see account information
Getting an account
- you may need (20240424) to set a bridges/PSC specific password via reset at https://apr.psc.edu/
- send user account to PI (Finn or Bea) for approval in the project/grant settings
- request matlab access from https://www.psc.edu/resources/software/matlab/permission-form/
Getting data to and from the PSC
You can copy files to/from rhea-PSC via rsync, for example rsync –size-only -avhi –exclude CuBIDS –exclude miniconda3 $software_dir $psc:${psc_destdir}
. Check which files will be rsynced before officially running it by adding –dry-run
to the rsync call.
Alternatively, setting up Globus Connect endpoints, while more work, can transfer large amounts of data very fast and in the background (for rsync, see tmux).
Submitting jobs on the PSC
Jobs are submitted on the PSC via sbatch
. sbatch is part of slurm. Usage and links are described on the slurm wiki page, including links to PSC's user guide, especially the RM
and EM
nodes RAM and cores summary table
Jobs run on RM
partition request the whole node (128 cores), and always bill 128 hours for every hour spent in the node. Use -p RM-shared --ntasks-per-node=N
to restrict to N cores.
If your job uses a lot of memory, you will see OOM (out-of-memory) errors with any RM
partition. Request and use -p EM
for high memory jobs.
If you need to run a script that requires command line arguments, you can export them, for example:
# in your script to be run by the job queue export bids_dir freesurfer_dir freesurfer_sif license acq_label do_thing $bids_dir #in your sbatch call --export="ALL,SUBJECT_ID=$subject_id,ACQ=$acq_label,BIDS_DIR=$bids_dir,FS_DIR=$freesurfer_dir,FS_SIF=$freesurfer_sif,LIC=$license"
Not sure what resources to request? You can run 1 job with more resources than you think you will need. When the job completes successfully, check the resources it used via seff $jobid
; this will tell you the CPU utilized, the job wall-clock run time, the amount of memory utilized, etc. Hence, when initially testing an sbatch submission, it is recommended to launch just one test participant (or one test run) in order to figure out if the job will complete successfully. If you are launching jobs for a list of participants or range of runs, this can be accomplished by adding break
to a bash loop that launches the jobs in succession.
Checking the status and resource usage of your submitted jobs
When you have launched some jobs, you can check on whether they are running via squeue -u $userid
To find the jobid of jobs that you previously ran (that are either running, completed successfully, or exited with an error), use sacct –starttime yyyy-mm-dd
. This will list the JobID, JobName, Partition, Account, AllocCPUs, State, and ExitCode of the job.
Checking hour allocations
To check the hours of specific job
sacct -j <job name>.batch --format=JobID,MaxRSS,AveRSS,TotalCPU
To check the hours in general:
projects
To check the hours per person:
projects --usage-by-user soc230004p
- allocation hour calculator: TODO