Table of Contents

Pittsburgh Super Computer

General information

ssh -Y $USER@bridges2.psc.edu # -Y for X11 display forwarding
projects                      # see account information 

Getting an account

  1. you may need (20240424) to set a bridges/PSC specific password via reset at https://apr.psc.edu/
  2. send user account to PI (Finn or Bea) for approval in the project/grant settings

Getting data to and from the PSC

You can copy files to/from rhea-PSC via rsync, for example rsync –size-only -avhi –exclude CuBIDS –exclude miniconda3 $software_dir $psc:${psc_destdir}. Check which files will be rsynced before officially running it by adding –dry-run to the rsync call.

Alternatively, setting up Globus Connect endpoints, while more work, can transfer large amounts of data very fast and in the background (for rsync, see tmux).

Submitting jobs on the PSC

Jobs are submitted on the PSC via sbatch. sbatch is part of slurm. Usage and links are described on the slurm wiki page, including links to PSC's user guide, especially the RM and EM nodes RAM and cores summary table

Jobs run on RM partition request the whole node (128 cores), and always bill 128 hours for every hour spent in the node. Use -p RM-shared --ntasks-per-node=N to restrict to N cores.

If your job uses a lot of memory, you will see OOM (out-of-memory) errors with any RM partition. Request and use -p EM for high memory jobs.

If you need to run a script that requires command line arguments, you can export them, for example:

# in your script to be run by the job queue
export bids_dir freesurfer_dir freesurfer_sif license acq_label
do_thing $bids_dir

#in your sbatch call
--export="ALL,SUBJECT_ID=$subject_id,ACQ=$acq_label,BIDS_DIR=$bids_dir,FS_DIR=$freesurfer_dir,FS_SIF=$freesurfer_sif,LIC=$license" 

Not sure what resources to request? You can run 1 job with more resources than you think you will need. When the job completes successfully, check the resources it used via seff $jobid; this will tell you the CPU utilized, the job wall-clock run time, the amount of memory utilized, etc. Hence, when initially testing an sbatch submission, it is recommended to launch just one test participant (or one test run) in order to figure out if the job will complete successfully. If you are launching jobs for a list of participants or range of runs, this can be accomplished by adding break to a bash loop that launches the jobs in succession.

Checking the status and resource usage of your submitted jobs

When you have launched some jobs, you can check on whether they are running via squeue -u $userid

To find the jobid of jobs that you previously ran (that are either running, completed successfully, or exited with an error), use sacct –starttime yyyy-mm-dd. This will list the JobID, JobName, Partition, Account, AllocCPUs, State, and ExitCode of the job.

Checking hour allocations

To check the hours of specific job

 sacct -j <job name>.batch --format=JobID,MaxRSS,AveRSS,TotalCPU 

To check the hours in general:

 projects 

To check the hours per person:

 projects --usage-by-user soc230004p 

Resources