====== Pittsburgh Super Computer ====== **General information** * To log in to the supercomputer ''ssh userid@bridges2.psc.edu''. * For graphical apps (eg. AFNI), see [[:tools:ssh display forwarding]] * This is world accessible and total independent for either the pitt or UPMC network. **no myapps, [[:tools:globalconnect]], or other [[:admin:remoteaccess]] needed!** * To test what resources you have access to type ''projects'' * Quick test for interactive queue: ''salloc''. See [[:tools:slurm]] for more * use ''%%-p RM-shared --ntasks-per-node=1%%'' for lowest hour burn rate * User Guide: https://www.psc.edu/resources/bridges-2/user-guide/ * For planned outages see: https://www.psc.edu/calendar/ ssh -Y $USER@bridges2.psc.edu # -Y for X11 display forwarding projects # see account information ===== Getting an account ===== - https://operations.access-ci.org/identity/new-user - you [[https://www.psc.edu/resources/bridges-2/user-guide/|may need (20240424)]] to set a bridges/PSC specific password via reset at https://apr.psc.edu/ - send user account to PI (Finn or Bea) for approval in the project/grant settings - request matlab access from https://www.psc.edu/resources/software/matlab/permission-form/ ===== Getting data to and from the PSC ===== You can copy files to/from rhea-PSC via rsync, for example ''rsync --size-only -avhi --exclude CuBIDS --exclude miniconda3 $software_dir $psc:${psc_destdir}''. Check which files will be rsynced before officially running it by adding ''--dry-run'' to the rsync call. Alternatively, setting up [[:tools:globus]] endpoints, while more work, can transfer large amounts of data very fast and in the background (for rsync, see [[:tools:tmux]]). ===== Submitting jobs on the PSC ===== Jobs are submitted on the PSC via ''sbatch''. sbatch is part of [[:tools:slurm]]. Usage and links are described on the [[:tools:slurm|slurm wiki page]], including links to [[https://www.psc.edu/resources/bridges-2/user-guide/#system-configuration|PSC's user guide]], especially the ''[[https://www.psc.edu/resources/bridges-2/user-guide/#:~:text=Bridges-2%20RM-,nodes,-RM|RM]]'' and ''[[https://www.psc.edu/resources/bridges-2/user-guide/#:~:text=of%20the%20EM-,partition,-EM%20partition|EM]]'' nodes RAM and cores summary table Jobs run on ''RM'' partition [[https://www.psc.edu/resources/bridges-2/user-guide/#:~:text=jobs%20in%20the%20rm%20partition%20are%20charged%20for%20all%20128%20cores%20on%20every%20node%20they%20use.%20|request the whole node (128 cores), and always bill 128 hours for every hour spent in the node]]. Use ''%%-p RM-shared --ntasks-per-node=N%%'' to restrict to N cores. If your job uses a lot of memory, you will see OOM (out-of-memory) errors with any ''RM'' partition. Request and use ''-p EM'' for high memory jobs. **If you need to run a script that requires command line arguments**, you can export them, for example: # in your script to be run by the job queue export bids_dir freesurfer_dir freesurfer_sif license acq_label do_thing $bids_dir #in your sbatch call --export="ALL,SUBJECT_ID=$subject_id,ACQ=$acq_label,BIDS_DIR=$bids_dir,FS_DIR=$freesurfer_dir,FS_SIF=$freesurfer_sif,LIC=$license" Not sure what resources to request? You can run 1 job with more resources than you think you will need. When the job completes successfully, check the resources it used via ''seff $jobid''; this will tell you the CPU utilized, the job wall-clock run time, the amount of memory utilized, etc. Hence, when initially testing an sbatch submission, it is recommended to launch just one test participant (or one test run) in order to figure out if the job will complete successfully. If you are launching jobs for a list of participants or range of runs, this can be accomplished by adding ''break'' to a bash loop that launches the jobs in succession. ===== Checking the status and resource usage of your submitted jobs ===== When you have launched some jobs, you can check on whether they are running via ''squeue -u $userid'' To find the jobid of jobs that you previously ran (that are either running, completed successfully, or exited with an error), use ''sacct --starttime yyyy-mm-dd''. This will list the JobID, JobName, Partition, Account, AllocCPUs, State, and ExitCode of the job. ===== Checking hour allocations ===== To check the hours of specific job sacct -j .batch --format=JobID,MaxRSS,AveRSS,TotalCPU To check the hours in general: projects To check the hours per person: projects --usage-by-user soc230004p * allocation hour calculator: TODO ===== Resources ===== * [[:tools:slurm]] * [[:tools:crc]]