====== BFG: git Repo Cleaner ====== As mentioned in [[https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository|github docs]], [[https://github.com/rtyley/bfg-repo-cleaner|BFG]] is the prefered way to scrub secrets, credentials, PHI, and other sensitive information accidentally included in a [[:tools:git|git]] repo. (**avoid** ''git -A'' and you're unlikely to end up in this situation) On rhea us the command: ''java -jar /opt/ni_tools/bfg-git-repo-cleaner/bfg-1.14.0.jar'' ===== Example ===== Here an ssh key is accidentally included and upload to github. \\ {{:tools:pasted:20231020-093647.png}} ==== Why a special tool? ==== You might try removing the file as you would any other source file git rm sshkey* git commit -m 'remove secrets' Though it's gone from the top of the repository, the file still exists in the history! For most things that are to be version controlled, this is ideal. You want to be able to go back and time and see what was included. ==== Scrub ==== Run bfg to delete the file. In the example, we cd to the cd /Volumes/Hera/Projects/7TBrainMech/scripts/eeg/Shane java -jar /opt/ni_tools/bfg-git-repo-cleaner/bfg-1.14.0.jar --delete-files sshkey git reflog expire --expire=now --all && git gc --prune=now --aggressive git push --force ===== Prevention ===== Add the names of secret files to a line in ''.gitignore''. ''git -A'' will ignore those files and git will warn you (and require ''git add -f'') if you try to manually add them. ===== Regenerate credentials ===== Github is aggressively mirrored by separate entities. A published secret then removed has still likely been captured externally. You should regenerate any compromised credentials **in addition to** scrubbing them from the repo. See [[https://blog.acolyer.org/2019/04/08/how-bad-can-it-git-characterizing-secret-leakage-in-public-github-repositories/|blog.acolyer.org]]'s breakdown of [[http://dx.doi.org/10.14722/ndss.2019.23418|10.14722/ndss.2019.23418]] [2019] for a scary insight into extracing secrets from github's searches API: 30 mins to scan all of github and find 100,000s leaked credentials.