Sensitive data or too much memory consumption: There are good reasons to want to change the Git history. In this blog post , I explained how to purge files from Git history using BFG . A weak point of BFG is the lack of support for direct paths , so you cannot specifically remove files or folders in subfolders from the history. With that, it's time to look at alternative solutions.
In addition to the officially not recommended git filter branch , git-filter-repo is one of the tools for cleaning up the history. After a short installation , we first analyze the repository and find, for example, the largest folders in history:
git filter-repo --analyze
Well be in the folder
.git/filter-repo/analysis generated all sorts of TXT files:
It's worth the file
directories-all-sizes.txt take a closer look:
=== All directories by reverse size ===
Format: unpacked size, packed size, date deleted, directory name
4624417043 3796607988 <present> <toplevel>
4475940396 3778033787 <present> wp-content
4060236681 3694449320 <present> wp-content/uploads
305163809 70576241 <present> wp-content/plugins
123818107 15442735 <present> wp-includes
It often happens that you have long ignored and removed from the HEAD data in the history (for example, the WordPress media folder
wp-content/uploads/ or an accidentally pushed one
git-filter-repo after cleaning, pushing to a new, empty repository. There are numerous reasons listed here, why this makes sense and avoids many problems. Nevertheless, it can happen that you want to push to the same repository and that is also possible with a few hints.
Importantly, the major code hosting platforms GitHub and GitLab recommend different approaches, some of which differ from each other. For example, on GitHub we remove
wp-content/uploads/ using the following steps
git-filter-repo from history:
mkdir tmp-repo cd tmp-repo git clone firstname.lastname@example.org:foo/bar.git . cp .git/config /tmp/config-backup git filter-repo --invert-paths --path wp-content/uploads/ # option 1: same repo mv /tmp/config-backup .git/config git push origin --force --all # option 2: new repo git remote add origin email@example.com:foo/bar-new.git git push origin --force --all cd .. rm -rf tmp-repo
We can now also check the size remotely (changing the size via API and in the UI can take up to 24 hours). To do this, open the repository settings (if the repository belongs to an organization, you must first add your own account to the organization). Now we see the size:
The procedure is slightly different on GitLab:
mkdir tmp-repo cd tmp-repo # option 1: same repo # Settings > General > Advanced > Export project > download tar.gz file into tmp-repo tar xzf 20*.tar.gz git clone --bare --mirror project.bundle cd project.git git filter-repo --invert-paths --path wp-content/uploads/ cp ./filter-repo/commit-map /tmp/commit-map-1 # copying the commit-map has to be done after every single command from git filter-repo # you need the commit-map files later git remote remove origin git remote add origin firstname.lastname@example.org:foo/bar.git # Settings > Repository > Protected branches/Protected branches > # enable "Allowed to force push to main/master" git push origin --force 'refs/heads/*' git push origin --force 'refs/tags/*' git push origin --force 'refs/replace/*' # Settings > Repository > Protected branches/Protected branches > # disable "Allowed to force push to main/master" date # wait 30 minutes (😱) date # Settings > Repository > upload /tmp/commit-map-X # option 2: new repo git clone email@example.com:foo/bar.git . git filter-repo --invert-paths --path wp-content/uploads/ git remote add origin firstname.lastname@example.org:foo/bar-new.git # Settings > Repository > Protected branches/Protected branches > # enable "Allowed to force push to main/master" git push origin --force --all # Settings > Repository > Protected branches/Protected branches > # disable "Allowed to force push to main/master" cd .. rm -rf tmp-repo
After another wait of ~5 minutes we can go under
Settings > Usage Quotas view storage space:
After the removal, it is important that all developers involved are involved in the final steps: If a user now performs a normal push with their own local copy, this would result in the large files migrating back to the central repository. Therefore, the following 3 options are recommended:
- "poor man's fresh clone"
rm -rf .git && git clone xxx temp && mv temp/.git ./.git && rm -rf temp
- For changed files (depending on the application):
git checkout -- .or.
git add -A . && git commit -m "Push obscure file changes." && git push
- "start from scratch"
rm -rf repo && git clone xxx .
- "ugly pull with rebase"
git pull -r
- Here you still have the uncleaned history, but in most cases you no longer accidentally overwrite the remote repository with the large local variant
In the course of the current quotas (especially due to the new restrictions of GitLab ), it is always worth checking the size of the history of your repositories and cleaning them up if necessary:
|GitHub Free||GitLab Free|
|Max file size limit||100MB||∞|
|Max repo size limit||5,000MB||∞|
|Max repo count limit||∞||∞|
|Max overall size limit||∞||5,000MB|
Finally, it's also worth taking a look at a self-hosted, free variant like Gitea to throw. With little effort you can on a very slim server a self-hosted Git instance (GUI per SSL secured, Backup included, control over powerful API) host, which are also excellent configure and is also superior in terms of data protection. Here, by the way, you can also use
git-filter-repo Simply streamline repositories:
mkdir tmp-repo cd tmp-repo git clone email@example.com:foo/bar.git . cp .git/config /tmp/config-backup git filter-repo --invert-paths --path wp-content/uploads/ # option 1: same repo mv /tmp/config-backup .git/config git push origin --mirror # login on the remote command line and run in the repo-folder sudo -u git git reflog expire --expire=now --all sudo -u git git gc --aggressive --prune=now # if you face memory limit issues, modify the git configuration sudo -u git git config --global pack.windowMemory "100m" sudo -u git git config --global pack.packSizeLimit "100m" sudo -u git git config --global pack.threads "1" # if in web ui the size does not change, make a slight # modification to a file and push again normally # option 2: new repo git remote add origin firstname.lastname@example.org:foo/bar-new.git git push origin --force --all cd .. rm -rf tmp-repo
Here specifically is the command
sudo -u git git gc --aggressive --prune=now important (the cron running
git gc otherwise has one too long prune time of 2 weeks).