How I Use Git
20 Mar 2023Git is fundamentally quite a simple tool. It performs a few core operations and they are not especially complex. However, to be useful it is necessary to combine several operations. There are many ways they can be combined and arguments abound over the best way to combine them into a workflow. Git itself is not particularly opinionated (although GitHub is more so). (Terms in italics are Git keywords you should search for if you aren’t familiar with them.)
Git stores the history of a project. There are several reasons you might want to do this and so several different strategies for how to do it best.
Levels of granularity
Git does not store the entire history. It does not store every keystroke like your editor does in its undo buffer, nor even every version of a file saved like some file systems do. It’s less granular than that. It only stores what you commit. So the exact level of granularity is up to you, and different situations and different user roles will prefer different levels.
If you use Git as an undo buffer, and/or a method of backing up your work, you will want it to be fine grained. You will want to make lots of small commits. However, the usefulness of such a fine grained history decays with time and distance. Once you have finished working on a feature, you probably don’t care about all the intermediate steps you took, especially if some of those steps were dead ends that you undid. If you made intermediate commits entirely for backup purposes because your work was interrupted, they may have no logical consistency at all and not even build.
The next common usecase for Git is collaboration. Other people may want to view and understand your Git history and they will certainly not want to do it at that finely grained level. How can you achieve this?
You could just not make such fine grained commits. If the work is easy and straightforward this is quite a reasonable workflow.
You could make use of the Git stash feature. A stash is a special sort of commit that serves as a backup but is not made part of the history and not shared with others.
You could compromise by only ever keeping the most recent fine grained commit. When you commit, use the amend option to overwrite your previous commit.
You could make fine commits but then edit them afterwards to make them more coarse grained.
It is an issue of debate as to when you should make these edits.
Building a pull request
The typical GitHub workflow is to create a branch in the history tree for the current feature you are working on. You make commits to this branch, then when done you request that the owner of the master branch pulls your changes into his branch.
So first you must decide what level of granularity you want him to see. In many cases, he and you will be the same person, so the question is irrelevant for this stage and you may as well leave your commits unedited. More commonly, he or his delegates will want to review your code, but only at a cursory level. GitHub makes this easy to do by combining all your commits into one and displaying the diff between this and the master branch. So again in this situation the individual commits do not matter.
When they do start to matter is when pull requests become large. People find it difficult to review large changes, and so they review the individual commits separately. If you expect them to do this, you should consider editing your commits before submitting the request. The granularity of a pull request is itself a debated issue. For the reviewers, it is preferable to review multiple smaller pull requests than one large one. However creating them can be difficult if features cannot be decomposed or are dependent on one another, requiring stacking.
So if you decide you don’t want your reviewer to see your unedited history, what can you do?
Combine everything in to one commit. (‘Squash’) You could make a new branch off master, or you could reset your current branch history back to its initial state (without resetting your files of course) and then make one single commit.
Combine commits using interactive rebase.
Rebase is controversial because it means you are no longer using Git as a truthful history of the project. You are editing history with the goal of telling a story. If in the real history you attempted something and then undid it (but you didn’t use Git to undo the attempt at the time) you will probably want to elide that attempt from the history as if it never happened. But some prefer to preserve paths not taken.
Rebase can also create combined commits in the history that never existed at any point in reality. You may want to ensure all your commits build and run correctly so you can return to any previous version of the project, but rebasing can create commits that don’t. Nevertheless most people believe the advantages of presenting a logical story in the history outweigh these problems.
I would recommend trying to be careful with your initial commits - prefer to amend them at the time rather than fixing them later with a rebase. However be careful if you amend a commit after you have already pushed it to a remote repo. In that case you will have to force push and it will cause confusion for anyone else who has checked out your branch. Assuming you are the only one working on a feature branch this should not be a problem, but if you have the same branch checked out on multiple machines it’s easy to confuse yourself this way.
Another decision you will have to make is whether to use rebase when updating your feature branches. If you (or someone else) has done work on the master branch since you forked your feature branch and you expect the feature branch to be pulled back into master then you should incorporate those changes before requesting the pull. The default behaviour of Git is to merge those changes, but the majority of users seem to prefer to use the rebase option. The difference is it makes it look like you started your work on the most recent version of master and then made all your commits after that, rather than the reality that you worked on an older version. The same philosophical debate applies about whether rewriting history is good, but assuming you don’t have such objections I can see no practical reason not to always use this option.
Merging a pull request
Once your pull request is reviewed, GitHub offers you three options: merge, squash or rebase . (Note these are things you can do at any stage yourself using Git independent of GitHub). If your pull request consists of just a single commit, perhaps because you already squashed it yourself prior to review, or because you split a large change into multiple pull requests, then these options are effectively identical.
Squash is the coarse grained option. You are throwing away information (the separate commits) on the basis that you won’t need it again and keeping it will make your history more cluttered. I would typically squash only because I had made “work in progress” commits and hadn’t bothered to amend or rebase before making the pull request. Also sometimes a pull request ends up growing after it was created to contain far more commits than originally intended, perhaps as a result of bugs found in review. In that case the best solution might be an interactive rebase, but if you can’t be bothered then squash is a quick way of hiding the clutter.
Rebase is the fine grained option. All your commits get appended to master as if you had been working on master the whole time. If you carefully crafted your commits through amend and interactive rebase then this is a good option. The problem is if your pull request contains more than a couple of commits. In the pull request, they are all grouped together as part of the same feature. Once added to master, the link between them is no longer obvious. Some people solve this by prefixing all the commit messages with a common prefix. For large projects with many commits each day, just reading the log can be an issue, so they might insist that all pull requests are squashed or merged.
Merging is something of a compromise. Unlike squash, it retains all of the commits in the pull request. Unlike rebase, it doesn’t pretend the commits were done on the master branch. Nothing is thrown away at all, which pleases those who want Git to be a historical record. It is fine grained, but it retains that all the commits were part of the same pull requested branch. So someone viewing the history can choose whether to view the fine grained detail, or just collapse those entries into one coarse entry.
Merge sounds like the best of both worlds, so why isn’t it more popular? Perhaps because GitHub and other tools do not collapse merged branches when viewing history, people think it looks cluttered. Understanding the history of a project that has multiple forks and merges is certainly not straightforward. Thus it’s not my first choice. If there are so many large commits in a pull request that I don’t want to lose the history by squashing it and I don’t want to pollute the master history by rebasing it then I will merge it, but I ideally would have made a pull request with fewer commits in the first place.
Conclusion
Each time you make a commit you should be aware whether you intend for this commit to appear in the final history of the project. If you do not, you should have a plan for what you intend to do with it. If it’s just one work in progress commit maybe you will amend it later. But if you’ve just made two such commits in a row you’re going to need to rebase. If you’re planning to squash all your commits eventually anyway then perhaps it doesn’t matter now.
You should know who your audience is at each stage:
During development.
Reviewing PR.
Reading and perhaps bisecting project history in future.
It may just be you and your future self. You should decide what level of granularity is appropriate for each audience. This will inform much history you want to keep and how much you want to throw away.
You should ask yourself how much you care about preserving a true history compared to how much you care about telling a useful story.
My favourite work flows
If feature is small enough, only make one commit. Keep amending it as necessary, then request pull. If you do this you can even use software to automatically generate the branches and stacked pull requests.
Make a small number of commits. Amend latest if necessary before starting the next. Interactive rebase shouldn’t be necessary if you plan each commit to be logically separate, but it is available as a last resort. Use stash for experiments and to save work in progress. After pull request, rebase.
Sometimes you need to make lots of commits because it’s part of your deployment pipeline. There’s something you can’t test locally. Maybe the commits trigger GitHub Pages, or maybe you are developong on one machine while testing on another and using Git to synchronise them. In that case make as many half cocked commits as you like, then squash after pull request.