Tailor's Blog

Why good Git hygiene matters

My guidelines below put extra responsibility and work onto a code author’s shoulders. What might seem like unnecessary overhead, turns out to be the opposite in practice: An efficient use of time that pays off pretty much immediately in collaborative development, and pays further dividends throughout the lifecycle of a codebase.

A code change’s author has full knowledge over why they are making code changes in that very moment, and why the chosen changes are a good solution for the problem at hand. This understanding is a natural by-product, no, a requirement for any code change. The guidelines below propose a tiny investment: Noting down this understanding right in the moment. That’s cheaper than to recreate the understanding at a later time.

In a team with a process where someone else reviews code changes, a person without intimate knowledge of the code and its change will spend considerable brain energy to onboard themselves into the topic and follow along the code change. Doing so without guidance is more expensive than for the code author to leave helpful breadcrumbs. That’s why throughout the review process, extra-effort on the code change owner’s site pays off. It’s also respectful: Its the code change author’s desire to merge a change. It’s their responsibility to convince that the code change is worth being merged. Any effort the author extends in favor of the reviewer increases likelihood and speed of getting the change merged. It also strengthens the constructive collaboration culture.

The Git history is the code base’s diary. That’s immensely useful to have. If used properly, it can answer questions like: Did an earlier version of the codebase behave differently? When was a regression shipped? What was the train of thought that drove a change? Why were certain decisions made? Was there any alternatives that were considered?

During development

Keep as close to the integration branch as possible to avoid (invisible) merge conflicts. This is achieved through continuous rebasing on top of the integration branch.

Commit often, more often than might seem necessary in the moment. Stage line by line. These two behaviors result in small commits with a single purpose.

Focused code diffs are easy to understand for future readers.

Secondly, reshaping a Git commit history is easier when starting with small work chunks. Merging commits is easier than to split them apart after the fact.

Commits provide a great, standardized undo/redo primitive, which helps to change course during implementation. Commits are like building blocks. Creating multiple branches enables to explore alternative paths in parallel. Having more small building stones affords you flexibility. It’s not always possible to know upfront if a seemingly harmless code change like an optional refactoring spirals out of control. Sometimes code changes turn into a rabbit hole that draws you into making more and more unrelated changes, and you find yourself wanting to reset to the state before having stumbled into the rabbit hole.

When writing a commit message: Don’t describe what the commit does - the code diff needs to explain itself. Instead, describe the why, the motivation for the code change. If alternatives were considered, or the code has known trade-offs, state why this particular solution was chosen. Explaining a code change like this tends to lead to realizations that the chosen implementation overlooked aspects, or that different paths would be preferable after all.

Before code review

Rebase the Git commits one last time to craft a commit history. The commit history should tell an easy-to-follow story. It should not be transparent about the actual, messy process the work followed. The commit history is supposed to show the most logical break-up / assembly of work steps you would take to implement the change now, with all the knowledge you gained during implementation. A reviewer should perceive a clear, linear thread of thought and be able to imagine the code changes based on the commit messages. If that’s the case, initial code review will be much quicker. Additional back-and-forth will be minimized. And hence the PR merged as quickly as possible.

To craft the perfect history, put yourself into the reviewer’s shoes: review your code change. Go commit-by-commit, and also look at the totality of code changes. Then act upon your self-judgment: Iterate code and commit history. Sharpen the PR to a minimal, essential, coherent body of work. Any change that is useful but doesn’t belong can be cherry-picked to a new feature branch.

During pull request review

At least on GitHub: Never change commit hashes which are part of the already reviewed commit graph. Doing so makes it hard for the reviewer to know what changes were made between the commit that the reviewer originally viewed, and the force-pushed version. That forces the reviewer to re-review the whole commit. A waste of time, and rude to the reviewer, who can’t be sure whether code they are reading is ever immutable and “done”. At least in the past, the GitHub UI would sometimes not show code review comments in the main overview if they were left on commits that had since been removed through a force-pushed rebase, which opened the door for overlooking crucial feedback.

Address each PR review comment with one commit, referencing the comment in the commit message. This practice might feel like overhead at first and creates a busy history, but it will speed up the PR’s review lifecycle. Having one commit per review comment allows the code change author to post a commit hash in reply to every review comment. The reviewer can go over unresolved comments, and can see with one click whether a particular concern has been addressed, and how.

Merge strategy

Don’t squash. All of the work put into the Git history beforehand will continue to be valuable if you keep it around. A granular history will allow future readers to understand why any change was made, through tools like blame, diff, dissect. Squashing would be a lossy change that serves little purpose but to simplify the commit list. A short list only containing one entry per finished feature might look prettier than a long commit list containing plenty fix commits. But no human is ever reading the commit history like that, so go with function over aesthetics!