Warning: This post is awful if you expect a practical introduction to code review. Most of the article is not concerned with code review at all, and instead choses to establish a frame of thinking about code. My git netiquette contains practical advice on how to use Git(Hub) (during code-review), and Sarah Vessels and Phil Booth provide nice, holistic overviews.
A definition of code
Code is language. Like with spoken languages, there’s a multitude of languages. Think of code as a group of languages that is optimized to condense the expression of logic (algorithms). Like „regular“ language, code has a grammar. The difference is that the grammar is less flexible. There’s also a smaller vocabulary of words. Those tight limitations are a feature, not a bug: they enable to express logic in a condensed manner.
Unlike spoken language, code really only exists in written form. That’s because the end goal is to feed code as instructions to machines, not to use it for communication between humans. Code is not optimized to be spoken out loud, in fact it’s terrible for that. All sorts of non-spoken or hard-to-speak characters are frequent, like brackets, hyphens, arrows, and so on.
When we “code”, we write text in languages which - in theory - should be easier to learn (because more restricted) than spoken languages. The difficulty of authoring code text does not lie in a difficult grammar or a wide vocabulary, but in thinking in the logical patterns provided by the code language, and using them to express the precise instructions that make up the software.
In a team
When multiple people edit a shared document simultaneously, different tools are in order than when one person writes. When writing “regular” text in a team, nowadays we use collaborative real time editors, like Google Docs, Notion, et. al.
In a software developer team, all members edit the same text documents at the same time. But we don’t use a collaborative real time editor.
Why is that?
It’s because code is so precise that it’s fragile. In a “regular” text document, there’s no bad side effect for person a
when person b
hasn’t completed their last sentence in a different paragraph, or person c
used invalid grammar. Code is not as forgiving: if there’s a mistake in any of the documents, usually the whole application cannot be executed anymore.
So for teams of developers to work productively, we have a more old-school, elaborate flow: We copy the latest agreed upon revision of all documents in the document folder onto our own computer, and make our own changes locally. Everyone is independent from everyone else. Besides us not breaking each other’s application while coding, this separation allows us to release individual developer’s document revisions as soon as they are ready, even if other people in the team take longer for their current “chapters”.
Git Workflow
Crucially, there’s tooling to manage the copying of document revisions, their history, and to synchronize them between the local computer and a server. The most popular code versioning tool is called Git1.
Before starting to change code, developers create an independent local copy of documents. To do so, the developer creates a new Git branch
of the code2, usually based on the latest document versions stored on the Git server. The Git server is used as source of truth for the team, keeping track of all document revisions which have been synchronized to it. The new branch
initially only lives on the developer’s computer.
Then, the developer makes changes to the text files on their local machine. The text editors developers use have spell-/grammar, and style-checking. This is similar to text processors like Word, but rules and their validation are more strict, since code validity can be assessed easier than regular language, due to the reduced amount of allowed permutations. The developer bundles chunks of related text changes in Git commit
s. Those “packages” of changes are then synchronized with the Git server.
Once the code changes are complete, they need to be integrated into a branch, whose document revision’s code is executed on an internal staging environment, or even the end user environment. To archive that, the developer uses the Git server’s UI to open a pull request
(PR) 3 from the branch
containing their document revisions to the target branch. The UI displays:
- the changes which will occur in the target branche’s documents once the new revisions are merged in
- a description of the changes
- all revisions that were made in the developer’s
branch
in the form ofcommit
s that are not present in the targetbranch
Lastly, another developer will go over the code changes as reviewer.
Code review
Code review is developer’s form of proof reading. The reviewer acts as “editor” of the code text, making sure that the code document can be published. The reviewer will spend time with the code text, to understand it, and then help to get a more polished version live. The reviewer reads the condensed logic contained in the code documents like a translator reads a document in a foreign language: Extracting the meaning encoded in the source. The reviewer will then compare this interpretation of the source code documents in their head to the task that the document changes are intended to archive. Everyone handles code review differently, but here’s the main concerns we attempt to address by reviewing code:
- Ensure that the software fulfills the intent. I assume that this concern for “immediate quality” is the number one concern causing the code review process to be adopted. At the same time - from my experience - fulfilling this concern is often not requiring code review. That’s because developers have help assessing whether the code does what it’s supposed to do: Product mangers and quality assurance specialists test whether the software works as expected and is bug free, functionally speaking. For that they don’t read the code’s text, but instead use the software directly, which is encoded in the code text.4 To be clear: There’s aspects of software correctness that are unlikely to be noticed in internal functional testing without reading the code text, like assessing hairier aspects of security, or noticing race conditions.
- Make sure that the code is readable, and maintainable. By “maintainable” we mean that future “writers” will be able to read the code text, and afterwards be able to make quick, confident changes to the prose, without breaking existing code’s grammar or logic. Esthetics play a role. Code text can be beautiful like poetry, in form and meaning. In practice, code review spends a lot of energy on readability and maintainability. It is possible to objectively assess when code is unreadable and unmaintainable. At the same time it is very subjective, which code text (changes) fare best on those dimensions. It’s for certain that the beauty of code lies in the beholder’s eyes. For me personally, the simplest solution is usually the most elegant. In this subjective critiquing of each others work lies most value of code review for the software in the long run. Code that can’t be understood by anyone (other than their original author) can’t be changed anymore. Unless the software already reached it’s truly “finished” state, it is dead.
- Learn from and teach each other. Our job and often personal goal is to be masters in thinking in and expressing ourselves in the code language.
An alternative (or addition) to code review is pair programming, where two developers author the new code together. Whether or not pull requests are useful and or preferable over or inferior compared to pair programming is a separate discussion.
Why is code (review) not automated?
The intended outcome of code text changes is transported in many different forms: design files in Figma, text in fuzzy language in a Linear ticket, ad-hoc Slack discussions, points discussed in past conversations with colleagues, common sense, … - as of now, we do not have automated tools available that can collect and interpret all of this context, and compare it to code text changes.
Once we have such tools, most activities developers spend time on today will have been automated away, since generating code itself is the “easy” part. So far, the closest we’ve come in automated tools which compare what code does to the intent, is automated tests. For most automated tests, a human still needs to express the desired test cases in a formal, logical test language, similar to the code itself. We also haven’t automated building working end user software. Most movement in that area has come in the form of WYSIWYG5 / no-code tools, which move the abstraction layer further up. For those, the tool provider’s developers pre-builds recurring software features as “widgets” / “lego stones” that can be combined and configured to a limited degree by the user. However, there’s movement in tooling to augment developer’s writing activity with the help of AI, like GitHub Copilot.
Footnotes
-
Git has been created by Linus Torwalds, who wrote the Linux kernel. Git was created to allow developers to collaborate on the Linux kernel. The linux kernel probably powers most computers on the world. ↩
-
This naming comes from the fact that Git uses a tree to keep track of document changes. ↩
-
Popular collaborative Git servers are are e.g. GitHub or GitLab.
Pull requests
are also referred to asmerge requests
. ↩ -
This requires the pull request’s software to be distributed. ↩
-
“What you see is what you get” ↩