Abstract concepts behind code review

Warning: This post is awful if you expect a practical introduction to code review. Most of the article is not concerned with code review at all, and instead choses to establish a frame of thinking about code. My git netiquette contains practical advice on how to use Git(Hub) (during code-review), and Sarah Vessels and Phil Booth provide nice, holistic overviews.

A definition of code

Code is language. Like with spoken languages, there’s a multitude of languages. Think of code as a group of languages that is optimized to condense the expression of logic (algorithms). Like „regular“ language, code has a grammar. The difference is that the grammar is less flexible. There’s also a smaller vocabulary of words. Those tight limitations are a feature, not a bug: they enable to express logic in a condensed manner.

Unlike spoken language, code really only exists in written form. That’s because the end goal is to feed code as instructions to machines, not to use it for communication between humans. Code is not optimized to be spoken out loud, in fact it’s terrible for that. All sorts of non-spoken or hard-to-speak characters are frequent, like brackets, hyphens, arrows, and so on.

When we “code”, we write text in languages which - in theory - should be easier to learn (because more restricted) than spoken languages. The difficulty of authoring code text does not lie in a difficult grammar or a wide vocabulary, but in thinking in the logical patterns provided by the code language, and using them to express the precise instructions that make up the software.

In a team

When multiple people edit a shared document simultaneously, different tools are in order than when one person writes. When writing “regular” text in a team, nowadays we use collaborative real time editors, like Google Docs, Notion, et. al.

In a software developer team, all members edit the same text documents at the same time. But we don’t use a collaborative real time editor.

Why is that?

It’s because code is so precise that it’s fragile. In a “regular” text document, there’s no bad side effect for person a when person b hasn’t completed their last sentence in a different paragraph, or person c used invalid grammar. Code is not as forgiving: if there’s a mistake in any of the documents, usually the whole application cannot be executed anymore.

So for teams of developers to work productively, we have a more old-school, elaborate flow: We copy the latest agreed upon revision of all documents in the document folder onto our own computer, and make our own changes locally. Everyone is independent from everyone else. Besides us not breaking each other’s application while coding, this separation allows us to release individual developer’s document revisions as soon as they are ready, even if other people in the team take longer for their current “chapters”.

Git Workflow

Crucially, there’s tooling to manage the copying of document revisions, their history, and to synchronize them between the local computer and a server. The most popular code versioning tool is called Git1.

Before starting to change code, developers create an independent local copy of documents. To do so, the developer creates a new Git branch of the code2, usually based on the latest document versions stored on the Git server. The Git server is used as source of truth for the team, keeping track of all document revisions which have been synchronized to it. The new branch initially only lives on the developer’s computer.

Then, the developer makes changes to the text files on their local machine. The text editors developers use have spell-/grammar, and style-checking. This is similar to text processors like Word, but rules and their validation are more strict, since code validity can be assessed easier than regular language, due to the reduced amount of allowed permutations. The developer bundles chunks of related text changes in Git commits. Those “packages” of changes are then synchronized with the Git server.

Once the code changes are complete, they need to be integrated into a branch, whose document revision’s code is executed on an internal staging environment, or even the end user environment. To archive that, the developer uses the Git server’s UI to open a pull request (PR) 3 from the branch containing their document revisions to the target branch. The UI displays:

Lastly, another developer will go over the code changes as reviewer.

Code review

Code review is developer’s form of proof reading. The reviewer acts as “editor” of the code text, making sure that the code document can be published. The reviewer will spend time with the code text, to understand it, and then help to get a more polished version live. The reviewer reads the condensed logic contained in the code documents like a translator reads a document in a foreign language: Extracting the meaning encoded in the source. The reviewer will then compare this interpretation of the source code documents in their head to the task that the document changes are intended to archive. Everyone handles code review differently, but here’s the main concerns we attempt to address by reviewing code:

An alternative (or addition) to code review is pair programming, where two developers author the new code together. Whether or not pull requests are useful and or preferable over or inferior compared to pair programming is a separate discussion.

Why is code (review) not automated?

The intended outcome of code text changes is transported in many different forms: design files in Figma, text in fuzzy language in a Linear ticket, ad-hoc Slack discussions, points discussed in past conversations with colleagues, common sense, … - as of now, we do not have automated tools available that can collect and interpret all of this context, and compare it to code text changes.

Once we have such tools, most activities developers spend time on today will have been automated away, since generating code itself is the “easy” part. So far, the closest we’ve come in automated tools which compare what code does to the intent, is automated tests. For most automated tests, a human still needs to express the desired test cases in a formal, logical test language, similar to the code itself. We also haven’t automated building working end user software. Most movement in that area has come in the form of WYSIWYG5 / no-code tools, which move the abstraction layer further up. For those, the tool provider’s developers pre-builds recurring software features as “widgets” / “lego stones” that can be combined and configured to a limited degree by the user. However, there’s movement in tooling to augment developer’s writing activity with the help of AI, like GitHub Copilot.

Footnotes

  1. Git has been created by Linus Torwalds, who wrote the Linux kernel. Git was created to allow developers to collaborate on the Linux kernel. The linux kernel probably powers most computers on the world.

  2. This naming comes from the fact that Git uses a tree to keep track of document changes.

  3. Popular collaborative Git servers are are e.g. GitHub or GitLab. Pull requests are also referred to as merge requests.

  4. This requires the pull request’s software to be distributed.

  5. “What you see is what you get”