What we are missing on clean code
I don't think anyone disagrees that writing good, clean code makes a better software engineer. There are many books in the software engineering field about writing better, cleaner code that is widely read, such as Clean Code and Refactoring: Improving the Design of Existing Code. They often go into details about writing smaller functions, encouraging test-driven development, separating responsibility, etc.
The Twitter crowd is also fanatical about code readability too, often choosing the more readable option over the more efficient one. Interestingly, discussions about
reduce() is a very helpful function because it accomplishes two important tasks at once: writing unreadable code and showing off how smart you are— eevee (@eevee) May 23, 2021
All code using array.reduce should be rewritten without array.reduce so it's readable by humans *mutes thread*— Jake Archibald (@jaffathecake) January 3, 2020
However, I think the software industry has been looking at the issue from the wrong angle. This is frustrating because code readability is a big issue in many codebases, but the industry opinion and advice are distracting us from addressing it the right way.
Micro code readability is just a distraction
In many of the discussions about
.reduce, detractors argue that it is hard to read and less efficient compared to simple loops. I agree with this sentiment and avoid using it.
However, the surface area of
.reduce is small and contained to the routine it runs in. Encapsulated behind a function, users would not care whether we used
.reduce or loops. The impact of reading the few lines that use
.reduce is fairly small, in the magnitude of seconds to minutes.
I get that
.reduce being a well-known method makes for viral content on Twitter, which is why it comes up so often. This sort of opinion on code readability is, however, on the micro-scale. What we should be focusing on instead, is macro code readability.
Macro clean code
To think about macro code readability, we need to take the perspective of a programmer trying to contribute to an existing codebase. How is this programmer, who is new to the code base, going to find the files relevant to the task at hand, to modify and add new behaviors?
Even if every function in the code base is well named, decoupled, and readable locally, without the knowledge of how the codebase is structured, organized, and architected, it is going to be very hard for the programmer to contribute. Where does he/she even start?
Despite many projects having well-written code, they often do not have an
CONTRIBUTING.md explaining what is going on. How the code directory is structured, what frameworks are being used, how folders are organized, where you should put things are all things that should be communicated to a new contributor.
As an analogy, this is akin to every book in a library being labeled and organized neatly in shelves, but the shelves and floors in the library remain entirely undocumented.
When I contribute to a codebase, I find myself spending the majority of the time trying to understand the structure of the project. Here are a few things that can result in me wasting several minutes:
- Vague names such as "lib", "packages", "common", "src" and "shared", all existing in the same repository.
- Company codenames or project names such as "gardener", "sauron" and "raven", that require an explanation, but aren't explained in the repository.
- Unusual folder structures (e.g. main.js in the root directory along with a "src" folder)
- "Magical" setups hidden by layers of abstractions
Documentation informing new contributors about quirks, structure, and organization of the codebase will go a long way to improving code readability as a whole, and make contributing easier and faster.
When we try to add or modify something, we have a context in mind as to why we are doing the thing. When things are abstracted behind multiple layers of abstractions, or uses different frameworks or patterns, we find it harder to keep track of what is going on. This layering makes it hard to maintain context, which slows readability.
One example of a case where context switching results in code becoming hard to manage is Redux. Despite being the industry standard for state management, software engineers often find it hard to trace through code and figure out what is happening because you have to keep track of your constants, states, actions, and reducers. This is further worsened by projects that adopt convention to split the four things into different files.
And yet, Redux is a benign case of context switching because the pattern is well established. In larger projects, you have to jump through multiple files to arrive at the logic. It is like going through a maze. At some point, you wonder if you have seen this file already before, but you are not sure.
More often than not, these layers of abstractions are caused by a few variables:
- Multiple software engineers handling the same product area, not realizing the same abstraction has been built somewhere already.
- The product is so complex that no one understands it enough to simplify it.
- Software engineers not getting time and resources to refactor the code.
- Modification results in breakages that adding or forking is preferable.
Death by a thousand cuts
These sorts of macro-level decisions are often left out of the discussion about clean code. While readability is more important than performance, it does not mean that performance is not required.
Clean code is certainly important. However, it helps to think of clean code not just in the context of a few lines of code, but also holistically within a codebase. Improving code cleanliness on a macro level can help a lot more in making code easier to understand.