Clean Code
Introduction
"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
-- Martin Fowler
The term "clean code" refers to code that is easy to read, understand, and modify. It was popularized by Robert C. Martin in his book Clean Code. It is a simple concept, but it's one that many developers struggle with. Writing readable code is a skill that takes time and practice to develop. It requires attention to detail, and a willingness to refactor and rewrite code until it is as clear and concise as possible.
Reading code is a big part of a developer's job. You will spend more time reading code than you will writing it. This is why it's so important to write code that is easy to read. If you write code that is hard to read, then you're making life harder for yourself and for your colleagues. A codebase that is hard to read will be causing problems for years to come.
The book Clean Code is a good starting point if you want some good tips on writing easier to understand code, and I recommend reading it at some stage in your career. That being said, I think that the book is a little dated, and mixes some questions recommendations in with the good ones. This guide is an attempt to distill the best advice from the book, and from my own experience, into a more concise form.
Why Clean Code Matters
A bad codebase gets worse over time. Messy code has a kind of gravity that attracts complexity. If you don't keep your code clean, then it will get harder and harder to work with. You'll find that you're spending more working around issues and less time adding new features.
Andy Hunt gives a great analogy for this:
“One broken window — a badly designed piece of code, a poor management decision that the team must live with for the duration of the project — is all it takes to start the decline. If you find yourself working on a project with quite a few broken windows, it’s all too easy to slip into the mindset of ‘All the rest of this code is crap, I’ll just follow suit.’”
-- Andy Hunt
If a codebase is already bad, then developers feel less pressure to organize and refactor their work before adding to the project. This leads to a snowball effect where the codebase gets worse and worse over time. It's much easier to keep a codebase clean than it is to clean it up once it's messy.
Formatting
Developers like to debate coding style a lot. I've seen long arguments about all of the following:
- Should we use tabs or spaces?
- Should the opening brace be on a new line?
- What order should the imports be in?
- Should I use wildcard imports?
- How much white space should I leave?
Every developer ends up with their own preferred answers to these questions, and they’re all equally right or equally wrong. The best style guide is always the one that everyone on the team uses. Rob Pike summed this up about the Go programming languages:
"Gofmt's style is nobody's favorite, but gofmt is everybody's favorite."
If you aren't familiar with gofmt, it's a tool that automatically formats Go code. It's a simple tool, but it's very powerful. It means that every Go codebase looks the same. You don't have to worry about what style to use, because the tool does it for you. The style it produces might not be your favorite, but it's consistent, and that's the most important thing.
As long your team agrees on a uniform style, then the exact style you use doesn't matter. The important thing is that everyone uses the same style. Whichever style you pick, you should configure your IDE to auto-apply that style to all code that you write. Pair that with a linter that enforces the style guide on each PR, and you'll have a consistent code base.
Mismatched styles are the problem. They dramatically increase the cognitive load. The code base should look like it was made by a team and not a collection of individuals. If you're working on a team, then you should follow the team's style guide. If you're working on your own, then you should pick a style guide and stick to it.
Code with inconsistent styling is much harder to read. It's like reading a book where every page is in a different font. It's possible to read, but it's much harder than it needs to be.
Naming
Here's a simple block of Kotlin code with intentionally bad naming:
fun checkNum(): Int {
return if (n < max) {
-1
} else {
1
}
}
What does this code do? It's hard to tell, because the names are basically useless. The function name checkNum
doesn't tell us what the function does, and the variable names n
and max
are equally unhelpful. It also seems awkward that the function returns -1
if n
is less than max
, and 1
otherwise. Why not return a boolean value?
fun hasTakenDamage(): Boolean {
return playerHealth < maxHealth
}
Simple by making the names more meaningful, it becomes clear that this function checks if the player has taken damage. The code is now self-documenting. We can clearly see what it's purpose is without looking outside of the function.
- Descriptive: They should tell you what the variable or function does.
- Detailed: They should be specific enough to be useful, but not so specific that they are misleading.
- Informative: They should tell you how the variable or function is used.
- Consistent: If you use
get
as a prefix for one function, use it as a prefix for all functions that do the same thing. - Pronounceable: If you can't say the name out loud, it's probably not a good name.
- Searchable: If you can't find the name in your codebase, it's not a good name.
- Visually Different: It should be easy to tell the difference between two names at a glance.
Descriptive Names
A common bad practice is to use single letter variable names. This is a holdover from the days when memory was scarce and every byte counted. Nowadays, memory is cheap and plentiful, and we should use more descriptive names. Consider something like this:
var d: Int
What does d
represent? It's impossible to tell without looking at the code around it. Compare that to this:
var daysSinceLastLogin: Int
It's clear that this variable represents the number of days since the user last logged in. The extra typing is worth it for the clarity it provides.
As a side-note, adding a comment to explain the variable when it is declared is the wrong solution.
// Number of days since the user last logged in
var d: Int
Comments can be useful, but are a terrible choice for clarifying a variable name. The comment won't show up to help you when you're using the variable. It doesn't show up in autocomplete, and it doesn't show up when you're reading the code where the variable is used. It's much better to have a descriptive variable name.
Many names might be superficially descriptive, but not actually tell you what the variable does.
var message: String
var category: String
This names might be fine, but without context, it's impossible to tell what they represent. Compare that to this:
var errorMessage: String
var listingCategory: String
These still require some context to understand, but they are much more descriptive than the previous examples.
Concise Names
your names should avoid extraneous information. Anything that isn't needed for understanding the code should be left out. For example, it used to be common practice to include the variable type as part of the name. Using it as prefix is called Hungarian notation, and is still seen in some code bases. With modern IDEs that provide type information on hover, this practice is no longer necessary.
var messageString: String = ""
var strCategory: String = ""
var iCount: Int = 0
This can be a particular problem when the type information is inaccurate. If you change the type of the variable, but forget to change the name, then you've introduced a bug. It's better to let the IDE handle the type information, and keep the name concise.
var messageList: String= ""
Maybe this variable is being used to store a comma-separated list of messages, but maybe not. The name messageList
is either misleading or just imprecise. It implies that it's a List
of messages, but this doesn't match the type.
Another source of excess length often comes from including noise words in the variable name. Anything that doesn't add context should be avoided. For example, consider this:
var theMessage: String
var messageVariable: String
var userObject = User()
var userInfo = User()
var userDate = Date()
class InfoManager()
These extra words don't add any information. They just make the name longer and harder to read. Leave them out.
Pronounceable Names
It's important that your names are easy to say out loud. You will frequently be discussing your code as a team, and you want to be able to refer to variables and functions by name. If you can't say the name out loud, then it's probably not a good name.
var fnfMessage: String
var ftsResp: Response
These names are hard to say out loud, and unless you know the codebase in detail their meaning is obscure. It's much easier to say:
var fileNotFoundMessage: String
var featureToggleServiceResponse: Response
Visually Different Names
It should be possible to tell the difference between two names at a glance. This is particularly important when you're scanning through code quickly. If two names are too similar, it's easy to confuse them.
var isAccountInformationOfUserRequired: Boolean
var isAccountInformationOfAdminRequired: Boolean
Without reading the names carefully, it's easy to confuse these two variables, and in this case the different looks like it might be important. It's better to make them visually distinct. One option is to move the difference to the end of the name, where it's easier to see.
var isAccountInformationRequiredForUser: Boolean
var isAccountInformationRequiredForAdmin: Boolean
It's better to try to avoid this situation entirely. If you find yourself with two variables that are too similar, then you should consider renaming one of them. I've seen quite a few bugs caused by developers confusing two variables that were too similar.
Consistent Names
If you use a particular naming convention, then you should stick to it. If you use get
as a prefix for one function, then use it as a prefix for all functions that do the same thing. If you use m_
as a prefix for member variables, then use it for all member variables. Consistency makes the code easier to read, because it means that you only have to learn the naming convention once.
employee.getId()
candidate.id.get()
supervisor.retrieveId()
dependent.showId()
When a new developer is first interacting with your codebase, they should be able to quickly understand the naming conventions that you use. If after their first day they make a guess at how to get an id
from an object, and they're right, then you've done a good job.
Consistent naming is particularly important for class names. I once worked on a codebase that included the following class names:
class AlertProcessor { }
class MessageProcessor { }
class SlackProcessor { }
class MailProcessor { }
These classes all did very different things, but they all had the same name structure. It was very easy to confuse them. After taking some time to understand the code, I chose to rename them to:
class AlertCoordinator { } // Owns the alert lifecycle
class MessageBuilder { } // Builds messages to be sent
class SlackSender { } // Sends slack messages
class MailSender { } // Sends emails
The original naming obscured the purpose of the classes, and made it harder to understand the code. The new names made it clear what each class did, and made the code easier to read.
Rename Variables Frequently
Try to avoid agonizing over the perfect name. It's better to pick a name that's good enough, and then change it later if you find a better one. This is particularly important when you're first writing the code. You might not know the best name for a variable until you've written the code that uses it. Just change the name when you know more about how it will be used. Modern IDEs make it easy to rename variables, so there's no reason not to do it.
The only caveat to this is that you should avoid renaming variables across a shared codebase as part of other feature development. You will end up with a giant PR that is hard to review, and it will be difficult to understand the changes that were made. You might also cause merge conflicts. Make the name change as part of a separate PR, so that the change is easy to review and understand.
This shouldn't discourage you from renaming variables or functions, even ones that you didn't write. If you see a bad name, change it. It will make the code easier to read, and it will make it easier for the next person who reads the code.
Comments
Different authors have very different opinions on comments. Robert Martin, in his book Clean Code, says that comments are a failure to express yourself in code. He argues that if you need a comment to explain what your code does, then you should rewrite your code to make it clearer. He also points out that comments are often out of date, and that they can be misleading.
On the other hand, John Ousterhout, in his book A Philosophy of Software Design, says that comments are a necessary tool for explaining why code does what it does. He argues that code can tell you what it does, but it can't tell you why it does it. Adding comments allows developers to explain the intent behind the code. This can be very useful when you're trying to understand a complex system.
My opinion tends to fall somewhere in the middle. I've encountered comments that were extremely helpful in understanding a piece of code, and I've encountered comments that were completely useless. That being said, as I've gained more experience, I've found that including some comments to be very useful. Good comments can speed up code reviews, and result in needing to answer fewer questions from colleagues.
Prioritize Readable Code
If your code is clear and easy to read, then you won't need as many comments. The code should be self-explanatory. Before you start adding a comment to explain what the code is doing, consider whether you can make the code clearer. Only when the code cannot be made clearer should you add a comment that explains what the code does.
There are a lot of good reasons that code might not be clear. It might be complex, or it might be dealing with a complex problem. It might be working around a bug in a library, or it might be dealing with a performance issue. In these cases, a comment can be very useful. Focus your comment on why the code does what it does, rather than on what the code does.
Commenting Interfaces
One place where comments should almost always be included is when you are defining an interface. No matter how clear your function names are, comments will add useful context. They can explain what the function does, what the inputs are, and what the outputs are. They can also explain any exceptions that the function might throw, or edge cases that the function might handle.
/**
* Returns the user with the given ID.
*
* @param id The ID of the user to return.
* @return The user with the given ID, or null if no user exists with that ID.
* @throws DatabaseException If there is an error accessing the database.
*/
fun getUserById(id: Int): User?
This captures a lot of information about the function that would be hard to glean from the function signature alone. Knowing the cases where the function might return null, or throw an exception, can be very useful when you're using the function.
Avoid Redundant Comments
The comments many developers are taught to write in school are often redundant, and generally useless. They simply repeat what the code is already saying. These comments are not helpful, and they can be misleading if they get out of date.
counter += 1 // Add 1 to the counter
Comments on each variable or function are also redundant. If you have to explain what a variable does, then you should probably rename the variable.
var userCount: Int // The number of users in the system
These are just noise. They don't add any information, and they make the code harder to read instead of easier.
Explain the Why
Comments should be used to explain why the code does what it does or to explain the intent behind the code. They should only be used to explain what the code does in exceptional circumstances. Developers can read most code to figure out what it does, but they can't read your mind to figure out why you wrote it that way.
// We use a list here instead of a set because we need to maintain the order of the elements.
val users = mutableListOf<User>()
Keep Your Comments Near the Code
Comments should be close to the code that they explain. If you have to scroll up and down to read the comment and the code, then the comment is in the wrong place. It should be right next to the related code.
This also helps with keeping the comments up to date. If the code changes, then the comment should change too. If the comment is right next to the code, then it's more likely to be updated when the code is updated.
Use the Commenting Style of the Codebase or Company
Different companies and codebases have different approaches to comments. Some companies require comments to be in a specific format and in specific locations. Some codebases might have no comments at all. If you're working on a team, then you should follow the team's approach to comments.
Deviating can cause serious issues when you're working on a team. If you're the only one writing comments, then other developers won't think to update them when making changes to the code. This can result in misleading comments that make the code harder to understand.
Clean Functions
Make each function you write clear and understandable on its own. There are a few rules that can help you write clean functions:
- Small Functions:Functions should be small. The book Clean Code suggests that functions should be no more than 20 lines long. I object to having a strict line limit, but the idea is sound. Functions should be small enough that you can understand them at a glance. If you have to scroll to see the whole function, then it's too long. I view small functions as a side effect of the rest of the advice given here. If you concentrate on making your functions do one thing, and do it well, then they will naturally be small.
- Single Responsibility Principle: The single responsibility principle is most commonly referred to in the context of classes, but it applies at all levels of abstraction. A function should do just one thing. If a function does more than one thing, then it's harder to understand. It's also harder to test, because you have to test all of the different things that the function does.
- Avoid Side Effects: Having some side effects is unavoidable, but you should try to minimize them. A function should do what it says it does, and nothing else. If a function has side effects, then it should only have side effects. It should not also return a value, or throw an exception. This makes the function easier to understand, because you only have to look at the function signature to have a good idea of what the function does.
- Readable: You should be able to read your function from top to bottom and understand what it does. If you have to jump around the function to understand it, then it's too complex. You should be able to understand the function by reading it like a book.
Other Tips and Examples
Avoid Flag Variables
Flag variables are variables that are used to control the flow of a function, and according to Clean Code, should generally be avoided. For example, this function uses a flag variable to determine whether to transform a message to uppercase or lowercase:
fun transformMessage(message: String, toUppercase: Boolean): String {
if (toUppercase) {
return message.uppercase()
} else {
return message.lowercase()
}
}
fun main() {
println(transformMessage("Hello, World!", true)) // HELLO, WORLD!
println(transformMessage("Hello, World!", false)) // hello, world!
}
Robert Martin argues that flag variables are sign that the function violates the single responsibility principle. The function is doing two things. I disagree with his reasoning, but agree with his advice. Flag variables should be avoided because:
- Unadorned
true
andfalse
statements in code render it opaque and unclear. - Replacing the
true/false
with static constants would at least make this more readable, but doesn’t really solve the root problem. - Anything that is short enough that it’s still readable with a flag variable should probably be just inlined in most cases. Otherwise, refactor into two functions.
Hidden Side Effects
Take this function for example:
fun checkPassword(name: String, password: String): Boolean {
val user = UserGateway.findByName(name)
if (user != null && user.password == password) {
Session.initialize()
return true
}
return false
}
Initializing the Session
object is a side effect that is not immediately obvious from the function signature. This can lead to bugs when the function is used in unexpected ways.
Commented out code tends to stick around. Other developers are afraid to remove it because they don't know why it was kept. If you need to keep the code around for reference, then use version control. Don't leave it in the codebase.
No commented out code should ever be merged into the main branch.
Magic Numbers
This function has two main issues:
fun notifyAdultUsers(users: List<User>) {
users.forEach { user ->
if (user.age >= 18) {
sendNotification(user)
}
}
}
The first issue is that the number 18
is a magic number. If adult
is a concept in your domain, then it should be defined in a constant. This function also has a second issue, which is that it does two things. It filters the list of users, and it sends a notification. This violates the single responsibility principle.
fun notifyAdultUsers(users: List<User>) {
users.filter { isAdult(it) }.map { sendNotification(it) }
}
private fun isAdult(user: User): Boolean {
return user.age >= ADULT_AGE
}
A commonly encountered argument against Clean Code is that it can have performance impacts. Some of the advice given in the book will often lead to slower code. Some examples include:
- Prefer polymorphism over conditionals
- Hide implementation details
- Small functions
- Dry code
All of these will have some performance impact. However, the performance impact is usually negligible and in some cases, it is handled through compiler optimizations. The performance impact of these changes is usually outweighed by the benefits of cleaner code. Optimizing every line of code is a foolish waste of time. You are better off prioritize your time over the computer's time.
Using "high performance" as an excuse to write bad code is almost always counterproductive. To create performant code you need to maximize the amount of work in the smallest number of instructions. Simplicity is the best way to achieve this. A complex piece of code likely obscures the simplest possible route through the system.
Code Smells
"A code smell is a surface indication that usually corresponds to a deeper problem in the system."
-- Martin Fowler
A code smell is something that is quick to spot and should lead you to look deeper. They mark places where further investigation is required. These smells aren't problems themselves, and might be perfectly fine in some contexts. However, they are a hint that there might be a problem.
Every code smell can be appropriate in some situations, as long as it is the product of a conscious choice. If you're using a code smell, then you should be aware of it, and you should be able to explain why you're using it. If you're using a code smell because you're lazy, or because you don't know any better, then you should fix it.
Not every smell should be eliminated. The goal here is just to be aware of them. If you're aware of the smells in your code, then you can make a conscious decision about whether to keep them or not.
Application Smells
Some code smells are signs that the application as a whole is not well designed. Addressing them might require large scale changes to the application but are likely to have the biggest impact.
- Duplicated Code: If similar code exists in more than one place, then it might be a sign that the code base isn't well understood. It also might indicate that developers haven't been given enough time to extract shared logic into common classes or functions.
- Shotgun Surgery: When fixing a defect or making a change causes you to have to make many changes in many different places, then you have a problem. This is usually a sign of tight coupling between different parts of the code base. It can make bug fixes time consuming and error prone.
- Contrived Complexity: When the code is more complex than it needs to be, then you have contrived complexity. This can be caused by developers trying to be clever, or by developers not understanding the problem they're trying to solve. It can also be caused by developers trying to solve problems that don't exist yet. Engineers like being clever, but it's usually better to be simple.
Class Smells
There are also code smells to look out for at the class level. These smells can indicate that the class is doing too much, or that it's not well designed.
- Large Class: If a class is too large, then it's probably doing too much. A class should have a single responsibility. Large classes tend to be unwieldy and hard to understand.
- Freeloader: A class that doesn't do much, but relies on other classes to do the work for it, is a freeloader. It forces developers to work through an extra layer of abstraction that doesn't add any value.
- Downcasting: Needing to downcast or to check the type of a subclass is usually the sign of an incorrect abstraction. It bypasses many compile-time checks and can lead to runtime errors.
- Inappropriate Coupling: When two classes are too tightly coupled they can become hard to work with. Changes to one class can have unexpected effects on the other class. This is usually a sign of bad OO design or a poor interface.
- Parallel Inheritance: When you have to create a subclass for each subclass of another class, then you have parallel inheritance. This is usually a sign that the classes are too tightly coupled, and that the design could be improved.
- Divergent Subclasses: When subclasses have to override a lot of methods, then they are divergent. This is usually a sign that the superclass shouldn't be a subclass at all, and that some design decisions need to be revisited. Why were these classes combined in the first place?
Function Smells
There are also code smells to look out for at the function level.
- Long Function: If a function is too long, then it's probably doing too much. Keep functions small and focused. Ideally they should be no more than 10-20 lines long. In special cases, they can be longer, but they should always fit on the screen.
- Speculative Generality: Beware of any function that is only called from tests, or that is only called from one place. This is usually a sign that the function is unnecessary.
- Message Chains: If you end up with a function that calls another function that calls another function etc, then you have a message chain. These sorts of chains make it hard to follow the flow of logic and can make the code hard to understand. This might be a sign of freeloader classes, but is more often an issue with encapsulation.
- Too Many Parameters: If a function has too many parameters, then it's probably doing too much. Mistakes will happen for this sort of function. Developers might mix up the order of the parameters, or forget to pass one of the parameters. This can be a sign that the function is doing too much, and should be split into smaller functions.
- Excessive Returner: If a function returns more data than it needs to, then it's an excessive returner. This is a common cause of security problems.
Dijkstra famously said that every function should have one entry point and one exit point, but this just isn't true. Multiple returns can make code easier to read, not harder. Guard clauses help to make the code easier to understand, and can help to reduce the complexity of the function. Don't be afraid to use multiple returns when it's called for, just avoid scattering them throughout the function. Keep them localized to the beginning and the end of the function.
Conclusion
You will spend at least 10 times as much time reading code as you will writing it. This is why it's so important to write clean code. Clean code is code that is easy to read, understand, and modify. It's code that is easy to work with, and that doesn't get in the way of adding new features.
Make sure that your code is clear and easy to understand. Your peers, and your future self, will thank you for it. Every developer has had the experience of looking at a piece of code and wondering what the original author was thinking, only to find out the original author was themselves. Don't be that developer. This great comic from Three Panel Soul sums it up nicely: