I’m going to keep a workshop about code review for my colleagues so I’ve collected every idea, practice, suggestion, recommendation into this post.
Any comments and suggestions are welcome.
In this article, I would like to talk about some code review techniques which I consider useful when changes or bug fixes have to be reviewed in legacy code. I have no intention to quote the existing articles, documents, books, because they are good as they are in their current state. I’m focusing more on the practical parts which help deliver quality product quite fast.
By legacy code, I mean the code base, which has not been developed in agile way, and about which the reviewer does not know every detail. This situation is quite common and requires a different approach than reviewing a code base which currently under development.
Before a Code Review
After a developer finishes a task, a user story or a bug fix, they should ask somebody for a code review. Developers usually do not like doing code reviews. There are a lot of pros and cons, but let’s assume that the developer would like to deliver quality product and understands the benefits of code review:
faster feedback on the work done
knowledge distribution (the reviewer will learn about the change and may see new techniques or tricks)
finding introduced bugs faster and cheaper
In companies where agile has just been introduced, management tends to say that code review is a waste of time, because if you are doing pair programming, you review the code while you are writing it. This is a common mistake, because pair programming does not replace code review. During implementation, two people know about the change. If they ask for a review, a third person will also know about the change. This is one way of knowledge distribution. On the other hand, if two developers introduce a change, and a senior developer says that the change is fine, but there are other ways to do it, ways which the developers were unaware of, they can learn new things. This is another way of knowledge distribution. This view is usually ignored, but it is very important.
Even with knowledge distribution there is no guarantee that the pair finds and implements the right solution. A code review after pair work can give very useful feedback on the performance of the pair. They’ll know what they did was right under the given circumstances. Also, the team introduced another useful feedback loop (first feedback loop during pair programming, second during review, third during continuous integration etc.)
One can argue what will happen if two senior developers make a change together. In this case, the reviewer can learn from their work and bring another perspective into the game. Imagine a case where the seniors have been working on the project for three years. They know every single detail about it. This knowledge is very valuable. On the other hand, they might be unable to think outside the box. A less experienced, but fresh mind can show them something that may make the change more robust, less resource consuming or something even better.
This is not a must, but a recommendation. If the team members are very experienced and they decide not to do code review after pair work, I can understand that. Nevertheless, I recommend doing reviews when somebody is doing solo work. Joshua Kerievsky said once that most of the problems in their code was introduced when somebody did solo work.
For a code review, the developer needs one or more reviewers - I prefer more. The characteristics of a good reviewer candidate:
He or she
did not contribute to the change
has a lot of business knowledge
knows the programming language and the code
has some testing skills
I agree that people who have these skills are very hard to find, still it is crucial to pick the person most skilled in the fields described above for review. The first three options are obvious, therefore check the fourth. The code written has to be testable. This does not mean only unit testability, but also acceptance testability. Finding a person with such testing skills is even harder. Maybe it’s worth talking to a tester and tell them what the change is about, so that they can give early feedback.
Do not write review request to mailing lists. A social psychology study proves that if a larger group is asked for help, one will have less chance of getting help than asking a small group or an individual. Find the person who fits best, and ask him or her directly. For example, if you have changed code that works with databases, ask the database expert of the team for review. In this particular example, it is very important to check certain aspects like data loss, performance, database size. It is perfectly fine that everyone dares change database related code base, but like with security related matters it’s worth having the expert review the change to avoid unnecessary error reports.
The First Important Step
As a first step, it’s worth checking the following list:
the continuous integration is green - if not available: the code compiles
the code follows the agreed code conventions
there are comments in the version control system describing the change
unit tests are available
If any of the items are missing, I recommend not to continue the code review, because every hour a developer spends away from development is considered as waste. A good review takes time - sometimes it takes minutes, but it can take also days -, but there is no need to increase this time unnecessarily, if it can be avoided by following some simple rules. If not everybody is aware of the rules, I suggest having a working agreement and making these rules clear for everybody.
Reasons for not continuing without all of the requirements above met:
The fact that the code does not compile suggests that the developer did not finish his or her work, and the dependencies have not been check for some reason. After performing this check, the code may change, so it will have to be reviewed again, which in the end will render the time spent on the first review wasted.
Following the code conventions is one of the XP techniques worth applying, because it makes comparison much easier. Imagine the situation when you have a piece of code which follows the conventions, but somebody commits some refactoring without the conventions applied. You can spend a lot of time separating the logical differences from the formatting differences. People tend to be stuck checking the formatting differences. That is not really necessary, because tools can do it for them. Therefore it is in everyone’s best interest to review only the logical parts.
It is hard to understand legacy code. If somebody changes it, and does not make a note about the contents of the change, the original code becomes more incomprehensible. Not every version control system allows changing the commit messages, but if yours does support it, ask the developer to add the comment.
Checking in code without testing is very dangerous and not recommended. If someone checks in a piece of code without test cases, that can suggest that the change wasn’t verified. After testing, the author may change code, so the effort spent on review will become a waste again.
During the Code Review
Code Review Methods
There several ways to do code review, but for now, consider only these two:
the author explains the code to the reviewer and they check the code together - the feedback is immediate
the author checks the code alone, and gives feedback after the review
I recommend doing the code review alone. Similar to testing, the author/developer can force the reviewer/tester in one direction so that they miss an important detail. Of course, if the reviewer has questions, he or she shall ask the author, but do not let the author take control over the review session.
There are plenty of ways to find out what has changed and where. It is a common way of working to take the diff/compare tool of the version control system and review the code there. I suggest not doing that. It is a very good thing to have the list of the affected files and the changed areas, but this will narrow down the view of the reviewer to the lines affected by the change. If they do the review with a compare tool, they can check that the change is good, but will have a hard time finding out what is missing; for example an important dependency injection or an exception handler.
If the checklist is complete, the reviewer shall try to understand what the change is about. If it is a change, check the product backlog, if it is a bug fix, check the problem description. Without knowing what the change is about it is very hard to do quality work. Hence, it is better to start the review with the review of the unit test cases. They can tell you about the nature of the change, and you can decide whether the change is correct or not. So always start with the test cases.
If you find something that is not included in the checklist above not stop the review just then, even if it is a major problem. I’ll talk about this later, but the code review has to be performed until the quality of the code satisfies the reviewer and the team. If the code contains 2 major and 4 minor findings, and the reviewer highlights only the first major one and stops the review, an important quality feedback is missed.
Separation of Changes
Usually, a change in the code affects only a few lines. However, agile teams tend to do a lot of refactoring, which is a good thing, but it is a nightmare when it comes to code review. A particular change consists of two things:
the changed parts
If you face some code where the refactored and the changed parts were checked in together, you have no other option but to talk to the author, so that he or she can help you separate them. If you are lucky, the change affects one file, and the refactoring another twenty. It is not cheating to point out where the change is, so that the reviewer does not waste a lot of time reviewing refactoring that has been done automatically by the editor tool. Therefore, you can check the refactored part later, or not check it at all and let the regression test suite verify it.
Nevertheless, if this is the case, you can have one comment for the author: do not commit refactoring together with the changes. Try to do them separately. It kind of fits into the idea that says write test cases first; refactor the code, so that the change is obvious; do the change and refactor again. Following the commit recommendation above, this means three individual commits: one after the first refactoring, one after the change, and one at the end. With this approach, the reviewer can prioritize his or her review process: change must be checked every time, and if there is little time left, the reviewer can decide whether to check the first refactoring. I would check it as well every time, because this is the change which has been made on the legacy code, and it is very important to make sure that the system continues to work as before. The last refactoring can be verified using the regression test suite. If there aren’t any, you have to review that part as well.
Large companies have templates even for code review. Usually these templates contain a category for review findings. They are used for deciding whether to perform the suggestions of the reviewer. Having categories is fine, but treat the review findings as user stories instead. Show the findings to the author and the team, and decide together how to continue. It makes no sense to implement only the major findings or a number of small items. Every code review shall be ended with a discussion between the author and the reviewer(s) - and, optionally, the team. If a finding makes no sense, or ignoring them does not harm the current code base, you can skip it.
A common mistake I observed is that people are using the version control system’s diff tool or an external tool called by the version control system for code review. They are focusing on the changed part only, which saves some time for them, but can cost a lot later. Have a look at the example below:
1 2 3 4 5 6
If you check only in the class scope, it is a fine operation. On the other hand, another class has a reference:
1 2 3 4 5 6 7 8
Using just a compare tool you will miss this impact. The recommendation here is to check the changes in the IDE you are using and check the references as well. If you see a field without the private modifier, be curious and check the references. If you are lucky, the regression test suite will find the problem later, but usually legacy systems do not have a lot of regression test cases.
Look for common mistakes
Have a look at Jeff Atwood’s Top 25 list. Look for these problems in the changed code.
If you see a change which affects code protected by the synchronized keyword, look for the wait, notify and notifyAll calls in order to make sure that the change does not introduce a [dead] lock situation.
Abuse of design patterns
Design patterns are good, but people sometimes abuse them. Use the right pattern for the right task. Do not solve everything with the factory pattern or with the command pattern. If the change seems to be an introduction of a design pattern, double check the pattern, and make sure that it is implemented properly.
Look for duplication
There is no way that every team member has a deep knowledge of the legacy code base. Therefore it may happen that somebody introduces a utility method or class that already existed. These utility classes have very unique characteristics: no operation on fields, descriptive and general name, used in different places in the changed code. Do a search on the name, or the used method, and there is a big chance that the utility method or class already exists somewhere in the legacy code base.
If a change affects the constructor, it may indicate a dependency change. It is a good place to start the review.
Length and complexity of test cases
As I said before, I like checking test cases first, because they tell a lot about the code. Large and complex test cases indicate large and complex code. If you meet such test cases, it’s worth talking to the author about the change before continuing the review. You may be about to spend a lot of time on a code which may have to be refactored or, in the worst case, is not even be ready. So talk to the author, pair up, solve the problem.
Wasting resources (memory, CPU, disk)
I strongly believe that a lot of resource wasting operations can be found by code review. If the changed code operates on streams or javax.sql classes, check that every stream or connection is handled and closed properly. Look for interesting loops and data structure operations.
If a conditional expression has been changed, the first thing I usually do is check the coverage of the branches. It is not part of the standard code review practices, but it helps me find out what is going on, and the possible effects of the change. The next thing is that I check every possible outcome of the expressions. It is important to check for things which may be missing. The test cases can help in this case.
Consistency is very important. Let’s have an example; the class solves every possible loop with while. A change requires another loop, and the fixer decides to use a for loop because he feels more comfortable with it. Even if the code works, after a certain amount time, nobody will remember why we have a lot of while loops and only one single for loop. Although it is a small thing, people will spend some time figuring this out. Eventually, they will end up at a particular commit, which does not make too much sense in their current point of view, and it will be hard to refactor the code, because the for has to be changed into a while. This is also a waste. The recommendation here is to be consistent with the class/package even if the author thinks differently. Do the fix, and refactor the class in two different commits.
Limit your time spent on review. It is easier said than done, but it is easy to get stuck in a code review. Limit yourself so that you’ll check each and every logical change (see prioritization above), and maybe just a couple of the refactoring or minor changes.
Follow the life of the reviewed code, so that you can have a feedback on your review. If the acceptance fails on a trivial error, it’s worth sitting down and checking the review way of working.
After the Review
List of Findings
The result of the code review is a list which contains the findings. It is a good practice to give hints in case of tricky problems. I mentioned above that sending code review invitation to mailing lists is not recommended, but sending the list of findings to the entire team is a good idea. With this approach, everyone can have a look and decide which findings will be implemented - like a small product backlog.
The Next Review
Never have ”accepted after change” as an outcome of the review. Nobody guarantees that the findings will be implemented as requested. Also, usually, a major finding means that additional changes will be done besides the existing ones. So I recommend having the review sessions until the current state of the code satisfies the reviewer and the team. If the changes are minor and the team is fine with them, the second review can be a peer review, but try to keep the basic rule: the first review should be done alone.
If there are a lot of changes, pair up with the author and do the fixes together. In this case, you need a new reviewer afterwards, but this way you can share the new code base with someone else as well.
It is a good practice to keep intermediate reviews if the changes seems to be big, or the author developer has the habit of checking in a large set of files. During these reviews, the first checklist should still be applied, because the earlier you find the problems, the cheaper they will be to fix.