I’m going to keep a workshop about code review for my colleagues so I’ve collected every idea, practice, suggestion recommendation into this post.
Any comments and suggestions are welcomed.
Abstract
In this article I would like to talk about some code review techniques, which I consider useful, when somebody has to review changes or review bug fixes on legacy code. I have no intention to quote the existing articles, documents, book, because they are good as they are in their current state. I’m focusing more on the practical parts, which helps to deliver quality product quite fast.
Under legacy code I mean the code base, which was not developed in agile way, and the reviewer does not know every details about. This situation is quite common and requires a different approach than reviewing a code base, which is right now under development.
Before a Code Review
Preparation
After a developer finishes a task, a user story or a bug fix, he should ask somebody for a code review. Developers usually do not like doing code reviews, there are a lot of pros and cons, but let’s assume that the developer would like to deliver quality product and understands the benefits of code review:
- faster feedback on the done work
- knowledge distribution (the reviewer will learn the change and may see new techniques or tricks)
- finding introduced bugs faster and cheaper
In companies where agile was just introduced, management tends to say that code review is waste of time, because if you are doing pair programming you review the code while you are writing it. This is a common mistake, because pair programming does not replaces code review. During implementation two people knew about the change. If they ask for a review a third person will know about the change, too. This is one way of knowledge distribution. On the other hand, if two developers did a change and a senior developer says that the change is fine, but there are other ways to do it, which ways they are not aware of, the developers can learn new things. This the other way of knowledge distribution. This view is usually ignored, but is very important.
Besides knowledge distribution there is no guarantee that the pair found and implemented the right solution. A code review after a pair work can give back a very useful feedback on the performance of the pair. They’ll know that what they did was right under the given circumstances. Yet again, the team introduced another useful feedback loop (first feedback loop during pair programming, second feedback loop during review, third feedback loop by the continuous integration etc).
One can argue, what shall happen if two senior developers did a change together. In this case the reviewer can learn from their work and bring another perspective into the game. Image the case that the seniors have been working on the project for three years. They know every single detail about it. This knowledge is very valuable. On the other hand they might be unable to think outside the box. A less experienced but fresh mind can show them something may make the change more robust, less resource consuming or something even better.
This is not a must, but a recommendation. If the team members are very experienced and they decided not to do code review after pair work, I can understand that. Nevertheless I recommend doing reviews when somebody doing solo work. Joshua Kerievsky said once, that most of the problems in their was introduced when somebody did solo work.
For a code review the developer needs a reviewer, or more reviewers (I prefer more reviewers). The characteristics of a good reviewer candidate:
- had nothing to do with the change
- has a lot of business knowledge
- knows the programming language and the code
- has some testing skills
I agree that a person who has these skills is very rare, therefore it is very crucial to pick the right person for review. The first three options are obvious, therefore check the forth. The written code has to be testable. This does not mean only unit testable, but also acceptance testable. Finding such a person is even harder, but it’s worth talking to a tester and tell him what the change is about, so that he can give back an early feedback.
Do not write review request to mailing lists (a social psychology study proves that if a larger group is asked for help, one will have less help than asking a small group or an individual). Find the right person who fits better, and ask him directly. For example if you have changed code which works with databases, ask the database expert of the team for review. In this particular example – databases -, it is very important to check certain aspects like data loss, performance, database size. It is perfectly fine that everyone dares to change database related code base, but like under security manners it’s worth having the expert reviewing the change to avoid unnecessary error reports.
The First Important Step
As a first step it’s worth checking the following list:
- the continuous integration is green – if not available the code compiles
- the code follows the agreed code conventions
- there are comments in the version control system what the change is about
- unit tests are available
If any of the items is missing, I recommend not to continue the code review, because every hour a developer spends away from development is considered as waste. A good review takes time – sometimes it takes minutes but can take days -, and there is no need to increase the unnecessary spent times, which can be avoided if the developers follow some simple rules. If not everybody is aware of the rules, I suggest to have a working agreement and make clear these rules.
Why not to continue:
- the fact that the code does not compile suggest that the writer did not finish his work, or for some reason the dependencies has not been check. After performing this check the code may change, which change has to be reviewed again, which makes the first review waste
- following the code conventions is one of the XP techniques, which is worth doing, because it makes the compare much more easier. Imagine the situation when you have a code which follows the conventions, but somebody commits some refactoring without conventions. You can spend a lot of time separating the logical differences from the formatting differences. People tend to stuck checking the formatting differences, which is not really necessary, because tools can do it, so it is everyones best interest to review only the logical parts
- it is hard to understand legacy code. If somebody changes it, and does not put down what the change is about the original code became more incomprehensible. Not every version control system allows to change the commit messages, but if your do support it, ask the developer to add the comment
- checking in code without testing is very dangerous and not recommended. If someone checks in a code without test cases can suggest that the change wasn’t checked and after testing the author may change code, so the review effort became a waste again
During the Code Review
Ways
There several ways to do code review, but for now consider only these two:
- when the author explains the code to the reviewer and they check the code together, the feedback is immediate
- the author checks the code alone, and give feedback after the review
I recommend to do the code review alone. Similar to testing the author/developer can drive the reviewer/tester so that they miss an important detail. Of course if the reviewer has questions he shall ask the author, but do not let him to take over the review session.
There are plenty of ways to find out what has changed and were. It is a usual way of working to take the diff/compare tool of the version control system tool and review the code there. I suggest not to do that. It is a very good thing to have the list of the affected files and the changed areas, but this will narrow down the view of the reviewer to the changed. If he does the review with a compare tool, he can check that the change is good, but will have a hard time to find out what is missing; for example an important dependency injection or an exception is thrown but handled incorrectly.
If the checklist is complete, the reviewer shall check what the change is about. If it is a change, check the product backlog, if it is a bug fix, check the problem description. Without knowing what the change is about it is very hard to do quality work. Hence it is better to start the review with the review of the unit test cases. They can tell you about the nature of the change, and you can decide whether the change is correct or not. So always start with the test cases.
Unlikely the check list, if you find something during the review, do not stop the review at the first finding, even if it is a major problem. I’ll talk about it later, but the code review has to performed until the code’s quality satisfies the reviewer and the team. If the code contains 2 major and 4 minor findings, and the reviewer highlights the first major one and stops the review, an important quality feedback won’t be given.
Separation of Changes
Usually the change on the code affects only a small amount of lines. However agile teams tends to do a lot of refactoring, which is a good thing, but is a nightmare in case of code review. A particular change consists of two things:
- refactored parts
- the changed parts
If you face a code where both the refactored parts and the changed parts were checked in together, you have no other option then talking to the author, so that he can help you separating them. In the good case the change affects one file, and the refactoring another twenty. It is not cheating to point out where the change is, so that the reviewer does waste a lot of time reviewing refactoring, which has been done by the editor tool automatically. Therefore you can check the refactoring part later, or not at all. Let the regression test suite verify it.
Nevertheless you can have one finding for the author; do not commit refactoring with changes. Try to do them separately. It is kind of fits into the idea which says that write test cases first, refactor the code, so that the change is obvious, do the change and refactor again. Following the commit recommendation above, this means three individual commits; one after the first refactoring, one after the change and one at the end. With this approach the reviewer can prioritize his review process; he must check the change every time. In case of less available time, he can decide whether to check the first refactoring. I would check it as well every time, because this is the change which was made on the legacy code, and it is very important to make sure that the system works as before. The last refactoring can be verified using the regression test suite. If there aren’t any, you have to review that part as well.
Categorization
Large companies have templates even for code review. Usually these templates contain a category for review findings. They are used for deciding whether to perform the suggestion of the reviewer. Having categories is fine, but treat the review findings as user stories instead. Show the findings to the author and to the team, and decide together how to continue. It makes no sense to implement only the major findings or setting an amount of small items.Every code review shall be ended with a discussion between the author and reviewer(s) and optionally the team. If a finding makes no sense, or has no harm on the current code base, you can skip it.
Practices
References
A common mistake I observed that people are using the version control system’s diff tool or an external tool called by the version control system for code review. They are focusing on the changed part only, which saves some time for them, but can cost a lot later. Have a look at the example below:
original:
class A {
boolean flag;
public A() {
flag = true;
}
}
the change:
boolean flag = true;
If you check only in the class scope, it is a fine operation. On the other hand, another class has a reference:
class B {
public void foo() {
if (A.flag) {
// something
} else {
// something else
}
Using just a compare tool you will miss this affect. The recommendation here is to check the changes in the IDE you are using and check the references as well. If you see a field without the private modifier, by curious and check the references. If you are lucky the regression test suite will find the problem later, but usually legacy systems do not have a lots of regression test cases.
Look for common mistakes
Have a look at Jeff Atwood’s Top 25 list. Look for these problems in the changed code.
Synchronization
If you see a change, which affects code, protected by synchronized, look for the wait, notify and notifyAll calls in order to make sure that the change does not introduce a [dead] lock situation.
Abuse of design patterns
Design patterns are good, but people sometimes abuse them. Use the right pattern for the right task. Do not solve everything with the factory pattern or with the command pattern. If the change seems to be an introduction of design pattern, double check the pattern, and make sure that it is done in the proper way.
Look for duplication
There is no way that every team member have a deep knowledge of the legacy code base. Therefore it may happen that somebody introduces a utility method or class which has already existed. They have a very unique characteristics; no operation on fields, descriptive and general name, used at different places in the changed code. Do a search on the name, or the used method, and there is a big chance that the utility method or class exists somewhere in the legacy code base.
Dependencies
If a change affects the constructor, may indicate a dependency change. It is a good place to start the review.
Length and complexity of test cases
As I said before I like checking test cases first, because they tell a much about the code. Large and complex test cases indicate large and complex code. If you meet such test cases, it’s worth talking to the author about the change before continuing the review, because you may spend a lot of time on a code, which can be refactored or in the worth case, it is not ready. So talk to him, pair up, solve the problem.
Wasting resources (memory, CPU, disk)
I strongly believe that a lot of resource wasting operation can be found by code review. If the change code operates with streams, javax.sql classes, check that every stream is handled and closed properly. Look for interesting loops and data structure operations.
Conditional expressions
If a conditional expression has been changed the first thing I usually do is to check the coverage of the branches. It is not part of the standard code review practices but helps me to find out what is going on, and the possible effects of the change. The next thing is that I check every possible outcome of the expressions. It is important to check for things, which may missing. The test cases can help in this case.
Consistency Check
Consistency is very important. Let’s have an example; the class solves every possible loop with while. A change requires another loop, and the fixer decides to use for loop because he feels more comfortable with it. Even if the code works after a certain time nobody will remember why we have a lot of while loops and why one single for loop. Although it is a small thing, people will spent some time figuring this out. Eventually they end up at a particular commit, which does not make too much sense in the current point of view and it is hard to refactor the code, because the for has to be changed into while. This is also a waste. The recommendation here is to be consistent with the class/package even if the author things that the previous part can be done differently. Do the fix, and refactor the class in two different commit.
Spent time
Limit your time on review. It is easier to say than doing it, but it is easy to stuck in a code review. Limit yourself so that you’ll check every logical change (see prioritization above) – really everything, and just couple of the refactoring or the minor changes.
Feedback
Follow the life of the reviewed code, so that you can have a feedback on your review. If the acceptance fails on a trivial error, it’s worth sitting down and check the review way of working.
After the Review
List of Findings
The result of the code review is a list, which list contains the findings. It is a good practice to give hints in case of tricky problems. I mentioned above that sending code review invitation to mailing lists is not recommended, but sending the list of findings is a good choice. With this approach everyone can have a look and decide which findings shall be implemented – like a small product backlog.
The Next Review
Never have an “accepted after change situation“. Nobody guarantees that the findings will be implemented as requested, or usually a major finding means a lot more changes than before. So I recommend to have the review sessions until the current state of the code satisfies the reviewer and the team. If the changes are minor changes and the team is fine with it, the second review can be a peer review, but endeavor to keep the basic rule; the first review should be done alone.
Correction
If there are a lot of changes pair up with the author and do the fixes together. In this case you need a new reviewer afterwards, however you can share the new code base with someone else as well.
Intermediate Review
It is a good practice to keep intermediate reviews if the changes seems to be big, or the author developer has the behaviour to check in large set of files. During these reviews the first checklist should be still applied, because earlier you find the problems the cheaper is the fix.
Comments
Leave a comment Trackback