I really like continuous integration, that’s why I’m always sad when I see one dying. Unfortunately, I’ve seen it happen a lot, and in every case its lifeline looked like this:
The introduction of the continuous integration is usually started by an enthusiastic team member who read about it, realized “it was cool”, and it was just what the team needed. She quickly affects the whole team, they set up the build servers and are really happy with it.
The first fracture happens when the first unexpected failed build appears on account of a test case failure. Nobody knows why the test fails, because it works locally, also works on the server, but mysteriously sometimes it just doesn’t work in either of these places. The team is aware of the phenomenon, closes the case saying “don’t worry, we know that sometimes it fails” and moves forward. As time goes on, the team adds more test cases, and another failing test case appears on the horizon. Mysteriously, just as before. The same decision is made, because a team can live with two known failing test cases, can’t it? Unfortunately, this question signs the death sentence of the continuous integration build, because failing test cases decrease the faith in the system, and team members will start ignoring the results, because they are not trustworthy. Criminologists call this phenomenon broken windows syndrome:
Consider a building with a few broken windows. If the windows are not repaired, the tendency is for vandals to break a few more windows. Eventually, they may even break into the building, and if it’s unoccupied, perhaps become squatters or light fires inside. (source: wikipedia)
Occasionally, some of the team members pull themselves together and decide to “fix the build”. They might even get management support. Unfortunately, the broken windows are more powerful than them: while they are desperately trying to fix the test cases, other team members are going forward and building software based on an insecure base and creating more failing test cases: the broken windows are working behind the curtains. At the end, only the initiator checks on the build, because everybody else has already given up the fight with the windmills. This is the end of the continuous integration for the team: “flatline”.
For me, sporadically failing test cases are like diseases, and should be handled accordingly: until it is not known what the problem is, they should be separated from the reliable (healthy) test cases. If a team relies only on reliable test cases and keeps the unreliable ones in a test quarantine, then the team won’t lose its faith in continuous integration and it can live and support the team. Let’s see how to set up a test quarantine. Read more »
