How and why to set Code Coverage Targets: Breaking the arbitrary goal trend

When discussing code coverage with people, particularly outside the TDD community, I often hear them talk about 80% Code Coverage. This is particularly true for legacy code bases. I have also in a previous role been given this as a target. As with many metrics and targets, the question “why” is the most interesting. Or phrased another way, what is the intent behind the goal?

The obvious intent is to encourage developers to write tests, which makes sense, however, I always wonder, why 80%? Is this a magic number where the impact of issues in the remaining 20% is significantly lower? Is it because in a typical codebase 20% of code is too expensive to test? Or, is it simply a number that someone picked

statistics meme.png

It is widely known that low code coverage is bad. But, it is also known that the opposite does not hold true. In fact, high code coverage is not necessarily good. Turning around my question a little, let’s think about why low code coverage is bad. Essentially the issue with low code coverage is that you haven’t tried running your code, you do not know how it will behave when it is run. If you bought a car and the manufacturer said, this is our new model and no-one has tried it yet, I doubt you would be very impressed. Yet with low code coverage, this is the position we put our users in.

tech debt comic.png

With our legacy code, why do we want to improve our code coverage? Simply to reduce the cost of bugs that customers find as we extend/change the code. Therefore, how many serious bugs will we find/prevent by improving code coverage from 79% to 80%? Well, it depends, what is the code that we are testing with that extra one percent. How important is the code? This shows that not every bit of code coverage is equal. Therefore it is entirely possible to take two regression suites for the same code each hitting 80% code coverage and one is significantly better than the other.

Being a triathlete, I do a fair amount of running and was recently thinking, what is a good time to complete a 5k run? I came to the shocking conclusion that it depends. If you’re new to running, 30 minutes might be good. If you're a good runner, it might be 20 minutes. For a professional, 15 minutes or even better. The only consistent quality for a good run is beating a personal best. Something that @Garmin is successfully capitalizing on with #beatyesterday.

What does all of this mean for Code Coverage goals? I am a fan of looking at the trend, not an absolute number. If you are increasing your code coverage every sprint, month, quarter, then great. If your rate of increase is also increasing, even better! Why? Because we know that higher coverage is better, therefore always strive to be better than yesterday. Also, by focusing on being better than yesterday we can encourage developers to test the most critical code first, not the easiest code in search of an absolute goal.

And for those companies that insist on a hard code coverage target? You need to consider first that all code bases are not the same, and how the effort and cost will vary across applications. You next need to find a way to enable your teams to hit those metrics in a time, resource and cost-effective manner.

Here comes the shameless sales pitch. Diffblue Cover can help kick start your quest to improve Java code coverage. Let AI do the work for you.  If you want to find out more, take a look at Diffblue Playground, our website,  request a demo of Diffblue Cover. Or get in touch with me @jgwilson42.