How and why to set Code Coverage Targets: Breaking the arbitrary goal trend

When discussing code coverage with people, particularly outside the TDD community, I often hear them talk about 80% Code Coverage. This is particularly true for legacy code bases. In a previous role, I have also been given this 80% target. As with many metrics and targets, the question I am most interested in is how was this number chosen? What is the intent behind the goal?

The obvious intent is to encourage developers to write tests, which makes sense. But why 80%? Is this a magic number where the impact of issues in the remaining 20% is significantly lower? Is it because in a typical codebase, 20% of code is too expensive to test? Or is it simply a number that someone picked?

statistics meme.png

It is widely known that low code coverage is bad. But the opposite does not always hold true. In fact, high code coverage for the sake of reaching a target is not necessarily good. Turning around my question a little, let’s think about why low code coverage is bad. Essentially, the issue with low code coverage is that you haven’t tried running your code; you do not know how it will behave when it is run. If you were in the market for a car and the manufacturer said, “This is our new model. No one has tried it yet,” I doubt you would trust it to work. Yet with low code coverage, this is the position we put our users in.

tech debt comic.png

So why do we want to improve code coverage of our legacy code? Simply to reduce the cost of bugs that customers find as we extend/change the code. Therefore, how many serious bugs will we find/prevent by improving code coverage from 79% to 80%? Well, it depends. What is the code that we are testing with that extra one percent? How important is the code? This shows that not every bit of code coverage is equal. Therefore it is entirely possible to take two regression suites for the same code, each hitting 80% code coverage, and have one be significantly better than the other.

Being a triathlete, I do a fair amount of running and was recently thinking, what is a good time to complete a 5k run? I came to the shocking conclusion that it depends. If you’re new to running, 30 minutes might be good. If you're a good runner, it might be 20 minutes. For a professional, it might be 15 minutes or even faster. The only consistent quality for a good run is beating a personal best—something that @Garmin is successfully capitalizing on with #beatyesterday.

What does all of this mean for code coverage goals? I am a fan of looking at the trend, not an absolute number. If you are increasing your code coverage every sprint, month, quarter, then great. If your rate of increase is also increasing, even better! Why? Because we know that higher coverage is better, therefore always strive to be better than yesterday. Also, by focusing on being better than yesterday we can encourage developers to test the most critical code first, not just the easiest code in search of an absolute goal.

And for those companies that insist on a hard code coverage target? You need to consider first that all code bases are not the same, and how the effort and cost will vary across applications. You next need to find a way to enable your teams to hit those metrics in a time-, resource- and cost-effective manner.

Here comes the shameless sales pitch. Diffblue Cover can help kickstart your quest to improve Java code coverage. Let AI do the work for you. If you want to find out more, take a look at Diffblue Playground, explore our website, request a demo of Diffblue Cover, or get in touch with me @jgwilson42.