Most companies have their own methods of doing quality assurance. One popular approach is called dogfooding: “eating your own dog food,” i.e. using your own software. Our tool, Diffblue Cover, creates unit tests automatically for Java code. But as part of the process of building and QAing Cover, we had a dilemma to solve: how do you test the test-writer?
For us, dogfooding would technically mean running Diffblue Cover against our development repository for Cover. We tried this, and it led to a bunch of interesting problems. For example, our assumption is that a user would take the version of Cover we give them and run it against a repository that is changing and evolving. But we have an evolving repository and an evolving version of Cover, so any changes we saw had unclear causes—was it because the code changed, or Cover’s functionality changed? To parse out the cause more effectively, we moved to catfooding.
What is catfooding?
Though different organizations have different definitions of catfooding, for us catfooding is a way of making sure that our product will work effectively with all the codebases we throw at it, since customers come to us with all kinds of unexpected code. We have learned that the best way to make sure it doesn’t crash and burn on user code is to run Cover on a large number of different projects that we have found on GitHub and elsewhere, and some that customers have pointed out to us. This is sufficient to ensure that it will work on pretty much anything users bring to us.
How does it work?
Catfooding, as we practice it, is a completely automatic process; whenever we push a change to GitHub and create a PR, it runs in parallel to the standard CI run. We follow this with quick manual checks when reviewing a PR to see if anything is out of the ordinary.
The thing we want to see at this point is as few unexpected changes as possible, as well as (ideally) higher test coverage, better time requirements, and a lower number of errors. For the metrics that can’t be automated, like evaluating test quality, we do manual spot checks.
Catfooding is one of our key quality guarantees. If a proposed change doesn’t do well in catfooding, we either have to rework it or completely rethink what we are doing.
The process also provides more automated testing, which compliments unit testing, and we can rely on these tests to catch problems that we didn’t have a test for because we didn’t expect it could break. These “unknown unknowns” often come out in the catfooding process when something triggers that error path, allowing us to fix it.
It’s also good for QA. Having more stats helps make sure nothing breaks with new changes, and makes it easier to double check if the change behaves as intended by checking the diffs to see if the results are what we want to see. Previously, our testers would run Cover on several projects by hand and then make sure they work. Catfooding adds more automation that makes their lives easier.
As a result of this process, we can assume our product is relatively robust; it works on a variety of random, real-world projects and not just a few hand-picked examples.
Changes in line coverage
How to get started
The specific setup we follow at Diffblue might not apply to other organizations, but the underlying principles are universal. Having automated system tests that also do a bit of benchmarking and provide precise information about metrics you can evaluate (more than just ‘does it run or not?’) can help you understand how your product is evolving over time and provide you with much more confidence in its performance.
To start doing this yourself, think about how to automate system testing/benchmarking (what you want to test, how to automate it and integrate it into a CI pipeline, etc.) early on. If you do it properly from the start, since it will be with you for a long time, life will be much easier down the line.