5 things we’ve learned about generating tests from working with the open source community

At Diffblue, we’re really into high quality testing and providing a good user experience. We QA check the technology behind our flagship product, Diffblue Cover, every single day. One of the ways we do this is by writing unit tests for open source public GitHub Java repositories, for free and without obligation, and working closely with repository owners to learn exactly how our tests work (or don’t) for them. The current capabilities of Diffblue Cover allow us to essentially pick up a code base we’ve never seen before and write tests for it right away, but learning what repo owners want to see in those tests is a different task.

IMG_9993.jpg

Since we began this project, we’ve seen huge advancements in the speed with which we’ve created pull requests and the quality of the tests we’ve generated, and also learned a lot about how to use AI to generate tests that repo owners are happy with. In fact, about 70% of the PRs we’ve submitted have been accepted and merged. We’d like to share some of our main takeaways from this process so far:

1. Everyone likes finding bugs  

Though Diffblue Cover isn’t explicitly designed to find bugs, we’ve raised bugs in several of the 70 PRs that have been merged since January. This unexpected bonus has been pretty well received, and we’re excited to see what bugs we can uncover next.

2. Maintenance of generated tests is a big concern (but it doesn’t need to be)

A few of the comments we’ve received have expressed concerns that the tests generated by Diffblue Cover will be harder to maintain, or will commit code owners to only using tests generated by Diffblue Cover in the future. Neither one of these is the case!

The tests we generate can be edited or maintained like any other, and can be added/removed/integrated with the rest of your test suite as you wish, so you have no obligation or commitment to continue using our tests. You can choose to merge the tests we’ve generated for you and enjoy higher test coverage, and then maintain this however you’d like.

3. Naming conventions? What naming conventions?  

Everyone has their own preferred test naming convention, which means there isn’t much of a standard at all. Should a test name always start with ‘test’? How much information should be included in a name? We’ve been improving our automatically generated test names based on feedback to accommodate different preferences, but we could still use more feedback to find out what everyone likes best.

4. The best way to use generated tests varies based on how your code behaves

Diffblue Cover highlights the current behavior of your code, which has a few implications for how to use the tests we generate. If the behavior being checked by a test is expected, then that test can be useful for protecting functional code against future regressions. If the behavior isn’t expected, then Diffblue Cover can sometimes highlight edge cases you might not have thought of before.

5. There are a lot of misconceptions about how AI writes tests

We’ve had a few incredulous questions and comments that an AI-powered test generator doesn’t understand the design goals of a piece of code, and therefore can’t write tests that encompass meaningful use cases. We understand these concerns, because there are assumptions that an advanced understanding of code design goals is necessary to write good unit tests, and it also seems like AI isn’t advanced enough yet to have that capacity.

However, the mathematical reasoning and learning engine that makes up the AI in Diffblue Cover doesn’t actually need to interpret the purpose of a line of code in order to write tests that can cover all cases, including edge cases and corner cases. It’s enough to simply build a representation of your code, capture its behavior in a formula and generate a series of queries about it (such as which inputs are needed to cause various outcomes) and document those in the form of unit tests—some of which are descriptive, and some of which do test design goals.

Tell us what you think!

As you can see, we’re learning a lot from those of you in the open source community, and we hope our PRs are useful (or at least interesting) and save you time developing tests. We’d love to continue to get your feedback about what defines meaningful and thorough unit tests. If you’d like to have Diffblue Cover test your public GitHub Java repository for free, click here.