Diffblue Microservice Testing - a sneak peek at our early product and results

Diffblue Microservice Testing is a Diffblue Labs product that automatically generates regression tests for service and microservice applications. It finds inputs that exercise nasty corner cases and unexpected interactions between seemingly independent modules. During the last months, we worked hard to demonstrate an early version of the system. Today, we are excited to share our first results with this demo and the many issues it already found on a large Java code base.


Diffblue Microservice Testing generates regression suites of component tests in a completely automated manner. Component tests are ideally suited to catch regressions in your service.  For instance, suppose that you are working on an HR service and that you recently updated some functionality in the calendar module. You probably expect that your change does not affect, say, your public API about users. But how can you be sure? How do you know there are not unintended interactions between the user and calendar functionality? How do you ensure that all API requests related to users still return the same responses?

Diffblue Microservice Testing generates regression test suites that exhaustively exercise the interesting logical paths of your code. The generation algorithms are extremely good at finding corner cases in your code logic. The best part of this is that you don't have to do anything. It's an automated process. While this might look like magic to some, it's not. It's based on four decades of research in formal verification, static, and dynamic program analysis.

So, how does it work? Initially, you provide a small number of component tests. We analyze the logical paths they exercise. From that, we derive new test cases that cover new functions, as well as corner cases in the logic of your code.

To benchmark the product, we gave it 24 hours to generate tests for Apache Solr, a well-known enterprise search engine. Apache Solr is a quite complex code base, spanning 288k lines of Java code in 2287 source files. We configured Solr with a database of movies, following the quickstart tutorial. We also provided 31 initial test cases that we wrote by hand, mostly by taking URLs from the same tutorial.

Without knowing pretty much anything about the source code of Solr, we selected the following 7 Java packages to evaluate our success.

Package Lines of code Function
org.apache.solr.core 12k Application core
org.apache.solr.parser 4k Database query parser
org.apache.solr.request 4k Request handler
org.apache.solr.response 5k Response formatter
org.apache.solr.search 31k Search request execution
org.apache.solr.servlet 2k Servlet container interface
org.apache.solr.util 12k Utility classes
In total 70k

Our goal was to get an average end-to-end coverage of at least 30% in these packages. In end-to-end execution, functions are normally called in a predetermined environment. In contrast, unit tests often use mocks which can serve to increase coverage in comparison to unit end-to-end test.

Given all of this, we were quite impressed when our early, unoptimized version of the product produced 36% line coverage out of the box:

Screen Shot 2018-12-19 at 16.04.27.png

But we found something even more exciting than exceeding our coverage target. The system found hundreds of requests producing Java exceptions that trigger an HTTP 500 error response. The table below shows the number of different code locations found to throw those error-producing exceptions. Note that we only consider the line where the exception is thrown, not the entire stack trace.

Exception Code locations
NullPointerException 24
NumberFormatException 19
SolrException 9
IllegalArgumentException 4
IOException 4
ClassCastException 4
IllegalStateException 3
ArrayIndexOutOfBoundsException 3
UnsupportedOperationException 2
StringIndexOutOfBoundsException 2
Other 3

Overall we are very pleased with these initial results, especially when we take into account that this is an early version of the product. During the following months, we will be making Diffblue Microservice Testing even smarter at catching unintended behaviors and offering multiple classification criteria for the tests generated. So stay tuned! And be sure to register for our labs updates and signup for our monthly newsletter here.