Fuzz Testing Java and Other Managed Languages

Fuzz testing is an automated technique for finding program inputs that exercise interesting logical paths in your code. While many variants exist, the basic idea is simple to explain. From a set of initial inputs, take one, mutate some bits and run the program on the mutated input. If the program does something new (which you didn't see with any of the previous inputs), then save it for future analysis. Otherwise, discard it and start again. Repeat this at lightning speed for a few million times.

One may think that such a simplistic approach will never find anything interesting. Think again. This randomized mutational process can, for instance, synthesize well formed JPEGs files out of thin air. Fuzz testing has been extremely successful in finding thousands of security vulnerabilities for even mature projects of all kinds.

Accordingly, large organizations quickly understood its benefits. Google recently announced they open sourced ClusterFuzz, which has found ~16,000 bugs in Chrome and ~11,000 bugs in many other open source projects. Facebook acquired Sapienz in 2017, a tool that relies on fuzz testing to find crashes in Android applications.

Despite this success, fuzz testing has almost exclusively focused on non-managed, or 'unsafe' languages (C/C++). Dozens of open-source or commercial fuzzers exist for these languages, but fuzzing support for managed languages (e.g. Java, Python, Javascript, C#) is nearly non-existent today. It's true that memory corruption flaws in unsafe languages often imply concerns about a security breach. But managed languages are subject to many of the same vulnerabilities that fuzzers detect in non-managed languages: crashing inputs, null pointer exceptions, unintended infinite loops, excessive resource utilization, and many others.

At Diffblue we are building Diffblue Microservice Testing, a tool for automated regression testing of web services. It uses fuzzing among other dynamic code analysis techniques for test writing and bug finding. And our early results have been more than promising. In 24h of operation, our tool can create 36% end-to-end coverage for a large Java service (Apache Solr, 288 KLOC) and find more than 70 unique ways to crash the server response (the Solr developers are starting to fix those crashes). Check here for the full story.

So what have we learned? Never underestimate the power of a well configured random search. Fuzzing has proved to be capable of finding program inputs that exercise non-trivial code paths. And there is still a lot to say for managed languages. Stay tuned for more updates about fuzzing and our tools - sign up to our newsletter to be the first to be updated.