Diffblue’s tagline is AI For Code, but what does that mean exactly in the context of Diffblue Cover, our product that automatically writes Java unit tests?
The core technique used in Cover is reinforcement learning, which is an unsupervised machine learning approach. Supervised learning—of neural networks, for example—has received the lion’s share of attention around machine learning techniques. This is largely because convolutional neural networks and other supervised learning approaches have created solutions for previously unsolved computer science problems like highly accurate image recognition.
Supervised Learning
Supervised learning is split into two phases: the first is the training phase, where the model is generated by showing it large numbers of labelled examples. In the case of the ImageNet image recognition neural network model, it is trained on labelled images found on the Internet. The second phase is inference, where the trained model is shown a previously unseen input and it returns a prediction. In the case of ImageNet, the input is an image and the model infers a collection of probable image labels from it.
Unsupervised Learning
Unsupervised learning, in contrast, has no training phase. Learning takes place in real time as the algorithm searches a space of potential answers to find a probable solution to the problem at hand. Reinforcement learning in the general case uses an agent to find the maximum possible reward. The agent takes an action on the environment, observes what happens (the resulting state) and is given a reward value (a score, if you like). A positive reward indicates that the action improved the state. There’s a strategy function that evaluates the state and reward, and decides the next action. The strategy function seeks to optimize the long-term value.
Reinforcement Learning
In Diffblue Cover, we’re searching for the best set of tests we can write that exercise a particular Java method, achieving the best line coverage while also being as human-readable as possible. We do this with reinforcement learning.
Perhaps the most famous example of reinforcement learning is Google AlphaGo, the program that was able to learn to play the game of Go well enough to beat several Go Masters. AlphaGo (and its successor AlphaGo Zero) combines reinforcement learning and neural network approaches, applying each to the areas of the problem where they are the best fit.
Learning to write unit tests—without a training dataset
Supervised learning works when you can assemble a large enough labelled dataset from which it is possible to build a general model. In AlphaGo, neural networks were trained on previous Go games in order to make two predictions: the best move, and the likelihood of winning the game. But in the case of test-writing, we don’t have a training dataset. There is no matched pair of code and tests from which it is possible to generalize. But, it turns out we don’t need such a dataset, because we don’t need to predict how well a Java test will work. We can just run the test and know how well it works.
Unsupervised learning works when you can calculate the state, reward, and a strategy function to determine the next action. In AlphaGo that’s possible because the state is the game board, and the reward and strategy functions can be computed using the two neural network predictions. In test writing, we have the state of the program, and we can run the Java test we wrote against the method under test and see how well it performs. No need for a neural network to predict anything.
From there we can figure out how to alter/update the test for the next iteration. Of course, there is substantial Diffblue IP in how Cover computes the reward, constructs a human-readable test and determines the next action—as well as a great deal of expertise in making all of this work with real-world Java code written by humans.
Reinforcement learning also offers practical advantages in that it has relatively low CPU and memory requirements compared to the massive computational effort required to train and run neural network models. Leading image recognizers built with neural networks require 10 Giga FLOPs (10 trillion floating point math operations) to identify an image. Using reinforcement learning means Cover can run on a developer laptop with just 8Gb of memory and 2 Intel CPU cores.
Machine-written, human-readable, high-coverage tests
The general approach of Cover is to use reinforcement learning to search for the best tests we can find that exercise the most line coverage, while remaining readable by humans. It may sound straightforward, but it is in fact very complicated to do inside of the sophisticated enterprise Java applications our customers use. To see the result of this process yourself, download our free Community Edition and start creating tests for your Java code.