Welcome to this Diffblue webinar on how artificial intelligence can help enhance shift left. My name is Mathew Lodge and I’m the CEO at Diffblue. My background is product development, and I have 25 years of experience in the industry starting out as a software developer and moving through product and marketing roles in the technology industry. I was previously SVP of product at Anaconda and VP of cloud services at VMware.
Let’s talk about shift left. The purpose of this webinar is to talk about shifting left: doing more testing sooner in the software development life cycle. In this chart that we found on devopedia, you can see that shift left is about testing early and testing often, as opposed to the traditional quality model which really comes from the waterfall world, where testing takes place much later in the cycle. Essentially, shift left is something very similar to testing methodologies, testing theories and quality theories that really came out of the manufacturing world. The idea is that you want to find defects as quickly as possible before they go further into the production pipeline and cycle. Defects are much more expensive to fix later on and it turns out the same is true for software; so shift left is the same idea: test early and test often.
So, without shift left what happens in pipelines is that bugs don’t get caught early, and therefore they cause breakages or failure in the code later on, and the time taken to find or fix those areas dramatically increases. Think about bugs that could have been caught early on with unit testing. If the cause of breakage is caught later on after integration tests, the original developer has probably moved onto the next task. Now, the developer has to stop what they are currently working on, context-switch back to that piece of code, and figure out the source of the failure. Then, they will write the change and submit, then switch back to the new task and that’s why it takes longer.
So when everyone is context switching it’s much more difficult than finding the triage and figuring out what causes the failure. This is much harder later on in the cycle versus early on where you’ve got the single commit for the developer, and with the unit test you can find a failure right away. Also, some of the tests that run later in the cycle just take a lot longer to run. They might before end-to-end tests simulate the production environment for example, or it might be the production environment itself. In that case, you might have a failure after you’ve deployed, and those are always expensive.
The idea of shift left is that it looks pretty easy right? You are going to be running unit tests every single commit. The idea is to be able to find most of the logic bugs that have to do with the individual code in each unit. You want to be able to identify those bugs and those regressions as quickly as possible as part of that commit cycle. So it’s an automated test that can run right away to test a branch, before it goes into the mainline, and then the integration tests later on catch integration issues separately from logic of individual units. But how do all these things come together? It sounds great in theory.
We’ve recently done a developer survey because we’ve seen our customer base and the companies we talk to, and the developers we talk to said, most organizations find it very difficult indeed to get good unit test coverage.
We’ve recently done a survey where over 300 developers in the US and UK said they spend around 20% of their time writing unit tests, which is quite a high amount, and given that they spend about 35% writing all tests, 48% of developers found it very hard to meet unit test requirements that are being set by their organizations. It’s difficult to write that number of unit tests, and 42% of developers say they have skipped writing unit tests in order to speed up development.
In other words, they essentially traded time spent writing those tests in order to meet their goals and the timelines of the organization. 33% of developers wish they didn’t have to write unit tests at all, and that one is not too surprising if you think about it. Writing unit tests is very detailed and somewhat boring and repetitive in some ways, and those are the kinds of tasks the human brain finds it difficult to do. Our attention wanders and it is mentally difficult maintaining concentration on that sort of challenge.
So writing these tests in the first place is quite difficult which is why organizations run into trouble with that. One of the things that AI is capable of doing is writing these shift left tests from existing codes. The idea is that you can analyze the existing code of the application and the AI is able to write a set of tests that reflect the current logic, so these are unit tests that are not end-to-end tests or integration tests; just unit tests.
Essentially, what they do is capture the current behavior of the application. The idea is that you can run these tests and they get good coverage in the sense that they put input into those methods that will follow the difficult branches and they control flow of the method, and they will also check the result that comes back from the method. So it’s not just about ‘do we hit all the lines?’ but ‘did we test the answer that comes back is it the correct answer?’ and AI is capable of doing this today.
As a result, you get tests that run quickly and can find regressions because they capture the current behavior, so new behavior will deviate from the logic of the previous code. That is how you will find those regressions (and of course, they can improve your coverage). The idea is they are also easy to understand, so they should be simple enough in the construct that the developer can look at a failing test and understand what it is doing very quickly and then go figure out what the problem is.
The whole idea here is not to produce perfect tests, because AI just fundamentally doesn’t have capability to do that. All it understands is the code, it doesn’t understand the intent of the program, it doesn’t understand what their user stories are or anything like that; all it knows is the current code. Martin Fowler said back in 2006 when he was talking about continuous integration: ‘Imperfect tests, run frequently, are better than perfect tests that are never written at all’ and if our developer survey is correct then a lot of these tests are never written at all.
So what does an AI-written test look like? I wanted to make this very concrete. Here’s an example of an AI written test on the right hand side; on the left hand side we have the source code, the java method that is a tic tac toe (or O’s & X’s if you prefer, if you’re British like me).
So we have some java code, and it plays tic tac toe, and we’re looking at the method that is used to check to see if one of the players has won. You can see the logic on the left-hand side there looking at the board and the rows and columns on the board to see if a particular player has won. This is a test to check to see if someone has won by getting a column of their symbol or a O & X. What the AI is able to do is analyze the program, understand the control and the data flow so we are looking at the reachability and we’re looking at the result that is generated and working backwards from these in order to construct a test that will exercise the code.
You can see on the right-hand side, the AI has put together a board, and you can see it creating these arrays of integers that is the representation used by the programs for the board. It’s put together a board that is the winner for player two and then it feeds that into the method and checks the results, so it inserts that player two has won, as you can see there.
This is also a good example of how AI can help generate unit tests that reflect current behavior. The important thing to know about this is that it is not going to catch errors of logic because all the AI knows is that code is correct. If we assume the code is correct (as it will be, most of the time) and we assume overtime errors and logic are fixed so the AI can work from the best working copy of the program, then what that does is enables the AI to write a test that reflects the current logic, and therefore if the test breaks it means one of two things: Either there’s been a deliberate change to the logic of a function, so now it has to do something differently, or it means that a regression has been introduced and the developer can look at the failing test.
Then they can make the call: either the test is failing because there has been deliberate change in the functionality and the test reflects the old functionality (and that’s why it failed) or the test is catching an involuntarily introduced regression, in which case they can go and fix their code and test will pass, and on they go. Then the software is able to regenerate tests based on the new updated code. Once it has passed the rules of the test, it can automatically regenerate that.
AI can help shift left in two different areas for software developers. It can improve the velocity and the quality of the code so it’s generating an automatic set of suite of tests that is updated directly by the AI, and therefore it allows the developer to spend more time on the things that only they can do. It improves the quality and catches more regressions for the DevOps teams (catching errors early is going to help them with shift left). It’s also going to help you find more problems earlier on in the pipeline when they are quick to fix and cheaper to fix. This will help you improve your deployment frequency and lead time to recovery.
So hopefully this was useful for you. This is the role of AI in shift left testing, my name is Mathew Lodge, thanks very much for your time and attention.