Transcript

Hello, and welcome back to our webinar series. The first one aired a few weeks ago talking about why everyone loves automated testing and what problems it can solve. You can watch that on our website. Today, I’m going to talk about why everyone is so bad at automated testing. If we think automated testing is a great thing, why are we so bad at it?

So let’s take a look at some of the problems we see with test automation. Here, I have two sections: false positives and false negatives. Starting off with the false positives: this is when a test case fails but there isn’t really a problem (we quite often see this with unpredictable tests, where, say, if you run them five times, they fail once or twice). This is one of the worst problems I see in automated tests: where you don’t quite believe the results. 

How many times have we been guilty of saying, “That test failed and no my code change didn’t do that. Let’s rerun it and see if it passes.” Everytime that happens, it degrades your trust in the test suite. 

Testing the wrong thing: I quite often describe this as testing the implementation, not the behavior. Let’s imagine a simple calculator and we have a test that says ‘press the bottom left hand button, then press a button on the top right, then press the bottom right hand button, then press the top right hand button, then the answer is 3.’ Well, for most calculators that would be 1+2=3 but what happens when the layout of the key changes? Maybe the symbols end up down the side rather than on the top row. This test will fail. It’s not really indicating there’s a problem, it’s indicating the layouts changed. 

Much better would be: press button 1, press + button, press button 2, press = button. This test won’t fail if the keys move around. It will prove the behavior 1+2=3 continues to work as the buttons move around. So, testing the implementation leads to brittle tests and it leads to a higher maintenance cost at maintaining the tests.

Let’s think about false negatives now. False negatives are where the tests pass, but really there’s a problem in the system that the tests haven’t identified. So how can we get false negatives? Well, I suppose the obvious answer is low coverage. You haven’t actually written any tests that cover that area of the code. Then we have a lack of assets and if we think specifically about unit testing, here we can write a test that covers the code path; unless we actually write an assert at the end of the test, it’s not really testing anything. All it’s showing is that the program didn’t crash. 

I’ve said already, when tests are unpredictable, people stop believing the results and I think this is crucial here. False positives are almost worse than false negatives because as soon as people stop believing the results of the test they are not going to believe when the test is telling them that there’s a real problem. I think overall, having highlighted a few reasons why we’re bad at doing test automation or writing unit tests, the thing we’re really bad at is allowing unpredictable non-deterministic tests to end up in our test suite and diminishing the return we get.

So, before we talk about how to write good unit tests, let’s talk about what makes a good unit test. Here we have a list of things that make a good test. The first one is testing one thing. Sometimes this seems counterintuitive. Sometimes, when you are writing a test you think ‘I can check that and check this- I can check 5 different things with this test case.’ The problem is when that test case fails you don’t know immediately what the problem is. By testing one thing, one specific behavior per test, you know exactly what’s broken in your product when that test fails. 

Testing the behavior not the implementation. We’ve already talked about this using the example of a calculator where testing the implementation leads to a more brittle test case. I think the key point here is that implementations change more regularly than the behavior of the program. What we want to do is make sure when people change the implementation, maybe through refactoring, maybe through changing dependencies, the behavior of the system is preserved and that’s when the unit test will really come into their own. 

Writing deterministic tests: I spoke earlier about how test cases that are unpredictable are bad. We don’t want to have any tests relying on a random number being in a particular range or relying on a certain date or time or certain speed of test execution. For unit testing, we want to make sure that if the product functionally does what it did before, the test cases pass. 

The final point here is writing readable and maintainable tests. Ultimately, the test case is at its most useful when it fails and it points out a problem to a developer by failing. The developer knows they have broken or changed a piece of behavior. Writing test cases that are readable and maintainable is crucial because the next developer to look at it might be you six, twelfth, eighteen months down the line or it might be someone completely different who looks at your test case and goes, what on earth is this testing. Unless they understand how that test case works and what it’s trying to achieve, they can edit it but they are likely to delete it and that’s bad. 

We’ve looked at a few things that can make a unit test good. Let’s talk about how we can get started on writing unit tests. I think the first point that is worth considering here is that the first test takes so much longer than writing your second test. When you write your first unit test you have to make sure it has all the dependencies for example junit and any other dependencies you want to use, any other frameworks, and you need to make sure you have an environment where you can run them unit tests. 

You will probably want to have a CI environment that’s running the unit tests. You need to make sure that appropriate reports are being produced. You want to measure code coverage, you want to measure the number of tests, decide whether you want to add in any extra logging like ensuring that coverage or coverage doesn’t decrease whilst you’re running these test cases. Do you want to use a tool such as SonarQube to measure the quality of your test cases? All of these things come into play when you write your first test case. 

Once you’ve written your first test case you have a frame, you have a structure, and writing the second test case is a lot quicker. So, my key piece of advice here would be: pick something simple for your first test and focus on getting the framework right. With the future tests, you can focus on making sure you have all the behaviors you want covered. Running tests in CI is very important to make sure you’re doing continuous integration with your unit tests; as soon as you stop running the unit tests regularly, you will find their usefulness starts to degrade. Some of them will start to fail and those failures won’t be picked up at the time the code is about to be merged in. So, make sure when you’re adding new commits into your master branch or your developed branch that you have a clean bill of health from your unit tests. 

A lot of tools that check code quality will ignore the test directory, and I think this is a big mistake. If you’ve got a toolset that is measuring the quality of the code, if you’ve got a linter that’s supplying coding style guides on your project, make sure they cover your test directory. Your test code needs to be of the same quality as your production code. At the end of the day, you wouldn’t accept people pushing in poor quality code that causes the build to break, that causes code not to compile, in the same way you shouldn’t be allowing people to check in poor quality tests that can cause the CI to break. The best way to do this and the best way to encourage high quality standards is to ensure you are holding your test code to the same standard as your product code. 

Finally, setting realistic and measurable goals. Obviously, if you’re setting out on a journey of adding unit tests you’ve probably already got a product, you probably already have quite a lot of code that you want to cover. You need to think about how you’re going to start that journey: Which areas of the code base are you going to tackle first? Now the obvious answer to this is to look at the high risk areas of the code and start looking at the areas that you know have defects in them. Start looking at the areas that have high frequency of changes. You can pull all these stats from your source code management system and cross reference that with other pieces, like bug tracking databases, to try and get an idea of where the critical code is. Also you’ll probably find that you know (as a developer) some of the code that you think is a little flaky and has some problems, and that might be a good place to start. 

Of course, being a webinar from Diffblue, I’m going to talk a little bit about Diffblue Cover. Diffblue Cover will automatically generate unit tests for your existing code base. When you are setting out on your automation effort a great way to get started is to use Diffblue Cover and run it across the whole repository. Then you’ve got a great starting point for adding more unit tests on top to achieve the goals you want to get to. 

Q&A 

So we’ve got to the end of my slides and it’s time for a little bit of Q&A. We’ve already got some questions people have asked. The first one here is: What if you don’t know what the original outcome of the code being tested was supposed to be? I’m going to assume here we’re talking a little bit about legacy code. Where we’re not really sure what the requirement is, we’re going to find it really difficult to test the expected behavior of the code because we don’t really know what it is. I think the first part to think about this question is, is this code in production? If it is what we can do is assume that the current behavior is ok. If you can assume current behavior is ok, then you can write tests for the current behaviour. That way, you will be alerted anytime you change the behavior. As you change the behavior, you need to make a choice: is it a good change or is it a bad change? Ultimately, there’s power in knowing that behavior has changed. 

Next question: How do you know when not to write a test for something? This is a great question. When wouldn’t we write an automated test? When would we write a unit test? In my view, when the test isn’t going to add any value. Typically this comes with very simple methods that you are trying to test. If you are looking at something like a getter or setter or a constructor it’s not very valuable to write a specific test case for that code. You can look at that code and tell if there is a bug in it just by inspecting the three or four lines. Now I am making the assumption here that you haven’t put any business logic into these methods; as soon as you put the business logic in, then it will become valuable to test. 

Next question: Where should we mock something? That’s an excellent question and we have a blog on this topic coming up soon. I will pick out the obvious things to mock, for example: database access, web requests, things that go outside the system. I know we have some great content coming up on this so take a look at our website and Twitter to see when we release that blog. 

Finally: How do you tackle different code styles? For example: test cases layed out arranged, act, assert versus putting everything into one line. So, there’s lots of different styles for writing test cases and there’s a little bit of personal preference that comes in here. Let’s also remember that the test cases need to be useful to yourself when you come back to look at this code and to other people. In terms of coding style, I would definitely refer back to coding standards that you apply to the whole repository. 

A great answer to this is to have a specific coding style for your tests to cover things like: do you always have an arranged, act, assert section? Do you have a different style in laying out your test cases? Do you have messages with every assert or is that superfluous in your environment? These kinds of questions are very much individual choices, but what works best is being consistent across the repository. 

So that concludes the Q&A section and concludes our webinar. Do keep an eye on our website and Twitter for our next webinar in the series and thank you for listening.