Modern development strategies employ different tactics to deliver code more quickly, from agile planning to cross-functional teams to ‘shift left’. Unit testing has an important role to play, accelerating cycle times by detecting regressions at the earliest possible stage. Since testing remains a significant bottleneck in most CI pipelines, the gains can be significant.
In this webinar, we cover:
1. The Critical Role of Unit Testing in Software Delivery
Main Theme: Unit testing is fundamental to delivering software faster and maintaining high code quality.
- Speed and Responsiveness: “The quicker and faster you can release means you can release more often which means you can respond to your environment quicker.” This includes responding to market opportunities, competitive pressures, and critical bug fixes.
- Quality as a Byproduct of Speed: “a company that’s able to ship faster is a company that must have good quality practices in place.” Fast shipping indicates robust quality assurance.
- Preventing Production Bugs: Bugs reaching production are extremely costly and time-consuming to fix. “bugs that are making it into the field can take hours days weeks months I had one bug that took over a year to track down.”
- Shifting Left: Finding and fixing bugs earlier in the development cycle drastically reduces time and cost. “as we shift left further away from production the time it takes to find and fix defects reduces.”
- Developer Accountability and Context Switching: Unit tests empower developers to find and fix bugs on their desktops before moving on. This avoids “context switching,” which “kills velocity” and occurs when a developer is pulled back to fix old code.
- Surgical Precision: Unlike integration tests that pinpoint general failures, unit tests are “surgical,” testing “every single method very precisely.” This allows developers to know “exactly and we say surgically where that defect needs to be fixed.”
- Unit Tests as Documentation: “because unit tests are describing every single behavior of every single method in your application the unit test itself is documentation.” This aids new developers or those working on legacy code by clearly defining expected behavior.
2. Characteristics of Good vs. Poor Unit Testing
Main Theme: Not all unit testing is equally effective; quality hinges on who writes the tests and their granularity.
- Good Unit Testing (Written by the Developer), Tests should be written by the developer at the time of code change to maintain accountability and fix issues immediately.
- Granular and Surgical: They “test every single pathway through each method to validate each individual behavior,” ensuring precise defect identification.
- Poor Unit Testing (Written by QA Engineers Later): QA engineers writing unit tests after code completion leads to delays, constant developer interruptions, bias, and a loss of developer accountability for code quality.
- Integration Tests Misrepresented as Unit Tests: “integration tests are a tool on top of unit testing.” While valuable, they test end-to-end functionality, not individual method behaviors, making precise defect identification difficult. “when a integration test fails you are not going to know why it failed and exactly which piece of code needs to be repaired.”
3. The Unit Test Velocity Paradox and Limitations of Code Coverage
Main Theme: Traditional unit testing is time-consuming, and simple code coverage metrics are insufficient indicators of quality or risk mitigation.
- Time Consumption: Developers “spend anywhere between 25 and 50 percent of their time writing unit tests.” One quote highlights this: “We end up spending more time writing unit tests than we do actually writing business logic.” This creates the “unit test velocity paradox” – testing slows down delivery.
- Limitations of Code Coverage:Focus on Low Complexity Code: Developers often test “only the lowest complexity code because it’s easiest to test,” neglecting critical, complex business logic that is harder to test but carries higher risk. “code coverage is not the answer to knowing that you’ve got good quality code and that you’ve reduced risk.”
- Lack of Assertions: Code coverage only measures execution, not validation. An extreme example shows 100% code coverage without any assertions, meaning “this will never catch a regression.” Developers can also miss or bias assertions.
4. Understanding AI: Supervised vs. Unsupervised Methods
Main Theme: AI can be categorized into two main types, with unsupervised methods being crucial for automated unit test generation.
- Supervised AI (e.g., Google Photos): Pre-training and Labeling requires “a huge bank of images” analyzed “offline” to group similar data, then labeled (e.g., “bus”).
- Computational Effort: Demands “a huge amount of computational effort because we’re doing all of this in advance.”
- Non-Deterministic: “the AI has learned the AI has maybe got better at determining what a burst looks like but may have also got worse.” Answers can change over time.
- Bias: “biased towards the inputs.” If trained only on London buses, it won’t recognize a Tokyo bus.
- Unsupervised AI (e.g., AlphaGo), learns in real-time by applying specific rules to bound “almost limitless moves.” AlphaGo “has never played go before” a match and “forgets it when it then goes on to its second game.”
- Lower Computational Effort: By focusing on possibilities within rules, it’s computationally more efficient.
- Deterministic: “you’re always going to get the same moves for a given board layout.” This is critical for regulated environments like finance.
- Zero Bias: “computationally, there is zero bias because we’re using those rules, we cannot break the rules.”
- Relevance to Unit Testing: “When we’re talking about writing unit tests using AI, we’re using an unsupervised method.” This is because there’s an “infinite number of unit tests possible” but “very specific rules” (Java syntax, frameworks). They use a “probabilistic search” based on rules and statistics to find the right unit test.
5. Diffblue Cover: AI-Powered Unit Test Automation
Main Theme: Diffblue Cover utilizes unsupervised reinforcement learning to automatically generate, maintain, and optimize human-like unit tests, addressing the velocity paradox.
- Reinforcement Learning Process (Initial Guess): Analyzes code and project configuration using “rules of Java” and frameworks to make an initial guess at a unit test.
- Execution and Measurement: Runs the test and measures its “goodness” based on coverage and regression-catching potential.
- Modification and Iteration: Modifies the test based on the score, then reruns, repeating “thousands of times until we get the test that is exercising the behavior of the code and that can actually catch regressions and can write unit tests that look human-like.”
- In-Production Workflow: Diffblue generates a baseline of unit tests for existing code.
- Regression Detection: When a developer makes a change, running these same tests immediately “tell you about how your code change changed the behavior of your application at the most granular level.”
- Automated Updates: Diffblue “has run and updated the unit tests” when code changes, addressing the “hidden cost of unit testing” maintenance.
- Code Review Aid: Displays the “diff between the before and after behavior,” showing how a change impacted existing tests and validating new behavior.
- Key Benefits:Speed: Generates tests rapidly; an S3 upload test took 1.6 seconds, which a human would take “half an hour.”
- Human-like Quality: Tests are “human-like” and serve as good documentation.
- Accuracy (No Human Bias): Assertions are written by AI, ensuring “no bias from the human here making a mistake of missing an assertion or biasing an assertion.”
- “Year’s Worth of Code in 8 Hours”: Goldman Sachs used Diffblue Cover on a project and found it “wrote a year’s worth of code in eight hours,” doubling their coverage with high-quality, regression-catching tests.
- CLI Version (CI Pipeline): Integrates with any CI tool (GitHub, Jenkins, GitLab) to provide consistent test generation and policy enforcement across teams.
- Analytics Tool: Provides deep insights into code quality, coverage ceilings (why code isn’t testable), and areas of risk.
- Cover Optimize: “helps you to speed up your release process by only running the tests that are actually going to find a possibly finder aggression.” Reduces test run times significantly (e.g., from 1 hour 20 minutes to 15 minutes at Diffblue itself).
- Cover Refactor: Automatically makes untestable legacy code testable (e.g., adding getters for private fields), saving developer time.
6. Q&A Highlights
- CI Tool Integration: Diffblue Cover (CLI version) can integrate with any CI pipeline tool that can run a command-line package (e.g., GitHub, GitLab, Jenkins).
- Real-time Savings: After baseline tests are written, the significant savings come from developers no longer spending 25-50% of their time writing unit tests. This translates to “a 66% increase in performance in velocity.”
- Security/Data Transfer: As an unsupervised model, Diffblue Cover does not send any data to the cloud. All AI processing occurs locally on the build server or IDE. “there’s no data goes anywhere it’s all done on your build server or in your intellij ide.”
- Test-Driven Development (TDD): Diffblue doesn’t replace TDD but complements it. Developers can use TDD for the “most critical code” (e.g., 10% coverage), and then “let diff blue come in afterwards and write in all of the unit tests they’re gonna catch regressions.”
- Supported Languages: Currently focused on Java and specifically supports the Spring framework. Scala and Kotlin are being considered for future support.
- Final Takeaway: “Ask yourself is it valuable is it able to catch regressions if your unit testing is not able to catch regressions it’s not worth doing the way you’re currently doing it.” The ultimate measure of good unit testing is its ability to catch regressions.