Case Study

Autonomous, high-quality refactoring with Diffblue Cover

Long range shot of financial services district in New York

Looking for a more effective AI-driven approach to unit testing, the custody technology team at a financial services customer replaced GitHub Copilot and a homegrown AI system with Diffblue Cover. In just two months, the AI agent boosted test coverage by 180% across 25 Java applications, with several surpassing an 80% coverage threshold.

180%

increase in coverage

80%+

quality threshold met

THE CHALLENGE: Accelerating unit testing while modernizing a financial services custody technology platform.

At the core of this global financial services company is a technology backbone that safeguards over $50 trillion in client assets. Like many financial institutions, their engineers grapple with integrating aging legacy systems while balancing innovation, risk management, and the demands of high performance and scalability.

To modernize its systems, this financial institution is undergoing a comprehensive transformation, migrating to a platform-operating model to manage clients’ digital assets in a resilient, secure, and compliant way across the entire financial life cycle. Achieving this requires extensive refactoring of its technology systems.

The custody division, in particular, needed to improve the quality and maintainability of code across multiple Java-based applications, making it easier to understand, maintain, and extend the application on which their business depends.

As they worked to modernize their platform, the overarching issue was that the legacy applications generally lacked proper unit test coverage, making it impossible to detect unintended breakages caused by code refactoring until it was too late in the process to fix efficiently. Key hurdles in the refactoring process included:

Inconsistent Test Coverage: The applications had varying levels of test coverage, with most hovering around 50% and some older applications closer to 10%. (Goal: 80%)
Quality of Existing Tests: The quality of past unit tests varied widely, often depending on the developer’s skill level. Some developers were more thorough (or more experienced) in their approach to unit testing, while others treated it as a “box-ticking” exercise. (Unit test writing can be tedious!)
Scalability: One of the core client reporting applications has over 100,000 lines of code (LOC). They sought to automate what would be a time-intensive manual process to generate — and accelerate — unit testing at scale. (Time better spent on innovation.)
AI Tool Limitations: Though they had used homegrown and commercial AI tools to assist with unit testing generation, these tools didn’t operate effectively at scale, in part because they often produced tests that would not compile or pass, thus requiring considerable manual rework. (Could an autonomous unit testing agent be the solution?)

The architect tasked with evaluating software for the company sought a more accurate and scalable approach to unit testing across their application portfolio. With a focus on speed and quality, one of the custody tech engineers recommended

THE SOLUTION: Diffblue Cover, an enterprise-grade AI agent for automating the generation and management of Java unit tests.

To overcome these limitations, the custody division piloted Diffblue Cover, a deterministic AI agent purpose-built for automated Java unit test creation. Recommended by an engineer familiar with the tool from a previous organization, Diffblue offered a fundamentally different approach from assistant-based tools like Copilot.

Rather than suggesting test snippets, Diffblue Cover autonomously analyzes compiled Java bytecode to understand application behavior and generate executable, production-grade unit tests. The tool runs entirely on-prem and integrates directly into CI pipelines, allowing for unsupervised test generation without compromising security or developer flow.

EVALUATION: Refactoring with speed, precision, and scale

A team of five developers used Diffblue Cover to generate tests across 25 applications during the pilot. Unlike previous tools that broke down under scale, Diffblue performed consistently across both legacy and modern modules.

Results were substantial:

180% increase in test coverage
27,618 new tests generated—up from a manual baseline of just over 3,000
Several applications exceeded the 80% coverage gate, supporting the institution’s compliance targets

Diffblue Cover’s impact was immediate. Developers no longer had to halt feature work to backfill brittle tests. Legacy codebases once considered untouchable were suddenly testable. More importantly, the tool consistently produced valid, compiling, behavior-accurate tests without any manual rework.

THE OUTCOMES: Diffblue Cover dramatically increases test coverage, replacing GitHub Copilot and the institution’s homegrown AI tool.

Coverage at Scale: As the only fully automated AI unit testing solution, the Diffblue agent was able to generate unit tests quickly and without developer intervention. With this capability, the team quickly generated unit tests out of the box. Over the two-week period, all applications showed improvement, with many exceeding the 80% coverage gate by a significant margin.

Improved Quality: By automating unit tests, Diffblue Cover addressed the challenges of testing complex code from applications developed over half a decade ago. The AI agent ensured consistent test quality, moving beyond “box-ticking” exercises. It also freed senior developers from having to pause their current work to manually test and reconfigure outdated code.

Developer Efficiency: By offloading the burden of writing and maintaining unit tests to Diffblue Cover, development teams were able to reallocate valuable engineering time toward higher-impact work. Senior engineers, in particular, shifted their attention from repetitive and time-consuming test creation to more strategic initiatives such as architectural modernization, performance optimization, and feature delivery.

Seamless Integration into CI: Diffblue Cover was embedded directly into the organization’s CI/CD pipelines, allowing automated test generation to occur continuously as part of the development lifecycle. By operating at this infrastructure level—without requiring manual intervention or IDE-level prompts—Diffblue eliminated the typical bottlenecks caused by permission restrictions or inconsistent developer access.

With all the challenges identified and tackled during the pilot, the financial services institution quickly decided to move forward with Diffblue Cover as its Java unit testing solution, supporting their ongoing transformation efforts.

THE DIFFBLUE ADVANTAGE: Improved code quality, enhanced developer productivity, and application modernization acceleration.

Poor unit test coverage is a common challenge of applications in need of modernization. Legacy code is often fragile, hard to understand, and poorly documented, making unit test writing a difficult and time-consuming task.

Diffblue’s autonomous AI can help you test your business-critical applications faster and with less risk. Unlike LLMs or code completion tools, our technology uses reinforcement learning to generate code that is guaranteed to run, compile and be correct — every time. Plus, it operates on-prem, so your code stays within your own environment.

With comprehensive integrations for CI pipelines, Diffblue Cover seamlessly fits into your current development and testing workflows with minimal disruption, which is critical for refactoring legacy applications.

Want to evaluate Diffblue Cover for yourself?

Reach out to us today for a demo and discover how Diffblue Cover can help you increase test coverage, reduce risk, and accelerate your modernization journey.

Book a demo