Rethinking TDD: Modern AI Workflow for Better Software

Michal Zalecki on 16 Feb 2025 in #Testing, #AI, #Architecture

One of the use cases that software engineers get excited about in the context of AI coding assistants is generating tests for their code. This is a reasonable candidate for automation as writing tests can constitute a significant portion of the time one spends contributing their solution to the codebase.

TDD has been proclaimed dead many times. Is TDD even more dead now with the AI code assistance workflows? Perhaps!

It is clear that AI code assistance is influencing how we, as software developers, write code. The overarching goal is to reduce the time and cost of building software. Your approach will vary depending on your task and the type of system on which you’re working. There is a difference between the work required to bootstrap a prototype and contributing to a large production system.

I would like to explore the idea that tests enable us to create better code with AI, and it is all about the feedback loop rather than the test-first fundamentalism.

Step 1: Start with the Interface

Whether you are about to write something small like a single utility function or something that requires more elaborate planning like a new domain object, start with an interface.

interface Money {
  currencyCode: string;
  units: number;
  nanos: number;
}

export function value(money: Money): number {
}

export function add(money1: Money, money2: Money): Money {
}

Starting by defining an interface helps me think about structure and behaviour without delving into implementation details just yet. This leads to improved software design. In the context of enhancing the efficiency of coding workflows with AI, it provides very helpful context for the next step.

Step 2: Generate Tests

The interface defined in the previous step allows me to generate the test structure that reasonably identifies the behaviours that I might want from this module. The model correctly identified the problematic case of trying to add two Money objects when currencies are different.

describe("value", () => {
  it("should convert money to decimal value", () => { ... });
});

describe("add", () => {
  it("should add two money values", () => { ... });
  it("should handle nanos overflow", () => { ... });
  it("should throw when adding different currencies", () => { ... });
});

Now I can take a moment to verify the generated test cases and modify them to either extend or use more relevant test data. I can also request that all the cases for the value test be changed to a table-driven test using the it.each function. It’s important to recognise that the better job you do when defining the interface, the better context you provide, and the better results you achieve in the initial iteration. You reap what you sow, and this is particularly evident when it comes to prompting AI.

What I find interesting is that tests generated for interfaces tend to have more likelihood to suggest edge cases that developers might initially overlook. This can be easily demonstrated by prompting to "generate the test" for this naive implementation.

export function add(money1: Money, money2: Money): Money {
  return {
    currencyCode: money1.currencyCode,
    units: money1.units + money2.units,
    nanos: money1.nanos + money2.nanos,
  };
}

// result of "generate the tests for add function" prompt:
describe("add", () => {
  it("adds two money values", () => { ... });
  it("adds money with zero values", () => { ... });
});

Claude 3.5 Sonnet generates two passing tests for the naive add implementation, while for the interface it identified that nanos can overflow and that different currencies should not be added together. I suspect the existing code biases the model towards writing a passing test rather than focusing on creating comprehensive checks.

Step 3: Generate the Implementation

The final step is simply to ask for generating the initial implementation. I can do this without providing additional context because I already have the tests that not only describe the behaviour but can also be executed by the AI agent (like Cursor) to create the feedback loop. This feedback loop can be completed without my intervention. This feedback mechanism for AI agents is not limited to tests; it can also include linters or other tools for code analysis.

export function value(money: Money): number {
  return money.units + money.nanos / 1_000_000_000;
}

export function add(money1: Money, money2: Money): Money {
  if (money1.currencyCode !== money2.currencyCode) {
    throw new Error("Cannot add different currencies");
  }

  const totalNanos = money1.nanos + money2.nanos;
  const nanosOverflow = Math.floor(totalNanos / 1_000_000_000);
  const remainingNanos = totalNanos % 1_000_000_000;

  return {
    currencyCode: money1.currencyCode,
    units: money1.units + money2.units + nanosOverflow,
    nanos: remainingNanos,
  };
}

Once the tests pass I can wear a code reviewer’s hat and carefully check the generated solution. I might discover some new cases that will make me update the test and repeat the process, or I can update the code immediately.

When I want to make a change to the existing code, for example, modifying the error message, I adjust the test and ask AI again to ensure the code passes the tests. Extending the module with new functionality can be quickly accomplished by defining a new interface and repeating the process, even without enriched prompt context, as it already has existing implementation and tests to reference.

Workflow I Want

These three steps fit the workflow I want when working with AI coding assistants. It begins with a design process and then transitions into defining requirements, invariants, and assertions, which provides me with confidence in the solution's correctness. The implementation process is often an order of magnitude quicker, sometimes almost an afterthought.

By following this process, I remain engaged and avoid the feeling of disconnect from the codebase over time. I attribute this to being very intentional during the design phase and not settling for a design that would largely result from my initial prompt.

This process also helps me avoid creating excessively long prompts, attempting to outline all the requirements upfront to achieve the desired code functionality. I get to spend more time in the editor rather than correcting the errors of the AI agent in a chat window.

Limitations and Challenges

While this TDD-derived cycle works for me in many use cases, it is less applicable in prototyping. I prefer to start with implementation when exploring new languages, tools, libraries, APIs, or when I am generally more interested in learning about capabilities rather than arriving at a definitive solution.

For some other higher abstraction level tasks, like migrating the codebase to a new framework, defining detailed implementation steps upfront for the AI coding assistant will yield quicker, more tangible results than starting with interfaces.

In the real world, there remains a significant amount of scepticism regarding AI tools that generate code, but by thoughtfully integrating AI into our development processes - particularly with a focus on design and testing - we can not only alleviate these concerns but also elevate code quality to build more robust solutions.

The photo showcases GS's spring drive movement, which combined mechanical springs (the old) with quartz (the new) to achieve accuracy and a signature smooth-gliding second hand.