agentic-workflow

TDD as a Strategy for Building Reliable Software with AI Agents

TDD keeps AI coding agents honest by turning intent into executable checks and durable contracts.

m0 / python / tdd / testing / matic

AI coding agents make TDD more valuable, not less. Plausible changes are easy to generate. Correct changes are not. Tests are what turn intent into executable constraints and prevent a reasonable-looking patch from drifting away from the behavior the system actually needs.

Start with behavior

The discipline is simple: make the agent prove failure before writing source code.

That means writing the test around behavior, not implementation. If a test passes before the implementation changes, one of two things is true. Either the behavior already exists, or the test is too weak. TDD only works when that ambiguity is resolved immediately.

The red phase matters because it shows the change has a real target. Without that proof, it is too easy to accept a patch that only looks right.

Tests as durable memory

Tests are not just checks. They are durable memory for requirements, regressions, and interface contracts.

For agent-facing work, the useful categories are concrete:

  • behavior tests
  • validation tests
  • state tests
  • regression tests
  • CLI smoke checks

Those tests give the agent a tight feedback loop and give humans a way to confirm that the behavior still matches the intent. They also help the repo keep its own promises when the implementation evolves.

Protect the contracts that drift

AI-generated code tends to drift at the boundaries. That is where TDD earns its keep.

Constants, filesystem conventions, and public APIs are contracts. They should not be left to memory or to a code review that happens after the fact. Tests should protect them because they are part of the system's external behavior.

That is especially important in a filesystem-first codebase, where directory names, default paths, and persistent records are not incidental. They are the operational surface.

The loop that keeps the system honest

The human-agent loop should stay disciplined:

  • write the behavior
  • confirm the test fails for the expected reason
  • implement the smallest change that makes it pass
  • refactor only after green
  • update the test if it passed too early

That sequence prevents the agent from building a plausible but unstable implementation. It also keeps the work grounded in observable outcomes instead of vague confidence.

Where the repo rules fit

docs/rules/workflow-rules.md and docs/rules/python-source-rules.md turn TDD into a repo policy instead of a personal preference.

Those rules matter because they connect the test loop to the rest of the system:

  • source structure stays predictable
  • shared constants stay centralized
  • filesystem conventions stay explicit
  • behavioral claims stay backed by executable checks

That is the practical value of TDD in an agentic workflow. It turns code generation into a constrained feedback loop, which is the only reason it stays reliable long enough to ship.

Wachtlijst

Jouw org werkt door
als jij offline bent.

Matic draait autonome organisaties richting lange-horizondoelen — een Charter in de root, benoemde agents met eigen geheugen, markdown-staat in git, en een verplichte leer-loop na elke opdracht. Kom op de lijst voordat de eerste orgs live gaan.

Eerste-install toegangArchitectuurnotitiesMijlpalen zodra ze landen

Geen spam. Productmijlpalen, ontwerpbeslissingen en de denkrichting erachter — meer niet.

launchwaitlist@matic.sh

Architectuurnotities, eerste-install toegang en mijlpalen zodra ze landen. Geen marketing.