Victor Guillard
menu

Why Agents Fail in Production

Most agent failures aren't model failures — they're orchestration failures. The model generates reasonable outputs. The problem is what happens between generations: state management, error recovery, context window overflow.

The orchestration problem

When you run multiple agents in sequence — one generating a plan, another executing it, a third verifying the result — state management becomes the dominant engineering challenge. Not because it's conceptually hard, but because the failure modes are invisible until production.

The Executor modifies shared state as it works. If step 2.3 of 5 fails, you need to roll back steps 2.1 and 2.2 before retrying. Without checkpoints, you restart from step 1 — regenerating the entire plan, which may now differ because the model is stochastic.

State checkpoints

The fix is embarrassingly simple: checkpoint state after each successful step. On failure, roll back to the last checkpoint instead of restarting from scratch. This reduced our restart rate from 34% to under 5%.

Measuring what matters

Most teams track the wrong metrics. Token usage and latency matter, but they're proxies. The metrics that actually predict production reliability are different.

The demo-production gap

The gap between demo and production performance is where most agent projects die. Understanding why requires looking at the architecture, not the model.