Should we try to automate human review out of the loop?

No. Review is the quality gate that prevents AI-generated slop from compounding into technical debt that takes years to unwind. Removing it is how you end up with 18,000 water bottles ordered at a Taco Bell — a system that looked fine until it cascaded.

Human Review Is Not a Limitation

Q: What is the right operating pattern for background agents?

Context in, background execution, reviewable output, human approval. The agent does the work. The human exercises judgment. Code does not merge without a human saying yes. This is not a conservative position — it is the only position that scales without compounding errors.

There is a temptation to see human review as a temporary constraint — something we will automate away once the models get good enough. This is wrong, and it is dangerous.

Human review is not the bottleneck to be eliminated. It is the quality gate that prevents AI-generated slop from compounding into technical debt that takes years to unwind. The organizations currently shipping AI-built products without engineering review are building on sand. They do not know it yet because the failures have not cascaded.

The pattern that works is: context in, background execution, reviewable output, human approval. The agent does the work. The human exercises judgment. Code does not merge without a human saying yes.

This is not a conservative position. It is the only position that scales. The alternative — letting AI-generated code flow into production without review — is how you end up with 18,000 water bottles ordered at a Taco Bell. The error compounds silently because nobody is watching the seam where machine output meets the real world.

I've argued that the bottleneck has moved from generation to review. That shift is not a problem to solve. It is the design constraint that makes the rest of the system work. Invest in review tooling, review skills, review at scale. The teams that treat human judgment as the most valuable input — not the most expensive overhead — are the ones whose AI deployments will still be working in three years.

Sources

Human Review Is Not a Limitation

Related Essays

Review Is Not a Screen. It Is a Primitive

The Three Phases of AI-Assisted Engineering

Taste Does Not Scale With Token Throughput