Something strange is happening inside large companies. Teams that used to ship a handful of pull requests per week are generating dozens per day. Non-engineering teams are building their own products. Leadership is discovering, sometimes months too late, that five different teams built the same integration aggregator without talking to each other.
The conventional narrative about AI in software engineering is still stuck on generation. Can the model write the code? How accurate is it? What percentage of keystrokes does it save? These are Phase 1 questions, and most organizations have already moved past them — whether they realize it or not.
There are three phases of AI-assisted engineering. Phase 1: Autocomplete. GitHub Copilot, tab-completion, inline suggestions. Useful, incremental, easy to adopt. Phase 2: Interactive agents. Claude Code, Cursor, Codex CLI — a developer drives, the AI executes. The developer is still in the loop for every decision. Phase 3: Background agents. A ticket gets created in Linear or Jira. An agent picks it up, clones the repo into a sandbox, searches the codebase, writes the fix, opens a pull request. The developer's first interaction with the work is the review.
Phase 3 is where the paradigm actually shifts. And it is here right now — this is what we are shipping at Tembo. When your system can produce 45 pull requests in a single day, the constraint is no longer writing code. It is reading code. It is deciding whether the AI chose the right approach, not just whether the code compiles. It is taste.
The math is brutal. If an AI agent gets the implementation right nine times out of ten, that sounds great. But at dozens of PRs per day, the one-in-ten that chose the wrong approach is not obviously wrong — it compiles, it passes tests, it looks reasonable. Finding it requires someone who understands the system's intent. I've argued elsewhere that human review is the quality gate, not a limitation. The organizations that win will be the ones that build review at scale, not just generation at scale.
Sources
Related Essays
Human Review Is Not a Limitation
Human review is not the bottleneck to be eliminated. It is the quality gate that keeps AI-generated slop from compounding into technical debt that takes years to unwind.
Agents Are Software, Not Prompts
The industry treats agents as a new category. They are not. Agents are software, and the same engineering principles that have always mattered still apply.
Sessions Replace Tasks, Runs, and Threads
When the same object has three names, your architecture is drifting. Tasks, runs, threads, chats — all of it is just a session. One container, many shapes of work.
Key takeaways
- Phase 1 is autocomplete, Phase 2 is interactive agents, Phase 3 is background agents that ship pull requests on their own.
- When a system produces 45 PRs in a single day, the constraint is no longer writing code — it is reading code with judgment.
- The industry is obsessed with making generation faster. The organizations that win will solve the review problem at scale.
FAQ
What are the three phases of AI-assisted engineering?
Phase 1 is autocomplete — Copilot, tab-completion, inline suggestions. Phase 2 is interactive agents — Claude Code, Cursor, Codex CLI, where a developer drives. Phase 3 is background agents — a ticket gets created, an agent picks it up, clones the repo into a sandbox, and opens a PR on its own. The first interaction is the review.
Why is review the new bottleneck?
If an AI agent gets the implementation right nine times out of ten, the one wrong PR is not obviously wrong — it compiles, it passes tests, it looks reasonable. Finding it requires someone with deep system context. At dozens of PRs per day, that judgment becomes the scarce resource.