There is a technical insight that matters for anyone building or buying agent infrastructure. Most enterprise AI architectures default to RAG — vectorize your codebase, retrieve relevant chunks, feed them to the model. It works for question-answering. It is a poor fit for code generation.
Background coding agents work differently. They clone the actual repository into an ephemeral sandbox. The agent uses standard tools — grep, find, the file system itself — to locate relevant code. It might take sixteen attempts to find the right file. That is fine. Compute is cheap. What matters is that when it starts writing code, it has real context, not a vector similarity approximation of context.
The sandbox gets destroyed when the task is done. There is no persistent state to manage, no vector index to keep in sync with your codebase. Stateless, disposable, accurate. This is the architecture that actually works for enterprise code generation at scale.
Vector RAG made sense in 2023 when context windows were small and retrieval was the only way to fit a large codebase into a model's working memory. That constraint has loosened. Long contexts are cheap, repository-aware tools are good, and the operational overhead of maintaining a vector index in sync with a moving codebase outweighs the latency savings.
I've argued that agents are software, not prompts. The architecture follows from that. If you would not run your CI off a vectorized snapshot of your codebase, do not run your code-generation agents off one either. Give them the real repo, the real tools, and a clean room to work in.
Sources
Related Essays
Context Is the Moat — Don't Give It Away
If you upload your entire business to a frontier model provider, you have handed them the playbook. Keep context local. Use frontier models as the engine.
The Codebase Is the Territory. The Agent Needs a Map
Every quarter a new model writes marginally better benchmark code. And every quarter enterprise teams stall on the same context problems. The hard part is the engineering around the AI.
Tech Context Is Tractable. Org Context Is Not
The hardest unsolved problem in agent infrastructure is not compute or sandboxing. It is context — and most of that context lives in people, not repos.
Key takeaways
- RAG works for question-answering. It is a poor fit for code generation on a non-trivial codebase.
- Background agents clone the actual repo into an ephemeral sandbox and use grep, find, and the file system to locate context.
- Sixteen failed searches do not matter when compute is cheap. Real context dramatically improves output quality.
FAQ
Why does RAG fall short for code generation?
Vector retrieval gives you a similarity approximation of context. For code, that approximation drops critical structural relationships — imports, type signatures, call graphs, file layout. A sandbox with the real repo and standard tools beats it on any non-trivial task.
What is the operational advantage of sandbox-based agents?
The sandbox gets destroyed when the task is done. There is no persistent state to manage, no vector index to keep in sync with your codebase. Stateless, disposable, accurate. That is the architecture that scales for enterprise code generation.