← Back to essays
·2 min read·By Ry Walker

Why Sandboxes Beat Vector RAG for Code Generation

Why Sandboxes Beat Vector RAG for Code Generation

There is a technical insight that matters for anyone building or buying agent infrastructure. Most enterprise AI architectures default to RAG — vectorize your codebase, retrieve relevant chunks, feed them to the model. It works for question-answering. It is a poor fit for code generation.

Background coding agents work differently. They clone the actual repository into an ephemeral sandbox. The agent uses standard tools — grep, find, the file system itself — to locate relevant code. It might take sixteen attempts to find the right file. That is fine. Compute is cheap. What matters is that when it starts writing code, it has real context, not a vector similarity approximation of context.

The sandbox gets destroyed when the task is done. There is no persistent state to manage, no vector index to keep in sync with your codebase. Stateless, disposable, accurate. This is the architecture that actually works for enterprise code generation at scale.

Vector RAG made sense in 2023 when context windows were small and retrieval was the only way to fit a large codebase into a model's working memory. That constraint has loosened. Long contexts are cheap, repository-aware tools are good, and the operational overhead of maintaining a vector index in sync with a moving codebase outweighs the latency savings.

I've argued that agents are software, not prompts. The architecture follows from that. If you would not run your CI off a vectorized snapshot of your codebase, do not run your code-generation agents off one either. Give them the real repo, the real tools, and a clean room to work in.

Key takeaways

  • RAG works for question-answering. It is a poor fit for code generation on a non-trivial codebase.
  • Background agents clone the actual repo into an ephemeral sandbox and use grep, find, and the file system to locate context.
  • Sixteen failed searches do not matter when compute is cheap. Real context dramatically improves output quality.

FAQ

Why does RAG fall short for code generation?

Vector retrieval gives you a similarity approximation of context. For code, that approximation drops critical structural relationships — imports, type signatures, call graphs, file layout. A sandbox with the real repo and standard tools beats it on any non-trivial task.

What is the operational advantage of sandbox-based agents?

The sandbox gets destroyed when the task is done. There is no persistent state to manage, no vector index to keep in sync with your codebase. Stateless, disposable, accurate. That is the architecture that scales for enterprise code generation.