The natural response to the context maintenance problem is to make the agent learn. When you correct it, it should remember. Every correction you type as a code review comment is information that could have been loaded into context from the start.
Obviously the right vision. Also, today, an unsolved research problem.
The closest thing shipping is Claude Code's memory feature, an append-only file the agent self-updates. Other teams have tried journals, learning logs, self-updating instruction files. All hit the same wall. Autoregressive transformers degrade as context documents grow. Compaction is lossy. The relationship between what an agent remembers and what it needs for a specific task is extraordinarily hard to model.
At the scale of an entire codebase, this is a genuine research problem. A question as simple as "what is the on-call schedule?" becomes nearly impossible to answer reliably when schedules are scattered across multiple team repos and the answer depends on who is asking and when.
Narrow to a single workflow, though, and it gets tractable. An agent that learns to run one guardrail check better over time is much more solvable than an agent that learns an entire codebase. I've argued elsewhere that each user needs their own agent instance precisely because the learning surface is too narrow when you try to share. Same insight, different axis.
The real opportunity is not general-purpose memory. It is workflow-scoped learning that compounds. Pick the constraint. Define the input shape. Define the success signal. The memory problem stops being intractable the moment you stop trying to remember everything.
Sources
Related Essays
Each Person Needs Their Own Agent Instance
Shared agents with a single configuration are fundamentally broken. Each user gets a personal copy that learns through interactions — and an organizational layer that locks in what consistently works.
Context Engineering Is the Hard Problem
Models keep getting better, but agents without deep codebase and organizational context are just expensive autocomplete. Context engineering is the bottleneck nobody has productized.
Tech Context Is Tractable. Org Context Is Not
The hardest unsolved problem in agent infrastructure is not compute or sandboxing. It is context — and most of that context lives in people, not repos.
Key takeaways
- The dream is an agent that remembers every correction. The reality is autoregressive transformers degrade as context documents grow.
- At the scale of an entire codebase, "what is the on-call schedule?" is an unsolved retrieval problem — the answer depends on who is asking, when, and across which repos.
- Narrow to a single workflow and learning becomes tractable. The opportunity is workflow-scoped, not general.
FAQ
Why is general-purpose agent memory hard?
Because the relationship between what the agent remembers and what it needs for a specific task is extraordinarily hard to model. Append-only logs degrade with context length, compaction is lossy, and retrieval across an entire organization breaks at the seams.
What does workflow-scoped learning look like in practice?
An agent that learns to run one guardrail check better over time. An agent that gets sharper at one ticket triage flow. The constraint is what makes the learning tractable — narrow input distribution, narrow output distribution, narrow feedback signal.