If models can handle git operations themselves, what is the harness still for?

The harness sets steerable defaults, provides the sandbox, tracks artifacts like PR records, and enforces the review and approval layer. It governs the agent rather than puppeteering it.

How do you know when harness logic has become a liability?

When the model fails at something your code forces — opening a PR for a stray temp file, refusing a workflow the user explicitly asked for — the scaffolding is fighting the model instead of helping it.

Why stay agnostic across coding agents instead of betting on one?

No single coding agent is going to win, and teams will keep switching as models leapfrog each other. A thin harness that supports whatever CLI ships next is in a stronger position than a wrapper that competes with the model vendors.

We Made the Agent Smarter by Deleting Code

The biggest upgrade we shipped to our agent platform recently was deletion.

We had two hardcoded jobs at the core of the product. One handled the initial task. The other was a feedback loop we bolted on later, because real work is never one-shot. Both were full of logic written to babysit the model. If the agent produced any diff at all, our code opened a pull request — even if the diff was an accidental temp file. The agent couldn't clone a specific branch, couldn't commit directly when a customer's workflow required it, couldn't open two PRs from one session. Not because the model couldn't do those things. Because our scaffolding wouldn't let it.

Part of the honest history here: we built the product for background jobs. It took three minutes to start, it wasn't efficient with tokens, and so we built a whole pile of guardrails to make it do smart stuff. That stuff worked for the models it was written against — and then it started getting in the way. So we're shedding it and getting back to the basics of what's already available in the CLIs.

We replaced both jobs with one loop: an agent, with tools, in a sandbox. That's it. The models got good enough that the forcing logic became the bottleneck. Every local coding agent already knows how to open a PR, work a branch, run a test suite. So the harness's job changed. Instead of forcing behavior, it steers defaults. Opening a PR against your branch is the default; if you tell the agent to commit directly, it does. The control didn't disappear — it moved from hardcoded paths to steerable defaults inside the harness, which is where it belongs.

The cleanest example of the new approach: PR tracking. We need a pull request record in our database every time the agent ships work — that record drives everything downstream. The old way would have been a custom tool the model has never seen, or a webhook listening for GitHub events. Instead, we patched the GitHub CLI. The models are deeply trained on gh; they reach for it instinctively. So when the agent calls it, the call routes through our CLI, we write the record, and the command behaves exactly as the model expects. The agent learned nothing new, and we deleted the webhook plumbing. Good harness code is invisible to the model — it works with the model's training instead of around it.

The same thing happened with context. Over months of incremental additions, we had stuffed so much guidance into the context window that we made the agent measurably dumber. The fix wasn't better prompting. It was removal. Thirty MCP tools dumped into context degrade the model; skills as files in the filesystem, searched on demand, don't. Our sandboxes now ship with pre-baked skills the agent discovers the way it discovers any file, and an MCP gateway that exposes a search tool instead of thirty tool schemas. Context engineering turns out to be as much about what you withhold as what you provide.

There's a strategic reason deletion matters, not just a technical one. We are not building a coding agent and we are not building an LLM. We're building the layer above them — an agnostic harness that can run Claude Code today, Codex tomorrow, and whatever shipped this morning the day after. That's gnarly in its own way: every agent supports a different feature set, and the harness has to track what works where. But it means we never compete with the model vendors on the thing they care most about. Their lightweight wrappers — a couple of engineers building a background-agent feature around their own model — can't match a team doing this full time, across all of them. Nobody is going to lock into one coding agent for a long time, and the chaos of the current state is our friend. The thinner the harness, the faster we ride each new model instead of fighting it.

Here's the pattern worth generalizing: most agent infrastructure built more than six months ago contains logic that exists solely because the model of that era couldn't be trusted. That logic doesn't age into neutrality. It ages into a constraint. The teams shipping the best agents right now aren't the ones adding the most scaffolding — they're the ones auditing their harness every model generation and asking what they can finally delete.

If your agent platform hasn't removed code in the last quarter, it's probably holding your agents back.

Sources

We Made the Agent Smarter by Deleting Code

Related Essays

The Agent Is the Primitive, Not the Automation

Agents Are Software, Not Prompts

Sessions Replace Tasks, Runs, and Threads