The biggest lie in agent infrastructure right now is that you can describe a skill in a markdown file and call it done. A skill that says "when the user asks about pipeline metrics, query the analytics database" is not a skill. It is a wish. And wishes do not survive contact with production data.
A real skill has executable tools wired to specific endpoints, automated tests on those tools, validation that the data shape matches what the agent expects, memory of what happened the last hundred times the agent ran this workflow, and integration tailored to the actual data model — not a generic description of the data model. When an agent needs analytics data, a markdown description of the platform is not enough. The skill needs to specify exactly which tables, which event names, which API endpoints. Without that, the agent burns enormous effort on discovery, trying to figure out where data lives instead of answering the question. You watch this happen in real time and realize the model is fine. The harness is starving.
Agent platforms shipping "skill marketplaces" full of markdown configurations are selling something closer to a wiki than to software. Skills need to be treated like any other software artifact — versioned, tested, reviewable, with deterministic checks that run before the agent uses them and observability for what happened when it did. A CTO agent told to never create PRs and only ship to main is not a markdown note. That is a governance rule enforced in the deployment pipeline.
The corollary I keep coming back to: context engineering is the real work, not the infrastructure plumbing. Standing up an EC2 instance with bash scripts and a YAML config is something a competent engineer can do in a week. Getting the context right — the docs, the deterministic checks, the retrieval logic that puts the right information in front of the model at the right time — is the work that never ends. The infrastructure is one-and-done. The context is continuous iteration, and longer context documents degrade performance, so the loop never closes.
I've made the broader case that the harness itself is commoditized. The skill layer is where the differentiation lives. If you are not treating skills like software, you are not building skills.
Sources
Related Essays
The Harness Is Commoditized. Everything Else Is Not
The agent harness — Claude Code, OpenCode, Goose, Aider — is a commodity. Companies migrate between them freely. The defensible layers are context, orchestration, and tools.
You Need a Directory of Agents
Companies have directories of employees. They have no equivalent for the agents doing real work. Every agent should be inspectable, auditable, and correctable.
Users Should Iterate on Agents, Not Developers
Agents are always slightly wrong when first built. The people who know what is wrong are not developers — they are the users who interact with the agent every day.
Key takeaways
- Skills marketplaces full of markdown configs are selling wikis, not software. Real skills need versioning, tests, and deterministic checks.
- Without specific endpoints, table names, and schema validation, agents burn enormous effort on discovery instead of doing the work.
- Context engineering is the maintenance burden that never ends. Infrastructure plumbing is one-and-done; context is continuous iteration.
FAQ
What makes a markdown skill insufficient?
A markdown description tells the agent what exists but not how to operate on it reliably. Real skills need executable tools wired to specific endpoints, automated tests, schema validation, and memory of prior runs so the agent does not waste effort rediscovering the environment.
Why is context engineering harder than infrastructure?
Infrastructure layers — runtimes, dev containers, CI/CD integrations — tend to be one-and-done efforts that rarely break once set up. Context layers — agents.md files, skill definitions, memory journals, deterministic check libraries — need continuous iteration. Longer context degrades performance, so the loop never closes.