Key takeaways
- Role-based agents fail because the context engineering burden scales with the breadth of the job; workflow-based agents scope context to a single task and compound.
- Agents are primitives — prompt, tools, model, sandbox — independent of trigger. Automations are just one trigger type, not a separate category of system.
- Sessions are the universal unit of work. Tasks, runs, threads, and chats are all the same object with different names.
- Most enterprise agent value comes from triggered workflows running in the background, not from humans typing into a chat box.
- Inspectable logic beats hidden prompts. Trust, not capability, is the real bottleneck for adoption.
- The companies that win the agent era will give developers maximum extensibility, not monolithic agents that try to replace them.
FAQ
Why do role-based agents fail in production?
Role-based agents require unbounded context engineering — every legacy corner, design system convention, and team-specific norm has to be taught. That burden scales with the breadth of the role, so the agent never reaches a stable, trustworthy baseline.
How is a workflow agent different from an automation?
An automation bundles trigger, prompt, tools, and model into one flat object. A workflow agent treats the agent as a primitive — prompt, tools, model, sandbox — independent of how it gets invoked. Triggers become a separate concern, which makes agents reusable across crons, humans, and other agents.
Where should a team start if they want to adopt this approach?
Pick the three most annoying, repetitive workflows your team does every week and build scoped agents for each one. Make the logic inspectable, let developers own them, and compose them over time. Avoid trying to replace a whole role on day one.
The default enterprise AI strategy right now goes something like this: hire an agent to be a software engineer. Give it the codebase. Let it fix bugs, implement features, review PRs, write changelogs, handle chores. Give it a role. Expect it to fill the seat.
This is the vision Devin popularized and the vision most enterprises are chasing. It is also a trap.
The problem is not that role-based agents cannot do these things in a demo. The problem is what is hidden behind the demo: the staggering amount of context engineering required to make any of it work in production. Every codebase has legacy corners that trip the agent up. Every team has a specific design system, a specific way they write PRs, a specific changelog format. The agent does not know any of this on day one, and teaching it is not a configuration step. It is an ongoing software engineering effort that scales with the breadth of what you ask the agent to do.
When you start with a role, you are starting with the hardest possible version of the problem.
Flip It: Start With Workflows
There is another way. Instead of starting at the top — here is a role, now figure out the context — you start at the bottom. Pick one specific, repeatable workflow. Make the agent do that one thing well. Then add the next workflow. Then the next.
This is not a new idea in computing. It maps directly to the microkernel versus monolithic kernel debate from decades ago: do you build one massive process that does everything, or do you compose many small, specialized processes that coordinate? The history of infrastructure suggests composability wins over time, even when the monolith looks more impressive in the short term.
A workflow-based agent starts with a specific chore. Writing a changelog every week. Enriching new leads in a CRM. Running a deterministic guardrail check on every deployment. It operates in a narrow domain. The context engineering is scoped and manageable. The deterministic guardrails are specific to that workflow. The person who set it up understands exactly what it does and why.
Over time, you compose these workflow agents into something that starts to resemble a role. But you build up from a foundation that actually works, rather than trying to boil the ocean on day one.
Agents Are Primitives, Not Automations
Most teams building with AI right now have an automation layer. It runs on a schedule or fires on a webhook. It has a prompt, maybe some repo access, maybe a tool or two. It works. And it is architecturally wrong.
The problem is not that automations are broken. The problem is that calling them automations constrains how your team thinks about what they can do. An automation is a cron job with a language model attached. An agent is something fundamentally different — and the distinction is not semantic. It is structural.
When you build an automation, you bundle too many concerns into one object. Trigger logic, prompt, tools, model selection, sandbox, reasoning level — all of it in one flat construct. That works when you have five automations. It falls apart at fifty.
Here is what needs to happen: the agent becomes its own primitive. It has a prompt, skills, MCP connections, a model, a reasoning level, sandbox configuration. It exists independently of when or how it gets invoked.
Triggers — schedules, webhooks, events — become a separate concern. A trigger creates a session that uses an agent. The agent does not know or care how it was invoked. Automations stop being a category of system and become what they always were: one of several ways to start a session.
Sessions Replace Everything
One of the most telling symptoms of architectural drift is when the same object has three names. In many AI-native codebases right now, you will find tasks, runs, issues, threads, and chats all referring to roughly the same thing: a unit of interaction between a human and an AI system.
The right word is session. A session is a container for work. It can include multiple tasks. It can span multiple interactions. It can be multiplayer — more than one person contributing to the same session. It can be paused, resumed, forked.
Sessions work because they do not prescribe what happens inside them. A session might be a code review. It might be a research task. It might be a conversation where someone asks a question, gets an answer, and then pivots to something completely different. Calling it a task implies a single unit of work with a completion state. Calling it a thread implies rapid back-and-forth. Calling it a chat undersells the work being done.
A session is just a session. You had a session. You got work done. Maybe you will come back to it tomorrow.
If agents are their own primitive and sessions are the universal container, you can invoke any agent inside any session. You start a session. You are writing code. You realize you need a security review. You select the security agent from a dropdown. That single message runs with the security agent's prompt, tools, model, and configuration. The next message goes back to your default. This is not a new session. This is not a new tab. You are composing agents within a single workflow, the same way you might call a function from a different library.
What This Looks Like in Practice
I have been working with a customer success team that illustrates the workflow-first approach perfectly. They have HubSpot as their system of record. Half the team barely uses it. People are managing their daily work across post-it notes, Excel spreadsheets, Jira tickets, Airtable, and Intercom — five or six different systems with no consistent workflow.
The instinct — and I have this instinct too, because my operational brain kicks in — is to say: let us build a dashboard, get everyone using tasks, enforce a work unit, standardize on one tool. That is the role-based approach. You are trying to solve everything at once by imposing a system from the top down.
Instead, we started with one workflow: a daily priority list. Every morning at 9 AM, an agent pulls data from HubSpot, checks for untouched accounts, looks at open support tickets, and sends each person a prioritized list of what they should focus on today. That is it. One workflow, one cadence, one output.
The agent pulls from the systems that already exist — HubSpot for account ownership, the ticketing system for open issues, call transcripts for recent activity. It does not require anyone to change their behavior or adopt a new tool. It meets them where they already work, which in this case is Slack.
From there, the next workflow is conversational updates. The rep talks to the agent: "I called this account and left a message." The agent updates HubSpot. No toggling between systems. No manual data entry.
Then the next workflow: the agent reviews call transcripts and flags stalled agreements. Then it incorporates revenue data to tier the priority list. Then escalation rules — if a high-value account has an urgent ticket open for ten days and no one has responded, surface it to a manager.
Each of these is a small, scoped workflow. Each one is independently valuable. Composed together, they start to look like something that could have been pitched as "an AI account manager" — except it actually works, because it was built from the bottom up with specific rules, specific data sources, and specific logic the team can inspect and modify.
The Algorithm Should Be Inspectable
The logic that drives these workflow agents should be visible and understandable, not hidden inside a prompt. When we build the prioritization algorithm for that daily list, it is written as code anyone can read. Even if you are not a developer, you can ask the agent to explain how the prioritization works, and it will walk through the logic.
This matters because trust is the bottleneck for agent adoption, not capability. If the daily priority list surfaces something that feels wrong, the rep needs to be able to understand why. Maybe the algorithm weights last-touch date too heavily and ignores account tier. Maybe it does not account for accounts in active onboarding. These are not model problems. They are logic problems, and they should be solved by adjusting inspectable rules, not by hoping the next model version gets it right.
The same principle applies to the data pipeline. The agent can pull from HubSpot, from Metabase, from Airtable, from call transcript tools — it does not matter where the data lives. What matters is that the transformation from raw data to prioritized output is deterministic and reviewable. The agent can write the code to do this transformation, but the code should persist as a permanent, inspectable artifact that evolves over time.
Automations Generate Most of the Volume
When I look at how teams actually use agent systems today — not how vendors pitch them, but how they get used in production — the pattern is overwhelming. The vast majority of agent activity is automations and triggered workflows, not human-initiated chat interactions.
If you set up an hourly automation, it will generate 10x the volume of human-triggered tasks simply because machines do not sleep, do not forget, and do not context-switch. This ratio only grows over time. The teams getting the most value from their agent infrastructure are the ones that leaned into triggered workflows early.
This is not a temporary state. This is the equilibrium. The future of enterprise AI is not a chatbot you talk to when you remember to. It is a mesh of background workflows that execute continuously, produce reviewable output, and surface results for human approval when needed.
This is why the agent-as-primitive architecture matters so much. If your "automations" are bundled with their triggers, you cannot reuse them. The same workflow that runs on a 9 AM cron should be invokable by a human in a session, by another agent as a sub-task, by a webhook from a CRM. The agent is the unit of capability. The trigger is incidental.
The Customization Moat
Here is where it gets interesting from a competitive standpoint. The large AI companies — Anthropic, OpenAI, Google — are all trying to identify enterprise workflows and productize them. I know this firsthand. They are hiring people specifically to map workflows that can be tied to model improvements and shipped as products.
But they bring a fundamental bias. They want workflow quality to be a function of model quality, because that is what they sell. The contrarian bet is that a probabilistic system cannot match the quality of a workflow agent that has been customized with deep, specific attention to how a particular company actually operates.
Every company has different integrations. Different contexts. Different standards. Different edge cases that matter enormously to them and not at all to anyone else. The value is in the customization, not the model.
This is why the company that wins this era will be the one that gives developers the most freedom and extensibility. Not a monolithic agent that tries to replace the developer, but a platform that lets developers build, customize, and own their workflow agents.
The IKEA Effect Is Real
There is a well-documented cognitive bias called the IKEA effect: people disproportionately value products they partially created. Researchers found subjects were willing to pay 63% more for furniture they assembled themselves than for equivalent pre-assembled items.
This applies directly to agent tooling. When a developer builds a workflow agent — scopes it, customizes it, iterates on it — they care about it. They want to improve it. They feel ownership over it. Compare that to a vendor-provided agent that does something generically: the developer has no attachment, no investment, and no motivation to make it better.
Platforms that lean into this dynamic — that make developers feel like the agents are theirs — generate far more engagement, far more iteration, and far more production value than the ones that ship pre-built agents and hope for adoption.
This extends to end users too. When the customer success team can submit ideas for how the agent should prioritize their work — when they can say "I want to see my ten highest-revenue accounts with no touch in the last two weeks" and watch that become a real capability — they are invested. They are co-building the system. That is a fundamentally different relationship than being handed a tool and told to use it.
Think Small to Win Big
The AI discourse right now is dominated by visions I do not believe will come to pass in the next decade. Download a business, import it, run it. Agents that replace entire departments. The permanent underclass of displaced knowledge workers.
I think this is wrong. AI infrastructure will work the same way every other infrastructure wave has worked: with a ton of low-level primitives that slowly get built up into something useful. The companies that try to skip straight to the grand vision will burn through capital and credibility. The companies that start small — one workflow, done well, for a specific team — will compound their way into something much larger.
The path is not glamorous. It is not a demo that makes a VC gasp. It is a developer who was annoyed by a repetitive task, built a workflow agent to handle it in a weekend, and then never thought about that task again. Multiply that by every developer in the organization, and you have something that looks a lot like an agent mesh — not because someone architected it from the top down, but because it grew organically from the bottom up.
If you are leading an engineering or AI team and trying to figure out your agent strategy, here is what I would suggest. Stop trying to hire an agent for a role. Identify the three most annoying, repetitive workflows your team does every week. Build agents for those. Make them good. Make them inspectable. Let your developers own them.
Then do three more. Then three more after that.
You will end up with something far more valuable than a role-based agent that kind of works at everything. You will have a portfolio of workflow agents that each work extremely well at one thing, that your team actually trusts, and that compound in value every week they run.
The gap between AI demo and AI deployment is not a model problem. It is a software engineering problem. And software engineering has always been about building small things well and composing them into larger things that work.
Related Essays
The Agent Harness Problem
Enterprise agents need layered interfaces, real software skills, and flexible platforms. The harness around the model matters more than the model.
The Atomic Agent Mesh: Architecture, Build-vs-Buy, and the Review Layer
Enterprise AI will not be one mega-agent. It will be a mesh of atomic, auditable units, and the companies that nail review and context will own the next infrastructure layer.
Knowledge Work Automation Is Software Engineering. Period.
Enterprises treating AI agents as a procurement decision will fail. Agents are software, every obvious idea commoditizes in weeks, and the bottleneck has moved from writing code to reviewing it.