Key takeaways
- ~1M lines of code shipped with 0 manually written over 5 months
- 3.5 PRs per engineer per day, scaling to 3-10 engineer-equivalents per person
- Agent-to-agent code review eliminates human review bottleneck
FAQ
What is OpenAI Harness Engineering?
OpenAI's internal engineering team that built a product with ~1M lines of code and zero manually written code. Codex agents run autonomously for 6+ hours per task, and agent-to-agent code review replaces human review.
How does Harness manage agent context?
Harness uses AGENTS.md as a table of contents pointing to a structured docs/ directory — not an encyclopedia file. This gives agents navigable, structured knowledge bases instead of monolithic context dumps.
What is the team's throughput?
Starting with 3 engineers growing to 7, the team merged ~1,500 PRs at 3.5 PRs per engineer per day. Each engineer operates at 3-10x capacity through autonomous agent delegation.
Executive Summary
OpenAI's Harness Engineering team represents the most extreme publicly documented case of agent-driven development. Over 5 months starting August 2025, a team of 3 engineers (growing to 7) shipped approximately 1 million lines of code with zero manually written code and ~1,500 merged PRs. Codex agents run autonomously for 6+ hours per task, and agent-to-agent code review eliminates the human review bottleneck entirely.
| Attribute | Value |
|---|---|
| Company | OpenAI |
| Type | Internal methodology |
| Agent Runtime | Codex |
| Public Documentation | March 2026 |
| Headquarters | San Francisco, CA |
Product Overview
Harness is not a product — it's a methodology and team at OpenAI that treats "no manually-written code" as a core philosophy. Engineers act as orchestrators, delegating all implementation to Codex agents. The key innovation is not just using agents for coding, but building an entire engineering practice around the assumption that humans never write code directly.
Key Capabilities
| Capability | Description |
|---|---|
| Zero manual code | All code written by Codex agents, no exceptions |
| 6+ hour autonomy | Agents run for extended periods without human intervention |
| Agent-to-agent review | Code review performed by agents, not humans |
| Structured knowledge base | AGENTS.md as table of contents, docs/ directory as encyclopedia |
| UI verification | Chrome DevTools Protocol wired into agent runtime |
| Observability access | LogQL and PromQL exposed directly to agents |
Technical Architecture
Context Management
The Harness team's key insight on context management: AGENTS.md should be a table of contents, not an encyclopedia. Rather than stuffing all project knowledge into a single file, they maintain a structured docs/ directory with AGENTS.md serving as a navigable map.
AGENTS.md (table of contents)
↓
docs/
├── architecture.md
├── conventions.md
├── api-reference.md
└── ...
This pattern gives agents structured, discoverable knowledge without overwhelming their context windows.
Agent-to-Agent Code Review
Harness eliminates the human review bottleneck by having agents review each other's code. This is more radical than StrongDM's approach (which eliminates review entirely via behavioral validation) — Harness maintains the code review practice but removes humans from it.
UI Verification
The Chrome DevTools Protocol is wired directly into the agent runtime, allowing Codex agents to:
- Render and inspect UI components
- Verify visual correctness
- Test interactive behavior
- Debug rendering issues
Observability
Agents have direct access to production observability tools:
- LogQL — Query logs in real time
- PromQL — Query metrics and alerting data
This means agents can not only write code but verify its behavior in production.
Results
Key Metrics
| Metric | Value |
|---|---|
| Lines of code | ~1,000,000 |
| Manually written code | 0 |
| Time period | 5 months (Aug 2025 start) |
| PRs merged | ~1,500 |
| Starting team size | 3 engineers |
| Final team size | 7 engineers |
| PRs per engineer per day | 3.5 |
| Engineer multiplier | 3-10x per person |
| Agent autonomy per task | 6+ hours |
Throughput Analysis
At 3.5 PRs per engineer per day with 7 engineers, the team sustains approximately 24.5 PRs per day — or roughly 500 PRs per month. Each engineer effectively operates at 3-10x capacity, meaning the 7-person team produces output equivalent to a 21-70 person team.
Key Insights
1. AGENTS.md as Table of Contents
The most transferable insight: structure your knowledge base as a navigable directory, not a monolithic file. AGENTS.md points to relevant docs, and agents can drill into what they need.
2. No Manual Code as Philosophy
This isn't "use agents when convenient" — it's "never write code manually, period." This constraint forces the team to invest in agent infrastructure, context management, and workflow design.
3. Agent-to-Agent Review Works
By removing humans from code review, the team eliminates what is typically the biggest bottleneck in agent-assisted development. The quality bar is maintained through agent review rather than human inspection.
4. Extended Autonomy is Viable
6+ hour autonomous agent sessions demonstrate that modern agents can handle complex, multi-step tasks without human intervention. This is significantly longer than most reported agent session lengths.
Strengths
- Unprecedented scale — ~1M LOC with zero manual code is the most extreme case documented
- Proven throughput — 3.5 PRs/engineer/day sustained over months
- Full autonomy — 6+ hour sessions, agent-to-agent review, no human bottlenecks
- Transferable insights — AGENTS.md pattern, docs/ structure, observability access are universally applicable
- Officially documented — Published by OpenAI with analysis by Martin Fowler
- Dogfooding — OpenAI eating their own cooking with Codex validates the product
Cautions
- OpenAI advantage — Team has privileged access to Codex capabilities and can directly influence product direction
- New product context — Building greenfield is easier for agents than modifying legacy code
- Small team — 3-7 engineers may not represent patterns that scale to larger organizations
- Codex-specific — Architecture and workflow designed around Codex's specific capabilities
- Survivorship bias — We see the successful project, not the failed attempts or rejected approaches
Competitive Positioning
vs. Other In-House Agents
| System | Comparison |
|---|---|
| Stripe Minions | Minions require human review; Harness uses agent-to-agent review |
| StrongDM Factory | StrongDM eliminates review; Harness replaces human review with agent review |
| Ramp Inspect | Inspect augments human engineers; Harness replaces manual coding entirely |
Unique Position
Harness represents the most aggressive position on the "agent autonomy spectrum":
- Conservative: Agents write code, humans review (Stripe, Ramp)
- Moderate: Agents write code, behavioral validation replaces review (StrongDM)
- Radical: Agents write code, agents review code, humans orchestrate (Harness)
Bottom Line
OpenAI's Harness Engineering is a proof-of-concept for fully agent-driven software development. The "no manual code" philosophy, agent-to-agent review, and 6+ hour autonomy sessions represent the frontier of what's possible today.
Key metrics: ~1M LOC, 0 manual, ~1,500 PRs, 3.5 PRs/engineer/day, 3-10x multiplier.
Architecture pattern: AGENTS.md as TOC → structured docs/ → Codex for all implementation → agent-to-agent review → Chrome DevTools for UI verification → LogQL/PromQL for observability.
Recommended study for: Engineering leaders interested in the upper bound of agent-driven development. The AGENTS.md-as-TOC pattern is immediately applicable regardless of scale.
Not recommended for: Teams expecting to replicate this without OpenAI-level access to frontier models and infrastructure.
Outlook: If Harness-style development becomes viable outside OpenAI, the economics of software engineering change fundamentally. The constraint is model capability — as frontier models improve, this approach becomes more accessible.
Research by Ry Walker Research • methodology
Disclosure: Author is CEO of Tembo, which offers agent orchestration as an alternative to building in-house.