← Back to research
·6 min read·company

OpenAI Harness Engineering

OpenAI's Harness team shipped ~1M lines of code with 0 manually written — 1,500 PRs merged by 3-7 engineers using Codex for 6+ hour autonomous tasks.

Key takeaways

  • ~1M lines of code shipped with 0 manually written over 5 months
  • 3.5 PRs per engineer per day, scaling to 3-10 engineer-equivalents per person
  • Agent-to-agent code review eliminates human review bottleneck

FAQ

What is OpenAI Harness Engineering?

OpenAI's internal engineering team that built a product with ~1M lines of code and zero manually written code. Codex agents run autonomously for 6+ hours per task, and agent-to-agent code review replaces human review.

How does Harness manage agent context?

Harness uses AGENTS.md as a table of contents pointing to a structured docs/ directory — not an encyclopedia file. This gives agents navigable, structured knowledge bases instead of monolithic context dumps.

What is the team's throughput?

Starting with 3 engineers growing to 7, the team merged ~1,500 PRs at 3.5 PRs per engineer per day. Each engineer operates at 3-10x capacity through autonomous agent delegation.

Executive Summary

OpenAI's Harness Engineering team represents the most extreme publicly documented case of agent-driven development. Over 5 months starting August 2025, a team of 3 engineers (growing to 7) shipped approximately 1 million lines of code with zero manually written code and ~1,500 merged PRs. Codex agents run autonomously for 6+ hours per task, and agent-to-agent code review eliminates the human review bottleneck entirely.

AttributeValue
CompanyOpenAI
TypeInternal methodology
Agent RuntimeCodex
Public DocumentationMarch 2026
HeadquartersSan Francisco, CA

Product Overview

Harness is not a product — it's a methodology and team at OpenAI that treats "no manually-written code" as a core philosophy. Engineers act as orchestrators, delegating all implementation to Codex agents. The key innovation is not just using agents for coding, but building an entire engineering practice around the assumption that humans never write code directly.

Key Capabilities

CapabilityDescription
Zero manual codeAll code written by Codex agents, no exceptions
6+ hour autonomyAgents run for extended periods without human intervention
Agent-to-agent reviewCode review performed by agents, not humans
Structured knowledge baseAGENTS.md as table of contents, docs/ directory as encyclopedia
UI verificationChrome DevTools Protocol wired into agent runtime
Observability accessLogQL and PromQL exposed directly to agents

Technical Architecture

Context Management

The Harness team's key insight on context management: AGENTS.md should be a table of contents, not an encyclopedia. Rather than stuffing all project knowledge into a single file, they maintain a structured docs/ directory with AGENTS.md serving as a navigable map.

AGENTS.md (table of contents)
    ↓
docs/
  ├── architecture.md
  ├── conventions.md
  ├── api-reference.md
  └── ...

This pattern gives agents structured, discoverable knowledge without overwhelming their context windows.

Agent-to-Agent Code Review

Harness eliminates the human review bottleneck by having agents review each other's code. This is more radical than StrongDM's approach (which eliminates review entirely via behavioral validation) — Harness maintains the code review practice but removes humans from it.

UI Verification

The Chrome DevTools Protocol is wired directly into the agent runtime, allowing Codex agents to:

  • Render and inspect UI components
  • Verify visual correctness
  • Test interactive behavior
  • Debug rendering issues

Observability

Agents have direct access to production observability tools:

  • LogQL — Query logs in real time
  • PromQL — Query metrics and alerting data

This means agents can not only write code but verify its behavior in production.


Results

Key Metrics

MetricValue
Lines of code~1,000,000
Manually written code0
Time period5 months (Aug 2025 start)
PRs merged~1,500
Starting team size3 engineers
Final team size7 engineers
PRs per engineer per day3.5
Engineer multiplier3-10x per person
Agent autonomy per task6+ hours

Throughput Analysis

At 3.5 PRs per engineer per day with 7 engineers, the team sustains approximately 24.5 PRs per day — or roughly 500 PRs per month. Each engineer effectively operates at 3-10x capacity, meaning the 7-person team produces output equivalent to a 21-70 person team.


Key Insights

1. AGENTS.md as Table of Contents

The most transferable insight: structure your knowledge base as a navigable directory, not a monolithic file. AGENTS.md points to relevant docs, and agents can drill into what they need.

2. No Manual Code as Philosophy

This isn't "use agents when convenient" — it's "never write code manually, period." This constraint forces the team to invest in agent infrastructure, context management, and workflow design.

3. Agent-to-Agent Review Works

By removing humans from code review, the team eliminates what is typically the biggest bottleneck in agent-assisted development. The quality bar is maintained through agent review rather than human inspection.

4. Extended Autonomy is Viable

6+ hour autonomous agent sessions demonstrate that modern agents can handle complex, multi-step tasks without human intervention. This is significantly longer than most reported agent session lengths.


Strengths

  • Unprecedented scale — ~1M LOC with zero manual code is the most extreme case documented
  • Proven throughput — 3.5 PRs/engineer/day sustained over months
  • Full autonomy — 6+ hour sessions, agent-to-agent review, no human bottlenecks
  • Transferable insights — AGENTS.md pattern, docs/ structure, observability access are universally applicable
  • Officially documented — Published by OpenAI with analysis by Martin Fowler
  • Dogfooding — OpenAI eating their own cooking with Codex validates the product

Cautions

  • OpenAI advantage — Team has privileged access to Codex capabilities and can directly influence product direction
  • New product context — Building greenfield is easier for agents than modifying legacy code
  • Small team — 3-7 engineers may not represent patterns that scale to larger organizations
  • Codex-specific — Architecture and workflow designed around Codex's specific capabilities
  • Survivorship bias — We see the successful project, not the failed attempts or rejected approaches

Competitive Positioning

vs. Other In-House Agents

SystemComparison
Stripe MinionsMinions require human review; Harness uses agent-to-agent review
StrongDM FactoryStrongDM eliminates review; Harness replaces human review with agent review
Ramp InspectInspect augments human engineers; Harness replaces manual coding entirely

Unique Position

Harness represents the most aggressive position on the "agent autonomy spectrum":

  • Conservative: Agents write code, humans review (Stripe, Ramp)
  • Moderate: Agents write code, behavioral validation replaces review (StrongDM)
  • Radical: Agents write code, agents review code, humans orchestrate (Harness)

Bottom Line

OpenAI's Harness Engineering is a proof-of-concept for fully agent-driven software development. The "no manual code" philosophy, agent-to-agent review, and 6+ hour autonomy sessions represent the frontier of what's possible today.

Key metrics: ~1M LOC, 0 manual, ~1,500 PRs, 3.5 PRs/engineer/day, 3-10x multiplier.

Architecture pattern: AGENTS.md as TOC → structured docs/ → Codex for all implementation → agent-to-agent review → Chrome DevTools for UI verification → LogQL/PromQL for observability.

Recommended study for: Engineering leaders interested in the upper bound of agent-driven development. The AGENTS.md-as-TOC pattern is immediately applicable regardless of scale.

Not recommended for: Teams expecting to replicate this without OpenAI-level access to frontier models and infrastructure.

Outlook: If Harness-style development becomes viable outside OpenAI, the economics of software engineering change fundamentally. The constraint is model capability — as frontier models improve, this approach becomes more accessible.


Research by Ry Walker Research • methodology

Disclosure: Author is CEO of Tembo, which offers agent orchestration as an alternative to building in-house.