Key takeaways
- From Dave Sifry — 18 specialized agents coordinate through an 11-phase pipeline from GitHub issue to merged PR
- Cross-model adversarial review: Claude writes code, Codex or Gemini reviews it — eliminating single-model blind spots
- Enforced quality gates: blocking state transitions mean no path from FAIL to COMMIT — coverage enforcement agents cannot bypass
FAQ
What is Metaswarm?
Metaswarm is a multi-agent orchestration framework for Claude Code that coordinates 18 specialized agents through a structured development pipeline with TDD, adversarial review, and automated PR shepherding.
Who created Metaswarm?
Dave Sifry created Metaswarm. It builds on Steve Yegge's BEADS system and Jesse Vincent's Superpowers framework.
What makes Metaswarm different from other orchestrators?
Cross-model adversarial review (different model reviews than writes), blocking quality gates, coverage enforcement via pre-push hooks, and a Design Review Gate where 6 agents review plans in parallel.
Is Metaswarm production ready?
Metaswarm is actively maintained and available via npm. It requires BEADS CLI and is designed specifically for Claude Code users.
Executive Summary
Metaswarm is Dave Sifry's multi-agent orchestration framework for Claude Code, coordinating 18 specialized agents through an 11-phase pipeline from GitHub issue to merged PR.[1] Its distinguishing features are cross-model adversarial review (Claude writes, Codex or Gemini reviews) and blocking quality gates that prevent FAIL→COMMIT transitions. Built on BEADS for git-native issue tracking and Superpowers for foundational workflows.
| Attribute | Value |
|---|---|
| Creator | Dave Sifry |
| Type | Open Source |
| Package | npm (npx metaswarm init) |
| Language | Prompts + BEADS (Go) |
| Dependencies | Claude Code, BEADS CLI, Node.js 18+ |
The Problem Metaswarm Solves
Claude Code is good at writing code. It is not good at building and maintaining a production codebase.[1]
Shipping production code requires research, planning, security review, design review, tests, PR creation, CI monitoring, review comment handling, and closing the loop. That's seven or eight distinct jobs. A single agent session cannot hold all of that context, and it cannot review its own work objectively.
The typical result: you become the orchestrator. You prime the agent, tell it what to build, review output, fix what it missed, create the PR, babysit CI, respond to review comments, and repeat. The agent is a fast typist, but you're still the project manager.
Product Overview
Metaswarm provides a full orchestration layer that breaks work into phases, assigns each to a specialist agent, iterates through multiple reviews, and coordinates handoffs through PR creation and shepherding.[2]
The 11-Phase Pipeline
| Phase | Description |
|---|---|
| 1. Research | Researcher agent explores codebase, finds patterns and dependencies |
| 2. Plan | Architect agent creates implementation plan with tasks |
| 3. Plan Validation | Pre-flight checklist: architecture, deps, API contracts, security |
| 4. Design Review Gate | PM, Architect, Designer, Security, UX, CTO review in parallel (6 agents) |
| 5. Decompose | Break plan into work units with DoD items, file scopes, dependency graph |
| 6. External Dependency Check | Identifies required API keys/credentials, prompts user |
| 7. Orchestrated Execution | Per work unit: Implement → Validate → Adversarial Review → Commit |
| 8. Final Review | Cross-unit integration check, full test suite, coverage enforcement |
| 9. PR Creation | Creates PR with structured description and test plan |
| 10. PR Shepherd | Monitors CI, handles review comments, resolves threads |
| 11. Close + Learn | Extracts learnings back into the knowledge base |
The 18 Agent Personas
| Agent | Phase | Role |
|---|---|---|
| Swarm Coordinator | Meta | Assigns work to worktrees, manages parallel execution |
| Issue Orchestrator | Meta | Decomposes issues into tasks, manages phase handoffs |
| Researcher | Research | Explores codebase, discovers patterns and dependencies |
| Architect | Planning | Designs implementation plan and service structure |
| Product Manager | Review | Validates use cases, scope, and user benefit |
| Designer | Review | Reviews API/UX design and consistency |
| Security Design | Review | Threat modeling, STRIDE analysis, auth review |
| CTO | Review | TDD readiness, codebase alignment, final approval |
| Coder | Implement | TDD implementation with coverage enforcement |
| Code Reviewer | Review | Collaborative or adversarial spec compliance |
| Security Auditor | Review | Vulnerability scanning, OWASP checks |
| PR Shepherd | Delivery | CI monitoring, comment handling, thread resolution |
| Knowledge Curator | Learning | Extracts learnings, updates knowledge base |
| Test Automator | Implement | Test generation and coverage enforcement |
| Metrics | Support | Analytics and weekly reports |
| SRE | Support | Infrastructure and performance |
| Slack Coordinator | Support | Notifications and human communication |
| Customer Service | Support | User support and triage |
Key Differentiators
Cross-Model Adversarial Review
A coding agent reviewing its own output has inherent bias. Metaswarm can delegate implementation and review tasks to external AI tools — OpenAI Codex CLI and Google Gemini CLI — with one rule: the writer is always reviewed by a different model.[1]
| Pattern | Description |
|---|---|
| Cross-Model Review | If Claude writes, Codex or Gemini reviews. If Codex writes, Claude or Gemini reviews. |
| Availability-Aware Escalation | Model A (2 tries) → Model B (2 tries) → Claude (1 try) → user alert |
| Shell Adapters | Each external tool has health checks, implement, and review commands |
Blocking Quality Gates
The hardest problem is getting agents to maintain standards. Metaswarm implements blocking state transitions — there is no instruction path from FAIL to COMMIT.[1]
Three enforcement points, one config file (.coverage-thresholds.json):
| Gate | Mechanism |
|---|---|
| Pre-Push Hook | Husky git hook runs lint, typecheck, format, coverage before every push |
| CI Coverage Job | GitHub Actions workflow blocks merge on failure |
| Agent Completion Gate | Task completion checklist reads enforcement command from config |
Orchestrated Execution Loop
For complex tasks with written specs, every work unit runs through a 4-phase loop:
- Implement — Coding agent builds against spec using TDD
- Validate — Orchestrator runs tsc, eslint, vitest, coverage independently (never asks agent "did tests pass?")
- Adversarial Review — Fresh review agent checks each DoD item with file:line evidence. Binary PASS/FAIL.
- Commit — Only after adversarial PASS. If human checkpoint defined, system pauses for approval.
On failure: fix, re-validate, spawn a fresh reviewer (never the same one), retry up to 3 times before escalating.
Self-Improving Knowledge Base
Metaswarm maintains a JSONL knowledge base in your repo — patterns, gotchas, architectural decisions, anti-patterns. After every merged PR, the self-reflect workflow analyzes what happened and writes new entries.[1]
Conversation introspection watches for signals:
- You repeated yourself → candidate for new skill or command
- You disagreed → captures your preferred approach for future alignment
- You did something manually → flags as workflow automation candidate
Technical Architecture
Dependencies
| Component | Purpose |
|---|---|
| Claude Code | Primary AI coding agent |
| BEADS CLI | Git-native, AI-first issue tracking (bd command) |
| Superpowers | Foundational agentic skills framework |
| GitHub CLI | PR and issue management |
| Node.js 18+ | Runtime for npm package |
Installation
cd your-project
npx metaswarm init
# In Claude Code: /project:metaswarm-setup
Claude detects language, framework, test runner, linter, CI — then installs and customizes everything automatically. Supports TypeScript, Python, Go, Rust, Java, Ruby, JavaScript.
GTG (Good-To-Go) Integration
Metaswarm integrates with GTG for deterministic merge readiness:[3]
- All CI checks passing
- All review comments addressed
- All discussion threads resolved
- Required approvals present
Strengths
| Strength | Description |
|---|---|
| Cross-model review | Eliminates single-model blind spots with writer ≠ reviewer rule |
| Enforced quality gates | No path from FAIL to COMMIT — agents cannot bypass coverage |
| Structured pipeline | 11 phases with clear handoffs, not ad-hoc prompting |
| Self-improving | Knowledge base grows from every PR, reduces repeated mistakes |
| Open source | Full visibility into agent definitions and workflows |
| Fresh reviewer rule | Spawns new reviewer on retry to prevent anchoring bias |
Weaknesses / Risks
| Risk | Description |
|---|---|
| BEADS dependency | Requires learning another tool (Steve Yegge's BEADS system) |
| Claude Code specific | Designed for Claude Code; not model-agnostic at the core |
| Complexity | 18 agents, 11 phases — significant learning curve |
| External tool setup | Cross-model review requires Codex CLI and/or Gemini CLI |
| New project | Less battle-tested than alternatives like Gastown |
Competitive Landscape
| Tool | Approach | Agents | Cross-Model | Quality Gates |
|---|---|---|---|---|
| Metaswarm | Structured pipeline | 18 | ✅ | ✅ Blocking |
| Gastown | Parallel execution | 7 roles | — | — |
| Ralph | Simple iteration loop | 1 | — | — |
| Pythagora | IDE-integrated | 14 | — | — |
Metaswarm sits between Gastown's raw parallelism and Ralph's simplicity, emphasizing structured workflows with enforced quality over maximum agent count.
For enterprise-grade orchestration with Jira integration, signed commits, and BYOK, evaluate Tembo.
Ideal Customer
Best fit:
- Teams using Claude Code who want structured, spec-driven development
- Projects requiring enforced test coverage and quality gates
- Developers who've been burned by agents that claim "tests pass" when they don't
- Those wanting cross-model review to catch blind spots
Not ideal for:
- Users wanting simple autonomous loops (use Ralph)
- Maximum parallel agents (use Gastown)
- IDE-integrated experience (use Pythagora)
- Enterprise air-gapped deployment (use Genie)
Bottom Line
Metaswarm represents a thoughtful approach to multi-agent orchestration: instead of maximizing parallelism, it maximizes trustworthiness. The cross-model adversarial review and blocking quality gates address the core problem that agents self-certify success even when things are broken.
The 18-agent, 11-phase pipeline is complex, but that complexity maps to real development workflow stages. For teams that want structured, spec-driven development with Claude Code, Metaswarm offers a well-designed alternative to ad-hoc prompting.
| Verdict | Assessment |
|---|---|
| Maturity | Early but actively maintained |
| Differentiation | Strong — cross-model review is unique |
| Best For | Quality-conscious Claude Code teams |
| Watch For | BEADS learning curve, external tool setup |
Research by Ry Walker Research • methodology
Disclosure: Author is CEO of Tembo, which offers agent orchestration as an alternative to Metaswarm.