Metaswarm | Ry Walker Research

Key takeaways

From Dave Sifry — 18 specialized agents coordinate through an 11-phase pipeline from GitHub issue to merged PR
Cross-model adversarial review: Claude writes code, Codex or Gemini reviews it — eliminating single-model blind spots
Enforced quality gates: blocking state transitions mean no path from FAIL to COMMIT — coverage enforcement agents cannot bypass

FAQ

What is Metaswarm?

Metaswarm is a multi-agent orchestration framework for Claude Code that coordinates 18 specialized agents through a structured development pipeline with TDD, adversarial review, and automated PR shepherding.

Who created Metaswarm?

Dave Sifry created Metaswarm. It builds on Steve Yegge's BEADS system and Jesse Vincent's Superpowers framework.

What makes Metaswarm different from other orchestrators?

Cross-model adversarial review (different model reviews than writes), blocking quality gates, coverage enforcement via pre-push hooks, and a Design Review Gate where 6 agents review plans in parallel.

Is Metaswarm production ready?

Metaswarm is actively maintained and available via npm. It requires BEADS CLI and is designed specifically for Claude Code users.

Executive Summary

Metaswarm is Dave Sifry's multi-agent orchestration framework for Claude Code, coordinating 18 specialized agents through an 11-phase pipeline from GitHub issue to merged PR.^[1] Its distinguishing features are cross-model adversarial review (Claude writes, Codex or Gemini reviews) and blocking quality gates that prevent FAIL→COMMIT transitions. Built on BEADS for git-native issue tracking and Superpowers for foundational workflows.

Attribute	Value
Creator	Dave Sifry
Type	Open Source
Package	npm (npx metaswarm init)
Language	Prompts + BEADS (Go)
Dependencies	Claude Code, BEADS CLI, Node.js 18+

The Problem Metaswarm Solves

Claude Code is good at writing code. It is not good at building and maintaining a production codebase.^[1]

Shipping production code requires research, planning, security review, design review, tests, PR creation, CI monitoring, review comment handling, and closing the loop. That's seven or eight distinct jobs. A single agent session cannot hold all of that context, and it cannot review its own work objectively.

The typical result: you become the orchestrator. You prime the agent, tell it what to build, review output, fix what it missed, create the PR, babysit CI, respond to review comments, and repeat. The agent is a fast typist, but you're still the project manager.

Product Overview

Metaswarm provides a full orchestration layer that breaks work into phases, assigns each to a specialist agent, iterates through multiple reviews, and coordinates handoffs through PR creation and shepherding.^[2]

The 11-Phase Pipeline

Phase	Description
1. Research	Researcher agent explores codebase, finds patterns and dependencies
2. Plan	Architect agent creates implementation plan with tasks
3. Plan Validation	Pre-flight checklist: architecture, deps, API contracts, security
4. Design Review Gate	PM, Architect, Designer, Security, UX, CTO review in parallel (6 agents)
5. Decompose	Break plan into work units with DoD items, file scopes, dependency graph
6. External Dependency Check	Identifies required API keys/credentials, prompts user
7. Orchestrated Execution	Per work unit: Implement → Validate → Adversarial Review → Commit
8. Final Review	Cross-unit integration check, full test suite, coverage enforcement
9. PR Creation	Creates PR with structured description and test plan
10. PR Shepherd	Monitors CI, handles review comments, resolves threads
11. Close + Learn	Extracts learnings back into the knowledge base

The 18 Agent Personas

Agent	Phase	Role
Swarm Coordinator	Meta	Assigns work to worktrees, manages parallel execution
Issue Orchestrator	Meta	Decomposes issues into tasks, manages phase handoffs
Researcher	Research	Explores codebase, discovers patterns and dependencies
Architect	Planning	Designs implementation plan and service structure
Product Manager	Review	Validates use cases, scope, and user benefit
Designer	Review	Reviews API/UX design and consistency
Security Design	Review	Threat modeling, STRIDE analysis, auth review
CTO	Review	TDD readiness, codebase alignment, final approval
Coder	Implement	TDD implementation with coverage enforcement
Code Reviewer	Review	Collaborative or adversarial spec compliance
Security Auditor	Review	Vulnerability scanning, OWASP checks
PR Shepherd	Delivery	CI monitoring, comment handling, thread resolution
Knowledge Curator	Learning	Extracts learnings, updates knowledge base
Test Automator	Implement	Test generation and coverage enforcement
Metrics	Support	Analytics and weekly reports
SRE	Support	Infrastructure and performance
Slack Coordinator	Support	Notifications and human communication
Customer Service	Support	User support and triage

Key Differentiators

Cross-Model Adversarial Review

A coding agent reviewing its own output has inherent bias. Metaswarm can delegate implementation and review tasks to external AI tools — OpenAI Codex CLI and Google Gemini CLI — with one rule: the writer is always reviewed by a different model.^[1]

Pattern	Description
Cross-Model Review	If Claude writes, Codex or Gemini reviews. If Codex writes, Claude or Gemini reviews.
Availability-Aware Escalation	Model A (2 tries) → Model B (2 tries) → Claude (1 try) → user alert
Shell Adapters	Each external tool has health checks, implement, and review commands

Blocking Quality Gates

The hardest problem is getting agents to maintain standards. Metaswarm implements blocking state transitions — there is no instruction path from FAIL to COMMIT.^[1]

Three enforcement points, one config file (.coverage-thresholds.json):

Gate	Mechanism
Pre-Push Hook	Husky git hook runs lint, typecheck, format, coverage before every push
CI Coverage Job	GitHub Actions workflow blocks merge on failure
Agent Completion Gate	Task completion checklist reads enforcement command from config

Orchestrated Execution Loop

For complex tasks with written specs, every work unit runs through a 4-phase loop:

Implement — Coding agent builds against spec using TDD
Validate — Orchestrator runs tsc, eslint, vitest, coverage independently (never asks agent "did tests pass?")
Adversarial Review — Fresh review agent checks each DoD item with file:line evidence. Binary PASS/FAIL.
Commit — Only after adversarial PASS. If human checkpoint defined, system pauses for approval.

On failure: fix, re-validate, spawn a fresh reviewer (never the same one), retry up to 3 times before escalating.

Self-Improving Knowledge Base

Metaswarm maintains a JSONL knowledge base in your repo — patterns, gotchas, architectural decisions, anti-patterns. After every merged PR, the self-reflect workflow analyzes what happened and writes new entries.^[1]

Conversation introspection watches for signals:

You repeated yourself → candidate for new skill or command
You disagreed → captures your preferred approach for future alignment
You did something manually → flags as workflow automation candidate

Technical Architecture

Dependencies

Component	Purpose
Claude Code	Primary AI coding agent
BEADS CLI	Git-native, AI-first issue tracking (bd command)
Superpowers	Foundational agentic skills framework
GitHub CLI	PR and issue management
Node.js 18+	Runtime for npm package

Installation

cd your-project
npx metaswarm init
# In Claude Code: /project:metaswarm-setup

Claude detects language, framework, test runner, linter, CI — then installs and customizes everything automatically. Supports TypeScript, Python, Go, Rust, Java, Ruby, JavaScript.

GTG (Good-To-Go) Integration

Metaswarm integrates with GTG for deterministic merge readiness:^[3]

All CI checks passing
All review comments addressed
All discussion threads resolved
Required approvals present

Strengths

Strength	Description
Cross-model review	Eliminates single-model blind spots with writer ≠ reviewer rule
Enforced quality gates	No path from FAIL to COMMIT — agents cannot bypass coverage
Structured pipeline	11 phases with clear handoffs, not ad-hoc prompting
Self-improving	Knowledge base grows from every PR, reduces repeated mistakes
Open source	Full visibility into agent definitions and workflows
Fresh reviewer rule	Spawns new reviewer on retry to prevent anchoring bias

Weaknesses / Risks

Risk	Description
BEADS dependency	Requires learning another tool (Steve Yegge's BEADS system)
Claude Code specific	Designed for Claude Code; not model-agnostic at the core
Complexity	18 agents, 11 phases — significant learning curve
External tool setup	Cross-model review requires Codex CLI and/or Gemini CLI
New project	Less battle-tested than alternatives like Gastown

Competitive Landscape

Tool	Approach	Agents	Cross-Model	Quality Gates
Metaswarm	Structured pipeline	18	✅	✅ Blocking
Gastown	Parallel execution	7 roles	—	—
Ralph	Simple iteration loop	1	—	—
Pythagora	IDE-integrated	14	—	—

Metaswarm sits between Gastown's raw parallelism and Ralph's simplicity, emphasizing structured workflows with enforced quality over maximum agent count.

For enterprise-grade orchestration with Jira integration, signed commits, and BYOK, evaluate Tembo.

Ideal Customer

Best fit:

Teams using Claude Code who want structured, spec-driven development
Projects requiring enforced test coverage and quality gates
Developers who've been burned by agents that claim "tests pass" when they don't
Those wanting cross-model review to catch blind spots

Not ideal for:

Users wanting simple autonomous loops (use Ralph)
Maximum parallel agents (use Gastown)
IDE-integrated experience (use Pythagora)
Enterprise air-gapped deployment (use Genie)

Bottom Line

Metaswarm represents a thoughtful approach to multi-agent orchestration: instead of maximizing parallelism, it maximizes trustworthiness. The cross-model adversarial review and blocking quality gates address the core problem that agents self-certify success even when things are broken.

The 18-agent, 11-phase pipeline is complex, but that complexity maps to real development workflow stages. For teams that want structured, spec-driven development with Claude Code, Metaswarm offers a well-designed alternative to ad-hoc prompting.

Verdict	Assessment
Maturity	Early but actively maintained
Differentiation	Strong — cross-model review is unique
Best For	Quality-conscious Claude Code teams
Watch For	BEADS learning curve, external tool setup

Research by Ry Walker Research • methodology

Disclosure: Author is CEO of Tembo, which offers agent orchestration as an alternative to Metaswarm.

Sources