← Back to research
·9 min read·opensource

Metaswarm

Metaswarm is Dave Sifry's open-source multi-agent orchestrator for Claude Code with 18 specialized agents, cross-model adversarial review, and enforced quality gates.

Key takeaways

  • From Dave Sifry — 18 specialized agents coordinate through an 11-phase pipeline from GitHub issue to merged PR
  • Cross-model adversarial review: Claude writes code, Codex or Gemini reviews it — eliminating single-model blind spots
  • Enforced quality gates: blocking state transitions mean no path from FAIL to COMMIT — coverage enforcement agents cannot bypass

FAQ

What is Metaswarm?

Metaswarm is a multi-agent orchestration framework for Claude Code that coordinates 18 specialized agents through a structured development pipeline with TDD, adversarial review, and automated PR shepherding.

Who created Metaswarm?

Dave Sifry created Metaswarm. It builds on Steve Yegge's BEADS system and Jesse Vincent's Superpowers framework.

What makes Metaswarm different from other orchestrators?

Cross-model adversarial review (different model reviews than writes), blocking quality gates, coverage enforcement via pre-push hooks, and a Design Review Gate where 6 agents review plans in parallel.

Is Metaswarm production ready?

Metaswarm is actively maintained and available via npm. It requires BEADS CLI and is designed specifically for Claude Code users.

Executive Summary

Metaswarm is Dave Sifry's multi-agent orchestration framework for Claude Code, coordinating 18 specialized agents through an 11-phase pipeline from GitHub issue to merged PR.[1] Its distinguishing features are cross-model adversarial review (Claude writes, Codex or Gemini reviews) and blocking quality gates that prevent FAIL→COMMIT transitions. Built on BEADS for git-native issue tracking and Superpowers for foundational workflows.

AttributeValue
CreatorDave Sifry
TypeOpen Source
Packagenpm (npx metaswarm init)
LanguagePrompts + BEADS (Go)
DependenciesClaude Code, BEADS CLI, Node.js 18+

The Problem Metaswarm Solves

Claude Code is good at writing code. It is not good at building and maintaining a production codebase.[1]

Shipping production code requires research, planning, security review, design review, tests, PR creation, CI monitoring, review comment handling, and closing the loop. That's seven or eight distinct jobs. A single agent session cannot hold all of that context, and it cannot review its own work objectively.

The typical result: you become the orchestrator. You prime the agent, tell it what to build, review output, fix what it missed, create the PR, babysit CI, respond to review comments, and repeat. The agent is a fast typist, but you're still the project manager.


Product Overview

Metaswarm provides a full orchestration layer that breaks work into phases, assigns each to a specialist agent, iterates through multiple reviews, and coordinates handoffs through PR creation and shepherding.[2]

The 11-Phase Pipeline

PhaseDescription
1. ResearchResearcher agent explores codebase, finds patterns and dependencies
2. PlanArchitect agent creates implementation plan with tasks
3. Plan ValidationPre-flight checklist: architecture, deps, API contracts, security
4. Design Review GatePM, Architect, Designer, Security, UX, CTO review in parallel (6 agents)
5. DecomposeBreak plan into work units with DoD items, file scopes, dependency graph
6. External Dependency CheckIdentifies required API keys/credentials, prompts user
7. Orchestrated ExecutionPer work unit: Implement → Validate → Adversarial Review → Commit
8. Final ReviewCross-unit integration check, full test suite, coverage enforcement
9. PR CreationCreates PR with structured description and test plan
10. PR ShepherdMonitors CI, handles review comments, resolves threads
11. Close + LearnExtracts learnings back into the knowledge base

The 18 Agent Personas

AgentPhaseRole
Swarm CoordinatorMetaAssigns work to worktrees, manages parallel execution
Issue OrchestratorMetaDecomposes issues into tasks, manages phase handoffs
ResearcherResearchExplores codebase, discovers patterns and dependencies
ArchitectPlanningDesigns implementation plan and service structure
Product ManagerReviewValidates use cases, scope, and user benefit
DesignerReviewReviews API/UX design and consistency
Security DesignReviewThreat modeling, STRIDE analysis, auth review
CTOReviewTDD readiness, codebase alignment, final approval
CoderImplementTDD implementation with coverage enforcement
Code ReviewerReviewCollaborative or adversarial spec compliance
Security AuditorReviewVulnerability scanning, OWASP checks
PR ShepherdDeliveryCI monitoring, comment handling, thread resolution
Knowledge CuratorLearningExtracts learnings, updates knowledge base
Test AutomatorImplementTest generation and coverage enforcement
MetricsSupportAnalytics and weekly reports
SRESupportInfrastructure and performance
Slack CoordinatorSupportNotifications and human communication
Customer ServiceSupportUser support and triage

Key Differentiators

Cross-Model Adversarial Review

A coding agent reviewing its own output has inherent bias. Metaswarm can delegate implementation and review tasks to external AI tools — OpenAI Codex CLI and Google Gemini CLI — with one rule: the writer is always reviewed by a different model.[1]

PatternDescription
Cross-Model ReviewIf Claude writes, Codex or Gemini reviews. If Codex writes, Claude or Gemini reviews.
Availability-Aware EscalationModel A (2 tries) → Model B (2 tries) → Claude (1 try) → user alert
Shell AdaptersEach external tool has health checks, implement, and review commands

Blocking Quality Gates

The hardest problem is getting agents to maintain standards. Metaswarm implements blocking state transitions — there is no instruction path from FAIL to COMMIT.[1]

Three enforcement points, one config file (.coverage-thresholds.json):

GateMechanism
Pre-Push HookHusky git hook runs lint, typecheck, format, coverage before every push
CI Coverage JobGitHub Actions workflow blocks merge on failure
Agent Completion GateTask completion checklist reads enforcement command from config

Orchestrated Execution Loop

For complex tasks with written specs, every work unit runs through a 4-phase loop:

  1. Implement — Coding agent builds against spec using TDD
  2. Validate — Orchestrator runs tsc, eslint, vitest, coverage independently (never asks agent "did tests pass?")
  3. Adversarial Review — Fresh review agent checks each DoD item with file:line evidence. Binary PASS/FAIL.
  4. Commit — Only after adversarial PASS. If human checkpoint defined, system pauses for approval.

On failure: fix, re-validate, spawn a fresh reviewer (never the same one), retry up to 3 times before escalating.

Self-Improving Knowledge Base

Metaswarm maintains a JSONL knowledge base in your repo — patterns, gotchas, architectural decisions, anti-patterns. After every merged PR, the self-reflect workflow analyzes what happened and writes new entries.[1]

Conversation introspection watches for signals:

  • You repeated yourself → candidate for new skill or command
  • You disagreed → captures your preferred approach for future alignment
  • You did something manually → flags as workflow automation candidate

Technical Architecture

Dependencies

ComponentPurpose
Claude CodePrimary AI coding agent
BEADS CLIGit-native, AI-first issue tracking (bd command)
SuperpowersFoundational agentic skills framework
GitHub CLIPR and issue management
Node.js 18+Runtime for npm package

Installation

cd your-project
npx metaswarm init
# In Claude Code: /project:metaswarm-setup

Claude detects language, framework, test runner, linter, CI — then installs and customizes everything automatically. Supports TypeScript, Python, Go, Rust, Java, Ruby, JavaScript.

GTG (Good-To-Go) Integration

Metaswarm integrates with GTG for deterministic merge readiness:[3]

  • All CI checks passing
  • All review comments addressed
  • All discussion threads resolved
  • Required approvals present

Strengths

StrengthDescription
Cross-model reviewEliminates single-model blind spots with writer ≠ reviewer rule
Enforced quality gatesNo path from FAIL to COMMIT — agents cannot bypass coverage
Structured pipeline11 phases with clear handoffs, not ad-hoc prompting
Self-improvingKnowledge base grows from every PR, reduces repeated mistakes
Open sourceFull visibility into agent definitions and workflows
Fresh reviewer ruleSpawns new reviewer on retry to prevent anchoring bias

Weaknesses / Risks

RiskDescription
BEADS dependencyRequires learning another tool (Steve Yegge's BEADS system)
Claude Code specificDesigned for Claude Code; not model-agnostic at the core
Complexity18 agents, 11 phases — significant learning curve
External tool setupCross-model review requires Codex CLI and/or Gemini CLI
New projectLess battle-tested than alternatives like Gastown

Competitive Landscape

ToolApproachAgentsCross-ModelQuality Gates
MetaswarmStructured pipeline18✅ Blocking
GastownParallel execution7 roles
RalphSimple iteration loop1
PythagoraIDE-integrated14

Metaswarm sits between Gastown's raw parallelism and Ralph's simplicity, emphasizing structured workflows with enforced quality over maximum agent count.

For enterprise-grade orchestration with Jira integration, signed commits, and BYOK, evaluate Tembo.


Ideal Customer

Best fit:

  • Teams using Claude Code who want structured, spec-driven development
  • Projects requiring enforced test coverage and quality gates
  • Developers who've been burned by agents that claim "tests pass" when they don't
  • Those wanting cross-model review to catch blind spots

Not ideal for:

  • Users wanting simple autonomous loops (use Ralph)
  • Maximum parallel agents (use Gastown)
  • IDE-integrated experience (use Pythagora)
  • Enterprise air-gapped deployment (use Genie)

Bottom Line

Metaswarm represents a thoughtful approach to multi-agent orchestration: instead of maximizing parallelism, it maximizes trustworthiness. The cross-model adversarial review and blocking quality gates address the core problem that agents self-certify success even when things are broken.

The 18-agent, 11-phase pipeline is complex, but that complexity maps to real development workflow stages. For teams that want structured, spec-driven development with Claude Code, Metaswarm offers a well-designed alternative to ad-hoc prompting.

VerdictAssessment
MaturityEarly but actively maintained
DifferentiationStrong — cross-model review is unique
Best ForQuality-conscious Claude Code teams
Watch ForBEADS learning curve, external tool setup

Research by Ry Walker Research • methodology

Disclosure: Author is CEO of Tembo, which offers agent orchestration as an alternative to Metaswarm.