← Back to research
·7 min read·company

Stripe Minions

Stripe's in-house coding agents produce 1,300+ merged PRs per week — fully unattended, human-reviewed code built on a Goose fork with a ~500-tool MCP server.

Key takeaways

  • 1,300+ pull requests merged per week with zero human-written code (up from 1,000+ at first disclosure)
  • Built on a fork of Block's open-source Goose agent
  • MCP server ('Toolshed') provides nearly 500 internal tools for context
  • 'Blueprints' orchestration primitive mixes deterministic workflow nodes with agent loops

FAQ

What is Stripe Minions?

Stripe's internal coding agent system that writes code end-to-end and ships over 1,300 merged pull requests per week, with human review but no human-written code.

How do Stripe Minions work?

Engineers invoke agents via Slack, CLI, or web. Agents run in isolated pre-warmed 'devboxes' (standardized AWS EC2 instances), access nearly 500 internal tools via the Toolshed MCP server, and produce PR-ready code within 2 CI rounds.

What is Stripe Minions built on?

Minions are built on a fork of Block's open-source Goose coding agent, extended with deep internal tool integrations.

Executive Summary

Stripe Minions is the most detailed public case study of enterprise in-house coding agents. As of February 2026, the system produces over 1,300 merged pull requests per week (up from 1,000+ at first disclosure) — with humans reviewing the code but writing none of it. Built on a fork of Block's open-source Goose agent, Minions integrate deeply with Stripe's existing developer infrastructure through a central MCP server called "Toolshed" with nearly 500 internal tools. Per InfoQ, the code Minions touch supports more than $1 trillion in annual payment volume.

AttributeValue
CompanyStripe
TypeInternal tool (not for sale)
FoundationGoose fork (Block)
Public DocumentationFebruary 2026 (Part 1: Feb 9; Part 2: Feb 19)
HeadquartersSan Francisco, CA

Product Overview

Stripe Minions are fully unattended coding agents designed for "one-shot" tasks — given a ticket, bug report, or specification, they produce a complete pull request without human intervention during the coding process. The key innovation is deep integration with Stripe's existing developer tooling, allowing agents to work with the same context and tools as human engineers.

Key Capabilities

CapabilityDescription
End-to-end codingWrites code from start to finish based on tickets/specs
Multi-surface invocationSlack (primary), CLI, web UI, internal tool integrations
MCP contextNearly 500 tools via central "Toolshed" server (docs, tickets, Sourcegraph)
Parallel executionMultiple minions run simultaneously without git conflicts
BlueprintsOrchestration primitive: deterministic workflow nodes wrapped around agent loops
Shared rule filesCursor-format rules synced across Cursor, Claude Code, and Minions

Product Surfaces

SurfaceDescriptionNotes
SlackTag agent in any thread to invokePrimary interface
CLICommand-line for power usersSecondary
WebVisibility and managementAdministrative
Internal toolsDeep integration with ticketing, feature flagsStripe-specific

Technical Architecture

Minions run on pre-warmed "devboxes" — standardized AWS EC2 instances that provision in 10 seconds. These environments are identical to what human engineers use, but isolated from production and the internet for security.

Architecture Flow

Invocation (Slack/CLI/Web)
    ↓
Devbox (isolated EC2 sandbox, 10s spin-up)
    ↓
MCP Server (Toolshed: ~500 tools)
    ↓
Agent Loop (Goose fork)
    ↓
Local lint (under 5s) → CI (max 2 rounds) → Auto-fix
    ↓
Pull Request (human review required)

Key Technical Details

AspectDetail
FoundationFork of Block's Goose agent, tailored for unattended runs
ExecutionPre-warmed devboxes — standardized AWS EC2 (10-second spin-up)
ContextToolshed MCP server with nearly 500 tools
Orchestration"Blueprints" — deterministic workflow nodes combined with agent loops
CI StrategyLocal lint (under 5s), max 2 CI rounds, auto-apply fixes
RulesCursor rule format synced across Cursor, Claude Code, and Minions
IsolationNo production access, no internet

Strengths

  • Proven scale — 1,300+ PRs merged weekly (as of February 2026), validated in production at one of the world's most demanding codebases
  • Deep integration — MCP server provides agents the same context human engineers have (docs, tickets, build status, Sourcegraph search)
  • Fast iteration — 10-second devbox spin-up enables rapid feedback loops
  • Leverages existing infra — Uses same environments as human engineers, reducing agent-specific edge cases
  • Parallelization — Engineers run multiple Minions simultaneously without git worktree conflicts

Cautions

  • Requires massive codebase investment — Stripe has hundreds of millions of lines of code and years of devex tooling; this isn't replicable overnight
  • Human review bottleneck — Agents write code but don't merge; humans must still review everything
  • Stack-specific patterns — Ruby/Sorbet environment; some patterns may not transfer to other stacks
  • Dedicated team required — a dedicated internal team maintains the system full-time (team composition not publicly disclosed as of February 2026)
  • Not for sale — This is internal tooling, not a product you can buy

What Developers Say

Practitioner reaction on the Hacker News thread for Part 2 (131 points) centers on review quality and the lack of concrete examples:

"I am wondering if a lot of the human review is substantive or rubber stamping - as we see with long Pull Requests from humans. I know I would half-ass a review of a PR containing lots of robot code." — 3rodents

"Code reviews are also an educational moment for seniors teaching juniors as well as an opportunity for people who know a system to point out otherwise undocumented constraints of the system. If people slack on reviews with the agent it means these other externalities suffer." — fnord123

"The glass-half-full here is it's an incredible signal that one of the largest financial gateways in the world is able to do this with current capabilities. Personally, this is exciting." — trevorhinesley

"Where is the detail? Examples? Something concrete? ... Lots of generic statements everyone knows." — gas9S9zw3P9c

The recurring skeptical theme: Stripe has not published example PRs, so outsiders cannot judge whether the 1,300/week figure represents substantive changes or low-stakes mechanical work. No Stripe engineers responded in-thread as of June 2026.


Competitive Positioning

vs. Other In-House Agents

SystemDifferentiation
StrongDM FactoryMinions require human review; StrongDM eliminates it entirely
Ramp InspectSimilar architecture; Minions have deeper MCP integration
Coinbase ClaudebotMinions built on Goose fork; Coinbase appears Anthropic-native

When to Reference Stripe Minions

  • Reference when: Building in-house agents at 1,000+ engineer scale
  • Consider alternatives when: Team under 100 engineers, no existing devex investment
  • Buy instead when: Need agents in weeks, not quarters (consider Tembo, Claude Code, Codex)

Ideal Customer Profile

This is internal tooling, not a product for sale. However, the architecture is worth studying if you are:

Good fit for similar build:

  • Engineering organization with 1,000+ engineers
  • Existing devex team (3+ engineers) that can be redirected
  • Codebase exceeding 10M LOC with proprietary frameworks
  • Mature internal tooling (ticketing, CI, monitoring)

Poor fit:

  • Teams under 100 engineers (buy instead)
  • Standard tech stack without proprietary frameworks
  • No existing devex investment
  • Need results in weeks, not quarters

Viability Assessment

FactorAssessment
Documentation QualityExcellent (detailed public blog post)
ReplicabilityDifficult (requires massive devex investment)
Open Source FoundationYes (Goose is open source)
Architecture MaturityHigh (production-validated at scale)
TransferabilityMedium (principles transfer; implementation is Stripe-specific)

Stripe's detailed documentation makes Minions the reference architecture for enterprise in-house coding agents. While the specific implementation requires Stripe-level investment, the patterns — MCP context, isolated sandboxes, CI integration, Slack invocation — are universally applicable.


Bottom Line

Stripe Minions represent the gold standard for in-house coding agents at elite engineering organizations. The key insight: agents need the same context and tools as human engineers, not a bolted-on integration.

Key metrics: 1,300+ PRs/week (as of February 2026), 10-second devbox spin-up, nearly 500 MCP tools, max 2 CI rounds.

Architecture pattern: Slack invocation → isolated sandbox → MCP context → CI loop → human review → merge.

Recommended study for: Engineering leaders evaluating build vs. buy for coding agents, devex teams designing agent infrastructure.

Not recommended for: Small teams, organizations without existing devex investment, anyone expecting a product they can purchase.

Outlook: Stripe has defined the reference architecture and is iterating publicly — Part 2 (February 19, 2026) added Blueprints orchestration and cross-tool rule syncing. Expect commercial vendors (Tembo, Devin, etc.) to offer "Stripe Minions in a box" for organizations that can't build their own.


Research by Ry Walker Research • methodology

Disclosure: Author is CEO of Tembo, which offers agent orchestration as an alternative to building in-house.