← Back to research
·10 min read·company

StrongDM Software Factory

StrongDM's radical approach to AI coding: no human-written code, no human review — code treated as opaque weights validated purely by behavior.

Key takeaways

  • Code must not be written by humans; code must not be reviewed by humans
  • Digital Twin Universe provides behavioral clones of third-party APIs for testing at scale
  • Target: $1,000/day in tokens per engineer as a productivity benchmark

FAQ

What is StrongDM Software Factory?

A non-interactive development system where specs and scenarios drive coding agents that write, test, and converge code without any human review or intervention.

What is the Digital Twin Universe?

Behavioral clones of third-party services (Okta, Jira, Slack, Google Docs) that enable testing at volumes exceeding production limits without rate limits or API costs.

How does StrongDM validate code without human review?

Through 'satisfaction testing' — probabilistic validation where LLMs judge whether observed trajectories through scenarios satisfy user expectations, similar to ML holdout sets.

Executive Summary

StrongDM's Software Factory represents the most radical publicly documented approach to AI coding agents. While Stripe requires human review, StrongDM has eliminated it entirely. Their charter: "Code must not be written by humans. Code must not be reviewed by humans." Code is treated as opaque weights — correctness is inferred from behavior, not inspection. A three-person team built the system in just three months.

Two major developments since initial publication: StrongDM open-sourced Attractor (the factory's non-interactive coding agent, published as a natural-language spec, Apache-2.0, ~1,200 GitHub stars as of June 2026) and CXDB (its context store for agents). And on March 5, 2026, Delinea completed its acquisition of StrongDM (announced January 15, 2026; terms undisclosed), putting the factory's long-term independence under new ownership.

AttributeValue
CompanyStrongDM (acquired by Delinea, completed March 5, 2026)
Team FormedJuly 14, 2025
Team Size3 engineers (as of Feb 2026 documentation)
Public DocumentationFebruary 2026
Open-Source ReleasesAttractor (nlspec), CXDB (Feb 2026)
HeadquartersNot disclosed

Product Overview

The Software Factory is a non-interactive development system where specifications and scenarios drive agents that write code, run validation harnesses, and converge toward working software without human intervention. The catalyst was Anthropic's Claude 3.5 October 2024 revision, which enabled "compounding correctness" in long-horizon agentic workflows — a shift from previous models that would accumulate errors over time.

Key Capabilities

CapabilityDescription
Non-interactive developmentCode written and tested without human involvement
Digital Twin Universe (DTU)Behavioral clones of 6+ third-party services
Satisfaction testingProbabilistic LLM-judged validation against scenarios
Scenario holdoutsTest cases stored outside codebase like ML holdout sets

Philosophy

"Prior to this model improvement, iterative application of LLMs to coding tasks would accumulate errors of all imaginable varieties. The app or product would decay and ultimately 'collapse': death by a thousand cuts."

The October 2024 Claude 3.5 revision changed this equation — models began compounding correctness rather than error.


Technical Architecture

The system operates on three core principles, stated as koans:

  1. Why am I doing this? (implied: the model should be doing this instead)
  2. Code must not be written by humans
  3. Code must not be reviewed by humans
  4. If you haven't spent at least $1,000 on tokens today per human engineer, your software factory has room for improvement

Validation Loop

Seed (PRD, sentences, screenshot, existing code)
    ↓
Agent writes code (Cursor YOLO mode initially)
    ↓
Validation harness runs scenarios against DTU
    ↓
LLM-as-judge evaluates "satisfaction"
    ↓
Feedback fed back for self-correction
    ↓
Convergence (no human review)

Digital Twin Universe

The DTU provides behavioral clones of third-party services the software depends on:

ServicePurpose
OktaIdentity/authentication testing
JiraIssue tracking integration
SlackMessaging integration
Google DocsDocument collaboration
Google DriveFile storage
Google SheetsSpreadsheet operations

Why DTU matters: Testing against real APIs has limits — rate limits, API costs, abuse detection. DTU enables testing at volumes far exceeding production, testing failure modes that would be dangerous against live services, and running thousands of scenarios per hour.

Key Technical Details

AspectDetail
Development StyleNon-interactive ("grown software")
Initial FoundationCursor YOLO mode
ValidationLLM-as-judge satisfaction testing
Test StorageScenarios stored outside codebase (holdout sets)
Target Spend$1,000/day/engineer in tokens

Factory Outputs (as of June 2026)

The factory's products page lists three artifacts, two of which are public on GitHub:

ProductDescriptionStatus
AttractorNon-interactive coding agent structured as a graph of phases; published as a natural-language spec ("nlspec") that any modern coding agent can implementOpen source, Apache-2.0, ~1,200 stars (June 2026)
CXDBSelf-hosted context store for AI agents — branch-friendly conversation/tool-output storage with content-addressed deduplicationOpen source (github.com/strongdm/cxdb)
StrongDM IDIdentity for humans, workloads, and AI agents with federated authListed on products page

Notably, Attractor's repo contains a spec rather than an implementation — consistent with the factory's philosophy that code is disposable output. Community reimplementations have already appeared (e.g., Go ports built by pointing Claude Code at the spec).


Strengths

  • No review bottleneck — Code ships without human inspection, eliminating the slowest step in most workflows
  • Infinite testing scale — DTU enables volume testing impossible against real APIs
  • ML-inspired validation — Scenarios act as holdout sets, preventing reward hacking that plagues traditional tests
  • Third-party API coverage — Behavioral clones handle integration complexity
  • Clear success metric — "$1,000/day in tokens per engineer" is concrete and measurable

Cautions

  • Requires validation investment — Building DTU took significant engineering effort; not every domain has clear third-party APIs to clone
  • Domain-specific fit — Works well for integration-heavy software (StrongDM's access management domain); unclear for other domains
  • Opaque code — Teams must accept not reading or understanding generated code — a cultural shift many organizations may resist
  • Novel approach — Less battle-tested than human-reviewed workflows; potential failure modes not yet discovered
  • Small team documented — 3-person AI team built the system; long-term maintenance and scalability at larger organizations unclear
  • Not for sale — This is internal methodology, not a product

Competitive Positioning

vs. Other In-House Agents

SystemDifferentiation
Stripe MinionsMinions require human review; Factory eliminates it
Ramp InspectInspect uses traditional CI; Factory uses DTU + satisfaction
Traditional CICI tests can be reward-hacked; scenarios are holdouts

Philosophical Spectrum

StrongDM occupies the radical end of the human-review spectrum:

ApproachHuman ReviewExample
ConservativeRequiredStripe, Coinbase, Ramp
ModerateOptionalSome internal systems
RadicalEliminatedStrongDM

What Developers Say

The factory essay hit the Hacker News front page in February 2026 (304 points, 459 comments). Reaction was sharply divided.

Positive:

"They're the most ambitious team I've see [sic] exploring the limits of what you can do with this stuff. It's eye-opening." — simonw (Simon Willison), Hacker News

"This is one of the clearest takes I've seen that starts to get me to the point of possibly being able to trust code that I haven't reviewed." — japhyr, Hacker News, on scenario holdouts

"The Digital Twin Universe is the most interesting thing in this article and the part most people are glossing over... Their answer of keeping scenarios external to the codebase like a holdout set is smart." — Zakodiac, Hacker News

Skeptical:

"I can't tell if this is genius or terrifying given what their software does. Probably a bit of both. I wonder what the security teams at companies that use StrongDM will think about this." — CubsFan1060, Hacker News

"as a previous strongDM customer, i will never recommend their offering again. for a core security product, this is not the flex they think it is" — rileymichael, Hacker News

"At that point, outside of FAANG and their salaries, you are spending more on AI than you are on your humans." — codingdave, Hacker News, on the $1,000/day token benchmark

"You still have to have a human who knows the system to validate that the thing that was built matches the intent of the spec." — CuriouslyC, Hacker News

The split is notable: validation-infrastructure ideas (DTU, holdout scenarios) drew genuine technical admiration even from skeptics, while the "no human review" charter — for a security product — drew the harshest criticism.


Ideal Customer Profile

This is internal methodology, not a product for sale. However, the approach is worth studying if:

Good fit for similar approach:

  • Integration-heavy software with clear third-party dependencies
  • Team comfortable with opaque generated code
  • Mature test/scenario infrastructure already exists
  • Strong observability and behavioral monitoring
  • High token budget tolerance

Poor fit:

  • Domain without clear behavioral boundaries
  • Regulatory requirements for code review audit trails
  • Team culture requires understanding code before shipping
  • Limited observability infrastructure

Viability Assessment

FactorAssessment
Documentation QualityGood (detailed website + external coverage)
ReplicabilityDifficult (DTU requires significant investment)
Cultural FitControversial (requires accepting opaque code)
Architecture MaturityEarly (publicly documented Feb 2026; Attractor spec still updated through March 2026)
External ValidationHigh (Simon Willison visit, Stanford Law coverage, HN front page, Ethan Mollick/Garry Tan attention)
Ownership RiskNew — Delinea acquisition completed March 5, 2026

Simon Willison visited the team in October 2025 (three months after formation) and reported they already had working demos of the agent harness, DTU, and satisfaction testing framework. External coverage from Stanford Law and tech media suggests growing interest in the methodology; StrongDM's own February 19, 2026 blog post notes attention from Ethan Mollick, Garry Tan, and Willison.

The Delinea acquisition (announced January 15, 2026 — two weeks before the factory essay went public — and completed March 5, 2026) is the major open question. Delinea positioned the deal around StrongDM's runtime authorization and AI-agent identity capabilities, not the Software Factory; whether the factory methodology continues, scales, or quietly winds down under new ownership is unverifiable as of June 2026. The Attractor spec saw active updates through mid-March 2026.


Bottom Line

StrongDM Software Factory represents the most philosophically radical approach to AI-native development publicly documented. The core insight: if validation infrastructure is strong enough, human code review becomes unnecessary.

Key innovations:

  • Digital Twin Universe for integration testing at scale
  • Satisfaction testing as probabilistic, LLM-judged validation
  • Scenario holdouts preventing reward hacking
  • Attractor open-sourced as a natural-language spec — the methodology is now replicable by anyone with a coding agent

Recommended study for: Organizations exploring the limits of AI coding autonomy, teams building integration-heavy software, infrastructure engineers designing validation systems.

Not recommended for: Regulated industries requiring audit trails, teams uncomfortable with opaque code, organizations without significant observability investment.

Outlook: StrongDM's approach may prove too radical for most enterprises in the near term, but the DTU pattern — behavioral clones for testing — is likely to become standard practice regardless of human-review policies. The open-sourced Attractor spec means the factory pattern now spreads independently of StrongDM itself — important, since the Delinea acquisition (completed March 2026) leaves the in-house team's future direction unconfirmed as of June 2026.


Research by Ry Walker Research • methodology

Disclosure: Author is CEO of Tembo, which offers agent orchestration as an alternative to building in-house.