← Back to research
·9 min read·company

Expect

Expect — lets coding agents test code changes in a real browser. One command scans git diffs, generates a test plan via AI, and executes it against a live browser with Playwright. 3,501 stars, TypeScript, FSL-1.1-MIT.

Key takeaways

  • Bridges the gap between coding agents and browser-based validation — agents can now verify their own UI changes without human intervention
  • 3,501 stars as of June 2026, up from 952 two weeks after launch (Mar 12, 2026). Built by the Million.js team (YC W24, $14.1M raised, 40k+ combined GitHub stars)
  • Cookie extraction from local browsers means tests run with real auth state — no fixture setup, no mock accounts, no manual login flows
  • Change validation, not regression testing — it tests what changed in the diff, not what might have broken elsewhere
  • FSL-1.1-MIT license (converts to MIT after 2 years) restricts competing commercial use — not true open source
  • Repo momentum has cooled — last push May 6, 2026 and last npm release (0.1.3) April 10, 2026, as the company's focus centers on Ami

FAQ

What is Expect?

A CLI that scans your git diff, sends it to an AI agent (Claude Code, Codex, or Cursor), generates a test plan, and executes it in a real browser via Playwright. You review the plan in a TUI, then the agent runs it.

How does Expect handle authentication for browser tests?

It extracts cookies from your local browser profiles (Chrome, Firefox, Safari) and injects them into Playwright. Tests run with your real login sessions, eliminating manual auth setup.

Is Expect open source?

It uses the Functional Source License (FSL-1.1-MIT), which prohibits competing commercial use but converts to MIT after 2 years. It is source-available but not OSI-approved open source.

How does Expect compare to Vercel agent-browser?

agent-browser is a general-purpose browser automation CLI for agents. Expect is specifically focused on testing code changes — it reads your git diff and generates a targeted test plan. Different scope, some overlapping infrastructure.

Does Expect replace a regression test suite or a QA agent?

No. Expect validates the changes in a PR or diff with ephemeral, AI-generated test plans. It does not maintain a persistent suite or crawl the whole app for side effects, which is where AI QA agents like Momentic, QA Wolf, Ranger, Bug0, and Passmark position themselves.

Overview

Expect is a CLI tool from Million Software that lets coding agents (Claude Code, Codex CLI, Cursor) automatically test code changes in a real browser. The workflow is straightforward: run expect in your terminal, it scans your unstaged changes or branch diff, an AI agent generates a test plan, you review it in an interactive TUI, then the agent executes each step against a live Playwright browser instance. Session recordings capture everything for replay.

Key stats (as of June 2026): 3,501 stars, 155 forks, TypeScript, FSL-1.1-MIT license. Created March 12, 2026; last push to the GitHub repo was May 6, 2026, and the latest npm release (expect-cli 0.1.3) shipped April 10, 2026 — early-launch velocity has cooled.

 Scan changes ──▶ Generate plan ──▶ Run in browser ──▶ Report
 (git diff)       (AI agent)        (Playwright)       (pass/fail)

Architecture

Expect is a pnpm monorepo with a clean separation of concerns:

  • expect-cli — Ink-based terminal UI (React for the terminal). Stateless renderer.
  • @expect/supervisor — Core orchestration. Owns all state management, agent lifecycle, and git operations.
  • @expect/agent — AI SDK providers wrapping Claude Code and Codex CLI as LanguageModelV3 implementations. Both work with Vercel AI SDK's generateText and streamText.
  • @expect/browser — Playwright automation with accessibility snapshots, ref-based interaction, and rrweb session recording.
  • @expect/cookies — Extracts cookies from local browser profile databases (Chrome, Firefox, Safari) for real auth state injection.
  • @expect/shared — Domain models and constants.

The tech stack is notably opinionated: Effect-TS throughout the backend (not just for error handling — full service architecture with layers, scoped resources, and structured concurrency), React + Ink for the TUI, and Playwright for browser automation.


The most interesting technical detail is the cookie extraction layer. Instead of requiring test accounts or mock auth, Expect reads your actual browser's cookie databases and injects them into the Playwright context. This means when the agent navigates to your staging URL or localhost, it's already logged in as you.

This is a pragmatic shortcut that eliminates one of the biggest friction points in browser testing: auth setup. It's also the kind of thing that only makes sense in a dev-local context — you'd never want this in CI with shared credentials.

For CI, the tool supports headless mode with -y (skip plan review), exiting 0 on success and 1 on failure.


Company and Team

Million Software, Inc. (YC W24) — founded by Aiden Bai, who created Million.js at age 16 and later React Scan. Combined 40k+ GitHub stars across those projects, used by Airbnb, Robinhood, Perplexity, and Shopify in production.

The company has raised $14.1M across 2 rounds and has pivoted through several products:

  1. Million.js — React performance optimizer (the original product)
  2. React Scan — Browser devtool for finding slow React renders
  3. Same.new — AI-powered full-stack app builder (no-code)
  4. Ami — Desktop coding agent (current main product, per million.dev)
  5. Expect — Agent-driven browser testing (this tool)

Notable angel investors include Scott Wu (Cognition/Devin CEO), Amjad Masad (Replit CEO), Evan You (Vue.js creator), David Cramer (Sentry CEO), Paul Klein IV (Browserbase CEO), and others.

The LinkedIn now says "Ami — building the post-IDE." Expect appears to be either a standalone product or a component of the broader Ami vision.


Competitive Landscape

The agent-driven browser testing space is heating up:

ToolApproachBacking
ExpectGit-diff-aware test plan generation + executionMillion Software (YC W24, $14.1M)
Vercel agent-browserGeneral browser automation CLI for agentsVercel Labs
Stagehand (Browserbase)AI browser automation framework with act/extract/observeBrowserbase
Browser UsePython framework for agent browser automationOpen source (50k+ stars)
QA WolfManaged E2E testing service with AI maintenance$56M+ raised
Playwright MCPMicrosoft's official MCP server for PlaywrightMicrosoft

Expect's differentiator is the git-diff-to-test-plan pipeline. Other tools give agents browser access; Expect specifically answers "did my code changes break anything?" by reading the diff and generating targeted validation steps. This is a narrower but more opinionated scope.

Stagehand and Browser Use are more general frameworks for building agent-browser interactions. Vercel agent-browser is the closest competitor in the "CLI for coding agents" space but lacks the diff-aware test planning.

Against the emerging AI QA agent category (Momentic, QA Wolf, Ranger, Bug0, Passmark), Expect sits at the opposite end of the testing lifecycle. Bug0's comparison is direct about the boundary: "Expect is designed for change validation, not regression testing. It tests what changed, not what might have broken elsewhere." Bug0 frames Expect and Passmark as "bookends of the testing lifecycle" — Expect validates a PR's diff inside Claude Code or Cursor with zero-config, ephemeral test plans that never land in your codebase, while QA agents maintain persistent suites and hunt for unintended side effects across the whole app.


Licensing

FSL-1.1-MIT — the Functional Source License. This is not OSI-approved open source. It prohibits using the software in competing commercial products or services. After 2 years, it automatically converts to MIT.

This is a deliberate choice: source-available for transparency and developer trust, but protected from direct competitors. Same approach used by Sentry, GitButler, and others. For individual developers and internal use, it's effectively permissive. For anyone building a competing testing product, it's off-limits until 2028.


Strengths

  • Solves a real problem. Coding agents generate code but can't verify UI changes. Expect closes that loop.
  • Strong founder pedigree. Aiden Bai has shipped multiple viral open-source projects. The team knows how to build developer tools that get adopted.
  • Smart architecture. Effect-TS for reliability, clean monorepo structure, proper separation between orchestration and rendering.
  • Cookie extraction is clever. Eliminates the auth setup problem that plagues browser testing.
  • Fast traction. 952 stars in 14 days, 3,501 by June 2026 — strong developer interest.

Weaknesses

  • Still early, and momentum has slowed. Launched March 2026; the repo's last push was May 6, 2026 and the last npm release was April 10, 2026. 22 open issues, including documented process leaks on macOS, silent stalls, and inconsistent cookie injection during auth flows.
  • Change validation only. It tests the diff, not the app — no persistent suite, no regression coverage.
  • Narrow agent support. Claude Code, Codex CLI, and Cursor. No Gemini, no open-source models.
  • FSL license may limit adoption. Developers and companies increasingly care about true open-source licensing; Bug0 calls FSL-1.1-MIT "more restrictive than MIT or Apache-2.0."
  • Company has pivoted several times. Million.js to React Scan to Same.new to Ami to Expect — is this the one that sticks? Million's flagship is Ami, and Expect's commit cadence suggests it's a side product.
  • Cookie approach has limits. Only works locally with personal browser profiles. CI environments need a different auth strategy.

What Developers Say

Expect launched on Product Hunt (109 upvotes, #8 product of the day) with a mix of enthusiasm and practical skepticism:

  • "One command to generate and run a full test plan. that's a massive time saver for solo devs shipping fast." — One9Founders
  • "This looks promising for catching bugs early! However, I'm curious how Expect handles complex user interactions or dynamic content." — Trydoff
  • "Agent-driven browser testing that actually runs in a real browser rather than a headless mock is interesting." — Mir Mubashshir, who also asked how Expect handles bot detection and CAPTCHA flows

Founder Aiden Bai's framing of the problem resonated more broadly than the tool itself: "Agents are bottlenecked because it needs YOU to test its changes." Notably, there is no substantive Hacker News or Reddit discussion of Expect as of June 2026 — community signal is concentrated on X and Product Hunt.


Relevance to Tembo

Expect validates a thesis that's central to Tembo's work: coding agents need to close the loop between generating code and verifying it works. The git-diff-aware approach is particularly interesting — instead of testing everything, test what changed. This is the kind of targeted validation that makes agent-driven development practical at scale.

The Effect-TS architecture is also worth noting as an emerging pattern in serious agent tooling — structured concurrency, typed errors, and service layers over ad-hoc async/await.


Bottom Line

Expect is the most opinionated take on agent self-verification: free, local, diff-scoped, and wired directly into Claude Code, Codex, and Cursor. Three months in, the idea has clearly landed (3.5k stars) but the project itself shows signs of back-burner status — a month-plus without a push and a months-old npm release at a company whose flagship is Ami.

Recommended for: solo devs and small teams using Claude Code/Cursor who want zero-config PR/diff validation in a real browser, with real auth state, before merging.

Not recommended for: teams that need regression coverage, persistent test suites, CI-grade auth, or a vendor with a support SLA — that's AI QA agent territory (Momentic, QA Wolf, Ranger, Bug0, Passmark).

Outlook: The diff-to-test-plan pattern will outlive the tool either way — expect coding agent platforms to absorb this capability natively. Whether Expect itself thrives depends on whether Million keeps investing in it alongside Ami; the May–June 2026 commit gap is the metric to watch.


Research by Ry Walker Research. Last verified June 11, 2026.