← Back to research
·6 min read·company

Expect

Expect — lets coding agents test code changes in a real browser. One command scans git diffs, generates a test plan via AI, and executes it against a live browser with Playwright. 952 stars, TypeScript, FSL-1.1-MIT.

Key takeaways

  • Bridges the gap between coding agents and browser-based validation — agents can now verify their own UI changes without human intervention
  • 952 stars in 2 weeks since launch (Mar 12, 2026). Built by the Million.js team (YC W24, $14.1M raised, 40k+ combined GitHub stars)
  • Cookie extraction from local browsers means tests run with real auth state — no fixture setup, no mock accounts, no manual login flows
  • FSL-1.1-MIT license (converts to MIT after 2 years) restricts competing commercial use — not true open source

FAQ

What is Expect?

A CLI that scans your git diff, sends it to an AI agent (Claude Code or Codex), generates a test plan, and executes it in a real browser via Playwright. You review the plan in a TUI, then the agent runs it.

How does Expect handle authentication for browser tests?

It extracts cookies from your local browser profiles (Chrome, Firefox, Safari) and injects them into Playwright. Tests run with your real login sessions, eliminating manual auth setup.

Is Expect open source?

It uses the Functional Source License (FSL-1.1-MIT), which prohibits competing commercial use but converts to MIT after 2 years. It is source-available but not OSI-approved open source.

How does Expect compare to Vercel agent-browser?

agent-browser is a general-purpose browser automation CLI for agents. Expect is specifically focused on testing code changes — it reads your git diff and generates a targeted test plan. Different scope, some overlapping infrastructure.

Overview

Expect is a CLI tool from Million Software that lets coding agents (Claude Code, Codex CLI, Cursor) automatically test code changes in a real browser. The workflow is straightforward: run expect in your terminal, it scans your unstaged changes or branch diff, an AI agent generates a test plan, you review it in an interactive TUI, then the agent executes each step against a live Playwright browser instance. Session recordings capture everything for replay.

Key stats: 952 stars, TypeScript, FSL-1.1-MIT license. Created March 12, 2026 — just 2 weeks old with active daily commits.

 Scan changes ──▶ Generate plan ──▶ Run in browser ──▶ Report
 (git diff)       (AI agent)        (Playwright)       (pass/fail)

Architecture

Expect is a pnpm monorepo with a clean separation of concerns:

  • expect-cli — Ink-based terminal UI (React for the terminal). Stateless renderer.
  • @expect/supervisor — Core orchestration. Owns all state management, agent lifecycle, and git operations.
  • @expect/agent — AI SDK providers wrapping Claude Code and Codex CLI as LanguageModelV3 implementations. Both work with Vercel AI SDK's generateText and streamText.
  • @expect/browser — Playwright automation with accessibility snapshots, ref-based interaction, and rrweb session recording.
  • @expect/cookies — Extracts cookies from local browser profile databases (Chrome, Firefox, Safari) for real auth state injection.
  • @expect/shared — Domain models and constants.

The tech stack is notably opinionated: Effect-TS throughout the backend (not just for error handling — full service architecture with layers, scoped resources, and structured concurrency), React + Ink for the TUI, and Playwright for browser automation.


The most interesting technical detail is the cookie extraction layer. Instead of requiring test accounts or mock auth, Expect reads your actual browser's cookie databases and injects them into the Playwright context. This means when the agent navigates to your staging URL or localhost, it's already logged in as you.

This is a pragmatic shortcut that eliminates one of the biggest friction points in browser testing: auth setup. It's also the kind of thing that only makes sense in a dev-local context — you'd never want this in CI with shared credentials.

For CI, the tool supports headless mode with -y (skip plan review), exiting 0 on success and 1 on failure.


Company and Team

Million Software, Inc. (YC W24) — founded by Aiden Bai, who created Million.js at age 16 and later React Scan. Combined 40k+ GitHub stars across those projects, used by Airbnb, Robinhood, Perplexity, and Shopify in production.

The company has raised $14.1M across 2 rounds and has pivoted through several products:

  1. Million.js — React performance optimizer (the original product)
  2. React Scan — Browser devtool for finding slow React renders
  3. Same.new — AI-powered full-stack app builder (no-code)
  4. Ami — Desktop coding agent (current main product, per million.dev)
  5. Expect — Agent-driven browser testing (this tool)

Notable angel investors include Scott Wu (Cognition/Devin CEO), Amjad Masad (Replit CEO), Evan You (Vue.js creator), David Cramer (Sentry CEO), Paul Klein IV (Browserbase CEO), and others.

The LinkedIn now says "Ami — building the post-IDE." Expect appears to be either a standalone product or a component of the broader Ami vision.


Competitive Landscape

The agent-driven browser testing space is heating up:

ToolApproachBacking
ExpectGit-diff-aware test plan generation + executionMillion Software (YC W24, $14.1M)
Vercel agent-browserGeneral browser automation CLI for agentsVercel Labs
Stagehand (Browserbase)AI browser automation framework with act/extract/observeBrowserbase
Browser UsePython framework for agent browser automationOpen source (50k+ stars)
QA WolfManaged E2E testing service with AI maintenance$56M+ raised
Playwright MCPMicrosoft's official MCP server for PlaywrightMicrosoft

Expect's differentiator is the git-diff-to-test-plan pipeline. Other tools give agents browser access; Expect specifically answers "did my code changes break anything?" by reading the diff and generating targeted validation steps. This is a narrower but more opinionated scope.

Stagehand and Browser Use are more general frameworks for building agent-browser interactions. Vercel agent-browser is the closest competitor in the "CLI for coding agents" space but lacks the diff-aware test planning.


Licensing

FSL-1.1-MIT — the Functional Source License. This is not OSI-approved open source. It prohibits using the software in competing commercial products or services. After 2 years, it automatically converts to MIT.

This is a deliberate choice: source-available for transparency and developer trust, but protected from direct competitors. Same approach used by Sentry, GitButler, and others. For individual developers and internal use, it's effectively permissive. For anyone building a competing testing product, it's off-limits until 2028.


Strengths

  • Solves a real problem. Coding agents generate code but can't verify UI changes. Expect closes that loop.
  • Strong founder pedigree. Aiden Bai has shipped multiple viral open-source projects. The team knows how to build developer tools that get adopted.
  • Smart architecture. Effect-TS for reliability, clean monorepo structure, proper separation between orchestration and rendering.
  • Cookie extraction is clever. Eliminates the auth setup problem that plagues browser testing.
  • Fast traction. 952 stars in 14 days suggests strong developer interest.

Weaknesses

  • Very early. 2 weeks old. Only 9 open issues. Limited real-world battle-testing.
  • Narrow agent support. Currently only Claude Code and Codex CLI. No Gemini, no open-source models.
  • FSL license may limit adoption. Developers and companies increasingly care about true open-source licensing.
  • Company has pivoted several times. Million.js to React Scan to Same.new to Ami to Expect — is this the one that sticks?
  • Cookie approach has limits. Only works locally with personal browser profiles. CI environments need a different auth strategy.

Relevance to Tembo

Expect validates a thesis that's central to Tembo's work: coding agents need to close the loop between generating code and verifying it works. The git-diff-aware approach is particularly interesting — instead of testing everything, test what changed. This is the kind of targeted validation that makes agent-driven development practical at scale.

The Effect-TS architecture is also worth noting as an emerging pattern in serious agent tooling — structured concurrency, typed errors, and service layers over ad-hoc async/await.


Research by Ry Walker Research