Spur | Ry Walker Research

Key takeaways

$4.5M seed led by First Round Capital in April 2025, with Pear VC, Neo, Conviction, and Liquid2 participating, off 30+ enterprise customers landed in under a year out of YC Summer 2024
Started as a horizontal "AI QA engineer" and verticalized into e-commerce QA — five purpose-built agents (functional, exploratory, UI/UX, localization, AI-feature testing) tuned for storefront realities like pop-ups, stock changes, and promotions
A 2026 MCP launch inverts the interface: ChatGPT, Claude, and Copilot can write, run, and analyze Spur tests, making the QA agent a tool other agents call
No public pricing and no independent community footprint — every impressive number (20X faster releases, 80% fewer false positives) is vendor-stated

FAQ

What is Spur?

Spur is an AI QA platform whose browser agents plan, execute, and report tests described in plain English across web and native mobile, with a product focus on e-commerce release testing.

How much does Spur cost?

Pricing is not publicly listed; engagements are quote-based and begin with a pilot program.

Who founded Spur?

Sneha Sivakumar (CEO, previously growth engineering at Figma and Snap) and Anushka Nijhawan (CTO, previously at DeepMind and Meta), Yale classmates who researched web agents in an NLP lab before founding the company.

How is Spur different from Momentic?

Both replace scripted browser tests with natural-language AI testing, but Momentic sells horizontally to engineering teams with public pricing and a CLI-first workflow, while Spur verticalized into e-commerce with merchandising-aware agents and quote-based enterprise deals.

Executive Summary

Spur sells an AI QA engineer: multimodal browser agents that take test intent in plain English — "add to cart," "apply a promo code" — then plan, execute, and report the tests across web and native mobile, adapting to the UI changes that break scripted Selenium and Playwright suites.^[1]^[2] The company launched out of Y Combinator's Summer 2024 batch as a horizontal natural-language testing tool and has since verticalized: the current tagline is "Release Faster with Agentic QA" for e-commerce, with five purpose-built agents (functional, exploratory, UI/UX, localization, and AI-feature testing) tuned for storefront-specific variables like pop-ups, stock status, and promotions.^[2]

The bet attracted real backing fast. Spur closed a $4.5M round led by First Round Capital in April 2025 — with Pear VC, Neo, Conviction, Liquid2 Ventures, and angels from Figma, OpenAI, Rippling, and Dropbox participating — on the strength of 30+ enterprise customers, including Fortune 500 companies, acquired in under a year.^[3] The customer wall now includes Abercrombie & Fitch, HelloFresh, Alo Yoga, Vuori, Living Spaces, Eight Sleep, Factor, Wander, and Uncommon Goods.^[2] The 2026 story is the MCP launch: external AI agents can now write, run, and analyze Spur tests, repositioning the product from a QA tool humans drive to a testing capability other agents call.^[4]

Attribute	Value
Company	Spur (New York City)^[1]
Founders	Sneha Sivakumar (CEO; ex-Figma, Snap), Anushka Nijhawan (CTO; ex-DeepMind, Meta)^[1]
Founded	2024; YC Summer 2024^[1]
Funding	$4.5M (April 2025) led by First Round Capital; Pear VC, Neo, Conviction, Liquid2, Predictive Venture Partners^[3]
Team Size	8 (per YC profile)^[1]
Named Customers	Abercrombie & Fitch, HelloFresh, Alo Yoga, Vuori, Living Spaces, Eight Sleep, Factor, Wander, Norse Atlantic Airways^[2]^[3]
Open Source	No

Product Overview

The core loop: describe the flow to cover in natural language, and Spur's agent simulates a real user executing it — clicking, filling, navigating — then reports pass/fail with step-by-step evidence.^[1]^[5] Because the agent interprets the page like a user rather than binding to selectors, tests are pitched as surviving UI changes and real-world storefront noise — promotional pop-ups, out-of-stock states, A/B variants — that produce false positives in scripted suites.^[2]^[1]

The e-commerce verticalization is the strategic move. Rather than competing for every engineering team's test suite, Spur packages agents around what online brands actually break: checkout and core purchase flows, localization across markets, UI/UX regressions, and — notably — testing of the brands' own AI features.^[2] The vendor claims 95% of brands automate their core flows within the first month, 20X faster release times, 80% fewer false positives, and 5X more experiments per release; all of these are homepage numbers without independent verification.^[2]

Key Capabilities

Capability	Description
Natural-language tests	Plain-English test authoring; no code required^[1]
Five agent types	Functional, exploratory, UI/UX, localization, AI-feature testing^[2]
Dynamic adaptation	Handles pop-ups, stock status, promotions mid-run^[2]
Web + native mobile	Parallel execution across both surfaces^[2]
CI/CD integration	GitHub Actions support for release gating^[2]
MCP access	External AI agents (ChatGPT, Claude, Copilot) write, run, and analyze tests^[4]
Reporting	Step-by-step run reports; Scenario Tables, Test Plans, reporting integrations added in the 2025 cycle^[5]^[4]

Product Surfaces

Surface	Description	Availability
Web app	Test authoring, preview editor, run management	GA^[6]
CLI / CI	Pipeline-triggered runs	GA^[6]^[2]
MCP server	Agent-driven test authoring and execution	Launched 2026^[4]

Technical Architecture

Spur is a managed cloud service built on multimodal browser agents that perceive and act on pages the way users do, a lineage that traces to the founders' NLP-lab research on web agents at Yale.^[1] The platform runs thousands of agent sessions in parallel to emulate user load within minutes.^[5] The underlying model stack is not publicly disclosed, and the public docs describe workflow surfaces (preview editor, run modal, prompt library) rather than architecture.^[6]

Key Technical Details

Aspect	Detail
Deployment	Managed cloud only; no self-hosting offered^[2]
Model(s)	Not disclosed^[6]
Integrations	GitHub Actions CI/CD; MCP for external agents; reporting integrations^[2]^[4]
Open Source	No

Strengths

Verticalization is a real strategy, not a retreat — e-commerce releases break in domain-specific ways (promos, inventory, localization), and purpose-built agents for those failure modes are harder for horizontal QA tools to match.^[2]
Enterprise logos unusual for an 8-person seed company — Abercrombie & Fitch, HelloFresh, Alo Yoga, and Living Spaces are named customers, and the raise cited 30+ enterprise accounts including Fortune 500s within the first year.^[2]^[3]^[1]
Top-tier seed syndicate — First Round led, with Pear, Neo, Conviction, and angels from Figma, OpenAI, Rippling, and Dropbox; Liz Wessel cited "tangible ROI" as the thesis.^[3]
MCP launch is the right 2026 move — exposing test authoring and execution to ChatGPT, Claude, and Copilot positions Spur inside agentic dev workflows rather than competing against them.^[4]
Founder-market fit on the agent side — both founders did web-agent NLP research before the agent wave, and the CTO came from DeepMind and Meta.^[1]

Cautions

Every headline number is vendor-stated — 20X faster releases, 80% fewer false positives, 95% first-month automation, and up-to-50X productivity claims all live on Spur's own homepage with no independent benchmark or audit.^[2]
No public pricing — engagements are quote-based behind a demo and a pilot program, which adds procurement friction and prevents cost comparison against list-priced rivals.^[5]^[4]
Zero independent community footprint — no Hacker News stories and no substantive Reddit discussion surfaced as of June 2026; the public voice is entirely vendor-curated.^[7]
Small team, modest capital, crowded category — 8 people and $4.5M against a field that includes QA Wolf, Momentic, Ranger, and the incumbent automation vendors all racing to add AI agents.^[1]^[3]
Closed and managed-only — no self-hosting, no disclosed model stack, and no open-source layer to audit; e-commerce buyers ship customer data through a seed-stage vendor's cloud.^[6]
Vertical focus cuts both ways — teams outside e-commerce (and adjacent travel customers like Norse Atlantic Airways) are no longer the marketed audience, an implicit narrowing from the original horizontal "test your websites" pitch.^[2]^[3]

What Developers Say

There is no substantive independent community discussion of Spur as of June 2026: an HN Algolia search returns no stories about the product, and no Reddit threads with real usage reports were found.^[7] The only public user voice is vendor-curated — testimonials on Spur's own site and YC launch page ("not a single test has broken due to flakiness," per one) — which this research does not treat as independent evidence.^[1] Third-party directory listings carry no critical assessment either.^[5] For a company claiming 30+ enterprise customers, the total absence of practitioner discussion is itself a data point: the buyers are e-commerce QA and growth teams, not the developers who post on HN.^[3]^[7]

Pricing & Licensing

Tier	Price	Includes
Pilot	Quote-based	Personalized pilot program; vendor advertises core-flow setup within 7 days^[4]
Enterprise	Quote-based	Full agent suite, web + mobile, CI/CD and MCP access^[5]^[2]

No public price list exists as of June 2026; the pricing page returns no published tiers and directory listings classify the product as quote-based.^[5]

Licensing model: Proprietary managed SaaS; no open-source components.^[2]

Hidden costs: Unquantifiable until quoted — budget for pilot-to-contract negotiation, and note that test volume, parallelism, and mobile coverage are the likely pricing levers.^[5]

Competitive Positioning

Direct Competitors

Competitor	Differentiation
Momentic	Horizontal AI testing for engineering teams with public pricing and developer-first workflow; Spur counters with e-commerce-specific agents and white-glove enterprise onboarding
Ranger	Also an AI QA agent with autonomous exploration; Spur differentiates on the e-commerce vertical and its five specialized agent types
QA Wolf	Service-heavy model with humans in the loop guaranteeing coverage; Spur is agent-first with a smaller team and lower-touch pitch
Selenium / Playwright (in-house)	Free and fully controlled, but selector-bound and maintenance-heavy — exactly the brittleness Spur's adaptive agents target^[4]

When to Choose Spur Over Alternatives

Choose Spur when: you run an e-commerce or DTC storefront, release velocity is gated by manual QA of purchase flows, and you want agents that already understand promos, stock states, and localization.
Choose Momentic when: you are an engineering team that wants transparent pricing, a developer-first workflow, and horizontal coverage beyond commerce.
Choose Ranger when: autonomous exploratory coverage across a general web app is the priority rather than vertical depth.
Choose in-house Playwright when: you have dedicated SDET capacity and need full control, self-hosting, or zero data egress.

Ideal Customer Profile

Best fit:

E-commerce and DTC brands releasing storefront changes weekly or faster without a large QA org
Teams burned by flaky selector-based suites who want tests that survive UI churn
Organizations standardizing on agentic workflows that want QA exposed via MCP to their other AI tools

Poor fit:

Non-commerce products outside the marketed vertical
Buyers requiring self-hosting, disclosed model providers, or an auditable open-source core
Teams that need published pricing to clear procurement quickly

Viability Assessment

Factor	Assessment
Financial Health	Adequate for stage — $4.5M from a strong syndicate (April 2025), but a small raise for an enterprise-sales motion^[3]
Market Position	Differentiated niche — the only verticalized e-commerce QA agent among horizontal rivals, with genuine retail logos^[2]
Innovation Pace	Active — Scenario Tables, Test Plans, environments/browsers, and reporting integrations shipped in 2025; MCP launched in the 2026 cycle^[4]
Community/Ecosystem	Absent — no HN, Reddit, or independent review presence as of June 2026^[7]
Long-term Outlook	Hinges on whether vertical depth beats the better-funded horizontal QA agents and on a Series A confirming the enterprise traction^[3]

The aliveness check passes: the blog is actively publishing into 2026, the MCP launch is recent, named-customer case material (Living Spaces running 1,000+ tests a month on bi-weekly releases) keeps appearing, and the YC profile lists the company as active.^[4]^[1] The open question is durability — an 8-person team selling quote-based enterprise contracts against QA Wolf's service guarantees and Momentic's developer motion needs the vertical wedge to hold.^[1]^[3]

Bottom Line

Spur is the e-commerce specialist in the AI QA agent field: a credible founding team, a top-tier seed syndicate, and retail logos that most seed companies never land, wrapped around agents purpose-built for the way storefronts actually break. The offsetting reality is that every performance number is vendor-stated, pricing is opaque, the community footprint is zero, and the company is betting a small team and $4.5M on a vertical wedge in a category where rivals are better capitalized.

Recommended for: E-commerce and DTC teams whose releases are gated by manual QA of purchase, promo, and localization flows — pilot it against your flakiest suite and demand the ROI numbers in your own data.

Not recommended for: Non-commerce products, self-hosting or model-transparency requirements, or buyers who need list pricing and independent benchmarks before engaging.

Outlook: Watch for a Series A to validate the enterprise traction, any independent benchmark of the false-positive claims, and whether the MCP surface turns Spur into the default QA tool that coding agents call — that, more than the vertical pitch, is the upside case.

Research by Ry Walker Research • methodology

Sources