QA Wolf | Ry Walker Research

Key takeaways

$36M Series B led by Scale Venture Partners (July 2024), $57M total — the most-funded company in the AI QA agents category, with 130+ customers including Salesloft, Drata, and AutoTrader.ca and a Sacra-estimated $15-20M ARR in 2024
The promise is an outcome, not a tool: 80% automated end-to-end coverage within 4 months, maintained indefinitely with a zero-flake guarantee, on open-source Playwright (web) and Appium (mobile) code the customer owns
The honest framing is AI-assisted humans-as-a-service — AI agents map, generate, and triage, but human QA engineers own the suite; autonomous- agent competitors like Ranger and QA.tech explicitly position against exactly this
Pricing is per-test: roughly $40-44 per test per month with a ~$90K median annual contract, which buys creation, maintenance, unlimited parallel runs, and 24-hour failure triage

FAQ

What is QA Wolf?

QA Wolf is a managed QA service plus AI testing platform that builds and maintains automated end-to-end test suites in Playwright and Appium, guaranteeing 80% coverage within about 4 months for a per-test monthly fee.

How much does QA Wolf cost?

Roughly $40-44 per automated test per month per third-party Vendr data, with a median annual contract value around $90,000; the fee includes test creation, maintenance, unlimited parallel runs, and 24-hour failure investigation.

Is QA Wolf an autonomous AI agent?

No — it is a hybrid. AI agents map the app, generate test code, and triage failures, but human QA engineers review, refine, and own the suite on the customer's behalf; competitors position against it on precisely this axis.

How is QA Wolf different from Ranger?

Both pair AI with human review, but QA Wolf is service-first — a dedicated external team owning your suite — while Ranger is AI-first with lighter expert oversight and claims faster initial coverage on web apps.

Executive Summary

QA Wolf sells an outcome, not a tool: 80% automated end-to-end test coverage within about 4 months, maintained indefinitely, with a zero-flake guarantee — delivered as "Coverage-as-a-Service" by AI agents and dedicated human QA engineers working together on open-source Playwright (web) and Appium (mobile) code the customer owns.^[1]^[2] It is the funding leader of the AI QA agents category: a $36M Series B led by Scale Venture Partners closed in July 2024, bringing the total to $57M, with 130+ customers including Salesloft, Drata, and AutoTrader.ca at the time of the raise.^[3]^[4]

The honest framing matters: QA Wolf is AI-assisted humans-as-a-service, not an autonomous agent. Its AI layer — Mapping AI that outlines the app, Automation AI that generates production-grade test code from natural language, and parallel run infrastructure — accelerates a human team that reviews, refines, and owns the suite on the customer's behalf.^[5] Autonomous-first competitors (Ranger, QA.tech) position against exactly this, framing QA Wolf as an outsourced engineering layer between your product and your coverage.^[6]^[7] Sacra estimates $15-20M ARR in 2024 on average contract values of $100K-200K — real revenue, but a services-heavy margin profile the company is working to automate away.^[2]

Attribute	Value
Company	QA Wolf (Seattle, WA)^[8]
Founded	2019 (Series A Sept 2022, led by Inspired Capital)^[2]^[8]
Funding	$57M total; $36M Series B (July 2024) led by Scale Venture Partners, with Threshold Ventures, Inspired Capital, Notation Capital^[3]^[4]
Revenue	$15-20M ARR (2024, Sacra estimate); 130+ customers^[2]^[3]
Named Customers	Salesloft, Drata, AutoTrader.ca, Metronome, Meow Wolf, Lifesum, UserTesting^[1]^[3]
Open Source	Test code is open-source Playwright/Appium owned by the customer; platform proprietary^[1]

Product Overview

The customer experience is closer to hiring a QA department than buying software. QA Wolf's team (AI plus humans) maps the application, writes the test suite, runs it in full parallel on hosted infrastructure, investigates every failure within 24 hours, files human-verified bug reports, and maintains the tests as the product changes — all for a flat per-test monthly fee.^[1]^[9] Because tests are plain Playwright and Appium, customers can take the code and leave without vendor lock-in.^[1]

The AI layer added through 2025-2026 shifts more of that labor to agents. Mapping AI "autonomously outline[s] your entire app in minutes"; Automation AI converts natural-language prompts into production-grade test code for web, iOS, and Android, handling edge cases like Canvas APIs, iBeacon, barcode scanning, and LLM-as-a-judge assertions for validating AI-product output; Run Infra orchestrates fully parallel execution with dependency management.^[1]^[5] Sacra notes growth has accelerated as AI-native coding tools speed up development cycles and create more code to test.^[2]

Key Capabilities

Capability	Description
Coverage guarantee	80% automated E2E coverage within ~4 months, maintained thereafter^[2]
Mapping AI	Autonomously explores and outlines the app's workflows^[5]
Automation AI	Natural language → production-grade Playwright/Appium test code^[5]
Run infrastructure	Unlimited fully parallel runs; one customer reports 300 tests in 11 minutes^[1]^[9]
Failure triage	24-hour investigation SLA, human-verified bug reports, zero-flake guarantee^[2]^[9]
Mobile testing	Appium-based iOS/Android automation, launched with the Series B^[4]
AI-product testing	LLM-as-a-judge assertions for validating AI output^[1]

Product Surfaces

Surface	Description	Availability
Managed service	Dedicated QA engineers embedded with the team	GA^[1]
Web testing (Playwright)	Core E2E suite, CI/CD-integrated	GA^[1]
Mobile testing (Appium)	iOS and Android native app testing	GA (waitlist opened July 2024)^[4]
AI platform	Mapping AI, Automation AI, Run Infra	GA^[5]

Technical Architecture

QA Wolf is a managed cloud service: tests run on QA Wolf's hosted infrastructure in full parallel, integrated into the customer's CI pipeline for PR-gating and smoke tests.^[5] The deliberate architectural bet is deterministic code over runtime AI: agents generate reviewable Playwright/Appium scripts rather than interpreting the app with a model on every run, which QA Wolf argues handles "the most complex functionality in a reproducible way."^[5] Humans sit in the loop at every stage — reviewing generated code, confirming root causes before tests are updated, and verifying bugs before reports reach the customer.^[1]^[6]

Key Technical Details

Aspect	Detail
Deployment	Managed cloud only; tests run on QA Wolf infrastructure, triggered from customer CI^[5]
Test frameworks	Open-source Playwright (web) and Appium (mobile); customer owns the code^[1]
Model(s)	Not disclosed
Execution	Unlimited parallel runs; entire suites in minutes^[9]^[2]
Open Source	Test code yes; platform and AI agents proprietary^[1]

Strengths

The only guaranteed outcome in the category — 80% coverage in ~4 months with a zero-flake guarantee and a 24-hour triage SLA is a contractual promise, not a product capability, and the per-test pricing makes the vendor eat maintenance costs when tests break.^[2]^[10]
Category-leading capitalization and revenue — $57M raised, Scale Venture Partners leading the B, and a Sacra-estimated $15-20M ARR across 130+ customers; no competitor in the AI QA agents field matches either number.^[3]^[2]
No lock-in on the artifact — suites are standard Playwright/Appium code the customer owns and can take elsewhere, unusual for a managed service.^[1]
Production-scale proof points — Salesloft runs 800+ PR-gating tests and releases 15x a day; Metronome runs 400+ tests releasing 4x a day.^[1]
Humans absorb the hard parts AI still fails at — flake investigation, root-cause confirmation, and bug verification are exactly where autonomous agents still hallucinate; QA Wolf's model keeps a person accountable for signal quality.^[1]^[9]
Real review volume — a 4.8-star G2 rating across 100+ reviews, far more independent review mass than any autonomous-agent rival.^[1]^[11]

Cautions

It is not an autonomous agent, whatever the marketing says — the product is a human QA team accelerated by AI; buyers expecting self-driving test coverage are buying outsourced engineering with an AI toolchain. Competitors build their entire pitch on this distinction.^[6]^[7]
Services economics — Sacra pegs ACVs at $100K-200K with a labor-heavy delivery model; the company's margin path depends on agents replacing more of the human work, a transition that is promised but not yet demonstrated publicly.^[2]
Expensive at suite scale — ~$40-44 per test per month means a 500-test suite runs roughly $20K+/month; the ~$90K median ACV is real budget, and flat-fee competitors undercut it aggressively.^[9]
Slower initial ramp than AI-first rivals claim — Ranger asserts QA Wolf takes "3-4 months" to meaningful coverage versus its own 1-2 weeks; that is adversarial sourcing, but the 4-month figure matches QA Wolf's own guarantee window.^[6]^[2]
G2 critics flag delivery friction — some reviewers report dissatisfaction with test flakiness and results, difficulty expanding coverage as expected, limitations around API-call testing, and sales-cycle expectations on test-creation speed that delivery did not match.^[11]
An external team between you and your tests — coverage decisions, maintenance, and triage run through QA Wolf's engineers; teams that want QA knowledge in-house are structurally outside the model.^[7]

What Developers Say

Independent discussion is thinner than QA Wolf's funding profile would suggest: Hacker News mentions are sparse and mostly incidental, while the substantive review volume lives on G2 (4.8 stars, 100+ reviews).^[12]^[1]^[11]

"None of this is wrong, but this is mostly an advertisement for QA Wolf." — pavel_lishin on Hacker News, on a QA Wolf engineering blog post^[12]

"QA Wolf takes a much longer route — typically 3-4 months — due to its manual, script-heavy approach." — Ranger's comparison page (a direct competitor; read as adversarial)^[6]

"QA Wolf gives you a dedicated team. QA.tech gives you AI agents that work 24/7 inside yours." — QA.tech's comparison page (also a competitor, but a fair one-line summary of the model difference)^[7]

G2 reviewers — paraphrased here, as G2 restricts excerpting — consistently praise the QA Wolf team's care, attentiveness, and communication, with some calling it among the best SaaS experiences they have had and crediting a 90-day pilot with delivering a regression suite that would have taken internal teams 9+ months; the critical minority cites flakiness, slower-than-pitched test creation, and API-testing limitations.^[11] Net: customer sentiment is strongly positive, developer-community sentiment is mostly silence punctuated by content-marketing skepticism.^[11]^[12]

Pricing & Licensing

No public price list — pricing is per-test and quoted by scope. Third-party Vendr data via Bug0 puts it at roughly $40-44 per automated test per month with a median annual contract value of ~$90,000.^[9]

Tier	Price	Includes
Per-test (sole model)	~$40-44/test/month; ~$90K median ACV^[9]	Test creation (no setup fees), continuous maintenance, unlimited parallel runs on hosted infra, 24-hour failure triage, human-verified bug reports, zero-flake guarantee^[10]^[9]

QA Wolf's own sizing rule of thumb is roughly 30 tests per engineer on staff.^[10] All pricing as of June 2026.

Licensing model: Proprietary managed service; generated test suites are open-source Playwright/Appium code owned by the customer.^[1]

Hidden costs: Cost scales linearly with suite size, so comprehensive coverage of a large app compounds quickly; complexity of the application also drives the quote, and possible integration fees are not publicly detailed.^[9]

Competitive Positioning

Direct Competitors

Competitor	Differentiation
Ranger	AI-first "cyborg" model — agents generate Playwright tests with expert review rather than a dedicated human team; claims 80%+ coverage in 1-2 weeks vs QA Wolf's ~4 months, web-only vs QA Wolf's web + mobile^[6]
Momentic	Self-serve AI testing tool your own engineers drive — a product you operate, versus QA Wolf's service that operates for you
QA.tech	Autonomous agents working inside your team 24/7; positions QA Wolf as an external dedicated team by contrast^[7]
Bug0 / Rainforest QA	Managed-QA rivals competing on price — Bug0 advertises $2,500/month flat with a dedicated engineer against QA Wolf's ~$90K median ACV^[9]

When to Choose QA Wolf Over Alternatives

Choose QA Wolf when: you have no QA function and want a guaranteed outcome — 80% coverage, maintained, with humans accountable for every failure — and the ~$90K/year budget exists.
Choose Ranger when: you want AI-first speed to coverage on a web app and accept lighter human involvement.
Choose Momentic when: your engineers want to own testing in-house with an AI tool rather than delegate it to an external team.
Choose a flat-fee managed rival when: budget is the constraint and the zero-flake guarantee premium isn't worth ~$90K/year.^[9]

Ideal Customer Profile

Best fit:

VC-backed B2B SaaS and digital commerce teams shipping weekly or daily with no dedicated QA function — QA Wolf's own stated core market^[2]
Teams that want a contractual coverage outcome with human accountability for triage, not another tool to staff
Products with both web and native mobile surfaces that want one vendor across Playwright and Appium^[4]

Poor fit:

Teams that want QA expertise and test ownership to live in-house
Budgets under ~$50K/year, where per-test pricing collapses against flat-fee or self-serve alternatives^[9]
Buyers specifically seeking autonomous-agent testing with no external human layer^[7]

Viability Assessment

Factor	Assessment
Financial Health	Strongest in category — $57M raised, $36M Series B (July 2024) led by Scale Venture Partners; $15-20M estimated ARR^[3]^[2]
Market Position	Category revenue leader with 130+ customers and the deepest review base, but positioned as the incumbent every autonomous-agent startup attacks^[3]^[6]
Innovation Pace	Active — mobile (Appium) launched with the Series B; Mapping AI and natural-language Automation AI shipped since^[4]^[5]
Community/Ecosystem	Modest — strong G2 presence (4.8, 100+ reviews) but little organic developer discussion on HN/Reddit^[11]^[12]
Long-term Outlook	Hinges on automating its own labor — margins improve if agents absorb more of the human work before autonomous rivals get good enough^[2]

QA Wolf has the category's revenue, funding, and customer proof, and its services-with-a-guarantee model is precisely what made it sellable into teams that distrust testing tools. The strategic question is sequencing: it must convert a human-delivered service into an agent-delivered one faster than AI-first competitors convert their agents into something enterprises trust. Its AI investments and 50M+ test runs of operational data are real advantages in that race; its cost structure is the handicap.^[2]^[5]

Bottom Line

QA Wolf is the safe, expensive choice in AI QA: the only vendor that contractually guarantees 80% maintained coverage, backed by the category's biggest war chest, real enterprise logos, and humans who answer for every flaky test. Just buy it with clear eyes — you are hiring an AI-accelerated external QA team, not deploying an autonomous agent, and you will pay ~$90K/year median for the difference.

Recommended for: Fast-shipping product teams without a QA function that want a guaranteed, maintained E2E suite on portable Playwright/Appium code and have the budget to delegate the whole problem.

Not recommended for: Teams that want testing expertise in-house, sub-$50K budgets, or buyers whose thesis is that autonomous agents make managed QA services obsolete.

Outlook: Watch the gross-margin story — whether Mapping AI and Automation AI visibly shrink the human layer — plus mobile traction post-Series B and whether the next funding round (none announced since July 2024, as of June 2026) confirms the transition from service to platform.

Research by Ry Walker Research • methodology

Sources