← Back to research
·8 min read·company

Vapi

Vapi is a developer-focused voice AI platform for building production voice agents, supporting both OpenAI Realtime and traditional STT+LLM+TTS pipelines with native phone integration.

Key takeaways

  • Developer-first platform supporting both OpenAI Realtime API and traditional voice pipelines for flexibility
  • Strong phone/telephony focus with native SIP trunking and Twilio integration
  • $72M raised including $50M Series B led by Peak XV at a ~$500M valuation (May 2026); 1B+ calls handled, Amazon Ring routes 100% of inbound support calls through Vapi

FAQ

What is Vapi?

Vapi is a developer platform for building voice AI agents that supports multiple backends (OpenAI Realtime, traditional pipelines), phone integration, and production deployment.

How much does Vapi cost?

Vapi charges a $0.05/minute platform fee plus at-cost provider charges (STT, LLM, TTS, telephony) — $0 in provider fees if you bring your own API keys. Real-world all-in costs typically run $0.13-0.33/minute depending on configuration.

Does Vapi support OpenAI Realtime API?

Yes, Vapi supports both OpenAI Realtime API for speech-to-speech and traditional STT+LLM+TTS pipelines, giving developers flexibility to choose their approach.

Executive Summary

Vapi is a developer-focused platform for building production voice AI agents. Unlike end-to-end solutions like ElevenLabs or OpenAI Realtime, Vapi acts as an orchestration layer that supports multiple backends: OpenAI Realtime API for speech-to-speech, or traditional STT+LLM+TTS pipelines with provider choice. The platform has strong phone integration and has raised $72M total — a $20M Series A in December 2024 followed by a $50M Series B led by Peak XV at a roughly $500M valuation in May 2026, after Amazon Ring chose Vapi over 40+ rival vendors to handle its inbound support calls.

AttributeValue
CompanyVapi
Founded2023
Funding$72M total ($50M Series B, May 2026)
InvestorsPeak XV, Bessemer, Y Combinator, M12, Kleiner Perkins
Valuation~$500M (May 2026)
Notable customersAmazon Ring, Intuit, ServiceTitan, New York Life
FocusDeveloper platform, phone agents
HeadquartersSan Francisco, CA

Product Overview

Vapi positions itself as the developer platform for voice AI, offering flexibility in how agents are built. Developers can use OpenAI Realtime API for native speech-to-speech or construct traditional pipelines mixing different STT, LLM, and TTS providers.

The platform is particularly strong in telephony, with native support for phone calls, SIP trunking, and Twilio integration—making it popular for call center automation, outbound sales, and customer support applications.

Since early 2026 Vapi has leaned hard into enterprise: the platform now advertises SSO, OAuth, RBAC, SOC 2/HIPAA/PCI compliance, AI guardrails against hallucinations, and sub-500ms latency at scale, claiming 1B+ calls supported, 2.5M+ agents launched, and 750K+ developers as of June 2026. Newer platform capabilities include Squads (multi-assistant orchestration with context-preserving transfers), a CLI, and SMS/chat alongside voice ($0.005/message).

Key Capabilities

CapabilityDescription
OpenAI Realtime SupportNative speech-to-speech integration
Traditional PipelineMix STT, LLM, TTS providers
Phone IntegrationSIP trunking, Twilio, phone numbers
Web/Mobile SDKsEmbeddable voice agents
Function CallingTool use and integrations
AnalyticsCall metrics and conversation analysis

Supported Providers

CategoryProviders
STTDeepgram, AssemblyAI, Azure, Google
LLMOpenAI, Anthropic, Groq, custom
TTSElevenLabs, Deepgram, PlayHT, Azure
TelephonyTwilio, SIP trunking, BYOC

Technical Architecture

Vapi acts as an orchestration layer between the voice interface (phone, web, mobile) and the AI backend. This architecture provides flexibility but adds a layer compared to direct API access.

┌─────────────────────────────────────────────────┐
│                 User Interface                   │
│     Phone | Web Widget | Mobile SDK              │
├─────────────────────────────────────────────────┤
│                 Vapi Platform                    │
│  ┌───────────────────────────────────────────┐  │
│  │         Orchestration Layer               │  │
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐     │  │
│  │  │ OpenAI  │ │ Trad.   │ │ Custom  │     │  │
│  │  │Realtime │ │Pipeline │ │ Config  │     │  │
│  │  └─────────┘ └─────────┘ └─────────┘     │  │
│  └───────────────────────────────────────────┘  │
├─────────────────────────────────────────────────┤
│              Provider Integrations               │
│   STT: Deepgram, AssemblyAI, Azure              │
│   LLM: OpenAI, Anthropic, Groq                  │
│   TTS: ElevenLabs, Deepgram, PlayHT             │
└─────────────────────────────────────────────────┘

Strengths

  • Provider flexibility — Mix and match STT, LLM, TTS providers; not locked to one vendor
  • OpenAI Realtime support — Native integration with latest speech-to-speech API
  • Phone-first — Strong telephony integration (SIP, Twilio, phone numbers)
  • Developer-focused — Well-documented APIs and SDKs
  • Well-funded — $72M total; $50M Series B led by Peak XV at ~$500M valuation (May 2026)
  • Enterprise validation — Amazon Ring routes 100% of inbound support calls through Vapi; Intuit, ServiceTitan, New York Life also customers
  • Cost optimization — Choose cheaper providers per component; provider costs drop to $0 with your own API keys
  • Web + phone — Same agent for web widget and phone calls

Cautions

  • Hidden costs — Base $0.05/min plus provider pass-through; independent reviews put real-world all-in costs at $0.13-0.33/min, 3-6x the headline rate
  • Orchestration overhead — Additional latency vs direct API access; reviewers measure 500-800ms in production despite the sub-500ms claim
  • Complexity — More configuration than end-to-end platforms; production deployments can require billing relationships with 4-6 providers
  • Developer-heavy — Requires technical expertise; not suitable for no-code users
  • Middleware dependency — Another vendor layer between you and providers
  • Support model — Self-serve support runs through Discord; reviews cite slow responses and agents breaking after platform updates

Pricing & Licensing

Vapi uses a base platform fee plus pass-through provider costs (as of June 2026):

ComponentCost
Platform fee (voice)$0.05/minute
SMS/Chat$0.005/message
STT / LLM / TTSAt cost ($0 if you bring your own API keys)
Call concurrency10 lines included, then $10/line/month
HIPAA add-on$2,000/month
Zero Data Retention add-on$1,000/month

The Scale plan (annual contract) replaces usage pricing with a fixed platform fee, committed volume, and volume-based per-minute rates, plus enterprise SLA, SOC 2/HIPAA/PCI, SSO, RBAC, and data residency options.

Total typical cost: independent reviews put real-world all-in costs at $0.13-0.33/minute depending on configuration — well above the $0.05/min headline once STT, LLM, TTS, and telephony are added.


What Developers Say

Developer sentiment is split between praise for flexibility and frustration with total cost:

"Vapi.ai (the best in my opinion)"

— andrewoodleyjr on Hacker News, comparing voice agent platforms

"If you add up costs... it comes to be $18/hr. That's not exactly cheap."

— paraschopra on Hacker News, on Vapi's all-in per-minute economics

Third-party reviews echo both sides: Vapi "gives developers serious control over the full voice pipeline" with a model-agnostic architecture, but the gap between the $0.05/min headline and $0.13-0.33/min real-world costs is the most-mentioned friction point, alongside Discord-based support that "works for hobby projects and frustrates production teams" and reports of working assistants breaking after platform updates.


Competitive Positioning

Direct Competitors

CompetitorDifferentiation
Retell AIRetell is simpler with lower base pricing ($0.07/min); Vapi has more provider flexibility
ElevenLabsElevenLabs is end-to-end with best voices; Vapi orchestrates multiple providers
LiveKit AgentsLiveKit is open-source framework; Vapi is managed platform with more telephony
OpenAI RealtimeOpenAI is direct access; Vapi adds orchestration and phone integration

When to Choose Vapi

  • Choose Vapi when: You want provider flexibility, strong phone integration, or mix of Realtime and traditional
  • Choose Retell when: You want simpler setup and predictable pricing
  • Choose ElevenLabs when: Voice quality is paramount and you want all-in-one
  • Choose LiveKit when: You want open-source control

Ideal Customer Profile

Best fit:

  • Developers building phone-first voice agents
  • Teams wanting to mix providers for cost/quality optimization
  • Organizations experimenting with different voice AI approaches
  • Call center automation projects
  • Outbound sales and appointment setting

Poor fit:

  • Non-technical teams (requires developer expertise)
  • Simple use cases (over-engineered for basic needs)
  • Teams wanting simplest possible integration
  • Organizations preferring single-vendor solutions

Viability Assessment

FactorAssessment
Financial HealthStrong — $72M raised; ~$500M valuation; eight-figure ARR run rate per TechCrunch
Market PositionStrong — 1B+ calls handled, 1-5M calls/day; enterprise logos (Amazon Ring, Intuit)
Innovation PaceGood — Quick to adopt OpenAI Realtime; Squads, CLI, SMS/chat added
EcosystemModerate — Good docs, growing integrations
Long-term OutlookPositive — Well-positioned in voice AI infrastructure

Bottom Line

Vapi is the platform for developers who want flexibility in building voice agents. The ability to use OpenAI Realtime for speech-to-speech OR traditional STT+LLM+TTS pipelines with provider choice makes it uniquely flexible. The strong phone integration (SIP, Twilio) makes it particularly suited for call center and telephony applications.

The trade-off is complexity and potentially higher total costs than simpler alternatives like Retell AI. The orchestration layer adds value for sophisticated use cases but may be overkill for straightforward applications. The May 2026 Series B — $50M led by Peak XV at a ~$500M valuation, won partly on the strength of the Amazon Ring deal — answers the viability question that hangs over most middleware vendors. For technical teams wanting maximum flexibility and phone-first voice AI, Vapi is an excellent choice.

Recommended for: Developers building phone-first voice agents who want provider flexibility and don't mind configuration complexity.

Not recommended for: Non-technical teams, simple use cases, or those wanting the simplest possible integration path.

Outlook: Strong. With enterprise validation (Ring, Intuit), 1B+ calls handled, and fresh capital, Vapi looks like a durable leader in voice agent orchestration — though competition from end-to-end platforms and direct speech-to-speech APIs keeps pressure on the middleware layer.


Research by Ry Walker Research