← Back to research
·6 min read·company

Vapi

Vapi is a developer-focused voice AI platform for building production voice agents, supporting both OpenAI Realtime and traditional STT+LLM+TTS pipelines with native phone integration.

Key takeaways

  • Developer-first platform supporting both OpenAI Realtime API and traditional voice pipelines for flexibility
  • Strong phone/telephony focus with native SIP trunking and Twilio integration
  • $25M+ raised from Bessemer, Y Combinator; rapidly growing in voice AI developer ecosystem

FAQ

What is Vapi?

Vapi is a developer platform for building voice AI agents that supports multiple backends (OpenAI Realtime, traditional pipelines), phone integration, and production deployment.

How much does Vapi cost?

Vapi charges $0.05/minute platform fee plus provider costs (STT, LLM, TTS, telephony). Total costs typically range from $0.10-0.35/minute depending on configuration.

Does Vapi support OpenAI Realtime API?

Yes, Vapi supports both OpenAI Realtime API for speech-to-speech and traditional STT+LLM+TTS pipelines, giving developers flexibility to choose their approach.

Executive Summary

Vapi is a developer-focused platform for building production voice AI agents. Unlike end-to-end solutions like ElevenLabs or OpenAI Realtime, Vapi acts as an orchestration layer that supports multiple backends: OpenAI Realtime API for speech-to-speech, or traditional STT+LLM+TTS pipelines with provider choice. The platform has strong phone integration and raised $25M+ from top investors including Bessemer and Y Combinator.

AttributeValue
CompanyVapi
Founded2023
Funding$25M+ (Series A, December 2024)
InvestorsBessemer, Y Combinator, AI Grant
FocusDeveloper platform, phone agents
HeadquartersSan Francisco, CA

Product Overview

Vapi positions itself as the developer platform for voice AI, offering flexibility in how agents are built. Developers can use OpenAI Realtime API for native speech-to-speech or construct traditional pipelines mixing different STT, LLM, and TTS providers.

The platform is particularly strong in telephony, with native support for phone calls, SIP trunking, and Twilio integration—making it popular for call center automation, outbound sales, and customer support applications.

Key Capabilities

CapabilityDescription
OpenAI Realtime SupportNative speech-to-speech integration
Traditional PipelineMix STT, LLM, TTS providers
Phone IntegrationSIP trunking, Twilio, phone numbers
Web/Mobile SDKsEmbeddable voice agents
Function CallingTool use and integrations
AnalyticsCall metrics and conversation analysis

Supported Providers

CategoryProviders
STTDeepgram, AssemblyAI, Azure, Google
LLMOpenAI, Anthropic, Groq, custom
TTSElevenLabs, Deepgram, PlayHT, Azure
TelephonyTwilio, SIP trunking, BYOC

Technical Architecture

Vapi acts as an orchestration layer between the voice interface (phone, web, mobile) and the AI backend. This architecture provides flexibility but adds a layer compared to direct API access.

┌─────────────────────────────────────────────────┐
│                 User Interface                   │
│     Phone | Web Widget | Mobile SDK              │
├─────────────────────────────────────────────────┤
│                 Vapi Platform                    │
│  ┌───────────────────────────────────────────┐  │
│  │         Orchestration Layer               │  │
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐     │  │
│  │  │ OpenAI  │ │ Trad.   │ │ Custom  │     │  │
│  │  │Realtime │ │Pipeline │ │ Config  │     │  │
│  │  └─────────┘ └─────────┘ └─────────┘     │  │
│  └───────────────────────────────────────────┘  │
├─────────────────────────────────────────────────┤
│              Provider Integrations               │
│   STT: Deepgram, AssemblyAI, Azure              │
│   LLM: OpenAI, Anthropic, Groq                  │
│   TTS: ElevenLabs, Deepgram, PlayHT             │
└─────────────────────────────────────────────────┘

Strengths

  • Provider flexibility — Mix and match STT, LLM, TTS providers; not locked to one vendor
  • OpenAI Realtime support — Native integration with latest speech-to-speech API
  • Phone-first — Strong telephony integration (SIP, Twilio, phone numbers)
  • Developer-focused — Well-documented APIs and SDKs
  • Well-funded — $25M+ from Bessemer, Y Combinator
  • Cost optimization — Choose cheaper providers per component
  • Web + phone — Same agent for web widget and phone calls

Cautions

  • Hidden costs — Base $0.05/min plus all provider costs can add up to $0.33/min
  • Orchestration overhead — Additional latency vs direct API access
  • Complexity — More configuration than end-to-end platforms
  • Developer-heavy — Requires technical expertise; not suitable for no-code users
  • Middleware dependency — Another vendor layer between you and providers
  • Limited testing tools — Less built-in testing compared to Retell AI

Pricing & Licensing

Vapi uses a base platform fee plus pass-through provider costs:

ComponentCost
Platform Fee$0.05/minute
STTVaries by provider (~$0.01-0.02/min)
LLMVaries by model (~$0.01-0.08/min)
TTSVaries by provider (~$0.01-0.04/min)
Telephony~$0.015/min (Twilio)

Total typical cost: $0.10-0.35/minute depending on configuration.

OpenAI Realtime mode: ~$0.50/minute (includes OpenAI's premium pricing).


Competitive Positioning

Direct Competitors

CompetitorDifferentiation
Retell AIRetell is simpler with lower base pricing ($0.07/min); Vapi has more provider flexibility
ElevenLabsElevenLabs is end-to-end with best voices; Vapi orchestrates multiple providers
LiveKit AgentsLiveKit is open-source framework; Vapi is managed platform with more telephony
OpenAI RealtimeOpenAI is direct access; Vapi adds orchestration and phone integration

When to Choose Vapi

  • Choose Vapi when: You want provider flexibility, strong phone integration, or mix of Realtime and traditional
  • Choose Retell when: You want simpler setup and predictable pricing
  • Choose ElevenLabs when: Voice quality is paramount and you want all-in-one
  • Choose LiveKit when: You want open-source control

Ideal Customer Profile

Best fit:

  • Developers building phone-first voice agents
  • Teams wanting to mix providers for cost/quality optimization
  • Organizations experimenting with different voice AI approaches
  • Call center automation projects
  • Outbound sales and appointment setting

Poor fit:

  • Non-technical teams (requires developer expertise)
  • Simple use cases (over-engineered for basic needs)
  • Teams wanting simplest possible integration
  • Organizations preferring single-vendor solutions

Viability Assessment

FactorAssessment
Financial HealthStrong — $25M+ raised from top-tier investors
Market PositionGrowing — Popular in developer community
Innovation PaceGood — Quick to adopt OpenAI Realtime
EcosystemModerate — Good docs, growing integrations
Long-term OutlookPositive — Well-positioned in voice AI infrastructure

Bottom Line

Vapi is the platform for developers who want flexibility in building voice agents. The ability to use OpenAI Realtime for speech-to-speech OR traditional STT+LLM+TTS pipelines with provider choice makes it uniquely flexible. The strong phone integration (SIP, Twilio) makes it particularly suited for call center and telephony applications.

The trade-off is complexity and potentially higher total costs than simpler alternatives like Retell AI. The orchestration layer adds value for sophisticated use cases but may be overkill for straightforward applications. For technical teams wanting maximum flexibility and phone-first voice AI, Vapi is an excellent choice.

Recommended for: Developers building phone-first voice agents who want provider flexibility and don't mind configuration complexity.

Not recommended for: Non-technical teams, simple use cases, or those wanting the simplest possible integration path.


Research by Ry Walker Research