Key takeaways
- Developer-first platform supporting both OpenAI Realtime API and traditional voice pipelines for flexibility
- Strong phone/telephony focus with native SIP trunking and Twilio integration
- $25M+ raised from Bessemer, Y Combinator; rapidly growing in voice AI developer ecosystem
FAQ
What is Vapi?
Vapi is a developer platform for building voice AI agents that supports multiple backends (OpenAI Realtime, traditional pipelines), phone integration, and production deployment.
How much does Vapi cost?
Vapi charges $0.05/minute platform fee plus provider costs (STT, LLM, TTS, telephony). Total costs typically range from $0.10-0.35/minute depending on configuration.
Does Vapi support OpenAI Realtime API?
Yes, Vapi supports both OpenAI Realtime API for speech-to-speech and traditional STT+LLM+TTS pipelines, giving developers flexibility to choose their approach.
Executive Summary
Vapi is a developer-focused platform for building production voice AI agents. Unlike end-to-end solutions like ElevenLabs or OpenAI Realtime, Vapi acts as an orchestration layer that supports multiple backends: OpenAI Realtime API for speech-to-speech, or traditional STT+LLM+TTS pipelines with provider choice. The platform has strong phone integration and raised $25M+ from top investors including Bessemer and Y Combinator.
| Attribute | Value |
|---|---|
| Company | Vapi |
| Founded | 2023 |
| Funding | $25M+ (Series A, December 2024) |
| Investors | Bessemer, Y Combinator, AI Grant |
| Focus | Developer platform, phone agents |
| Headquarters | San Francisco, CA |
Product Overview
Vapi positions itself as the developer platform for voice AI, offering flexibility in how agents are built. Developers can use OpenAI Realtime API for native speech-to-speech or construct traditional pipelines mixing different STT, LLM, and TTS providers.
The platform is particularly strong in telephony, with native support for phone calls, SIP trunking, and Twilio integration—making it popular for call center automation, outbound sales, and customer support applications.
Key Capabilities
| Capability | Description |
|---|---|
| OpenAI Realtime Support | Native speech-to-speech integration |
| Traditional Pipeline | Mix STT, LLM, TTS providers |
| Phone Integration | SIP trunking, Twilio, phone numbers |
| Web/Mobile SDKs | Embeddable voice agents |
| Function Calling | Tool use and integrations |
| Analytics | Call metrics and conversation analysis |
Supported Providers
| Category | Providers |
|---|---|
| STT | Deepgram, AssemblyAI, Azure, Google |
| LLM | OpenAI, Anthropic, Groq, custom |
| TTS | ElevenLabs, Deepgram, PlayHT, Azure |
| Telephony | Twilio, SIP trunking, BYOC |
Technical Architecture
Vapi acts as an orchestration layer between the voice interface (phone, web, mobile) and the AI backend. This architecture provides flexibility but adds a layer compared to direct API access.
┌─────────────────────────────────────────────────┐
│ User Interface │
│ Phone | Web Widget | Mobile SDK │
├─────────────────────────────────────────────────┤
│ Vapi Platform │
│ ┌───────────────────────────────────────────┐ │
│ │ Orchestration Layer │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ OpenAI │ │ Trad. │ │ Custom │ │ │
│ │ │Realtime │ │Pipeline │ │ Config │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ │ │
│ └───────────────────────────────────────────┘ │
├─────────────────────────────────────────────────┤
│ Provider Integrations │
│ STT: Deepgram, AssemblyAI, Azure │
│ LLM: OpenAI, Anthropic, Groq │
│ TTS: ElevenLabs, Deepgram, PlayHT │
└─────────────────────────────────────────────────┘
Strengths
- Provider flexibility — Mix and match STT, LLM, TTS providers; not locked to one vendor
- OpenAI Realtime support — Native integration with latest speech-to-speech API
- Phone-first — Strong telephony integration (SIP, Twilio, phone numbers)
- Developer-focused — Well-documented APIs and SDKs
- Well-funded — $25M+ from Bessemer, Y Combinator
- Cost optimization — Choose cheaper providers per component
- Web + phone — Same agent for web widget and phone calls
Cautions
- Hidden costs — Base $0.05/min plus all provider costs can add up to $0.33/min
- Orchestration overhead — Additional latency vs direct API access
- Complexity — More configuration than end-to-end platforms
- Developer-heavy — Requires technical expertise; not suitable for no-code users
- Middleware dependency — Another vendor layer between you and providers
- Limited testing tools — Less built-in testing compared to Retell AI
Pricing & Licensing
Vapi uses a base platform fee plus pass-through provider costs:
| Component | Cost |
|---|---|
| Platform Fee | $0.05/minute |
| STT | Varies by provider (~$0.01-0.02/min) |
| LLM | Varies by model (~$0.01-0.08/min) |
| TTS | Varies by provider (~$0.01-0.04/min) |
| Telephony | ~$0.015/min (Twilio) |
Total typical cost: $0.10-0.35/minute depending on configuration.
OpenAI Realtime mode: ~$0.50/minute (includes OpenAI's premium pricing).
Competitive Positioning
Direct Competitors
| Competitor | Differentiation |
|---|---|
| Retell AI | Retell is simpler with lower base pricing ($0.07/min); Vapi has more provider flexibility |
| ElevenLabs | ElevenLabs is end-to-end with best voices; Vapi orchestrates multiple providers |
| LiveKit Agents | LiveKit is open-source framework; Vapi is managed platform with more telephony |
| OpenAI Realtime | OpenAI is direct access; Vapi adds orchestration and phone integration |
When to Choose Vapi
- Choose Vapi when: You want provider flexibility, strong phone integration, or mix of Realtime and traditional
- Choose Retell when: You want simpler setup and predictable pricing
- Choose ElevenLabs when: Voice quality is paramount and you want all-in-one
- Choose LiveKit when: You want open-source control
Ideal Customer Profile
Best fit:
- Developers building phone-first voice agents
- Teams wanting to mix providers for cost/quality optimization
- Organizations experimenting with different voice AI approaches
- Call center automation projects
- Outbound sales and appointment setting
Poor fit:
- Non-technical teams (requires developer expertise)
- Simple use cases (over-engineered for basic needs)
- Teams wanting simplest possible integration
- Organizations preferring single-vendor solutions
Viability Assessment
| Factor | Assessment |
|---|---|
| Financial Health | Strong — $25M+ raised from top-tier investors |
| Market Position | Growing — Popular in developer community |
| Innovation Pace | Good — Quick to adopt OpenAI Realtime |
| Ecosystem | Moderate — Good docs, growing integrations |
| Long-term Outlook | Positive — Well-positioned in voice AI infrastructure |
Bottom Line
Vapi is the platform for developers who want flexibility in building voice agents. The ability to use OpenAI Realtime for speech-to-speech OR traditional STT+LLM+TTS pipelines with provider choice makes it uniquely flexible. The strong phone integration (SIP, Twilio) makes it particularly suited for call center and telephony applications.
The trade-off is complexity and potentially higher total costs than simpler alternatives like Retell AI. The orchestration layer adds value for sophisticated use cases but may be overkill for straightforward applications. For technical teams wanting maximum flexibility and phone-first voice AI, Vapi is an excellent choice.
Recommended for: Developers building phone-first voice agents who want provider flexibility and don't mind configuration complexity.
Not recommended for: Non-technical teams, simple use cases, or those wanting the simplest possible integration path.
Research by Ry Walker Research