Vapi | Ry Walker Research

Key takeaways

Developer-first platform supporting both OpenAI Realtime API and traditional voice pipelines for flexibility
Strong phone/telephony focus with native SIP trunking and Twilio integration
$25M+ raised from Bessemer, Y Combinator; rapidly growing in voice AI developer ecosystem

FAQ

What is Vapi?

Vapi is a developer platform for building voice AI agents that supports multiple backends (OpenAI Realtime, traditional pipelines), phone integration, and production deployment.

How much does Vapi cost?

Vapi charges $0.05/minute platform fee plus provider costs (STT, LLM, TTS, telephony). Total costs typically range from $0.10-0.35/minute depending on configuration.

Does Vapi support OpenAI Realtime API?

Yes, Vapi supports both OpenAI Realtime API for speech-to-speech and traditional STT+LLM+TTS pipelines, giving developers flexibility to choose their approach.

Executive Summary

Vapi is a developer-focused platform for building production voice AI agents. Unlike end-to-end solutions like ElevenLabs or OpenAI Realtime, Vapi acts as an orchestration layer that supports multiple backends: OpenAI Realtime API for speech-to-speech, or traditional STT+LLM+TTS pipelines with provider choice. The platform has strong phone integration and raised $25M+ from top investors including Bessemer and Y Combinator.

Attribute	Value
Company	Vapi
Founded	2023
Funding	$25M+ (Series A, December 2024)
Investors	Bessemer, Y Combinator, AI Grant
Focus	Developer platform, phone agents
Headquarters	San Francisco, CA

Product Overview

Vapi positions itself as the developer platform for voice AI, offering flexibility in how agents are built. Developers can use OpenAI Realtime API for native speech-to-speech or construct traditional pipelines mixing different STT, LLM, and TTS providers.

The platform is particularly strong in telephony, with native support for phone calls, SIP trunking, and Twilio integration—making it popular for call center automation, outbound sales, and customer support applications.

Key Capabilities

Capability	Description
OpenAI Realtime Support	Native speech-to-speech integration
Traditional Pipeline	Mix STT, LLM, TTS providers
Phone Integration	SIP trunking, Twilio, phone numbers
Web/Mobile SDKs	Embeddable voice agents
Function Calling	Tool use and integrations
Analytics	Call metrics and conversation analysis

Supported Providers

Category	Providers
STT	Deepgram, AssemblyAI, Azure, Google
LLM	OpenAI, Anthropic, Groq, custom
TTS	ElevenLabs, Deepgram, PlayHT, Azure
Telephony	Twilio, SIP trunking, BYOC

Technical Architecture

Vapi acts as an orchestration layer between the voice interface (phone, web, mobile) and the AI backend. This architecture provides flexibility but adds a layer compared to direct API access.

┌─────────────────────────────────────────────────┐
│                 User Interface                   │
│     Phone | Web Widget | Mobile SDK              │
├─────────────────────────────────────────────────┤
│                 Vapi Platform                    │
│  ┌───────────────────────────────────────────┐  │
│  │         Orchestration Layer               │  │
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐     │  │
│  │  │ OpenAI  │ │ Trad.   │ │ Custom  │     │  │
│  │  │Realtime │ │Pipeline │ │ Config  │     │  │
│  │  └─────────┘ └─────────┘ └─────────┘     │  │
│  └───────────────────────────────────────────┘  │
├─────────────────────────────────────────────────┤
│              Provider Integrations               │
│   STT: Deepgram, AssemblyAI, Azure              │
│   LLM: OpenAI, Anthropic, Groq                  │
│   TTS: ElevenLabs, Deepgram, PlayHT             │
└─────────────────────────────────────────────────┘

Strengths

Provider flexibility — Mix and match STT, LLM, TTS providers; not locked to one vendor
OpenAI Realtime support — Native integration with latest speech-to-speech API
Phone-first — Strong telephony integration (SIP, Twilio, phone numbers)
Developer-focused — Well-documented APIs and SDKs
Well-funded — $25M+ from Bessemer, Y Combinator
Cost optimization — Choose cheaper providers per component
Web + phone — Same agent for web widget and phone calls

Cautions

Hidden costs — Base $0.05/min plus all provider costs can add up to $0.33/min
Orchestration overhead — Additional latency vs direct API access
Complexity — More configuration than end-to-end platforms
Developer-heavy — Requires technical expertise; not suitable for no-code users
Middleware dependency — Another vendor layer between you and providers
Limited testing tools — Less built-in testing compared to Retell AI

Pricing & Licensing

Vapi uses a base platform fee plus pass-through provider costs:

Component	Cost
Platform Fee	$0.05/minute
STT	Varies by provider (~$0.01-0.02/min)
LLM	Varies by model (~$0.01-0.08/min)
TTS	Varies by provider (~$0.01-0.04/min)
Telephony	~$0.015/min (Twilio)

Total typical cost: $0.10-0.35/minute depending on configuration.

OpenAI Realtime mode: ~$0.50/minute (includes OpenAI's premium pricing).

Competitive Positioning

Direct Competitors

Competitor	Differentiation
Retell AI	Retell is simpler with lower base pricing ($0.07/min); Vapi has more provider flexibility
ElevenLabs	ElevenLabs is end-to-end with best voices; Vapi orchestrates multiple providers
LiveKit Agents	LiveKit is open-source framework; Vapi is managed platform with more telephony
OpenAI Realtime	OpenAI is direct access; Vapi adds orchestration and phone integration

When to Choose Vapi

Choose Vapi when: You want provider flexibility, strong phone integration, or mix of Realtime and traditional
Choose Retell when: You want simpler setup and predictable pricing
Choose ElevenLabs when: Voice quality is paramount and you want all-in-one
Choose LiveKit when: You want open-source control

Ideal Customer Profile

Best fit:

Developers building phone-first voice agents
Teams wanting to mix providers for cost/quality optimization
Organizations experimenting with different voice AI approaches
Call center automation projects
Outbound sales and appointment setting

Poor fit:

Non-technical teams (requires developer expertise)
Simple use cases (over-engineered for basic needs)
Teams wanting simplest possible integration
Organizations preferring single-vendor solutions

Viability Assessment

Factor	Assessment
Financial Health	Strong — $25M+ raised from top-tier investors
Market Position	Growing — Popular in developer community
Innovation Pace	Good — Quick to adopt OpenAI Realtime
Ecosystem	Moderate — Good docs, growing integrations
Long-term Outlook	Positive — Well-positioned in voice AI infrastructure

Bottom Line

Vapi is the platform for developers who want flexibility in building voice agents. The ability to use OpenAI Realtime for speech-to-speech OR traditional STT+LLM+TTS pipelines with provider choice makes it uniquely flexible. The strong phone integration (SIP, Twilio) makes it particularly suited for call center and telephony applications.

The trade-off is complexity and potentially higher total costs than simpler alternatives like Retell AI. The orchestration layer adds value for sophisticated use cases but may be overkill for straightforward applications. For technical teams wanting maximum flexibility and phone-first voice AI, Vapi is an excellent choice.

Recommended for: Developers building phone-first voice agents who want provider flexibility and don't mind configuration complexity.

Not recommended for: Non-technical teams, simple use cases, or those wanting the simplest possible integration path.

Research by Ry Walker Research

Sources