← Back to research
·6 min read·company

LiveKit Agents

LiveKit Agents is an open-source framework for building realtime voice, video, and physical AI agents with provider flexibility, multi-agent handoff, and self-hosting options.

Key takeaways

  • Open-source framework with full provider flexibility — mix OpenAI, Deepgram, ElevenLabs, or any STT/LLM/TTS
  • $122.5M+ raised including $100M in January 2026 led by Index Ventures, with OpenAI as a customer
  • Self-hostable with optional LiveKit Cloud for managed infrastructure at $0.004/min audio

FAQ

What is LiveKit Agents?

LiveKit Agents is an open-source framework for building realtime AI agents that can see, hear, and speak, with flexible provider integration and optional managed cloud infrastructure.

How much does LiveKit cost?

Self-hosting is free (open source). LiveKit Cloud charges $0.004/minute for audio-only, $0.015/minute for video, plus provider costs for STT/LLM/TTS.

Can I self-host LiveKit?

Yes, LiveKit is fully open source under Apache 2.0. You can deploy on your own infrastructure with no licensing fees.

Executive Summary

LiveKit Agents is an open-source framework for building realtime voice, video, and physical AI agents. Unlike managed platforms like Vapi or ElevenLabs, LiveKit gives developers full control over provider selection and deployment. The company raised $100M in January 2026, bringing total funding to $122.5M+, and counts OpenAI among its customers.

AttributeValue
CompanyLiveKit
Founded2021
Funding$122.5M+
InvestorsIndex Ventures, Salesforce, Altimeter
LicenseApache 2.0 (open source)
Notable CustomerOpenAI

Product Overview

LiveKit started as realtime video/audio infrastructure and expanded into AI agents with the Agents framework. The framework allows developers to build AI-driven applications that can see (video), hear (audio), and speak (TTS) in realtime, with full flexibility in choosing AI providers.

The platform supports both managed cloud deployment (LiveKit Cloud) and self-hosting, making it attractive to teams wanting control without building infrastructure from scratch.

Key Capabilities

CapabilityDescription
Voice AgentsBuild conversational AI with any STT/LLM/TTS
Video AgentsAdd vision capabilities to agents
Provider FlexibilityOpenAI, Deepgram, ElevenLabs, AssemblyAI, etc.
Multi-AgentHandoff between specialized agents
SIP IntegrationConnect to phone networks
Self-HostingDeploy on your own infrastructure

Framework Components

ComponentDescription
Python SDKpip install livekit-agents
TypeScript SDKNode.js agent development
PluginsPre-built integrations (OpenAI, Deepgram, etc.)
LiveKit CloudOptional managed infrastructure

Technical Architecture

LiveKit Agents runs as a server-side process that connects to LiveKit's realtime infrastructure (cloud or self-hosted). Agents can use any combination of AI providers through plugins.

┌─────────────────────────────────────────────────┐
│              Client Applications                 │
│       Web | Mobile | Phone (SIP)                │
├─────────────────────────────────────────────────┤
│         LiveKit Infrastructure                   │
│    (Cloud or Self-Hosted)                       │
│  ┌───────────────────────────────────────────┐  │
│  │         Realtime Media Routing            │  │
│  │    Audio/Video Streams ↔ Agents           │  │
│  └───────────────────────────────────────────┘  │
├─────────────────────────────────────────────────┤
│              LiveKit Agents                      │
│  ┌─────────────────────────────────────────────┐│
│  │         Agent Process                       ││
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐       ││
│  │  │   STT   │ │   LLM   │ │   TTS   │       ││
│  │  │(plugin) │ │(plugin) │ │(plugin) │       ││
│  │  └─────────┘ └─────────┘ └─────────┘       ││
│  │        ↓           ↓           ↓           ││
│  │  Deepgram    OpenAI      ElevenLabs        ││
│  │  AssemblyAI  Anthropic   Cartesia          ││
│  │  Whisper     Groq        Rime              ││
│  └─────────────────────────────────────────────┘│
└─────────────────────────────────────────────────┘

Deployment Options

OptionDescriptionCost
LiveKit CloudManaged infrastructure$0.004-0.015/min
Self-HostedYour own serversInfrastructure only
HybridMix of bothVaries

Strengths

  • Open source — Apache 2.0 license; no vendor lock-in; self-hostable
  • Provider flexibility — Mix any STT, LLM, TTS providers through plugins
  • Strong funding — $122.5M+ from top investors including Index and Salesforce
  • OpenAI customer — Validation from industry leader
  • Multi-modal — Voice, video, and physical agents in one framework
  • Multi-agent — Built-in support for agent handoff and specialization
  • Active community — 6K+ GitHub stars, active Discord

Cautions

  • More complexity — Requires more setup than managed platforms like Vapi
  • DIY responsibility — You manage provider accounts, rate limits, failover
  • No turn-taking model — Relies on provider capabilities for natural conversation
  • Learning curve — Framework concepts require developer investment
  • Cloud costs stack — LiveKit + STT + LLM + TTS costs add up
  • Less no-code — Primarily developer-focused; limited visual tools

Pricing & Licensing

Open Source (Self-Hosted):

  • Framework: Free (Apache 2.0)
  • Infrastructure: Your costs

LiveKit Cloud:

ComponentCost
Audio-only$0.004/minute
Video$0.015/minute
Participant connection$0.0005/minute (decreases with volume)
Egress (recording)Per-minute based on format

Provider costs (in addition to LiveKit):

  • STT: ~$0.006-0.02/minute (Deepgram, AssemblyAI)
  • LLM: ~$0.01-0.10/minute (OpenAI, Anthropic)
  • TTS: ~$0.01-0.04/minute (ElevenLabs, Deepgram)

Typical total: $0.03-0.15/minute depending on providers.


Competitive Positioning

Direct Competitors

CompetitorDifferentiation
VapiVapi is managed with more hand-holding; LiveKit is framework with more control
Retell AIRetell is simpler and phone-focused; LiveKit supports video and self-hosting
ElevenLabsElevenLabs is end-to-end with best voices; LiveKit lets you choose any provider
Daily, AgoraSimilar infrastructure; LiveKit has stronger AI agent focus

When to Choose LiveKit Agents

  • Choose LiveKit when: You want open-source control, provider flexibility, or need self-hosting
  • Choose Vapi when: You want managed platform with less setup
  • Choose Retell when: You want phone-focused with no-code option
  • Choose ElevenLabs when: Voice quality is the top priority

Ideal Customer Profile

Best fit:

  • Developer teams wanting full control over voice AI stack
  • Organizations requiring self-hosted deployment
  • Multi-modal applications (voice + video agents)
  • Teams wanting to mix best-of-breed providers
  • Companies avoiding vendor lock-in

Poor fit:

  • Non-technical teams needing no-code solutions
  • Teams wanting fastest path to production
  • Simple phone automation use cases
  • Organizations preferring single-vendor simplicity

Viability Assessment

FactorAssessment
Financial HealthStrong — $122.5M+ raised, top-tier investors
Market PositionGrowing — OpenAI customer, active open-source community
Innovation PaceRapid — Regular framework updates, new plugins
EcosystemExtensive — Many provider plugins, active GitHub
Long-term OutlookVery Positive — Well-funded, open-source moat

LiveKit's combination of strong funding ($122.5M+), open-source model, and validation from OpenAI positions it well for long-term success in the voice AI infrastructure space.


Bottom Line

LiveKit Agents is the best choice for developer teams wanting maximum control over their voice AI stack. The open-source framework with provider flexibility means no vendor lock-in and the ability to mix best-of-breed components. Self-hosting options make it suitable for organizations with data sovereignty requirements.

The trade-off is complexity—more setup than managed platforms, more responsibility for provider management, and a steeper learning curve. For teams with developer resources who want control, it's excellent. For teams wanting the fastest path to production, managed alternatives like Vapi or Retell may be more practical.

Recommended for: Developer teams wanting open-source voice AI with provider flexibility, self-hosting options, and multi-modal capabilities.

Not recommended for: Non-technical teams, those wanting fastest setup, or simple phone-only use cases.


Research by Ry Walker Research