← Back to research
·12 min read·company

Pipecat

Pipecat is Daily's open-source Python framework for real-time voice and multimodal conversational agents — vendor-neutral orchestration across ~90 STT/LLM/TTS/transport integrations, 12.7K+ GitHub stars, BSD-2-Clause, with NVIDIA shipping its own extension library. Daily monetizes via Pipecat Cloud at $0.01/min.

Key takeaways

  • One of the two de facto standard open-source voice-agent stacks (alongside LiveKit Agents): 12.7K+ stars, 2.1K+ forks, BSD-2-Clause, roughly 90 integrations, and NVIDIA publishes its own nvidia-pipecat extension library and features Pipecat on build.nvidia.com
  • Hit v1.0.0 on April 14, 2026 after ~2.5 years of 0.0.x releases, then shipped v1.1–v1.3 within six weeks — v1.3.0 (May 29, 2026) added a multi-agent framework where every PipelineWorker becomes a peer on a shared message bus
  • The framework is free and vendor-neutral; Daily monetizes through Pipecat Cloud managed hosting at $0.01/min active ($0.0005/min reserved), with SIP at $0.005/min and PSTN at $0.018/min — model/provider costs are separate

FAQ

What is Pipecat?

Pipecat is an open-source Python framework from Daily for building real-time voice and multimodal conversational agents, orchestrating STT, LLM, TTS, and transport services from dozens of vendors into low-latency pipelines.

How much does Pipecat cost?

The framework is free under BSD-2-Clause and self-hostable anywhere. Daily's managed Pipecat Cloud charges $0.01/min per active agent instance ($0.0005/min reserved), plus $0.005/min for SIP and $0.018/min for PSTN telephony; STT/LLM/TTS provider costs are billed separately.

What models and services does Pipecat support?

Roughly 90 integrations across speech-to-text (Deepgram, AssemblyAI, Whisper), LLMs (OpenAI, Anthropic, Gemini, Groq, Mistral), text-to-speech (ElevenLabs, Cartesia, OpenAI), and transports (Daily WebRTC, LiveKit, Twilio, Telnyx, Vonage, WhatsApp).

How is Pipecat different from LiveKit Agents?

Both are open-source realtime agent frameworks, but Pipecat is transport-neutral — it runs over Daily, LiveKit, Twilio, or plain WebSockets — while LiveKit Agents is built around the LiveKit media server; Pipecat is also BSD-licensed Python with a frame/pipeline architecture.

Executive Summary

Pipecat is an open-source Python framework for building real-time voice and multimodal conversational agents, created and maintained by Daily, the WebRTC infrastructure company, alongside a community of contributors.[1][2] Its pitch is vendor neutrality: rather than locking you to one speech stack, Pipecat orchestrates pipelines across roughly 90 integrations — Deepgram or AssemblyAI for STT, OpenAI, Anthropic, Gemini, Groq, or Mistral for the LLM, ElevenLabs or Cartesia for TTS, and transports spanning Daily WebRTC, LiveKit, Twilio, Telnyx, Vonage, and WhatsApp.[1][3] Launched as a Show HN in May 2024 with the explicit theory of being "LlamaIndex or LangChain for real-time/conversational AI," it has become one of the two stacks HN commenters now call "the 2 major stacks for building voice ai," the other being LiveKit Agents.[4][5]

The repo holds 12.7K+ stars and 2.1K+ forks under a BSD-2-Clause license as of June 2026, with daily commit activity.[6] After roughly two and a half years of 0.0.x releases (the repo dates to December 2023), v1.0.0 landed April 14, 2026, followed by v1.1.0, v1.2.x, and v1.3.0 by May 29, 2026 — the last adding a multi-agent framework, Vonage WebRTC transport, and major cold-start optimizations.[6][7] NVIDIA ships its own nvidia-pipecat extension library integrating Nemotron Speech ASR/TTS and NIM microservices, and features Pipecat on build.nvidia.com.[8][9] Daily's business model wraps the free framework with Pipecat Cloud, a generally available managed hosting service billed per active agent minute.[10][11]

AttributeValue
Company/CreatorDaily (Daily.co engineering team) + Pipecat community[2]
First releaseRepo created December 2023; Show HN May 2024[6][4]
GitHub Stars12.7K+ stars, 2.1K+ forks (June 2026)[6]
LicenseBSD-2-Clause[6]
Maturityv1.0.0 April 14, 2026; v1.3.0 May 29, 2026; 100+ releases[7]
Commercial armPipecat Cloud managed hosting, also sold via AWS Marketplace[12][13]

Product Overview

Pipecat models a voice agent as a pipeline of frame processors: audio frames flow in from a transport, through VAD and speech-to-text, into an LLM, out through text-to-speech, and back to the user — with interruption handling, phrase endpointing, and turn detection managed by the framework.[1][4] Daily CEO Kwindla Hultman Kramer's founding observation was that everyone building conversational AI re-solves the same problems: "low-latency media transport, echo cancellation, voice activity detection, phrase endpointing, pipelining data between models/services, handling voice interruptions, swapping out different models/services."[4]

Install is pip install pipecat-ai (Python 3.11 minimum, 3.12+ recommended); integrations load as optional extras so the base install stays lean.[1][7]

Key Capabilities

CapabilityDescription
Vendor-neutral pipelines~90 integrations across STT, LLM, TTS, video, vision, memory, analytics[3][1]
TransportsDaily WebRTC, LiveKit, Twilio, Telnyx, Vonage, WhatsApp; WebSocket and P2P WebRTC modules[1][12]
Telephony serializersTwilio, Telnyx, Vonage, Genesys protocol serializers for phone-call agents[1]
Multi-agent frameworkv1.3.0 turns every PipelineWorker into a peer on a shared typed-message bus[7]
Smart TurnOpen turn-detection model; v3 vendored its STFT to cut import overhead from ~566MB to ~60MB[7]
Pipecat FlowsStructured-conversation layer for state-machine dialog design[1]
ToolingPipecat CLI, Whisker debugger, Tail terminal dashboard, Voice UI Kit[1]

Product Surfaces

SurfaceDescriptionAvailability
Python frameworkpipecat-ai on PyPI, BSD-2-ClauseGA (v1.x)[7]
Client SDKsJavaScript, React, React Native, Swift, Kotlin, C++, ESP32GA[1]
Pipecat CloudDaily-managed agent hosting with autoscaling and observabilityGA[10]
NVIDIA extensionnvidia-pipecat library for Nemotron ASR/TTS and NIMsGA (March 2026)[8]

Technical Architecture

Pipecat is a cascaded (STT → LLM → TTS) orchestration framework at its core, with speech-to-speech model support as providers ship realtime APIs; the framework's job is the realtime plumbing — media transport, interruption, turn-taking — not the models themselves.[4][1] Agents run as ordinary Python processes, which means deployment is your problem: self-host on any infrastructure, or hand the container to Pipecat Cloud, which runs pipelines on Daily's global infrastructure with automatic scaling, containerized deployment, built-in observability, and Daily WebRTC transport included at no extra cost.[12]

Key Technical Details

AspectDetail
DeploymentSelf-host anywhere, or Pipecat Cloud managed containers (also via AWS Marketplace)[12][13]
Model(s)Bring-your-own across ~90 integrations; no bundled inference[3][1]
LanguagePython 3.11+ server; JS/React/React Native/Swift/Kotlin/C++/ESP32 clients[1]
TelephonyTwilio/Telnyx/Vonage/Genesys serializers; Cloud SIP $0.005/min, PSTN $0.018/min[1][11]
Open SourceBSD-2-Clause, entire framework — not open-core[6]

Strengths

  • The widest vendor-neutral integration surface in the category — roughly 90 integrations spanning every major STT, LLM, TTS, and transport vendor means no single provider can hold your agent hostage; swapping providers is a config change.[3][1]
  • Genuinely open source, permissively licensed — the whole framework is BSD-2-Clause with 2.1K+ forks, not an open-core teaser for a managed product.[6]
  • Ecosystem gravity beyond Daily — NVIDIA maintains its own nvidia-pipecat extension and features Pipecat on its build platform; per Daily's CEO, NVIDIA, AWS, and multiple foundation and voice AI labs use and contribute to the framework.[8][9][3]
  • Fast, substantive release cadence post-1.0 — four minor releases in the six weeks after v1.0.0, including a multi-agent framework and a ~9x reduction in Smart Turn import overhead.[7]
  • Cheap managed path when you want it — Pipecat Cloud's $0.01/min active platform fee with $0.0005/min reserved instances (1/20th active cost) undercuts most managed voice platforms' orchestration fees, with Daily WebRTC included.[11][12]

Cautions

  • Deployment at scale is the named pain point — Pipecat ships a framework, not infrastructure; HN's most pointed criticism is that "the problem with PipeCat and LiveKit... is the deployment at scale," pushing teams toward Pipecat Cloud or significant DevOps work.[5]
  • Cascaded-pipeline architecture is contested — skeptics argue orchestrated STT→LLM→TTS chains look "strictly inferior" next to natively speech-to-speech models; Pipecat's answer is integrating those models too, but the framework's value shrinks if end-to-end models win.[4]
  • No published latency SLA — community benchmarking of Pipecat-vs-LiveKit network performance is still early and inconclusive, so transport-level latency claims rest on Daily's WebRTC reputation rather than public numbers.[14]
  • Python-only server runtime — teams standardized on Node/Go/Rust backends must run Pipecat as a separate Python service; client SDKs are polyglot but the pipeline is not.[1]
  • Vendor-funded neutrality — Daily employs the core team and owns the default transport; the framework is neutral, but the commercial gravity points at Daily WebRTC and Pipecat Cloud.[2][12]
  • 2.5 years to 1.0 — the long 0.0.x run (over 100 releases) meant breaking changes for early adopters; API stability is only weeks old as of June 2026.[7][6]

What Developers Say

Community discussion is real and substantial across multiple HN threads from the May 2024 Show HN through late-2025 architecture debates.[4][5]

"Nice to see an open source implementation, i have been seeing many startups get into this space" — awenix on Hacker News[4]

"The problem with PipeCat and LiveKit (the 2 major stacks for building voice ai) is the deployment at scale." — ldenoue on Hacker News, who built a Cloudflare Workers alternative[5]

"When you compare to a natively multimodal model like GPT-4o it seems strictly inferior." — avarun on Hacker News, on the cascaded-pipeline approach[4]

"Cool stuff. I prefered the experience with lk but i always wonder whats the performance like with pipecat" — focom on Hacker News, in a Pipecat-vs-LiveKit benchmark thread[14]

One caveat: Daily CEO kwindla is an active HN participant — e.g., "Pipecat has 90 or so integrations with all the models/services people use for voice AI these days," with NVIDIA, AWS, and various labs contributing — so some pro-Pipecat framing in threads is vendor voice.[3]


Pricing & Licensing

The framework costs nothing; Pipecat Cloud is metered per agent-instance minute, with provider (STT/LLM/TTS) costs always separate.[6][11]

TierPriceIncludes
Pipecat (OSS)FreeFull framework, BSD-2-Clause, self-host anywhere[6]
Pipecat Cloud — active$0.01/min (agent-1x)Autoscaling, containerized deploys, observability, Daily WebRTC included[11][12]
Pipecat Cloud — reserved$0.0005/minWarm instances at 1/20th active cost to avoid cold starts[11]
Telephony add-onsSIP $0.005/min; PSTN $0.018/min; transfers $0.20/eventBuilt-in dial-in/dial-out[11]

Licensing model: BSD-2-Clause for the entire framework on GitHub — permissive enough for closed-source commercial embedding; Pipecat Cloud is proprietary managed infrastructure, also procurable through AWS Marketplace.[6][13]

Hidden costs: Model and provider fees (STT, LLM, TTS, third-party telephony) dominate real per-minute cost and are billed by each vendor separately; self-hosters carry the full burden of scaling stateful, long-lived realtime processes.[11][5]


Competitive Positioning

Direct Competitors

CompetitorDifferentiation
LiveKit AgentsThe closest peer and the other "major stack"; LiveKit Agents is Apache-2.0 and built around the LiveKit media server (with a $1B-valued cloud behind it), while Pipecat is transport-neutral and runs over Daily, LiveKit, Twilio, or WebSockets[5][1]
VapiClosed managed platform with a $0.05/min orchestration fee and strong telephony focus; Pipecat trades Vapi's turnkey hosting for open-source control and a 5x-cheaper managed option[11]
Retell AIApplication-layer managed voice-agent platform for contact-center use cases; Pipecat sits a layer lower as the framework such platforms could be built on
OpenAI Realtime / speech-to-speech APIsSingle-vendor, natively multimodal; the architectural bet against cascaded frameworks like Pipecat[4]

When to Choose Pipecat Over Alternatives

  • Choose Pipecat when: you want full open-source control of the pipeline, the freedom to swap any STT/LLM/TTS/transport vendor, Python is acceptable server-side, and you'll either self-host or take the cheap managed path.
  • Choose LiveKit Agents when: you are standardizing on LiveKit's media infrastructure end-to-end or need its larger funded ecosystem and built-in inference bundle.
  • Choose Vapi when: you want a fully managed, telephony-first product with no framework code to operate, and the platform fee is acceptable.
  • Choose a speech-to-speech API when: a single vendor's native multimodal model meets your quality bar and vendor lock-in is acceptable.

Ideal Customer Profile

Best fit:

  • Engineering teams building differentiated voice products who need provider flexibility — swapping STT/LLM/TTS vendors as quality and pricing shift
  • Python-native AI teams that want the agent pipeline in code, under version control, with BSD licensing for commercial embedding
  • Enterprises with NVIDIA-stack commitments, given first-party nvidia-pipecat support and Nemotron/NIM integrations[8]
  • Startups that want to prototype free and graduate to $0.01/min managed hosting without changing frameworks[11]

Poor fit:

  • Teams that want a no-code or turnkey voice-agent product rather than a framework
  • Non-Python backend shops unwilling to run a separate Python service
  • Operators without the DevOps capacity to scale stateful realtime processes — unless they accept Pipecat Cloud[5]

Viability Assessment

FactorAssessment
Financial HealthBacked by Daily's WebRTC business; Pipecat-specific revenue and Daily's current financials are not publicly disclosed[2]
Market PositionCo-leader — one of "the 2 major stacks" for open voice AI, with LiveKit Agents as the rival[5]
Innovation PaceHigh — v1.0 to v1.3 in six weeks, multi-agent framework, Smart Turn v3, new transports[7]
Community/EcosystemStrong — 12.7K+ stars, 2.1K+ forks, NVIDIA and AWS contributing, multi-platform client SDKs, active HN presence[6][3]
Long-term OutlookHinges on cascaded pipelines staying relevant against native speech-to-speech models, and on Daily converting framework adoption into Cloud revenue[4][10]

The structural picture is favorable: a permissive license, the category's broadest integration matrix, and third-party ecosystem investment (NVIDIA shipping its own extension library) make Pipecat hard to displace as the neutral substrate for voice agents.[8][1] The two open questions are economic — Daily must monetize a free framework through Pipecat Cloud against LiveKit's $1B-valuation war chest — and architectural, if natively multimodal models compress the pipeline Pipecat exists to orchestrate.[11][4]


Bottom Line

Pipecat is the strongest vendor-neutral foundation for teams that treat voice agents as software they own rather than a platform they rent: fully BSD-licensed, ~90 integrations deep, newly API-stable at v1.0, and validated by NVIDIA building on it. The trade is that you operate a Python realtime service yourself or pay Daily — deployment at scale is the community's loudest complaint, and Pipecat Cloud at $0.01/min is the intended answer.

Recommended for: Python-capable engineering teams that want provider flexibility, open-source control, and a cheap managed escape hatch; NVIDIA-stack enterprises; anyone avoiding voice-platform lock-in.

Not recommended for: Teams wanting turnkey or no-code voice agents, non-Python backends, or operators unwilling to manage (or pay for) stateful realtime infrastructure.

Outlook: Watch whether post-1.0 API stability holds, whether Pipecat Cloud wins meaningful share against LiveKit Cloud and Vapi, and whether native speech-to-speech models erode the cascaded-pipeline category Pipecat leads.


Research by Ry Walker Research • methodology