← Back to research
·8 min read·company

Deepgram Aura

Deepgram Aura is an enterprise-grade text-to-speech API optimized for voice AI applications, offering sub-200ms latency, domain-tuned pronunciation, and flexible deployment including on-premise.

Key takeaways

  • Enterprise-focused TTS with domain-tuned pronunciation for healthcare, finance, and legal terminology
  • Sub-200ms latency optimized for realtime voice AI agent integration
  • Flexible deployment options including cloud, private cloud, and on-premise for data sovereignty

FAQ

What is Deepgram Aura?

Deepgram Aura is an enterprise text-to-speech API designed for voice AI applications, offering fast latency, domain-specific pronunciation, and flexible deployment options.

How much does Deepgram Aura cost?

Aura-2 costs $0.030 per 1,000 characters ($0.027 at Growth tier). Deepgram offers $200 free credit to start.

Is Deepgram Aura a full voice agent platform?

No, Aura is a TTS building block. Deepgram also offers Voice Agent API for end-to-end voice agents, but Aura itself is focused on text-to-speech.

Executive Summary

Deepgram Aura is an enterprise-grade text-to-speech API designed for voice AI applications. Unlike consumer-focused TTS, Aura-2 is optimized for professional contexts with domain-tuned pronunciation, sub-200ms latency, and flexible deployment including on-premise. Deepgram raised $130M at a $1.3B valuation in January 2026, validating their position in enterprise speech AI. As of June 2026, Aura-2 remains the current model — there is no Aura-3 — but it expanded beyond English and Spanish in January 2026 with Dutch, French, German, Italian, and Japanese support. Deepgram claims over 200,000 developers on its platform.

AttributeValue
CompanyDeepgram
Founded2015
Valuation$1.3B (January 2026)
Total Funding$230M+
ModelAura-2 (current as of June 2026)
Languages7 (EN, ES, NL, FR, DE, IT, JA)
FocusEnterprise TTS for voice AI

Product Overview

Deepgram built its reputation on speech-to-text and expanded into TTS with Aura. The Aura-2 model (April 2025) represents a significant upgrade, focusing on enterprise requirements: accurate pronunciation of industry terminology, low latency for realtime applications, and deployment flexibility. In January 2026, Aura-2 added Dutch, French, German, Italian, and Japanese, joining the existing English and Spanish models.

Important distinction: Aura is a TTS building block, not a complete voice agent platform. Deepgram also offers the Voice Agent API — generally available since 2025 — a single voice-to-voice interface combining Nova-3 STT, LLM orchestration, and Aura-2 TTS, with bring-your-own LLM and TTS options. This profile focuses on Aura as the TTS component commonly used in voice AI stacks.

Key Capabilities

CapabilityDescription
Sub-200ms LatencyOptimized for realtime voice AI
Domain PronunciationHealthcare, finance, legal terminology
40+ English VoicesLocalized accents (US, Australian, Philippine) plus 10 Spanish voices
7 LanguagesEnglish, Spanish, Dutch, French, German, Italian, Japanese
StreamingReal-time audio streaming over WebSocket
On-PremiseSelf-hosted deployment option
PromptingContext-aware delivery adjustment

Voices

VoiceAccentGender
ThaliaUS EnglishFeminine
OdysseusUS EnglishMasculine
HarmoniaUS EnglishFeminine
TheiaAustralian EnglishFeminine
ApolloUS EnglishMasculine
LunaUS EnglishFeminine
40+ moreVariousVarious

Technical Architecture

Deepgram Aura runs on the Deepgram Enterprise Runtime (DER), the same infrastructure powering their STT. This unified architecture means improvements in speech recognition automatically enhance TTS through shared learning.

┌─────────────────────────────────────────────────┐
│         Deepgram Enterprise Runtime             │
├─────────────────────────────────────────────────┤
│  ┌───────────────────┐ ┌───────────────────────┐│
│  │  Speech-to-Text   │ │    Aura TTS           ││
│  │  (Nova-2)         │ │    (Aura-2)           ││
│  └─────────┬─────────┘ └───────────┬───────────┘│
│            │   Cross-model         │            │
│            │   Learning            │            │
│            └───────────────────────┘            │
├─────────────────────────────────────────────────┤
│        Deployment Options                        │
│   Cloud | Private Cloud | On-Premise            │
└─────────────────────────────────────────────────┘

Integration Points

IntegrationDescription
REST APIStandard HTTP requests
WebSocketReal-time streaming
SDKsPython, Node.js, Go
Voice PlatformsLiveKit, Vapi, Retell plugins

Strengths

  • Enterprise focus — Domain-tuned pronunciation for professional contexts
  • Low latency — Sub-200ms for realtime voice AI applications
  • On-premise option — Deploy in your own infrastructure for data sovereignty
  • Unified platform — STT + TTS from same vendor simplifies integration
  • Cost effective — $0.030/1K characters competitive with alternatives
  • Well funded — $1.3B valuation, $230M+ raised
  • Cross-model learning — STT improvements enhance TTS pronunciation

Cautions

  • TTS only — Not a complete voice agent platform; requires additional components
  • Limited voices — 40+ English voices vs ElevenLabs' thousands
  • No voice cloning — Cannot create custom voices from samples
  • Limited languages — 7 languages as of June 2026 vs ElevenLabs' 70+; non-English voice catalogs are thin
  • Less emotional range — Professional focus may limit expressive applications; developers consistently rate ElevenLabs higher on expressiveness
  • Enterprise pricing — Aura-2 doubled per-character price over Aura-1; may be expensive for consumer applications

Pricing & Licensing

Deepgram Aura uses character-based pricing (verified June 2026):

ModelPay-as-you-goGrowth
Aura-2$0.030/1K chars$0.027/1K chars
Aura-1$0.015/1K chars$0.0135/1K chars
EnterpriseCustomCustom

Free credits: $200 to start, no credit card required.

Comparison to alternatives:

  • ElevenLabs: ~$0.18-0.30/1K characters
  • OpenAI TTS: ~$0.015/1K characters
  • Google TTS: ~$0.016/1K characters

Voice Agent API (separate product, billed on websocket connection time): $0.075/min Standard, $0.163/min Advanced, dropping to $0.041–$0.050/min when you bring your own LLM and TTS.


What Developers Say

Developer sentiment tracks the official positioning closely: Deepgram is praised as STT and latency infrastructure, while its TTS is seen as serviceable rather than best-in-class.

On the streaming architecture that makes Aura fast in voice agent stacks:

"Some like Deepgram and ElevenLabs let you stream the LLM text (or chunks per sentence) over their websocket API, making your Voice AI bot really really low latency." — ldenoue, Hacker News, December 2025

The most common criticism is voice quality relative to ElevenLabs:

"Deepgram is really good for diorization. And speech to text, but they're not as good as the others for text to speech. ElevenLabs is awesome for speech generation (nothing beats it), but their speech to text is terrible especially for voice activity detection." — schappim, Hacker News, January 2026

In practice, many production voice agent stacks discussed on HN pair Deepgram STT with a different TTS vendor (ElevenLabs, Cartesia, Inworld) — Aura wins when teams want one vendor, on-premise deployment, or the lowest-latency websocket path rather than the most expressive voices.


Competitive Positioning

Direct Competitors (TTS)

CompetitorDifferentiation
ElevenLabsElevenLabs has more voices and voice cloning; Aura has better enterprise features and on-premise
OpenAI TTSOpenAI is cheaper; Aura has domain pronunciation and on-premise
Google Cloud TTSGoogle has more languages; Aura has voice AI optimization
Amazon PollyPolly integrates with AWS; Aura has better latency for realtime

When to Choose Deepgram Aura

  • Choose Aura when: You need enterprise TTS with domain pronunciation, on-premise deployment, or are already using Deepgram STT
  • Choose ElevenLabs when: Voice variety and quality are paramount
  • Choose OpenAI TTS when: Cost is primary concern for simple use cases
  • Choose full platforms when: You want end-to-end voice agents

Ideal Customer Profile

Best fit:

  • Enterprise voice AI applications needing domain pronunciation
  • Organizations requiring on-premise TTS deployment
  • Teams already using Deepgram STT wanting unified stack
  • Healthcare, finance, legal applications with terminology requirements
  • High-volume applications needing cost-effective TTS

Poor fit:

  • Applications requiring extensive voice variety
  • Consumer entertainment applications
  • Teams needing voice cloning
  • Non-English-primary applications

Viability Assessment

FactorAssessment
Financial HealthStrong — $1.3B valuation, $230M+ raised
Market PositionLeader in enterprise speech AI; 200K+ developers claimed as of June 2026
Innovation PaceGood — Aura-2 in 2025, 5 new languages January 2026
EcosystemStrong — Integrations with major voice platforms
Long-term OutlookPositive — Clear enterprise niche

Deepgram's $130M raise at $1.3B valuation in January 2026 validates their enterprise speech AI strategy. The acquisition of a YC startup shows commitment to continued expansion.


Bottom Line

Deepgram Aura is the best TTS choice for enterprise voice AI applications requiring domain-specific pronunciation, low latency, and flexible deployment including on-premise. The unified STT+TTS platform simplifies integration for teams building voice AI stacks.

The trade-off is that Aura is a building block, not a complete platform. You'll need to integrate it with STT and LLM components (though Deepgram offers those too). For applications requiring voice variety or cloning, ElevenLabs is better. For complete voice agent platforms, consider Vapi or Retell that use Aura under the hood.

Recommended for: Enterprise voice AI applications needing domain pronunciation, on-premise deployment, or unified STT+TTS from one vendor.

Not recommended for: Applications requiring extensive voice variety, voice cloning, or teams wanting complete managed voice agent platforms.

Outlook: Positive. The January 2026 $130M raise, the language expansion to seven languages, and a GA Voice Agent API show Deepgram converting its STT lead into a full voice stack. The open question is whether Aura's voice quality closes the gap with ElevenLabs and Cartesia faster than those vendors close the latency and enterprise-deployment gap with Deepgram.


Research by Ry Walker Research