Key takeaways
- Enterprise-focused TTS with domain-tuned pronunciation for healthcare, finance, and legal terminology
- Sub-200ms latency optimized for realtime voice AI agent integration
- Flexible deployment options including cloud, private cloud, and on-premise for data sovereignty
FAQ
What is Deepgram Aura?
Deepgram Aura is an enterprise text-to-speech API designed for voice AI applications, offering fast latency, domain-specific pronunciation, and flexible deployment options.
How much does Deepgram Aura cost?
Aura-2 costs $0.030 per 1,000 characters ($0.027 at Growth tier). Deepgram offers $200 free credit to start.
Is Deepgram Aura a full voice agent platform?
No, Aura is a TTS building block. Deepgram also offers Voice Agent API for end-to-end voice agents, but Aura itself is focused on text-to-speech.
Executive Summary
Deepgram Aura is an enterprise-grade text-to-speech API designed for voice AI applications. Unlike consumer-focused TTS, Aura-2 is optimized for professional contexts with domain-tuned pronunciation, sub-200ms latency, and flexible deployment including on-premise. Deepgram raised $130M at a $1.3B valuation in January 2026, validating their position in enterprise speech AI. As of June 2026, Aura-2 remains the current model — there is no Aura-3 — but it expanded beyond English and Spanish in January 2026 with Dutch, French, German, Italian, and Japanese support. Deepgram claims over 200,000 developers on its platform.
| Attribute | Value |
|---|---|
| Company | Deepgram |
| Founded | 2015 |
| Valuation | $1.3B (January 2026) |
| Total Funding | $230M+ |
| Model | Aura-2 (current as of June 2026) |
| Languages | 7 (EN, ES, NL, FR, DE, IT, JA) |
| Focus | Enterprise TTS for voice AI |
Product Overview
Deepgram built its reputation on speech-to-text and expanded into TTS with Aura. The Aura-2 model (April 2025) represents a significant upgrade, focusing on enterprise requirements: accurate pronunciation of industry terminology, low latency for realtime applications, and deployment flexibility. In January 2026, Aura-2 added Dutch, French, German, Italian, and Japanese, joining the existing English and Spanish models.
Important distinction: Aura is a TTS building block, not a complete voice agent platform. Deepgram also offers the Voice Agent API — generally available since 2025 — a single voice-to-voice interface combining Nova-3 STT, LLM orchestration, and Aura-2 TTS, with bring-your-own LLM and TTS options. This profile focuses on Aura as the TTS component commonly used in voice AI stacks.
Key Capabilities
| Capability | Description |
|---|---|
| Sub-200ms Latency | Optimized for realtime voice AI |
| Domain Pronunciation | Healthcare, finance, legal terminology |
| 40+ English Voices | Localized accents (US, Australian, Philippine) plus 10 Spanish voices |
| 7 Languages | English, Spanish, Dutch, French, German, Italian, Japanese |
| Streaming | Real-time audio streaming over WebSocket |
| On-Premise | Self-hosted deployment option |
| Prompting | Context-aware delivery adjustment |
Voices
| Voice | Accent | Gender |
|---|---|---|
| Thalia | US English | Feminine |
| Odysseus | US English | Masculine |
| Harmonia | US English | Feminine |
| Theia | Australian English | Feminine |
| Apollo | US English | Masculine |
| Luna | US English | Feminine |
| 40+ more | Various | Various |
Technical Architecture
Deepgram Aura runs on the Deepgram Enterprise Runtime (DER), the same infrastructure powering their STT. This unified architecture means improvements in speech recognition automatically enhance TTS through shared learning.
┌─────────────────────────────────────────────────┐
│ Deepgram Enterprise Runtime │
├─────────────────────────────────────────────────┤
│ ┌───────────────────┐ ┌───────────────────────┐│
│ │ Speech-to-Text │ │ Aura TTS ││
│ │ (Nova-2) │ │ (Aura-2) ││
│ └─────────┬─────────┘ └───────────┬───────────┘│
│ │ Cross-model │ │
│ │ Learning │ │
│ └───────────────────────┘ │
├─────────────────────────────────────────────────┤
│ Deployment Options │
│ Cloud | Private Cloud | On-Premise │
└─────────────────────────────────────────────────┘
Integration Points
| Integration | Description |
|---|---|
| REST API | Standard HTTP requests |
| WebSocket | Real-time streaming |
| SDKs | Python, Node.js, Go |
| Voice Platforms | LiveKit, Vapi, Retell plugins |
Strengths
- Enterprise focus — Domain-tuned pronunciation for professional contexts
- Low latency — Sub-200ms for realtime voice AI applications
- On-premise option — Deploy in your own infrastructure for data sovereignty
- Unified platform — STT + TTS from same vendor simplifies integration
- Cost effective — $0.030/1K characters competitive with alternatives
- Well funded — $1.3B valuation, $230M+ raised
- Cross-model learning — STT improvements enhance TTS pronunciation
Cautions
- TTS only — Not a complete voice agent platform; requires additional components
- Limited voices — 40+ English voices vs ElevenLabs' thousands
- No voice cloning — Cannot create custom voices from samples
- Limited languages — 7 languages as of June 2026 vs ElevenLabs' 70+; non-English voice catalogs are thin
- Less emotional range — Professional focus may limit expressive applications; developers consistently rate ElevenLabs higher on expressiveness
- Enterprise pricing — Aura-2 doubled per-character price over Aura-1; may be expensive for consumer applications
Pricing & Licensing
Deepgram Aura uses character-based pricing (verified June 2026):
| Model | Pay-as-you-go | Growth |
|---|---|---|
| Aura-2 | $0.030/1K chars | $0.027/1K chars |
| Aura-1 | $0.015/1K chars | $0.0135/1K chars |
| Enterprise | Custom | Custom |
Free credits: $200 to start, no credit card required.
Comparison to alternatives:
- ElevenLabs: ~$0.18-0.30/1K characters
- OpenAI TTS: ~$0.015/1K characters
- Google TTS: ~$0.016/1K characters
Voice Agent API (separate product, billed on websocket connection time): $0.075/min Standard, $0.163/min Advanced, dropping to $0.041–$0.050/min when you bring your own LLM and TTS.
What Developers Say
Developer sentiment tracks the official positioning closely: Deepgram is praised as STT and latency infrastructure, while its TTS is seen as serviceable rather than best-in-class.
On the streaming architecture that makes Aura fast in voice agent stacks:
"Some like Deepgram and ElevenLabs let you stream the LLM text (or chunks per sentence) over their websocket API, making your Voice AI bot really really low latency." — ldenoue, Hacker News, December 2025
The most common criticism is voice quality relative to ElevenLabs:
"Deepgram is really good for diorization. And speech to text, but they're not as good as the others for text to speech. ElevenLabs is awesome for speech generation (nothing beats it), but their speech to text is terrible especially for voice activity detection." — schappim, Hacker News, January 2026
In practice, many production voice agent stacks discussed on HN pair Deepgram STT with a different TTS vendor (ElevenLabs, Cartesia, Inworld) — Aura wins when teams want one vendor, on-premise deployment, or the lowest-latency websocket path rather than the most expressive voices.
Competitive Positioning
Direct Competitors (TTS)
| Competitor | Differentiation |
|---|---|
| ElevenLabs | ElevenLabs has more voices and voice cloning; Aura has better enterprise features and on-premise |
| OpenAI TTS | OpenAI is cheaper; Aura has domain pronunciation and on-premise |
| Google Cloud TTS | Google has more languages; Aura has voice AI optimization |
| Amazon Polly | Polly integrates with AWS; Aura has better latency for realtime |
When to Choose Deepgram Aura
- Choose Aura when: You need enterprise TTS with domain pronunciation, on-premise deployment, or are already using Deepgram STT
- Choose ElevenLabs when: Voice variety and quality are paramount
- Choose OpenAI TTS when: Cost is primary concern for simple use cases
- Choose full platforms when: You want end-to-end voice agents
Ideal Customer Profile
Best fit:
- Enterprise voice AI applications needing domain pronunciation
- Organizations requiring on-premise TTS deployment
- Teams already using Deepgram STT wanting unified stack
- Healthcare, finance, legal applications with terminology requirements
- High-volume applications needing cost-effective TTS
Poor fit:
- Applications requiring extensive voice variety
- Consumer entertainment applications
- Teams needing voice cloning
- Non-English-primary applications
Viability Assessment
| Factor | Assessment |
|---|---|
| Financial Health | Strong — $1.3B valuation, $230M+ raised |
| Market Position | Leader in enterprise speech AI; 200K+ developers claimed as of June 2026 |
| Innovation Pace | Good — Aura-2 in 2025, 5 new languages January 2026 |
| Ecosystem | Strong — Integrations with major voice platforms |
| Long-term Outlook | Positive — Clear enterprise niche |
Deepgram's $130M raise at $1.3B valuation in January 2026 validates their enterprise speech AI strategy. The acquisition of a YC startup shows commitment to continued expansion.
Bottom Line
Deepgram Aura is the best TTS choice for enterprise voice AI applications requiring domain-specific pronunciation, low latency, and flexible deployment including on-premise. The unified STT+TTS platform simplifies integration for teams building voice AI stacks.
The trade-off is that Aura is a building block, not a complete platform. You'll need to integrate it with STT and LLM components (though Deepgram offers those too). For applications requiring voice variety or cloning, ElevenLabs is better. For complete voice agent platforms, consider Vapi or Retell that use Aura under the hood.
Recommended for: Enterprise voice AI applications needing domain pronunciation, on-premise deployment, or unified STT+TTS from one vendor.
Not recommended for: Applications requiring extensive voice variety, voice cloning, or teams wanting complete managed voice agent platforms.
Outlook: Positive. The January 2026 $130M raise, the language expansion to seven languages, and a GA Voice Agent API show Deepgram converting its STT lead into a full voice stack. The open question is whether Aura's voice quality closes the gap with ElevenLabs and Cartesia faster than those vendors close the latency and enterprise-deployment gap with Deepgram.
Research by Ry Walker Research
Sources
- [1] Deepgram Text-to-Speech Product Page
- [2] Deepgram Pricing
- [3] Deepgram Aura-2 Announcement (Deepgram Blog)
- [4] Deepgram $130M Funding
- [5] Aura-2 Now Speaks Dutch, French, German, Italian, and Japanese
- [6] Deepgram Launches Voice Agent API
- [7] Hacker News comment on Deepgram TTS vs ElevenLabs (schappim)
- [8] Hacker News comment on Deepgram websocket streaming latency (ldenoue)