Key takeaways
- Native Bedrock integration provides enterprise-grade security, compliance, and unified billing for AWS customers
- Industry-leading price performance with unified speech understanding and generation in a single model
- Nova 2 Sonic upgrade adds polyglot voices across seven languages, asynchronous tool calling, and a 1M token context window
FAQ
What is AWS Nova 2 Sonic?
Amazon Nova Sonic is a speech-to-speech foundation model that unifies speech understanding and generation to enable human-like voice conversations in AI applications, available through Amazon Bedrock.
How much does Nova 2 Sonic cost?
Nova 2 Sonic costs $3 per million speech input tokens and $12 per million speech output tokens ($0.003/$0.012 per 1K), roughly $0.015/min estimated — approximately 80% cheaper than OpenAI's GPT-4o Realtime.
How does Nova 2 Sonic differ from Nova Sonic?
Nova 2 Sonic (December 2025) adds polyglot voices, seven-language support including Portuguese and Hindi, asynchronous tool calling, cross-modal voice/text switching, and a 1M token context window.
Executive Summary
Amazon Nova Sonic is AWS's native speech-to-speech foundation model, available through Amazon Bedrock. It unifies speech understanding and generation into a single model, enabling human-like voice conversations with low latency and industry-leading price performance. Nova 2 Sonic (announced at re:Invent on December 2, 2025) adds polyglot voices across seven languages, asynchronous tool calling, and a 1M token context window.[1][2] As of June 2026, Nova 2 Sonic runs in four AWS regions: N. Virginia, Oregon, Stockholm, and Tokyo.[3]
| Attribute | Value |
|---|---|
| Company | Amazon Web Services |
| Launched | April 2025 (Nova Sonic), December 2025 (Nova 2 Sonic) |
| Platform | Amazon Bedrock (model ID amazon.nova-2-sonic-v1:0) |
| Context Window | 1M tokens, 64K max output (Nova 2 Sonic) |
| Regions | us-east-1, us-west-2, eu-north-1, ap-northeast-1 (June 2026) |
Product Overview
Nova Sonic is part of Amazon's Nova family of foundation models, designed specifically for conversational AI applications. The model handles speech understanding and generation in a unified architecture, eliminating the latency and complexity of traditional ASR→LLM→TTS pipelines.[4]
Key applications include customer service automation, virtual assistants, and interactive voice response (IVR) systems.
Key Capabilities
| Capability | Description |
|---|---|
| Unified Speech Model | Single model for understanding and generation |
| Low Latency | Optimized for real-time conversational AI |
| Polyglot Voices | Single voice (e.g., Tiffany) speaks all seven languages with mid-sentence code-switching (Nova 2)[2] |
| Seven Languages | English, French, Italian, German, Spanish, plus Portuguese and Hindi added in Nova 2[2] |
| 1M Context Window | Extended conversations without context loss |
| Async Tool Calling | Keeps responding to the user while tools run in the background (Nova 2)[2] |
| Cross-Modal Sessions | Switch between voice and text input mid-session; supports DTMF keypad tones for IVR (Nova 2)[2] |
| Bedrock Integration | Native AWS security, compliance, billing |
Product Editions
| Edition | Description | Availability |
|---|---|---|
| Nova Sonic | Base speech-to-speech model | GA (April 2025) |
| Nova 2 Sonic | Polyglot voices, 7 languages, async tools, 1M context | GA (December 2025) |
Technical Architecture
Nova Sonic operates within the Amazon Bedrock infrastructure, providing enterprise-grade security and scalability. The model receives audio input and generates audio output directly, with optional text output for transcription and tool calling.
┌──────────────────────────────────────────────┐
│ Amazon Bedrock │
├──────────────────────────────────────────────┤
│ ┌────────────────────────────────────────┐ │
│ │ Nova Sonic Model │ │
│ │ ┌──────────┐ ┌──────────────────┐ │ │
│ │ │ Speech │ │ Speech │ │ │
│ │ │ Input │ → │ Output │ │ │
│ │ └──────────┘ └──────────────────┘ │ │
│ │ ↓ ↑ │ │
│ │ ┌──────────────────────────────────┐│ │
│ │ │ Text Tokens (tools, history) ││ │
│ │ └──────────────────────────────────┘│ │
│ └────────────────────────────────────────┘ │
├──────────────────────────────────────────────┤
│ IAM, VPC, CloudWatch, Cost Management │
└──────────────────────────────────────────────┘
Integration Options
| Integration | Description |
|---|---|
| Bedrock API | Direct API access with IAM authentication |
| Vonage | Telephony integration for voice calls |
| Connect | Amazon Connect contact center integration |
| Lex | Conversational interface integration |
Strengths
- Bedrock-native — Enterprise security, IAM, VPC, compliance certifications included
- Unified billing — Single AWS bill with existing enterprise agreements
- Price performance — Industry-leading cost efficiency for speech-to-speech
- 1M context window — Long conversations without context loss (Nova 2)
- Polyglot support — Multiple languages and accents in single model
- Tool invocation — Native function calling for task completion
- AWS ecosystem — Integrates with Connect, Lex, Lambda, and other AWS services
Cautions
- AWS lock-in — Tightly coupled to Bedrock; no self-hosting or multi-cloud
- Regional availability — Four regions as of June 2026 (N. Virginia, Oregon, Stockholm, Tokyo); no cross-region inference for Nova 2 Sonic[3]
- Newer entrant — Less mature than OpenAI Realtime API; smaller ecosystem
- Documentation gaps — Less community content and examples than competitors
- Limited voices — Fewer voice options than ElevenLabs
- No MCP support — Uses AWS-native tool calling, not open MCP protocol
Pricing & Licensing
Nova 2 Sonic is priced through Amazon Bedrock with token-based billing. Rates as of June 2026:[5][6]
| Component | Rate |
|---|---|
| Speech Input | $3.00 / 1M tokens ($0.003 / 1K) |
| Speech Output | $12.00 / 1M tokens ($0.012 / 1K) |
| Text Input | $0.33 / 1M tokens |
| Text Output | $2.75 / 1M tokens |
Nova 2 Sonic's speech rates undercut the original Nova Sonic ($3.40/1M input, $13.60/1M output), though text token rates are higher than Sonic v1's.[6]
Estimated per-minute cost: ~$0.015/min (speech I/O combined, estimate)
Cost comparison:
- ~80% cheaper than OpenAI's GPT-4o Realtime per AWS and third-party analysis[6]
- Competitive with Retell AI (~$0.07/min + provider costs)
- More expensive than Deepgram Aura TTS (but Nova is full S2S)
Service Tiers: Nova 2 Sonic supports only Bedrock's Standard pay-per-token tier as of June 2026 — Priority, Flex, and Reserved tiers are not supported for this model.[3]
What Developers Say
Community discussion of Nova Sonic remains thin compared to OpenAI Realtime — most published material is AWS's own blogs and samples. The Hacker News commentary that exists is measured rather than enthusiastic.
On capability, one developer who built a post-sales call assistant wrote: "Amazon has a commercial Speech-to-Text model (Nova Sonic) that is passable. I used it to create a post-sales call assistant and was surprised that the underlying model was able to do a bunch of stuff I thought I was going to have to use Claude for." — coredog64, Hacker News, August 2025[7]
On developer experience, the same engineer later cautioned: "Having done some implementations it's trickier than you might like (e.g. the Python library for Sonic had problems with echoes and we had to use the Java library)" — coredog64, Hacker News, May 2026[8]
No substantial Reddit threads on Nova 2 Sonic surfaced as of June 2026 — a signal that grassroots adoption still trails the AWS enterprise channel.
Competitive Positioning
Direct Competitors
| Competitor | Differentiation |
|---|---|
| OpenAI Realtime API | OpenAI has better instruction following and MCP support; Nova Sonic has Bedrock integration and potentially lower costs |
| ElevenLabs | ElevenLabs has superior voice quality and variety; Nova Sonic has enterprise AWS integration |
| Deepgram | Deepgram excels at STT+TTS building blocks; Nova Sonic is end-to-end speech-to-speech |
When to Choose AWS Nova 2 Sonic
- Choose Nova Sonic when: You're already on AWS, need Bedrock compliance, or want unified billing
- Choose OpenAI Realtime when: You need best-in-class instruction following and MCP
- Choose ElevenLabs when: Voice quality and variety are paramount
- Choose LiveKit when: You want open-source flexibility
Ideal Customer Profile
Best fit:
- AWS-native enterprises with existing Bedrock usage
- Organizations requiring AWS compliance certifications (HIPAA, SOC2, FedRAMP)
- Teams wanting unified AWS billing and IAM
- Contact centers using Amazon Connect
- Applications needing long context (1M tokens)
Poor fit:
- Multi-cloud organizations avoiding AWS lock-in
- Teams needing extensive voice variety
- Startups without AWS enterprise agreements
- Use cases requiring MCP protocol support
Viability Assessment
| Factor | Assessment |
|---|---|
| Financial Health | Excellent — AWS is one of the most profitable businesses globally |
| Market Position | Growing — Strong AWS enterprise base, newer in voice AI; community adoption still thin as of June 2026 |
| Innovation Pace | Rapid — Nova 2 Sonic released within 8 months of Nova Sonic; four regions by mid-2026 |
| Ecosystem | Extensive — Deep AWS service integration |
| Long-term Outlook | Positive — Core to AWS's AI strategy |
Bottom Line
AWS Nova 2 Sonic is the natural choice for AWS-native enterprises building voice AI applications. The Bedrock integration provides enterprise-grade security, compliance, and unified billing that's difficult to replicate with third-party providers. Nova 2 Sonic's 1M context window, asynchronous tool calling, and polyglot voices across seven languages make it competitive for complex, multilingual applications — at speech rates ($3/$12 per 1M tokens) that undercut both its predecessor and OpenAI's GPT-4o Realtime.[2][6]
The trade-off is AWS lock-in and a smaller voice AI ecosystem compared to dedicated providers like ElevenLabs or the OpenAI Realtime API. For organizations already committed to AWS, Nova Sonic reduces complexity. For multi-cloud or voice-quality-focused teams, alternatives may be better fits.
Recommended for: AWS-native enterprises needing compliant, scalable voice AI with unified billing.
Not recommended for: Multi-cloud organizations, teams requiring extensive voice variety, or those prioritizing MCP protocol support.
Outlook: Positive. AWS shipped a major version, two new languages, async tooling, and three additional regions within fourteen months of launch. The open question is grassroots traction — developer SDKs still draw complaints and community discussion remains sparse relative to OpenAI Realtime.
Research by Ry Walker Research
Sources
- [1] Amazon Nova Models Overview
- [2] Introducing Amazon Nova 2 Sonic (AWS News Blog)
- [3] Nova 2 Sonic Model Card — Amazon Bedrock Docs
- [4] Introducing Amazon Nova Sonic Blog
- [5] Amazon Bedrock Pricing
- [6] The Batch: Nova 2 Family Boosts Cost-Effective Performance
- [7] Hacker News comment on Nova Sonic (Aug 2025)
- [8] Hacker News comment on Nova Sonic implementation (May 2026)
- [9] Amazon Nova Sonic Announcement