AWS Nova 2 Sonic | Ry Walker Research

Key takeaways

Native Bedrock integration provides enterprise-grade security, compliance, and unified billing for AWS customers
Industry-leading price performance with unified speech understanding and generation in a single model
Nova 2 Sonic upgrade adds polyglot voices across seven languages, asynchronous tool calling, and a 1M token context window

FAQ

What is AWS Nova 2 Sonic?

Amazon Nova Sonic is a speech-to-speech foundation model that unifies speech understanding and generation to enable human-like voice conversations in AI applications, available through Amazon Bedrock.

How much does Nova 2 Sonic cost?

Nova 2 Sonic costs $3 per million speech input tokens and $12 per million speech output tokens ($0.003/$0.012 per 1K), roughly $0.015/min estimated — approximately 80% cheaper than OpenAI's GPT-4o Realtime.

How does Nova 2 Sonic differ from Nova Sonic?

Nova 2 Sonic (December 2025) adds polyglot voices, seven-language support including Portuguese and Hindi, asynchronous tool calling, cross-modal voice/text switching, and a 1M token context window.

Executive Summary

Amazon Nova Sonic is AWS's native speech-to-speech foundation model, available through Amazon Bedrock. It unifies speech understanding and generation into a single model, enabling human-like voice conversations with low latency and industry-leading price performance. Nova 2 Sonic (announced at re:Invent on December 2, 2025) adds polyglot voices across seven languages, asynchronous tool calling, and a 1M token context window.^[1]^[2] As of June 2026, Nova 2 Sonic runs in four AWS regions: N. Virginia, Oregon, Stockholm, and Tokyo.^[3]

Attribute	Value
Company	Amazon Web Services
Launched	April 2025 (Nova Sonic), December 2025 (Nova 2 Sonic)
Platform	Amazon Bedrock (model ID `amazon.nova-2-sonic-v1:0`)
Context Window	1M tokens, 64K max output (Nova 2 Sonic)
Regions	us-east-1, us-west-2, eu-north-1, ap-northeast-1 (June 2026)

Product Overview

Nova Sonic is part of Amazon's Nova family of foundation models, designed specifically for conversational AI applications. The model handles speech understanding and generation in a unified architecture, eliminating the latency and complexity of traditional ASR→LLM→TTS pipelines.^[4]

Key applications include customer service automation, virtual assistants, and interactive voice response (IVR) systems.

Key Capabilities

Capability	Description
Unified Speech Model	Single model for understanding and generation
Low Latency	Optimized for real-time conversational AI
Polyglot Voices	Single voice (e.g., Tiffany) speaks all seven languages with mid-sentence code-switching (Nova 2)^[2]
Seven Languages	English, French, Italian, German, Spanish, plus Portuguese and Hindi added in Nova 2^[2]
1M Context Window	Extended conversations without context loss
Async Tool Calling	Keeps responding to the user while tools run in the background (Nova 2)^[2]
Cross-Modal Sessions	Switch between voice and text input mid-session; supports DTMF keypad tones for IVR (Nova 2)^[2]
Bedrock Integration	Native AWS security, compliance, billing

Product Editions

Edition	Description	Availability
Nova Sonic	Base speech-to-speech model	GA (April 2025)
Nova 2 Sonic	Polyglot voices, 7 languages, async tools, 1M context	GA (December 2025)

Technical Architecture

Nova Sonic operates within the Amazon Bedrock infrastructure, providing enterprise-grade security and scalability. The model receives audio input and generates audio output directly, with optional text output for transcription and tool calling.

┌──────────────────────────────────────────────┐
│            Amazon Bedrock                     │
├──────────────────────────────────────────────┤
│  ┌────────────────────────────────────────┐  │
│  │           Nova Sonic Model             │  │
│  │  ┌──────────┐    ┌──────────────────┐ │  │
│  │  │  Speech  │    │     Speech       │ │  │
│  │  │  Input   │ →  │     Output       │ │  │
│  │  └──────────┘    └──────────────────┘ │  │
│  │          ↓              ↑             │  │
│  │  ┌──────────────────────────────────┐│  │
│  │  │  Text Tokens (tools, history)    ││  │
│  │  └──────────────────────────────────┘│  │
│  └────────────────────────────────────────┘  │
├──────────────────────────────────────────────┤
│  IAM, VPC, CloudWatch, Cost Management       │
└──────────────────────────────────────────────┘

Integration Options

Integration	Description
Bedrock API	Direct API access with IAM authentication
Vonage	Telephony integration for voice calls
Connect	Amazon Connect contact center integration
Lex	Conversational interface integration

Strengths

Bedrock-native — Enterprise security, IAM, VPC, compliance certifications included
Unified billing — Single AWS bill with existing enterprise agreements
Price performance — Industry-leading cost efficiency for speech-to-speech
1M context window — Long conversations without context loss (Nova 2)
Polyglot support — Multiple languages and accents in single model
Tool invocation — Native function calling for task completion
AWS ecosystem — Integrates with Connect, Lex, Lambda, and other AWS services

Cautions

AWS lock-in — Tightly coupled to Bedrock; no self-hosting or multi-cloud
Regional availability — Four regions as of June 2026 (N. Virginia, Oregon, Stockholm, Tokyo); no cross-region inference for Nova 2 Sonic^[3]
Newer entrant — Less mature than OpenAI Realtime API; smaller ecosystem
Documentation gaps — Less community content and examples than competitors
Limited voices — Fewer voice options than ElevenLabs
No MCP support — Uses AWS-native tool calling, not open MCP protocol

Pricing & Licensing

Nova 2 Sonic is priced through Amazon Bedrock with token-based billing. Rates as of June 2026:^[5]^[6]

Component	Rate
Speech Input	$3.00 / 1M tokens ($0.003 / 1K)
Speech Output	$12.00 / 1M tokens ($0.012 / 1K)
Text Input	$0.33 / 1M tokens
Text Output	$2.75 / 1M tokens

Nova 2 Sonic's speech rates undercut the original Nova Sonic ($3.40/1M input, $13.60/1M output), though text token rates are higher than Sonic v1's.^[6]

Estimated per-minute cost: ~$0.015/min (speech I/O combined, estimate)

Cost comparison:

~80% cheaper than OpenAI's GPT-4o Realtime per AWS and third-party analysis^[6]
Competitive with Retell AI (~$0.07/min + provider costs)
More expensive than Deepgram Aura TTS (but Nova is full S2S)

Service Tiers: Nova 2 Sonic supports only Bedrock's Standard pay-per-token tier as of June 2026 — Priority, Flex, and Reserved tiers are not supported for this model.^[3]

What Developers Say

Community discussion of Nova Sonic remains thin compared to OpenAI Realtime — most published material is AWS's own blogs and samples. The Hacker News commentary that exists is measured rather than enthusiastic.

On capability, one developer who built a post-sales call assistant wrote: "Amazon has a commercial Speech-to-Text model (Nova Sonic) that is passable. I used it to create a post-sales call assistant and was surprised that the underlying model was able to do a bunch of stuff I thought I was going to have to use Claude for." — coredog64, Hacker News, August 2025^[7]

On developer experience, the same engineer later cautioned: "Having done some implementations it's trickier than you might like (e.g. the Python library for Sonic had problems with echoes and we had to use the Java library)" — coredog64, Hacker News, May 2026^[8]

No substantial Reddit threads on Nova 2 Sonic surfaced as of June 2026 — a signal that grassroots adoption still trails the AWS enterprise channel.

Competitive Positioning

Direct Competitors

Competitor	Differentiation
OpenAI Realtime API	OpenAI has better instruction following and MCP support; Nova Sonic has Bedrock integration and potentially lower costs
ElevenLabs	ElevenLabs has superior voice quality and variety; Nova Sonic has enterprise AWS integration
Deepgram	Deepgram excels at STT+TTS building blocks; Nova Sonic is end-to-end speech-to-speech

When to Choose AWS Nova 2 Sonic

Choose Nova Sonic when: You're already on AWS, need Bedrock compliance, or want unified billing
Choose OpenAI Realtime when: You need best-in-class instruction following and MCP
Choose ElevenLabs when: Voice quality and variety are paramount
Choose LiveKit when: You want open-source flexibility

Ideal Customer Profile

Best fit:

AWS-native enterprises with existing Bedrock usage
Organizations requiring AWS compliance certifications (HIPAA, SOC2, FedRAMP)
Teams wanting unified AWS billing and IAM
Contact centers using Amazon Connect
Applications needing long context (1M tokens)

Poor fit:

Multi-cloud organizations avoiding AWS lock-in
Teams needing extensive voice variety
Startups without AWS enterprise agreements
Use cases requiring MCP protocol support

Viability Assessment

Factor	Assessment
Financial Health	Excellent — AWS is one of the most profitable businesses globally
Market Position	Growing — Strong AWS enterprise base, newer in voice AI; community adoption still thin as of June 2026
Innovation Pace	Rapid — Nova 2 Sonic released within 8 months of Nova Sonic; four regions by mid-2026
Ecosystem	Extensive — Deep AWS service integration
Long-term Outlook	Positive — Core to AWS's AI strategy

Bottom Line

AWS Nova 2 Sonic is the natural choice for AWS-native enterprises building voice AI applications. The Bedrock integration provides enterprise-grade security, compliance, and unified billing that's difficult to replicate with third-party providers. Nova 2 Sonic's 1M context window, asynchronous tool calling, and polyglot voices across seven languages make it competitive for complex, multilingual applications — at speech rates ($3/$12 per 1M tokens) that undercut both its predecessor and OpenAI's GPT-4o Realtime.^[2]^[6]

The trade-off is AWS lock-in and a smaller voice AI ecosystem compared to dedicated providers like ElevenLabs or the OpenAI Realtime API. For organizations already committed to AWS, Nova Sonic reduces complexity. For multi-cloud or voice-quality-focused teams, alternatives may be better fits.

Recommended for: AWS-native enterprises needing compliant, scalable voice AI with unified billing.

Not recommended for: Multi-cloud organizations, teams requiring extensive voice variety, or those prioritizing MCP protocol support.

Outlook: Positive. AWS shipped a major version, two new languages, async tooling, and three additional regions within fourteen months of launch. The open question is grassroots traction — developer SDKs still draw complaints and community discussion remains sparse relative to OpenAI Realtime.

Research by Ry Walker Research

Sources