← Back to research
·8 min read·company

AWS Nova 2 Sonic

Amazon Nova Sonic is AWS's speech-to-speech foundation model for Amazon Bedrock, offering human-like voice conversations with low latency and industry-leading price performance.

Key takeaways

  • Native Bedrock integration provides enterprise-grade security, compliance, and unified billing for AWS customers
  • Industry-leading price performance with unified speech understanding and generation in a single model
  • Nova 2 Sonic upgrade adds polyglot voices across seven languages, asynchronous tool calling, and a 1M token context window

FAQ

What is AWS Nova 2 Sonic?

Amazon Nova Sonic is a speech-to-speech foundation model that unifies speech understanding and generation to enable human-like voice conversations in AI applications, available through Amazon Bedrock.

How much does Nova 2 Sonic cost?

Nova 2 Sonic costs $3 per million speech input tokens and $12 per million speech output tokens ($0.003/$0.012 per 1K), roughly $0.015/min estimated — approximately 80% cheaper than OpenAI's GPT-4o Realtime.

How does Nova 2 Sonic differ from Nova Sonic?

Nova 2 Sonic (December 2025) adds polyglot voices, seven-language support including Portuguese and Hindi, asynchronous tool calling, cross-modal voice/text switching, and a 1M token context window.

Executive Summary

Amazon Nova Sonic is AWS's native speech-to-speech foundation model, available through Amazon Bedrock. It unifies speech understanding and generation into a single model, enabling human-like voice conversations with low latency and industry-leading price performance. Nova 2 Sonic (announced at re:Invent on December 2, 2025) adds polyglot voices across seven languages, asynchronous tool calling, and a 1M token context window.[1][2] As of June 2026, Nova 2 Sonic runs in four AWS regions: N. Virginia, Oregon, Stockholm, and Tokyo.[3]

AttributeValue
CompanyAmazon Web Services
LaunchedApril 2025 (Nova Sonic), December 2025 (Nova 2 Sonic)
PlatformAmazon Bedrock (model ID amazon.nova-2-sonic-v1:0)
Context Window1M tokens, 64K max output (Nova 2 Sonic)
Regionsus-east-1, us-west-2, eu-north-1, ap-northeast-1 (June 2026)

Product Overview

Nova Sonic is part of Amazon's Nova family of foundation models, designed specifically for conversational AI applications. The model handles speech understanding and generation in a unified architecture, eliminating the latency and complexity of traditional ASR→LLM→TTS pipelines.[4]

Key applications include customer service automation, virtual assistants, and interactive voice response (IVR) systems.

Key Capabilities

CapabilityDescription
Unified Speech ModelSingle model for understanding and generation
Low LatencyOptimized for real-time conversational AI
Polyglot VoicesSingle voice (e.g., Tiffany) speaks all seven languages with mid-sentence code-switching (Nova 2)[2]
Seven LanguagesEnglish, French, Italian, German, Spanish, plus Portuguese and Hindi added in Nova 2[2]
1M Context WindowExtended conversations without context loss
Async Tool CallingKeeps responding to the user while tools run in the background (Nova 2)[2]
Cross-Modal SessionsSwitch between voice and text input mid-session; supports DTMF keypad tones for IVR (Nova 2)[2]
Bedrock IntegrationNative AWS security, compliance, billing

Product Editions

EditionDescriptionAvailability
Nova SonicBase speech-to-speech modelGA (April 2025)
Nova 2 SonicPolyglot voices, 7 languages, async tools, 1M contextGA (December 2025)

Technical Architecture

Nova Sonic operates within the Amazon Bedrock infrastructure, providing enterprise-grade security and scalability. The model receives audio input and generates audio output directly, with optional text output for transcription and tool calling.

┌──────────────────────────────────────────────┐
│            Amazon Bedrock                     │
├──────────────────────────────────────────────┤
│  ┌────────────────────────────────────────┐  │
│  │           Nova Sonic Model             │  │
│  │  ┌──────────┐    ┌──────────────────┐ │  │
│  │  │  Speech  │    │     Speech       │ │  │
│  │  │  Input   │ →  │     Output       │ │  │
│  │  └──────────┘    └──────────────────┘ │  │
│  │          ↓              ↑             │  │
│  │  ┌──────────────────────────────────┐│  │
│  │  │  Text Tokens (tools, history)    ││  │
│  │  └──────────────────────────────────┘│  │
│  └────────────────────────────────────────┘  │
├──────────────────────────────────────────────┤
│  IAM, VPC, CloudWatch, Cost Management       │
└──────────────────────────────────────────────┘

Integration Options

IntegrationDescription
Bedrock APIDirect API access with IAM authentication
VonageTelephony integration for voice calls
ConnectAmazon Connect contact center integration
LexConversational interface integration

Strengths

  • Bedrock-native — Enterprise security, IAM, VPC, compliance certifications included
  • Unified billing — Single AWS bill with existing enterprise agreements
  • Price performance — Industry-leading cost efficiency for speech-to-speech
  • 1M context window — Long conversations without context loss (Nova 2)
  • Polyglot support — Multiple languages and accents in single model
  • Tool invocation — Native function calling for task completion
  • AWS ecosystem — Integrates with Connect, Lex, Lambda, and other AWS services

Cautions

  • AWS lock-in — Tightly coupled to Bedrock; no self-hosting or multi-cloud
  • Regional availability — Four regions as of June 2026 (N. Virginia, Oregon, Stockholm, Tokyo); no cross-region inference for Nova 2 Sonic[3]
  • Newer entrant — Less mature than OpenAI Realtime API; smaller ecosystem
  • Documentation gaps — Less community content and examples than competitors
  • Limited voices — Fewer voice options than ElevenLabs
  • No MCP support — Uses AWS-native tool calling, not open MCP protocol

Pricing & Licensing

Nova 2 Sonic is priced through Amazon Bedrock with token-based billing. Rates as of June 2026:[5][6]

ComponentRate
Speech Input$3.00 / 1M tokens ($0.003 / 1K)
Speech Output$12.00 / 1M tokens ($0.012 / 1K)
Text Input$0.33 / 1M tokens
Text Output$2.75 / 1M tokens

Nova 2 Sonic's speech rates undercut the original Nova Sonic ($3.40/1M input, $13.60/1M output), though text token rates are higher than Sonic v1's.[6]

Estimated per-minute cost: ~$0.015/min (speech I/O combined, estimate)

Cost comparison:

  • ~80% cheaper than OpenAI's GPT-4o Realtime per AWS and third-party analysis[6]
  • Competitive with Retell AI (~$0.07/min + provider costs)
  • More expensive than Deepgram Aura TTS (but Nova is full S2S)

Service Tiers: Nova 2 Sonic supports only Bedrock's Standard pay-per-token tier as of June 2026 — Priority, Flex, and Reserved tiers are not supported for this model.[3]


What Developers Say

Community discussion of Nova Sonic remains thin compared to OpenAI Realtime — most published material is AWS's own blogs and samples. The Hacker News commentary that exists is measured rather than enthusiastic.

On capability, one developer who built a post-sales call assistant wrote: "Amazon has a commercial Speech-to-Text model (Nova Sonic) that is passable. I used it to create a post-sales call assistant and was surprised that the underlying model was able to do a bunch of stuff I thought I was going to have to use Claude for." — coredog64, Hacker News, August 2025[7]

On developer experience, the same engineer later cautioned: "Having done some implementations it's trickier than you might like (e.g. the Python library for Sonic had problems with echoes and we had to use the Java library)" — coredog64, Hacker News, May 2026[8]

No substantial Reddit threads on Nova 2 Sonic surfaced as of June 2026 — a signal that grassroots adoption still trails the AWS enterprise channel.


Competitive Positioning

Direct Competitors

CompetitorDifferentiation
OpenAI Realtime APIOpenAI has better instruction following and MCP support; Nova Sonic has Bedrock integration and potentially lower costs
ElevenLabsElevenLabs has superior voice quality and variety; Nova Sonic has enterprise AWS integration
DeepgramDeepgram excels at STT+TTS building blocks; Nova Sonic is end-to-end speech-to-speech

When to Choose AWS Nova 2 Sonic

  • Choose Nova Sonic when: You're already on AWS, need Bedrock compliance, or want unified billing
  • Choose OpenAI Realtime when: You need best-in-class instruction following and MCP
  • Choose ElevenLabs when: Voice quality and variety are paramount
  • Choose LiveKit when: You want open-source flexibility

Ideal Customer Profile

Best fit:

  • AWS-native enterprises with existing Bedrock usage
  • Organizations requiring AWS compliance certifications (HIPAA, SOC2, FedRAMP)
  • Teams wanting unified AWS billing and IAM
  • Contact centers using Amazon Connect
  • Applications needing long context (1M tokens)

Poor fit:

  • Multi-cloud organizations avoiding AWS lock-in
  • Teams needing extensive voice variety
  • Startups without AWS enterprise agreements
  • Use cases requiring MCP protocol support

Viability Assessment

FactorAssessment
Financial HealthExcellent — AWS is one of the most profitable businesses globally
Market PositionGrowing — Strong AWS enterprise base, newer in voice AI; community adoption still thin as of June 2026
Innovation PaceRapid — Nova 2 Sonic released within 8 months of Nova Sonic; four regions by mid-2026
EcosystemExtensive — Deep AWS service integration
Long-term OutlookPositive — Core to AWS's AI strategy

Bottom Line

AWS Nova 2 Sonic is the natural choice for AWS-native enterprises building voice AI applications. The Bedrock integration provides enterprise-grade security, compliance, and unified billing that's difficult to replicate with third-party providers. Nova 2 Sonic's 1M context window, asynchronous tool calling, and polyglot voices across seven languages make it competitive for complex, multilingual applications — at speech rates ($3/$12 per 1M tokens) that undercut both its predecessor and OpenAI's GPT-4o Realtime.[2][6]

The trade-off is AWS lock-in and a smaller voice AI ecosystem compared to dedicated providers like ElevenLabs or the OpenAI Realtime API. For organizations already committed to AWS, Nova Sonic reduces complexity. For multi-cloud or voice-quality-focused teams, alternatives may be better fits.

Recommended for: AWS-native enterprises needing compliant, scalable voice AI with unified billing.

Not recommended for: Multi-cloud organizations, teams requiring extensive voice variety, or those prioritizing MCP protocol support.

Outlook: Positive. AWS shipped a major version, two new languages, async tooling, and three additional regions within fourteen months of launch. The open question is grassroots traction — developer SDKs still draw complaints and community discussion remains sparse relative to OpenAI Realtime.


Research by Ry Walker Research