Groq | Ry Walker Research

Key takeaways

Custom LPU delivers dramatically faster inference than GPU-based platforms with deterministic latency
Nvidia paid ~$20 billion in December 2025 to license Groq's LPU technology and hired founder Jonathan Ross and much of the leadership team
Groq remains independent and GroqCloud keeps operating, now pivoting from chipmaker to cloud inference service
Raised $750M at a $6.9B valuation in September 2025; reports over 3 million developers and teams
Limited to supported models — no custom model deployment, though LoRA adapter inference is now available

FAQ

What is Groq?

A company building custom Language Processing Units (LPUs) optimized for AI inference, operating GroqCloud — one of the fastest LLM inference APIs available.

Did Nvidia acquire Groq?

Not exactly. In December 2025 Nvidia entered a non-exclusive licensing agreement for Groq's inference technology worth roughly $20 billion — Nvidia's largest deal ever. Founder Jonathan Ross and other leaders joined Nvidia, but Groq remains an independent company and GroqCloud continues to operate.

How fast is Groq?

Often several times faster than GPU-based platforms, with deterministic latency and no batching delays. Groq is the speed benchmark others measure against.

Is Groq free?

Groq offers a free tier for development. Production usage is pay-per-token, with batch processing and spend limits available.

Company Overview

Groq builds custom Language Processing Units (LPUs) — purpose-built silicon designed from the ground up for AI inference.^[1] Unlike GPU-based platforms that repurpose graphics hardware for AI, Groq's LPU architecture is optimized for the sequential nature of language model inference, delivering dramatically faster token generation.

Founded by Jonathan Ross (who invented Google's TPU), Groq became synonymous with "fast inference" in the developer community. That story changed dramatically in December 2025: Nvidia entered a non-exclusive licensing agreement for Groq's inference technology worth roughly $20 billion in cash — Nvidia's largest deal on record — and Ross, president Sunny Madra, and much of the engineering leadership joined Nvidia.^[2]^[3] Groq remains an independent company under new leadership, and GroqCloud continues to operate without interruption, pivoting from chip manufacturer toward a cloud inference service running its own LPU fleet.^[3]

Before the Nvidia deal, Groq raised $750 million in September 2025 at a $6.9 billion post-money valuation, led by Disruptive with participation from BlackRock, Neuberger Berman, Samsung, and Cisco.^[4]

What It Does

LLM inference API — OpenAI-compatible endpoints for popular open models, including day-zero support for OpenAI's GPT-OSS models^[5]^[1]
Deterministic latency — Consistent response times without batching delays
LoRA inference — Serve LoRA fine-tuned adapters on supported base models^[5]
Multimodal endpoints — OCR/image recognition and text-to-speech alongside text generation^[5]
Batch processing — Discounted asynchronous batch API with spend limits^[5]
Free tier — Free usage for development and prototyping

How It Works

The LPU (Language Processing Unit) differs fundamentally from GPUs:

No batching — Each request gets dedicated compute; no waiting for batch formation
Deterministic — Same input produces same latency every time
Sequential optimization — Architecture designed for autoregressive token generation
SRAM-based — On-chip memory eliminates external memory bottlenecks

The result: token generation speeds that are often several times faster than GPU-based platforms, with consistent low latency.^[1]

Adoption

As of June 2026, Groq reports over 3 million developers and teams on GroqCloud, with named customers including Dropbox, Vercel, Canva, Robinhood, Riot Games, Workday, and Ramp, plus a partnership as the McLaren Formula 1 Team's official inference provider.^[1]

Pricing

Free tier — Rate-limited free usage for development^[5]
Pay-per-token — Production pricing by model, with batch processing discounts and spend limits^[5]
No GPU management — Fully managed API

Groq's per-token pricing is competitive with GPU-based alternatives despite the speed advantage, making it attractive for latency-sensitive applications.

Strengths

Fastest inference — Consistently benchmarks among the fastest LLM APIs
Deterministic latency — No variance from batching or contention
Free tier — Low barrier to adoption
OpenAI-compatible — Drop-in replacement for existing code
Well capitalized — $750M raise plus billions in Nvidia licensing payments^[4]^[2]
Validated technology — Nvidia paying ~$20B to license the LPU architecture is the strongest possible endorsement of the approach^[2]

Weaknesses / Risks

Talent exodus — Founder Jonathan Ross and key technical leadership departed for Nvidia in December 2025^[2]
Strategic uncertainty — Groq's differentiator is now licensed to its biggest competitor; the company is reinventing itself as an inference cloud
Limited model selection — Only supported open models; no custom model deployment (LoRA adapters partially mitigate this)^[5]
Hardware supply — Custom silicon means limited scale vs commodity GPUs
Model lag — New models take time to be optimized for LPU architecture

Competitive Landscape

vs. Nvidia: Now its licensor and talent destination rather than purely a rival. Nvidia's licensing of LPU tech signals the GPU giant sees dedicated inference silicon as the future.^[2]

vs. Cerebras: Both use custom silicon. Cerebras focuses on the largest single-chip approach; Groq on deterministic latency at scale.

vs. Together AI/Fireworks: GPU-based platforms offer more models and fine-tuning. Groq wins purely on speed.

vs. DeepInfra: DeepInfra may be cheaper per-token. Groq is significantly faster.

vs. SambaNova: Both custom silicon. SambaNova targets enterprise on-prem; Groq targets cloud API developers.

Ideal User

Applications where latency is the #1 priority (real-time chat, voice, gaming)
Developers prototyping with the free tier
Teams wanting the fastest possible open-model inference
Products where consistent response time matters more than model customization

Cautions

The December 2025 Nvidia deal removed the founder and much of the technical leadership; watch whether GroqCloud's pace of model support and hardware iteration holds up under new management^[2]
Roadmap commitments made before the deal should be re-verified before betting production workloads on them

What Developers Say

Developer sentiment on Groq's speed remains strongly positive; the December 2025 Hacker News thread on the Nvidia deal mixed admiration for the product with skepticism about what the deal means for competition.^[6]

"I was extremely impressed by Groq last I messed about with it, the inference speed was bonkers." — zamalek, Hacker News^[6]

"Many hundreds of tokens per second of output is the norm, in my experience." — LoganDark, Hacker News^[6]

"They don't need to catch up. They just need to be good enough and fast as fuck." — impulser_, Hacker News, on why Nvidia wanted the LPU technology^[6]

Bottom Line

Groq remains the speed benchmark of AI inference, and Nvidia's ~$20 billion license of the LPU architecture validated the technology emphatically.^[2] But the company that built it is in transition: the founder and key leaders are gone to Nvidia, and Groq is pivoting from chipmaker to inference cloud. GroqCloud continues uninterrupted with over 3 million developers.^[1]

Recommended for: Latency-critical applications (real-time chat, voice, agents) running popular open models, and developers prototyping on the free tier.

Not recommended for: Teams needing custom model deployment, full fine-tuning, or long-term hardware roadmap certainty.

Outlook: Flush with Nvidia cash and a validated architecture, Groq's future hinges on whether the post-Ross team can keep shipping. Expect it to compete as an inference cloud rather than a chip vendor.

Sources