Key takeaways
- Wafer-Scale Engine 3 contains 4 trillion transistors — the largest chip ever built; no WSE-4 has been announced as of June 2026
- Completed the biggest tech IPO of 2026 on May 14 — raised $5.5B at $185/share, popped 108% at open, closed day one at $311 (~$66B market cap)
- 2025 revenue of $510M (up 76% YoY) with a swing to $237.8M net income, plus an OpenAI supply relationship and AWS availability
- Cerebras Inference delivers 2,000+ tokens/sec on supported models — roughly 1,000 tok/s even on trillion-parameter models like Kimi K2.6
FAQ
What is Cerebras?
A company that builds the Wafer-Scale Engine (WSE) — the largest chip ever made — for AI inference and training.
How big is the Cerebras chip?
The WSE-3 is wafer-scale — an entire silicon wafer as a single chip with 4 trillion transistors, ~56x larger than the largest GPU.
Is Cerebras publicly traded?
Yes. Cerebras went public on Nasdaq on May 14, 2026, raising $5.5 billion at $185 per share — the biggest tech IPO of 2026. The stock opened at $385 and closed its first day at $311.
How fast is Cerebras inference?
Up to 15x faster than GPUs — 2,000+ tokens per second on models like Llama 4 Scout, and roughly 1,000 tokens per second on trillion-parameter models.
Company Overview
Cerebras takes the most radical approach in AI hardware: building the largest chip ever made.[1] The Wafer-Scale Engine (WSE) uses an entire silicon wafer as a single processor, containing 4 trillion transistors — approximately 56x larger than the largest NVIDIA GPU.
The IPO bet paid off. On May 14, 2026, Cerebras went public on Nasdaq in the biggest tech IPO of the year, raising $5.5 billion at $185 per share — far above its initial $115–$125 range. The stock opened at $385 (a 108% pop) and closed its first day at $311, an ~$66 billion market cap.[2] Underneath the listing: $510 million in 2025 revenue (up 76% year-over-year), a swing to $237.8 million net income, and a supply relationship with OpenAI.[2]
What It Does
- Cerebras Inference — Cloud API for fast LLM inference on WSE hardware, OpenAI-compatible, with models including Llama 4 Scout, GLM 4.7, GPT-OSS-120B, and Kimi K2.6[3]
- Training systems — CS-3 systems for large-scale model training
- Deployment options — Cloud, dedicated capacity, and on-premise WSE systems[1]
- Research partnerships — Collaborations with national labs and universities
How It Works
The Wafer-Scale Engine eliminates the interconnect bottleneck that limits GPU clusters. Instead of networking thousands of small chips together, Cerebras puts everything on one massive chip:
- 900,000 AI cores on a single wafer
- 44GB on-chip SRAM — no external memory bottleneck
- Wafer-scale interconnect — all cores communicate at silicon speed
- MemoryX — External memory system for models larger than on-chip capacity
For inference, Cerebras offers an API similar to other providers. For training, customers deploy CS-3 systems (each containing one WSE-3). As of June 2026, no WSE-4 has been announced — the company is still scaling on WSE-3.
Pricing
As of June 2026:[3]
- Free tier — API access with no upfront cost, for experimentation
- Developer — Self-serve pay-per-token pricing starting at a $10 deposit, with 10x higher rate limits than the free tier
- Enterprise — Production-scale inference with dedicated queue priority and custom terms
- Hardware/training — Custom pricing for CS-3 deployments
Strengths
- Fastest single-model throughput — 2,000+ tokens/sec on supported models, up to 15x faster than GPUs; ~1,000 tok/s even on trillion-parameter models[3]
- No interconnect bottleneck — Single chip eliminates multi-GPU communication overhead
- On-chip memory — 44GB SRAM avoids HBM bandwidth limits
- Marquee customers — OpenAI, Meta, AWS, Notion, AlphaSense, GSK, and Mayo Clinic appear on its customer roster as of June 2026[1]
- Public-company footing — $5.5B IPO raise, profitable in 2025, AWS Marketplace availability[2]
- Training + inference — Full-stack custom silicon platform
Weaknesses / Risks
- Limited model ecosystem — Only supported models available on inference API
- No fine-tuning API — Less accessible than GPU platforms for customization
- Hardware cost — WSE systems are expensive; cloud API offsets this
- Supply constraints — Wafer-scale manufacturing is complex and limited
- Customer concentration history — Group 42 once accounted for nearly all revenue, and its Abu Dhabi investment triggered a lengthy CFIUS review that delayed the original IPO[2]
- Circular OpenAI economics — TechCrunch flags a "complicated circular-deal relationship" with OpenAI; post-IPO volatility (open $385, close $311 day one) shows the market is still pricing the story[2]
What Developers Say
Developer sentiment on Hacker News is strongly positive on speed, with the main caveats being model selection rather than performance:
"Nothing (may be except groq ?) comes even close to Cerebras in inference speed. The difference in using them as an inference provider vs anything else is like night and day." — allisdust, Hacker News[4]
"Iterating on code with 1000 tok/s makes it feel even more magical." — 0vermorrow, Hacker News[5]
"Also consider using Cerebras' inference APIs. They released a voice demo a while back and the latency of their model inference is insane." — alfalfasprout, Hacker News[6]
Competitive Landscape
vs. Groq: Both custom silicon. Groq focuses on deterministic latency for individual requests; Cerebras on maximum throughput via wafer-scale.
vs. NVIDIA/GPU platforms: GPUs offer more model flexibility and ecosystem. Cerebras wins on raw performance for supported models.
vs. SambaNova: Both custom silicon with enterprise focus. Cerebras has the more radical hardware approach; SambaNova focuses on reconfigurability.
vs. Modal/Baseten: GPU platforms offer more flexibility. Cerebras wins on speed for supported workloads.
Ideal User
- Enterprise teams needing maximum inference throughput
- Builders of latency-sensitive products — voice agents, real-time coding tools, interactive agents
- Research organizations and national labs with large-scale AI workloads
- Teams running supported models where speed is the primary metric
Bottom Line
Cerebras represents the boldest bet in AI hardware — that a single massive chip beats a cluster of smaller ones — and as of June 2026 the bet is validated: the biggest tech IPO of the year, $510M in revenue, profitability, and an OpenAI supply deal.[2] The Wafer-Scale Engine delivers on performance, and the inference API makes that speed accessible to any developer with $10. The remaining risk is ecosystem breadth and the durability of circular AI-economy deals.
Recommended for: Teams where inference speed is the product — real-time agents, voice, fast coding loops — running models Cerebras supports.
Not recommended for: Teams needing arbitrary model choice, fine-tuning APIs, or maximum ecosystem flexibility; GPU platforms remain the safer default there.
Outlook: Public-company capital plus the OpenAI relationship should fund the next wafer generation (WSE-4 is speculated but unannounced). Watch customer concentration and whether post-IPO volatility settles into durable demand.
Research by Ry Walker Research • methodology