Key takeaways
- NVIDIA invested $2B in Nebius on March 11, 2026 as part of a full-stack AI cloud partnership targeting 5+ gigawatts of NVIDIA systems deployed by end of 2030
- Q1 2026 revenue hit $399M (up 684% YoY) with AI ARR at $1.9B; full-year guidance calls for $7–9B ARR exit and $20–25B of capex
- Token Factory — per-token managed inference over 60+ open models — is management's stated fastest-growing segment, reinforced by the $643M Eigen AI acquisition and the Clarifai and Tavily deals
- A $27B five-year Meta contract ($12B fixed, $15B optional) anchors the infrastructure side, making the inference platform a high-margin layer on top of a contracted neocloud
FAQ
What is Nebius Token Factory?
Token Factory is Nebius's managed AI inference platform — per-token API access to 60+ open-source models plus dedicated endpoints and fine-tuning, running on Nebius's own GPU cloud.
How much does Nebius Token Factory cost?
Per-token pricing with separate input/output rates and volume discounts, in Fast (low-latency) and Base (cost-efficient batch) tiers; exact rates are listed per model on the self-service pricing page.
What models does Token Factory serve?
60+ open models including DeepSeek-V4-Pro, Qwen3-235B, Llama, GPT OSS 120B/20B, GLM-5.1, MiniMax-M2.5, plus embedding and safety models, as of June 2026.
How is Nebius different from Baseten or Together AI?
Nebius is a public company that owns its data centers and sells the full stack from GPU clusters to per-token inference, while Baseten and Together are venture-backed inference-first platforms that largely rent or partner for capacity.
Executive Summary
Nebius (NASDAQ: NBIS) is the Amsterdam-headquartered neocloud — formerly Yandex N.V., which divested its Russian assets and relisted in 2024 — that has layered a managed inference business, Token Factory, on top of its GPU cloud. Token Factory serves 60+ open-source models (DeepSeek-V4-Pro, Qwen3-235B, Llama, GPT OSS, GLM-5.1, MiniMax-M2.5) through per-token APIs, dedicated endpoints with a 99.9% uptime SLA, and fine-tuning pipelines.[1] Management called Token Factory the company's fastest-growing segment on the Q1 2026 earnings call, naming Revolut and 1X Technologies as recent customers.[2][3]
The scale behind it is unusual for an inference vendor: Q1 2026 revenue of $399M grew 684% year-over-year, AI ARR reached $1.9B (up from $1.25B at end-2025), and NVIDIA made a $2B strategic equity investment on March 11, 2026, in a partnership targeting more than 5 gigawatts of NVIDIA systems deployed by end of 2030.[2][4] Nebius then spent ~$643M to acquire inference-optimization firm Eigen AI (May 1, 2026) and folded in Clarifai and Tavily, assembling a full inference and agentic-deployment stack on top of a $27B five-year Meta infrastructure contract.[5][6][2]
| Attribute | Value |
|---|---|
| Company | Nebius Group N.V. (NASDAQ: NBIS), Amsterdam[4] |
| Heritage | Spun out of Yandex N.V. after divesting Russian assets; relisted 2024[7] |
| Q1 2026 Revenue | $399M (+684% YoY); AI revenue $390M (+841%)[2] |
| AI ARR | $1.9B as of Q1 2026; FY2026 guidance $7–9B[2] |
| Strategic Backing | NVIDIA $2B equity investment (March 11, 2026)[4] |
| Anchor Contracts | Meta $27B/5yr ($12B fixed, $15B optional); Microsoft contract on schedule[2] |
Product Overview
Token Factory is the managed-inference layer of the Nebius cloud: pick an open model, hit an API with transparent $/token pricing, and scale to dedicated endpoints or fine-tuned variants without managing GPUs.[1] It launched as the production-grade successor to Nebius AI Studio, pitched at companies deploying open-source and custom models with enterprise security, and counts Higgsfield AI as an adopter and Hugging Face as a collaborator for open-model access.[8]
Two serving configurations cover the latency/cost trade: Fast (sub-second responses for interactive agents and chat) and Base (cost-efficient throughput for large-scale or background processing), switchable on the same API and endpoints.[1]
Key Capabilities
| Capability | Description |
|---|---|
| Model library | 60+ open models — DeepSeek-V4-Pro, Qwen3-235B, Llama, GPT OSS 120B/20B, GLM-5.1, MiniMax-M2.5, plus embeddings and Llama-Guard safety models[1] |
| Serving tiers | Fast (sub-second, interactive) and Base (batch/background), same API[1] |
| Dedicated endpoints | Reserved capacity, 99.9% uptime SLA, custom autoscaling, EU/US regional deployment, guaranteed isolation[1] |
| Fine-tuning | Custom fine-tuned models deploy via dashboard or API at per-token pricing; post-training and distillation forthcoming[1] |
| Optimization stack | Eigen AI's post-training quantization, KV-cache optimization, and custom CUDA kernels being integrated directly into Token Factory[6][5] |
| Multimodal/vision | Clarifai acquisition adds production-grade inference for multimodal and computer-vision workloads[2] |
| Compliance | SOC 2 Type II, HIPAA, ISO 27001; data centers in Finland, France, and the US[1] |
Product Surfaces
| Surface | Description | Availability |
|---|---|---|
| Per-token API | OpenAI-style shared endpoints over the open-model library | GA[1] |
| Dedicated endpoints | Reserved-capacity instances with SLA and regional pinning | GA[1] |
| Fine-tuning | Dashboard/API fine-tune-and-deploy pipeline | GA[1] |
| GPU cloud / clusters | The underlying Nebius AI cloud for training and custom inference | GA[4] |
Technical Architecture
Unlike inference-first startups, Token Factory sits on infrastructure Nebius owns end to end — the NVIDIA partnership spans "AI factory architecture to production software" and targets 5+ GW of NVIDIA systems by end of 2030, with the $2B investment cited by management as supply-chain certainty for GPU allocation.[4][2] The serving layer uses autoscaling, speculative decoding, and multi-region routing to hold sub-second latency targets at scale.[8]
The Eigen AI acquisition is the architecture bet: a 20-person team of MIT HAN Lab alumni whose quantization, KV-cache, and custom CUDA-kernel stack reduces compute and memory per token, integrated directly into Token Factory's inference and post-training layers, with the founding team establishing a Nebius engineering presence in the San Francisco Bay Area.[6][5]
Key Technical Details
| Aspect | Detail |
|---|---|
| Deployment | Managed cloud on Nebius-owned data centers (Finland, France, US); EU/US data residency[1] |
| Models | 60+ open models, new ones onboarded on customer demand; custom fine-tunes supported[1] |
| Optimization | Eigen AI quantization/KV-cache/CUDA kernels; speculative decoding; autoscaling[6][8] |
| Hardware | NVIDIA systems exclusively at partnership scale (5+ GW by 2030); no custom silicon[4] |
| Open Source | Serves open-weight models; platform itself is proprietary[1] |
Strengths
- Vertically integrated economics — Nebius owns the data centers, the power contracts, and the serving layer, so per-token margins are not squeezed by a cloud landlord; the $20–25B capex program is the moat competitors must rent.[2]
- NVIDIA as investor, not just supplier — the $2B strategic investment and full-stack partnership (March 11, 2026) give allocation certainty on next-generation systems through 2030.[4]
- Hyperscale-grade financial visibility — $399M Q1 revenue (+684%), $1.9B AI ARR, positive $130M adjusted EBITDA, and a $27B Meta contract make this the rare inference vendor with audited, public financials.[2]
- Acquired optimization depth — paying $643M for 20-person Eigen AI signals inference efficiency is the strategic priority; Clarifai and Tavily round out multimodal inference and agentic search.[6][2]
- Enterprise compliance posture — SOC 2 Type II, HIPAA, and ISO 27001 with EU data residency, a differentiator for European buyers that US-only rivals lack.[1]
Cautions
- Inference is the side bet, not the business — Token Factory is the fastest-growing segment, but Nebius's revenue is dominated by raw AI infrastructure; per-token serving could be deprioritized if hyperscale leases keep outgrowing it.[2]
- Customer-concentration risk — a $27B Meta contract anchors the model; the company's fate is tied to a handful of mega-deals, and capex guidance of $20–25B against $3.0–3.4B revenue guidance means heavy ongoing financing needs.[2]
- No published per-token rate card in marketing pages — pricing is "transparent $/token" with volume discounts, but specific model rates require the self-service console, complicating comparison shopping.[1]
- Heritage friction in some procurement processes — the Yandex lineage still surfaces in vendor reviews; one HN commenter reported legal counsel vetoing Nebius during GPU-provider due diligence.[7]
- Quantized serving trade-offs — community reports note FP8-quantized model variants on Nebius's serving stack, which buyers benchmarking against full-precision endpoints should verify per model.[7]
- Open models only — no proprietary frontier models (Claude, GPT, Gemini) are served, so teams needing closed models still need a second provider.[1]
What Developers Say
Community discussion of Nebius on Hacker News through June 2026 is thinner than its financial profile would suggest — there is no major dedicated Token Factory thread, and most mentions are stock chatter or passing infrastructure references; that gap between Wall Street attention and developer mindshare is itself a data point.[7]
"We do both: managed Kubernetes when it's available (AWS, Nebius, others)" — reissbaker on HN, building a GPU cloud, April 2026[7]
"Nebius AI Studio" runs FP8 quantized model versions; "my experience with FP8 output has been pretty decent" — KronisLV on HN, using GLM with Claude Code[7]
"When we were doing a deep-dive into cloud GPU providers, legal counsel veto'd them for this reason." — chias on HN, on Nebius's Yandex heritage during a European-infrastructure evaluation[7]
The pattern: practitioners who use Nebius treat it as a competent GPU/inference utility, while the criticism that does surface is about provenance and procurement comfort rather than product quality.[7]
Pricing & Licensing
| Tier | Price | Includes |
|---|---|---|
| Shared endpoints (Fast) | Per-token, input/output separated | Sub-second responses for interactive agents and chat[1] |
| Shared endpoints (Base) | Per-token, lower rates | Cost-efficient throughput for batch/background processing[1] |
| Dedicated endpoints | Reserved capacity, custom | 99.9% uptime SLA, custom autoscaling, EU/US regional deployment, isolation[1] |
| Enterprise | Custom | Dedicated Slack/support channels, custom DPAs, SSO, RBAC, unified billing[1] |
Per-model $/token rates are published in the self-service console rather than the marketing site; volume discounts apply as usage scales, with no infrastructure or idle-GPU charges on shared endpoints.[1]
Licensing model: Proprietary managed platform serving open-weight models; fine-tuned custom models deploy under the same per-token billing.[1]
Hidden costs: Dedicated endpoints carry reserved-capacity commitments; quantized (e.g., FP8) variants may differ from reference model quality, which is a cost-quality trade rather than a line item.[1][7]
Competitive Positioning
Direct Competitors
| Competitor | Differentiation |
|---|---|
| Baseten | Baseten (~$600M ARR, reported $11B valuation talks) leads on custom-model deployment, Truss tooling, and forward-deployed engineers; Nebius counters with owned data centers, public-company financials, and EU residency |
| Together AI | Together (~$1B annualized revenue) pairs inference with research credibility (FlashAttention lineage) and GPU clusters; Nebius matches the full-stack scope but at hyperscale capex with NVIDIA equity backing |
| Fireworks AI | Fireworks competes on serving-speed optimization; Nebius answers with the acquired Eigen AI stack plus vertically integrated capacity |
| DeepInfra | DeepInfra is the budget per-token option; Nebius targets enterprise SLAs, compliance, and dedicated capacity rather than the lowest sticker price |
| Hyperscalers (AWS/Azure/GCP) | Broader clouds with proprietary-model access; Nebius is AI-only, cheaper at GPU scale, and now a Meta/Microsoft supplier itself |
When to Choose Nebius Over Alternatives
- Choose Nebius Token Factory when: you serve open models at volume, want per-token pricing backed by an owner-operator with a 99.9% SLA, need EU data residency or SOC 2/HIPAA/ISO 27001, or want one vendor spanning GPU clusters to managed inference.
- Choose Baseten when: custom-model deployment workflows, compliance plus hands-on engineering support, and broad GPU selection matter more than owning-the-stack economics.
- Choose Together AI when: you want research-driven serving optimizations and a training-to-inference platform from one venture-backed vendor.
- Choose Groq/Cerebras when: custom-silicon latency is the deciding factor.
Ideal Customer Profile
Best fit:
- Enterprises standardizing on open-weight models (DeepSeek, Qwen, Llama) that want a contractual SLA and audited public-company counterparty rather than a startup
- European organizations needing EU data residency with SOC 2 Type II/HIPAA/ISO 27001 inference[1]
- Teams scaling from per-token APIs to dedicated endpoints to GPU clusters without switching vendors
- High-volume batch workloads that can exploit the Base tier's throughput pricing[1]
Poor fit:
- Teams that need proprietary frontier models (Claude, GPT, Gemini) from the same endpoint
- Buyers whose procurement or legal teams balk at the Yandex heritage[7]
- Small projects wanting the absolute lowest per-token price over SLAs and compliance
Viability Assessment
| Factor | Assessment |
|---|---|
| Financial Health | Strong but capital-hungry — $1.9B AI ARR, positive $130M adjusted EBITDA, $621M Q1 net income (incl. non-cash ClickHouse gain), against $20–25B capex guidance[2] |
| Market Position | Top-tier neocloud — NVIDIA equity backing, Meta ($27B) and Microsoft contracts; Token Factory is a challenger, not yet leader, in managed inference[4][2] |
| Innovation Pace | High — Token Factory launch, then Eigen AI ($643M), Clarifai, and Tavily acquisitions within roughly two quarters[8][6][2] |
| Community/Ecosystem | Weakest dimension — Hugging Face collaboration helps, but developer mindshare trails Baseten/Together and HN discussion is sparse[8][7] |
| Long-term Outlook | Favorable if mega-contract demand holds; the market priced both catalysts in, with shares up ~16% on the NVIDIA deal and ~12% on Eigen AI[4][9] |
The viability question inverts the usual startup calculus: Nebius will not run out of money quietly — it is a public company with NVIDIA on the cap table and $27B of contracted Meta demand — but Token Factory's fate depends on whether management keeps investing in the high-margin serving layer or lets the infrastructure mega-deals consume all strategic attention.[2][4]
Bottom Line
Nebius is the public-market heavyweight entering managed inference from below: it owns the GPUs, the data centers, and now (via Eigen AI) a credible optimization stack, and it sells all of that through Token Factory at per-token prices with enterprise SLAs and EU residency.[1][6] For buyers, the trade is financial durability and integration depth in exchange for a thinner developer ecosystem and an open-models-only catalog.
Recommended for: Enterprises serving open-weight models at volume that value an audited, NVIDIA-backed counterparty, compliance certifications, EU residency, and a path from per-token APIs to dedicated capacity.[1][4]
Not recommended for: Teams needing closed frontier models, developer-experience-first startups better served by Baseten or Together's tooling and community, or procurement environments where the Yandex heritage is disqualifying.[7]
Outlook: Watch whether Token Factory's "fastest-growing segment" claim gets quantified in coming quarters, how fast Eigen AI's optimizations land in production endpoints, and whether $7–9B ARR guidance survives 2026 — execution there would make Nebius the first inference platform with hyperscaler economics.[2][5]
Research by Ry Walker Research • methodology
Sources
- [1] Nebius Token Factory
- [2] BigGo Finance: NBIS Q1 2026 Earnings Call Summary
- [3] Motley Fool: Nebius (NBIS) Q1 2026 Earnings Transcript
- [4] NVIDIA Newsroom: NVIDIA and Nebius Partner to Scale Full-Stack AI Cloud
- [5] Nebius Newsroom: Agreement to Acquire Eigen AI
- [6] Techzine: Nebius acquires Eigen AI for $643 million
- [7] Nebius mentions on Hacker News (Algolia search)
- [8] Nebius Newsroom: Token Factory Launch
- [9] TradingView: Nebius shares jump 12% on Eigen AI deal