Nebius | Ry Walker Research

Key takeaways

NVIDIA invested $2B in Nebius on March 11, 2026 as part of a full-stack AI cloud partnership targeting 5+ gigawatts of NVIDIA systems deployed by end of 2030
Q1 2026 revenue hit $399M (up 684% YoY) with AI ARR at $1.9B; full-year guidance calls for $7–9B ARR exit and $20–25B of capex
Token Factory — per-token managed inference over 60+ open models — is management's stated fastest-growing segment, reinforced by the $643M Eigen AI acquisition and the Clarifai and Tavily deals
A $27B five-year Meta contract ($12B fixed, $15B optional) anchors the infrastructure side, making the inference platform a high-margin layer on top of a contracted neocloud

FAQ

What is Nebius Token Factory?

Token Factory is Nebius's managed AI inference platform — per-token API access to 60+ open-source models plus dedicated endpoints and fine-tuning, running on Nebius's own GPU cloud.

How much does Nebius Token Factory cost?

Per-token pricing with separate input/output rates and volume discounts, in Fast (low-latency) and Base (cost-efficient batch) tiers; exact rates are listed per model on the self-service pricing page.

What models does Token Factory serve?

60+ open models including DeepSeek-V4-Pro, Qwen3-235B, Llama, GPT OSS 120B/20B, GLM-5.1, MiniMax-M2.5, plus embedding and safety models, as of June 2026.

How is Nebius different from Baseten or Together AI?

Nebius is a public company that owns its data centers and sells the full stack from GPU clusters to per-token inference, while Baseten and Together are venture-backed inference-first platforms that largely rent or partner for capacity.

Executive Summary

Nebius (NASDAQ: NBIS) is the Amsterdam-headquartered neocloud — formerly Yandex N.V., which divested its Russian assets and relisted in 2024 — that has layered a managed inference business, Token Factory, on top of its GPU cloud. Token Factory serves 60+ open-source models (DeepSeek-V4-Pro, Qwen3-235B, Llama, GPT OSS, GLM-5.1, MiniMax-M2.5) through per-token APIs, dedicated endpoints with a 99.9% uptime SLA, and fine-tuning pipelines.^[1] Management called Token Factory the company's fastest-growing segment on the Q1 2026 earnings call, naming Revolut and 1X Technologies as recent customers.^[2]^[3]

The scale behind it is unusual for an inference vendor: Q1 2026 revenue of $399M grew 684% year-over-year, AI ARR reached $1.9B (up from $1.25B at end-2025), and NVIDIA made a $2B strategic equity investment on March 11, 2026, in a partnership targeting more than 5 gigawatts of NVIDIA systems deployed by end of 2030.^[2]^[4] Nebius then spent ~$643M to acquire inference-optimization firm Eigen AI (May 1, 2026) and folded in Clarifai and Tavily, assembling a full inference and agentic-deployment stack on top of a $27B five-year Meta infrastructure contract.^[5]^[6]^[2]

Attribute	Value
Company	Nebius Group N.V. (NASDAQ: NBIS), Amsterdam^[4]
Heritage	Spun out of Yandex N.V. after divesting Russian assets; relisted 2024^[7]
Q1 2026 Revenue	$399M (+684% YoY); AI revenue $390M (+841%)^[2]
AI ARR	$1.9B as of Q1 2026; FY2026 guidance $7–9B^[2]
Strategic Backing	NVIDIA $2B equity investment (March 11, 2026)^[4]
Anchor Contracts	Meta $27B/5yr ($12B fixed, $15B optional); Microsoft contract on schedule^[2]

Product Overview

Token Factory is the managed-inference layer of the Nebius cloud: pick an open model, hit an API with transparent $/token pricing, and scale to dedicated endpoints or fine-tuned variants without managing GPUs.^[1] It launched as the production-grade successor to Nebius AI Studio, pitched at companies deploying open-source and custom models with enterprise security, and counts Higgsfield AI as an adopter and Hugging Face as a collaborator for open-model access.^[8]

Two serving configurations cover the latency/cost trade: Fast (sub-second responses for interactive agents and chat) and Base (cost-efficient throughput for large-scale or background processing), switchable on the same API and endpoints.^[1]

Key Capabilities

Capability	Description
Model library	60+ open models — DeepSeek-V4-Pro, Qwen3-235B, Llama, GPT OSS 120B/20B, GLM-5.1, MiniMax-M2.5, plus embeddings and Llama-Guard safety models^[1]
Serving tiers	Fast (sub-second, interactive) and Base (batch/background), same API^[1]
Dedicated endpoints	Reserved capacity, 99.9% uptime SLA, custom autoscaling, EU/US regional deployment, guaranteed isolation^[1]
Fine-tuning	Custom fine-tuned models deploy via dashboard or API at per-token pricing; post-training and distillation forthcoming^[1]
Optimization stack	Eigen AI's post-training quantization, KV-cache optimization, and custom CUDA kernels being integrated directly into Token Factory^[6]^[5]
Multimodal/vision	Clarifai acquisition adds production-grade inference for multimodal and computer-vision workloads^[2]
Compliance	SOC 2 Type II, HIPAA, ISO 27001; data centers in Finland, France, and the US^[1]

Product Surfaces

Surface	Description	Availability
Per-token API	OpenAI-style shared endpoints over the open-model library	GA^[1]
Dedicated endpoints	Reserved-capacity instances with SLA and regional pinning	GA^[1]
Fine-tuning	Dashboard/API fine-tune-and-deploy pipeline	GA^[1]
GPU cloud / clusters	The underlying Nebius AI cloud for training and custom inference	GA^[4]

Technical Architecture

Unlike inference-first startups, Token Factory sits on infrastructure Nebius owns end to end — the NVIDIA partnership spans "AI factory architecture to production software" and targets 5+ GW of NVIDIA systems by end of 2030, with the $2B investment cited by management as supply-chain certainty for GPU allocation.^[4]^[2] The serving layer uses autoscaling, speculative decoding, and multi-region routing to hold sub-second latency targets at scale.^[8]

The Eigen AI acquisition is the architecture bet: a 20-person team of MIT HAN Lab alumni whose quantization, KV-cache, and custom CUDA-kernel stack reduces compute and memory per token, integrated directly into Token Factory's inference and post-training layers, with the founding team establishing a Nebius engineering presence in the San Francisco Bay Area.^[6]^[5]

Key Technical Details

Aspect	Detail
Deployment	Managed cloud on Nebius-owned data centers (Finland, France, US); EU/US data residency^[1]
Models	60+ open models, new ones onboarded on customer demand; custom fine-tunes supported^[1]
Optimization	Eigen AI quantization/KV-cache/CUDA kernels; speculative decoding; autoscaling^[6]^[8]
Hardware	NVIDIA systems exclusively at partnership scale (5+ GW by 2030); no custom silicon^[4]
Open Source	Serves open-weight models; platform itself is proprietary^[1]

Strengths

Vertically integrated economics — Nebius owns the data centers, the power contracts, and the serving layer, so per-token margins are not squeezed by a cloud landlord; the $20–25B capex program is the moat competitors must rent.^[2]
NVIDIA as investor, not just supplier — the $2B strategic investment and full-stack partnership (March 11, 2026) give allocation certainty on next-generation systems through 2030.^[4]
Hyperscale-grade financial visibility — $399M Q1 revenue (+684%), $1.9B AI ARR, positive $130M adjusted EBITDA, and a $27B Meta contract make this the rare inference vendor with audited, public financials.^[2]
Acquired optimization depth — paying $643M for 20-person Eigen AI signals inference efficiency is the strategic priority; Clarifai and Tavily round out multimodal inference and agentic search.^[6]^[2]
Enterprise compliance posture — SOC 2 Type II, HIPAA, and ISO 27001 with EU data residency, a differentiator for European buyers that US-only rivals lack.^[1]

Cautions

Inference is the side bet, not the business — Token Factory is the fastest-growing segment, but Nebius's revenue is dominated by raw AI infrastructure; per-token serving could be deprioritized if hyperscale leases keep outgrowing it.^[2]
Customer-concentration risk — a $27B Meta contract anchors the model; the company's fate is tied to a handful of mega-deals, and capex guidance of $20–25B against $3.0–3.4B revenue guidance means heavy ongoing financing needs.^[2]
No published per-token rate card in marketing pages — pricing is "transparent $/token" with volume discounts, but specific model rates require the self-service console, complicating comparison shopping.^[1]
Heritage friction in some procurement processes — the Yandex lineage still surfaces in vendor reviews; one HN commenter reported legal counsel vetoing Nebius during GPU-provider due diligence.^[7]
Quantized serving trade-offs — community reports note FP8-quantized model variants on Nebius's serving stack, which buyers benchmarking against full-precision endpoints should verify per model.^[7]
Open models only — no proprietary frontier models (Claude, GPT, Gemini) are served, so teams needing closed models still need a second provider.^[1]

What Developers Say

Community discussion of Nebius on Hacker News through June 2026 is thinner than its financial profile would suggest — there is no major dedicated Token Factory thread, and most mentions are stock chatter or passing infrastructure references; that gap between Wall Street attention and developer mindshare is itself a data point.^[7]

"We do both: managed Kubernetes when it's available (AWS, Nebius, others)" — reissbaker on HN, building a GPU cloud, April 2026^[7]

"Nebius AI Studio" runs FP8 quantized model versions; "my experience with FP8 output has been pretty decent" — KronisLV on HN, using GLM with Claude Code^[7]

"When we were doing a deep-dive into cloud GPU providers, legal counsel veto'd them for this reason." — chias on HN, on Nebius's Yandex heritage during a European-infrastructure evaluation^[7]

The pattern: practitioners who use Nebius treat it as a competent GPU/inference utility, while the criticism that does surface is about provenance and procurement comfort rather than product quality.^[7]

Pricing & Licensing

Tier	Price	Includes
Shared endpoints (Fast)	Per-token, input/output separated	Sub-second responses for interactive agents and chat^[1]
Shared endpoints (Base)	Per-token, lower rates	Cost-efficient throughput for batch/background processing^[1]
Dedicated endpoints	Reserved capacity, custom	99.9% uptime SLA, custom autoscaling, EU/US regional deployment, isolation^[1]
Enterprise	Custom	Dedicated Slack/support channels, custom DPAs, SSO, RBAC, unified billing^[1]

Per-model $/token rates are published in the self-service console rather than the marketing site; volume discounts apply as usage scales, with no infrastructure or idle-GPU charges on shared endpoints.^[1]

Licensing model: Proprietary managed platform serving open-weight models; fine-tuned custom models deploy under the same per-token billing.^[1]

Hidden costs: Dedicated endpoints carry reserved-capacity commitments; quantized (e.g., FP8) variants may differ from reference model quality, which is a cost-quality trade rather than a line item.^[1]^[7]

Competitive Positioning

Direct Competitors

Competitor	Differentiation
Baseten	Baseten (~$600M ARR, reported $11B valuation talks) leads on custom-model deployment, Truss tooling, and forward-deployed engineers; Nebius counters with owned data centers, public-company financials, and EU residency
Together AI	Together (~$1B annualized revenue) pairs inference with research credibility (FlashAttention lineage) and GPU clusters; Nebius matches the full-stack scope but at hyperscale capex with NVIDIA equity backing
Fireworks AI	Fireworks competes on serving-speed optimization; Nebius answers with the acquired Eigen AI stack plus vertically integrated capacity
DeepInfra	DeepInfra is the budget per-token option; Nebius targets enterprise SLAs, compliance, and dedicated capacity rather than the lowest sticker price
Hyperscalers (AWS/Azure/GCP)	Broader clouds with proprietary-model access; Nebius is AI-only, cheaper at GPU scale, and now a Meta/Microsoft supplier itself

When to Choose Nebius Over Alternatives

Choose Nebius Token Factory when: you serve open models at volume, want per-token pricing backed by an owner-operator with a 99.9% SLA, need EU data residency or SOC 2/HIPAA/ISO 27001, or want one vendor spanning GPU clusters to managed inference.
Choose Baseten when: custom-model deployment workflows, compliance plus hands-on engineering support, and broad GPU selection matter more than owning-the-stack economics.
Choose Together AI when: you want research-driven serving optimizations and a training-to-inference platform from one venture-backed vendor.
Choose Groq/Cerebras when: custom-silicon latency is the deciding factor.

Ideal Customer Profile

Best fit:

Enterprises standardizing on open-weight models (DeepSeek, Qwen, Llama) that want a contractual SLA and audited public-company counterparty rather than a startup
European organizations needing EU data residency with SOC 2 Type II/HIPAA/ISO 27001 inference^[1]
Teams scaling from per-token APIs to dedicated endpoints to GPU clusters without switching vendors
High-volume batch workloads that can exploit the Base tier's throughput pricing^[1]

Poor fit:

Teams that need proprietary frontier models (Claude, GPT, Gemini) from the same endpoint
Buyers whose procurement or legal teams balk at the Yandex heritage^[7]
Small projects wanting the absolute lowest per-token price over SLAs and compliance

Viability Assessment

Factor	Assessment
Financial Health	Strong but capital-hungry — $1.9B AI ARR, positive $130M adjusted EBITDA, $621M Q1 net income (incl. non-cash ClickHouse gain), against $20–25B capex guidance^[2]
Market Position	Top-tier neocloud — NVIDIA equity backing, Meta ($27B) and Microsoft contracts; Token Factory is a challenger, not yet leader, in managed inference^[4]^[2]
Innovation Pace	High — Token Factory launch, then Eigen AI ($643M), Clarifai, and Tavily acquisitions within roughly two quarters^[8]^[6]^[2]
Community/Ecosystem	Weakest dimension — Hugging Face collaboration helps, but developer mindshare trails Baseten/Together and HN discussion is sparse^[8]^[7]
Long-term Outlook	Favorable if mega-contract demand holds; the market priced both catalysts in, with shares up ~16% on the NVIDIA deal and ~12% on Eigen AI^[4]^[9]

The viability question inverts the usual startup calculus: Nebius will not run out of money quietly — it is a public company with NVIDIA on the cap table and $27B of contracted Meta demand — but Token Factory's fate depends on whether management keeps investing in the high-margin serving layer or lets the infrastructure mega-deals consume all strategic attention.^[2]^[4]

Bottom Line

Nebius is the public-market heavyweight entering managed inference from below: it owns the GPUs, the data centers, and now (via Eigen AI) a credible optimization stack, and it sells all of that through Token Factory at per-token prices with enterprise SLAs and EU residency.^[1]^[6] For buyers, the trade is financial durability and integration depth in exchange for a thinner developer ecosystem and an open-models-only catalog.

Recommended for: Enterprises serving open-weight models at volume that value an audited, NVIDIA-backed counterparty, compliance certifications, EU residency, and a path from per-token APIs to dedicated capacity.^[1]^[4]

Not recommended for: Teams needing closed frontier models, developer-experience-first startups better served by Baseten or Together's tooling and community, or procurement environments where the Yandex heritage is disqualifying.^[7]

Outlook: Watch whether Token Factory's "fastest-growing segment" claim gets quantified in coming quarters, how fast Eigen AI's optimizations land in production endpoints, and whether $7–9B ARR guidance survives 2026 — execution there would make Nebius the first inference platform with hyperscaler economics.^[2]^[5]

Research by Ry Walker Research • methodology

Sources