← Back to research
·12 min read·product

Nebius

Nebius is the public neocloud (NASDAQ: NBIS) whose Token Factory managed inference platform is its fastest-growing segment — backed by a $2B NVIDIA strategic investment, a $643M Eigen AI acquisition, and a $27B Meta infrastructure contract.

Key takeaways

  • NVIDIA invested $2B in Nebius on March 11, 2026 as part of a full-stack AI cloud partnership targeting 5+ gigawatts of NVIDIA systems deployed by end of 2030
  • Q1 2026 revenue hit $399M (up 684% YoY) with AI ARR at $1.9B; full-year guidance calls for $7–9B ARR exit and $20–25B of capex
  • Token Factory — per-token managed inference over 60+ open models — is management's stated fastest-growing segment, reinforced by the $643M Eigen AI acquisition and the Clarifai and Tavily deals
  • A $27B five-year Meta contract ($12B fixed, $15B optional) anchors the infrastructure side, making the inference platform a high-margin layer on top of a contracted neocloud

FAQ

What is Nebius Token Factory?

Token Factory is Nebius's managed AI inference platform — per-token API access to 60+ open-source models plus dedicated endpoints and fine-tuning, running on Nebius's own GPU cloud.

How much does Nebius Token Factory cost?

Per-token pricing with separate input/output rates and volume discounts, in Fast (low-latency) and Base (cost-efficient batch) tiers; exact rates are listed per model on the self-service pricing page.

What models does Token Factory serve?

60+ open models including DeepSeek-V4-Pro, Qwen3-235B, Llama, GPT OSS 120B/20B, GLM-5.1, MiniMax-M2.5, plus embedding and safety models, as of June 2026.

How is Nebius different from Baseten or Together AI?

Nebius is a public company that owns its data centers and sells the full stack from GPU clusters to per-token inference, while Baseten and Together are venture-backed inference-first platforms that largely rent or partner for capacity.

Executive Summary

Nebius (NASDAQ: NBIS) is the Amsterdam-headquartered neocloud — formerly Yandex N.V., which divested its Russian assets and relisted in 2024 — that has layered a managed inference business, Token Factory, on top of its GPU cloud. Token Factory serves 60+ open-source models (DeepSeek-V4-Pro, Qwen3-235B, Llama, GPT OSS, GLM-5.1, MiniMax-M2.5) through per-token APIs, dedicated endpoints with a 99.9% uptime SLA, and fine-tuning pipelines.[1] Management called Token Factory the company's fastest-growing segment on the Q1 2026 earnings call, naming Revolut and 1X Technologies as recent customers.[2][3]

The scale behind it is unusual for an inference vendor: Q1 2026 revenue of $399M grew 684% year-over-year, AI ARR reached $1.9B (up from $1.25B at end-2025), and NVIDIA made a $2B strategic equity investment on March 11, 2026, in a partnership targeting more than 5 gigawatts of NVIDIA systems deployed by end of 2030.[2][4] Nebius then spent ~$643M to acquire inference-optimization firm Eigen AI (May 1, 2026) and folded in Clarifai and Tavily, assembling a full inference and agentic-deployment stack on top of a $27B five-year Meta infrastructure contract.[5][6][2]

AttributeValue
CompanyNebius Group N.V. (NASDAQ: NBIS), Amsterdam[4]
HeritageSpun out of Yandex N.V. after divesting Russian assets; relisted 2024[7]
Q1 2026 Revenue$399M (+684% YoY); AI revenue $390M (+841%)[2]
AI ARR$1.9B as of Q1 2026; FY2026 guidance $7–9B[2]
Strategic BackingNVIDIA $2B equity investment (March 11, 2026)[4]
Anchor ContractsMeta $27B/5yr ($12B fixed, $15B optional); Microsoft contract on schedule[2]

Product Overview

Token Factory is the managed-inference layer of the Nebius cloud: pick an open model, hit an API with transparent $/token pricing, and scale to dedicated endpoints or fine-tuned variants without managing GPUs.[1] It launched as the production-grade successor to Nebius AI Studio, pitched at companies deploying open-source and custom models with enterprise security, and counts Higgsfield AI as an adopter and Hugging Face as a collaborator for open-model access.[8]

Two serving configurations cover the latency/cost trade: Fast (sub-second responses for interactive agents and chat) and Base (cost-efficient throughput for large-scale or background processing), switchable on the same API and endpoints.[1]

Key Capabilities

CapabilityDescription
Model library60+ open models — DeepSeek-V4-Pro, Qwen3-235B, Llama, GPT OSS 120B/20B, GLM-5.1, MiniMax-M2.5, plus embeddings and Llama-Guard safety models[1]
Serving tiersFast (sub-second, interactive) and Base (batch/background), same API[1]
Dedicated endpointsReserved capacity, 99.9% uptime SLA, custom autoscaling, EU/US regional deployment, guaranteed isolation[1]
Fine-tuningCustom fine-tuned models deploy via dashboard or API at per-token pricing; post-training and distillation forthcoming[1]
Optimization stackEigen AI's post-training quantization, KV-cache optimization, and custom CUDA kernels being integrated directly into Token Factory[6][5]
Multimodal/visionClarifai acquisition adds production-grade inference for multimodal and computer-vision workloads[2]
ComplianceSOC 2 Type II, HIPAA, ISO 27001; data centers in Finland, France, and the US[1]

Product Surfaces

SurfaceDescriptionAvailability
Per-token APIOpenAI-style shared endpoints over the open-model libraryGA[1]
Dedicated endpointsReserved-capacity instances with SLA and regional pinningGA[1]
Fine-tuningDashboard/API fine-tune-and-deploy pipelineGA[1]
GPU cloud / clustersThe underlying Nebius AI cloud for training and custom inferenceGA[4]

Technical Architecture

Unlike inference-first startups, Token Factory sits on infrastructure Nebius owns end to end — the NVIDIA partnership spans "AI factory architecture to production software" and targets 5+ GW of NVIDIA systems by end of 2030, with the $2B investment cited by management as supply-chain certainty for GPU allocation.[4][2] The serving layer uses autoscaling, speculative decoding, and multi-region routing to hold sub-second latency targets at scale.[8]

The Eigen AI acquisition is the architecture bet: a 20-person team of MIT HAN Lab alumni whose quantization, KV-cache, and custom CUDA-kernel stack reduces compute and memory per token, integrated directly into Token Factory's inference and post-training layers, with the founding team establishing a Nebius engineering presence in the San Francisco Bay Area.[6][5]

Key Technical Details

AspectDetail
DeploymentManaged cloud on Nebius-owned data centers (Finland, France, US); EU/US data residency[1]
Models60+ open models, new ones onboarded on customer demand; custom fine-tunes supported[1]
OptimizationEigen AI quantization/KV-cache/CUDA kernels; speculative decoding; autoscaling[6][8]
HardwareNVIDIA systems exclusively at partnership scale (5+ GW by 2030); no custom silicon[4]
Open SourceServes open-weight models; platform itself is proprietary[1]

Strengths

  • Vertically integrated economics — Nebius owns the data centers, the power contracts, and the serving layer, so per-token margins are not squeezed by a cloud landlord; the $20–25B capex program is the moat competitors must rent.[2]
  • NVIDIA as investor, not just supplier — the $2B strategic investment and full-stack partnership (March 11, 2026) give allocation certainty on next-generation systems through 2030.[4]
  • Hyperscale-grade financial visibility — $399M Q1 revenue (+684%), $1.9B AI ARR, positive $130M adjusted EBITDA, and a $27B Meta contract make this the rare inference vendor with audited, public financials.[2]
  • Acquired optimization depth — paying $643M for 20-person Eigen AI signals inference efficiency is the strategic priority; Clarifai and Tavily round out multimodal inference and agentic search.[6][2]
  • Enterprise compliance posture — SOC 2 Type II, HIPAA, and ISO 27001 with EU data residency, a differentiator for European buyers that US-only rivals lack.[1]

Cautions

  • Inference is the side bet, not the business — Token Factory is the fastest-growing segment, but Nebius's revenue is dominated by raw AI infrastructure; per-token serving could be deprioritized if hyperscale leases keep outgrowing it.[2]
  • Customer-concentration risk — a $27B Meta contract anchors the model; the company's fate is tied to a handful of mega-deals, and capex guidance of $20–25B against $3.0–3.4B revenue guidance means heavy ongoing financing needs.[2]
  • No published per-token rate card in marketing pages — pricing is "transparent $/token" with volume discounts, but specific model rates require the self-service console, complicating comparison shopping.[1]
  • Heritage friction in some procurement processes — the Yandex lineage still surfaces in vendor reviews; one HN commenter reported legal counsel vetoing Nebius during GPU-provider due diligence.[7]
  • Quantized serving trade-offs — community reports note FP8-quantized model variants on Nebius's serving stack, which buyers benchmarking against full-precision endpoints should verify per model.[7]
  • Open models only — no proprietary frontier models (Claude, GPT, Gemini) are served, so teams needing closed models still need a second provider.[1]

What Developers Say

Community discussion of Nebius on Hacker News through June 2026 is thinner than its financial profile would suggest — there is no major dedicated Token Factory thread, and most mentions are stock chatter or passing infrastructure references; that gap between Wall Street attention and developer mindshare is itself a data point.[7]

"We do both: managed Kubernetes when it's available (AWS, Nebius, others)" — reissbaker on HN, building a GPU cloud, April 2026[7]

"Nebius AI Studio" runs FP8 quantized model versions; "my experience with FP8 output has been pretty decent" — KronisLV on HN, using GLM with Claude Code[7]

"When we were doing a deep-dive into cloud GPU providers, legal counsel veto'd them for this reason." — chias on HN, on Nebius's Yandex heritage during a European-infrastructure evaluation[7]

The pattern: practitioners who use Nebius treat it as a competent GPU/inference utility, while the criticism that does surface is about provenance and procurement comfort rather than product quality.[7]


Pricing & Licensing

TierPriceIncludes
Shared endpoints (Fast)Per-token, input/output separatedSub-second responses for interactive agents and chat[1]
Shared endpoints (Base)Per-token, lower ratesCost-efficient throughput for batch/background processing[1]
Dedicated endpointsReserved capacity, custom99.9% uptime SLA, custom autoscaling, EU/US regional deployment, isolation[1]
EnterpriseCustomDedicated Slack/support channels, custom DPAs, SSO, RBAC, unified billing[1]

Per-model $/token rates are published in the self-service console rather than the marketing site; volume discounts apply as usage scales, with no infrastructure or idle-GPU charges on shared endpoints.[1]

Licensing model: Proprietary managed platform serving open-weight models; fine-tuned custom models deploy under the same per-token billing.[1]

Hidden costs: Dedicated endpoints carry reserved-capacity commitments; quantized (e.g., FP8) variants may differ from reference model quality, which is a cost-quality trade rather than a line item.[1][7]


Competitive Positioning

Direct Competitors

CompetitorDifferentiation
BasetenBaseten (~$600M ARR, reported $11B valuation talks) leads on custom-model deployment, Truss tooling, and forward-deployed engineers; Nebius counters with owned data centers, public-company financials, and EU residency
Together AITogether (~$1B annualized revenue) pairs inference with research credibility (FlashAttention lineage) and GPU clusters; Nebius matches the full-stack scope but at hyperscale capex with NVIDIA equity backing
Fireworks AIFireworks competes on serving-speed optimization; Nebius answers with the acquired Eigen AI stack plus vertically integrated capacity
DeepInfraDeepInfra is the budget per-token option; Nebius targets enterprise SLAs, compliance, and dedicated capacity rather than the lowest sticker price
Hyperscalers (AWS/Azure/GCP)Broader clouds with proprietary-model access; Nebius is AI-only, cheaper at GPU scale, and now a Meta/Microsoft supplier itself

When to Choose Nebius Over Alternatives

  • Choose Nebius Token Factory when: you serve open models at volume, want per-token pricing backed by an owner-operator with a 99.9% SLA, need EU data residency or SOC 2/HIPAA/ISO 27001, or want one vendor spanning GPU clusters to managed inference.
  • Choose Baseten when: custom-model deployment workflows, compliance plus hands-on engineering support, and broad GPU selection matter more than owning-the-stack economics.
  • Choose Together AI when: you want research-driven serving optimizations and a training-to-inference platform from one venture-backed vendor.
  • Choose Groq/Cerebras when: custom-silicon latency is the deciding factor.

Ideal Customer Profile

Best fit:

  • Enterprises standardizing on open-weight models (DeepSeek, Qwen, Llama) that want a contractual SLA and audited public-company counterparty rather than a startup
  • European organizations needing EU data residency with SOC 2 Type II/HIPAA/ISO 27001 inference[1]
  • Teams scaling from per-token APIs to dedicated endpoints to GPU clusters without switching vendors
  • High-volume batch workloads that can exploit the Base tier's throughput pricing[1]

Poor fit:

  • Teams that need proprietary frontier models (Claude, GPT, Gemini) from the same endpoint
  • Buyers whose procurement or legal teams balk at the Yandex heritage[7]
  • Small projects wanting the absolute lowest per-token price over SLAs and compliance

Viability Assessment

FactorAssessment
Financial HealthStrong but capital-hungry — $1.9B AI ARR, positive $130M adjusted EBITDA, $621M Q1 net income (incl. non-cash ClickHouse gain), against $20–25B capex guidance[2]
Market PositionTop-tier neocloud — NVIDIA equity backing, Meta ($27B) and Microsoft contracts; Token Factory is a challenger, not yet leader, in managed inference[4][2]
Innovation PaceHigh — Token Factory launch, then Eigen AI ($643M), Clarifai, and Tavily acquisitions within roughly two quarters[8][6][2]
Community/EcosystemWeakest dimension — Hugging Face collaboration helps, but developer mindshare trails Baseten/Together and HN discussion is sparse[8][7]
Long-term OutlookFavorable if mega-contract demand holds; the market priced both catalysts in, with shares up ~16% on the NVIDIA deal and ~12% on Eigen AI[4][9]

The viability question inverts the usual startup calculus: Nebius will not run out of money quietly — it is a public company with NVIDIA on the cap table and $27B of contracted Meta demand — but Token Factory's fate depends on whether management keeps investing in the high-margin serving layer or lets the infrastructure mega-deals consume all strategic attention.[2][4]


Bottom Line

Nebius is the public-market heavyweight entering managed inference from below: it owns the GPUs, the data centers, and now (via Eigen AI) a credible optimization stack, and it sells all of that through Token Factory at per-token prices with enterprise SLAs and EU residency.[1][6] For buyers, the trade is financial durability and integration depth in exchange for a thinner developer ecosystem and an open-models-only catalog.

Recommended for: Enterprises serving open-weight models at volume that value an audited, NVIDIA-backed counterparty, compliance certifications, EU residency, and a path from per-token APIs to dedicated capacity.[1][4]

Not recommended for: Teams needing closed frontier models, developer-experience-first startups better served by Baseten or Together's tooling and community, or procurement environments where the Yandex heritage is disqualifying.[7]

Outlook: Watch whether Token Factory's "fastest-growing segment" claim gets quantified in coming quarters, how fast Eigen AI's optimizations land in production endpoints, and whether $7–9B ARR guidance survives 2026 — execution there would make Nebius the first inference platform with hyperscaler economics.[2][5]


Research by Ry Walker Research • methodology