← Back to research
·6 min read·product

Together AI

Together AI is the AI Native Cloud — inference, fine-tuning, pre-training, GPU clusters, and code sandboxes — at roughly $1B annualized revenue and reportedly raising $1B at a $7.5B valuation.

Key takeaways

  • Research-driven platform from FlashAttention creators — now shipping FlashAttention-4, ATLAS speculative decoding, and Mamba-3 alongside RedPajama and DeepCoder
  • Roughly $1B annualized revenue as of early 2026 — up more than 3x from mid-2025 — and reportedly raising $1B at a $7.5B pre-money valuation
  • Full-stack AI cloud covering inference, fine-tuning, pre-training, GPU clusters, and Together Code Sandbox (from the CodeSandbox acquisition)
  • Current performance claims: 2x faster inference, 60% cost reduction, 90% faster pre-training with the Together Kernel Collection

FAQ

What is Together AI?

An AI cloud platform providing inference, fine-tuning, pre-training, GPU clusters, and code sandboxes with research-grade infrastructure.

Who founded Together AI?

Founded in 2021 by Vipul Ved Prakash, Percy Liang, and Chris Ré — the team behind FlashAttention, RedPajama, Mixture of Agents, and DeepCoder.

What hardware does Together AI use?

NVIDIA Blackwell hardware — on-demand B200 GPU clusters, with GB200 and GB300 NVL72 racks announced for large-scale deployments.

How many models does Together AI support?

A broad catalog of open models — Llama, Qwen, DeepSeek, MiniMax-M3, and more — via OpenAI-compatible APIs.

Company Overview

Together AI positions itself as "The AI Native Cloud" — a full-stack platform for AI inference, fine-tuning, pre-training, and GPU cluster management.[1] What distinguishes Together AI from pure inference providers is its research DNA: the team created FlashAttention (now standard in virtually all transformer training, with FlashAttention-4 published at MLSys) and has contributed the RedPajama datasets — 100T+ tokens powering 500+ models — plus Mixture of Agents, DeepCoder, ATLAS, and Mamba-3 to the open ecosystem.[2]

The business has scaled fast. As of early 2026 the company is at roughly $1B in annualized revenue — up more than 3x from mid-2025 — and was reported in March 2026 to be in talks to raise $1B at a $7.5B pre-money valuation, more than double the $3.3B mark from its $305M Series B in February 2025.[3] Prior backers include NVIDIA, General Catalyst, Kleiner Perkins, and Prosperity7 Ventures (Aramco's venture arm), with $537M raised before the new round.[3] Customer logos as of June 2026 include Cursor, Decagon, Cohere, ElevenLabs, and DeepMind.[1]

What It Does

  • Inference — A broad catalog of open models via OpenAI-compatible APIs, including Llama, Qwen, DeepSeek, and MiniMax-M3, plus audio, image, and vision models[4]
  • Fine-tuning — Custom model training on Together's infrastructure
  • Pre-training — Full model training from scratch on dedicated clusters
  • GPU clusters — On-demand NVIDIA Blackwell B200 clusters, with GB200/GB300 NVL72 racks for large-scale workloads[1]
  • Together Code Sandbox — Fast, secure code sandboxes for AI apps and agents, built on the December 2024 CodeSandbox acquisition (see Together Code Sandbox)[5]
  • Serverless & dedicated — Choose between shared endpoints or reserved capacity

How It Works

Together AI runs on current NVIDIA hardware — on-demand B200 GPU clusters today, with GB200/GB300 NVL72 racks announced for large deployments.[1] The inference stack is optimized with custom kernels from FlashAttention-4 and ThunderKittens research, runtime-learning speculative decoding (ATLAS), and the Together Kernel Collection for training.[2]

For inference, you call OpenAI-compatible API endpoints. For fine-tuning and pre-training, you configure jobs through the API or dashboard. Code execution runs in isolated sandboxes alongside model calls, documented under the platform's code-execution APIs.[4] GPU clusters provide dedicated multi-node hardware for organizations needing guaranteed capacity.

What's New (as of June 2026)

  • ISO 27001:2022 certification (June 2026) — enterprise-grade security validation for production AI workloads[6]
  • MiniMax-M3 inference (June 2026) — served with KV-block-major sparse attention and a Rust-based multimodal gateway[6]
  • Speech-to-text stack (May 2026) — claims the fastest ASR on Artificial Analysis benchmarks[6]
  • Coding-agent benchmarks (April 2026) — claims 31% more TPS than TensorRT-LLM, 2x better time-to-first-token at saturation, and 76% lower cost than Claude Opus 4.6 for coding agents[6]
  • Together Code Sandbox — the CodeSandbox SDK has matured into a first-party sandbox product for agents[5]

Pricing

  • Per-token pricing for inference (varies by model)
  • Per-GPU-hour for fine-tuning, training, and on-demand B200 clusters
  • Dedicated clusters — custom pricing for reserved hardware
  • Free credits available for new accounts

Together AI now claims a 60% cost reduction through workload-specific optimization (up from the earlier "20% lower cost" claim), and 76% lower cost than Claude Opus 4.6 for coding-agent workloads — vendor benchmarks that vary by model and workload.[1]

Strengths

  • Research pedigree — FlashAttention-4, ATLAS, and Mamba-3 show the optimization engine is still producing[2]
  • Current hardware — On-demand Blackwell B200 clusters, with GB200/GB300 NVL72 for scale
  • Full stack — Inference + fine-tuning + pre-training + clusters + code sandboxes under one roof
  • Open-source contributions — RedPajama (100T+ tokens, 500+ models), DeepCoder build community trust[2]
  • Real traction — ~$1B annualized revenue and marquee customers like Cursor and ElevenLabs[3]
  • Performance claims — 2x faster inference, 60% cost reduction, 90% faster pre-training[1]

Weaknesses / Risks

  • Premium positioning — Cutting-edge hardware comes at a price
  • Compliance still maturing — ISO 27001:2022 landed in June 2026, but HIPAA is not advertised (vs Baseten, Fireworks)[6]
  • Crowded market — Competing with well-funded Baseten, Fireworks, and custom silicon players; some developers rank cheaper hosts above it on price-to-performance[7]
  • Scope expansion — CodeSandbox/sandbox products and speech-to-text broaden the surface area beyond core inference
  • Valuation expectations — A reported $7.5B valuation prices in continued 3x-style growth[3]

What Developers Say

Public developer commentary on Together AI is thinner than its revenue scale would suggest; the most candid thread remains the Hacker News discussion of the CodeSandbox acquisition. There, user ilrwbwrkhv pushed back on the company's self-description: "'Together AI has quickly grown into the leading hosting platform for AI' Lol. Definitely not." — ranking Together "at the bottom of the pile" against Cerebras, DeepInfra, and Hyperbolic on price-to-performance.[7] That is one voice from December 2024, and the company's customer list (Cursor, Decagon, ElevenLabs) suggests production buyers see it differently[1] — but no strong corpus of recent first-hand developer reviews surfaced as of June 2026.

Competitive Landscape

vs. Baseten: Baseten leads on compliance (HIPAA) and GPU breadth. Together AI leads on hardware generation, research, and now scale (~$1B annualized revenue).

vs. Fireworks AI: Fireworks focuses on pure inference speed and reliability. Together AI offers a broader platform including pre-training and code sandboxes.

vs. Groq/Cerebras: Custom silicon is faster for supported models. Together AI offers more model variety and fine-tuning.

vs. Modal: Modal is general-purpose serverless compute. Together AI is purpose-built for AI workloads with optimized serving.

Ideal User

  • AI teams wanting current-generation NVIDIA hardware without managing infrastructure
  • Organizations needing the full ML lifecycle (train → fine-tune → serve), plus sandboxed code execution for agents
  • Research groups that value open-source contributions and community alignment
  • Companies running large-scale inference across many model types

Bottom Line

Together AI is the research-meets-production play in AI inference, and the bet is now validated by numbers: roughly $1B annualized revenue, a reported $1B raise at $7.5B, and customers like Cursor and ElevenLabs.[3] The FlashAttention-4/ATLAS pipeline and on-demand Blackwell clusters create genuine technical differentiation, and the CodeSandbox acquisition has matured into a real agent-infrastructure product rather than a distraction.[5]

Recommended for: Teams that want one platform for the entire ML lifecycle — training through serving through agent sandboxes — with research-grade optimization.

Not recommended for: Buyers who shop purely on price-per-token (cheaper hosts exist) or who need HIPAA-grade compliance today.

Outlook: Expect the funding round to close and the platform to keep absorbing adjacent agent infrastructure (sandboxes, speech, coding-agent serving). The main thing to watch is whether premium pricing holds as Blackwell capacity commoditizes.