Key takeaways
- Research-driven platform from FlashAttention creators — now shipping FlashAttention-4, ATLAS speculative decoding, and Mamba-3 alongside RedPajama and DeepCoder
- Roughly $1B annualized revenue as of early 2026 — up more than 3x from mid-2025 — and reportedly raising $1B at a $7.5B pre-money valuation
- Full-stack AI cloud covering inference, fine-tuning, pre-training, GPU clusters, and Together Code Sandbox (from the CodeSandbox acquisition)
- Current performance claims: 2x faster inference, 60% cost reduction, 90% faster pre-training with the Together Kernel Collection
FAQ
What is Together AI?
An AI cloud platform providing inference, fine-tuning, pre-training, GPU clusters, and code sandboxes with research-grade infrastructure.
Who founded Together AI?
Founded in 2021 by Vipul Ved Prakash, Percy Liang, and Chris Ré — the team behind FlashAttention, RedPajama, Mixture of Agents, and DeepCoder.
What hardware does Together AI use?
NVIDIA Blackwell hardware — on-demand B200 GPU clusters, with GB200 and GB300 NVL72 racks announced for large-scale deployments.
How many models does Together AI support?
A broad catalog of open models — Llama, Qwen, DeepSeek, MiniMax-M3, and more — via OpenAI-compatible APIs.
Company Overview
Together AI positions itself as "The AI Native Cloud" — a full-stack platform for AI inference, fine-tuning, pre-training, and GPU cluster management.[1] What distinguishes Together AI from pure inference providers is its research DNA: the team created FlashAttention (now standard in virtually all transformer training, with FlashAttention-4 published at MLSys) and has contributed the RedPajama datasets — 100T+ tokens powering 500+ models — plus Mixture of Agents, DeepCoder, ATLAS, and Mamba-3 to the open ecosystem.[2]
The business has scaled fast. As of early 2026 the company is at roughly $1B in annualized revenue — up more than 3x from mid-2025 — and was reported in March 2026 to be in talks to raise $1B at a $7.5B pre-money valuation, more than double the $3.3B mark from its $305M Series B in February 2025.[3] Prior backers include NVIDIA, General Catalyst, Kleiner Perkins, and Prosperity7 Ventures (Aramco's venture arm), with $537M raised before the new round.[3] Customer logos as of June 2026 include Cursor, Decagon, Cohere, ElevenLabs, and DeepMind.[1]
What It Does
- Inference — A broad catalog of open models via OpenAI-compatible APIs, including Llama, Qwen, DeepSeek, and MiniMax-M3, plus audio, image, and vision models[4]
- Fine-tuning — Custom model training on Together's infrastructure
- Pre-training — Full model training from scratch on dedicated clusters
- GPU clusters — On-demand NVIDIA Blackwell B200 clusters, with GB200/GB300 NVL72 racks for large-scale workloads[1]
- Together Code Sandbox — Fast, secure code sandboxes for AI apps and agents, built on the December 2024 CodeSandbox acquisition (see Together Code Sandbox)[5]
- Serverless & dedicated — Choose between shared endpoints or reserved capacity
How It Works
Together AI runs on current NVIDIA hardware — on-demand B200 GPU clusters today, with GB200/GB300 NVL72 racks announced for large deployments.[1] The inference stack is optimized with custom kernels from FlashAttention-4 and ThunderKittens research, runtime-learning speculative decoding (ATLAS), and the Together Kernel Collection for training.[2]
For inference, you call OpenAI-compatible API endpoints. For fine-tuning and pre-training, you configure jobs through the API or dashboard. Code execution runs in isolated sandboxes alongside model calls, documented under the platform's code-execution APIs.[4] GPU clusters provide dedicated multi-node hardware for organizations needing guaranteed capacity.
What's New (as of June 2026)
- ISO 27001:2022 certification (June 2026) — enterprise-grade security validation for production AI workloads[6]
- MiniMax-M3 inference (June 2026) — served with KV-block-major sparse attention and a Rust-based multimodal gateway[6]
- Speech-to-text stack (May 2026) — claims the fastest ASR on Artificial Analysis benchmarks[6]
- Coding-agent benchmarks (April 2026) — claims 31% more TPS than TensorRT-LLM, 2x better time-to-first-token at saturation, and 76% lower cost than Claude Opus 4.6 for coding agents[6]
- Together Code Sandbox — the CodeSandbox SDK has matured into a first-party sandbox product for agents[5]
Pricing
- Per-token pricing for inference (varies by model)
- Per-GPU-hour for fine-tuning, training, and on-demand B200 clusters
- Dedicated clusters — custom pricing for reserved hardware
- Free credits available for new accounts
Together AI now claims a 60% cost reduction through workload-specific optimization (up from the earlier "20% lower cost" claim), and 76% lower cost than Claude Opus 4.6 for coding-agent workloads — vendor benchmarks that vary by model and workload.[1]
Strengths
- Research pedigree — FlashAttention-4, ATLAS, and Mamba-3 show the optimization engine is still producing[2]
- Current hardware — On-demand Blackwell B200 clusters, with GB200/GB300 NVL72 for scale
- Full stack — Inference + fine-tuning + pre-training + clusters + code sandboxes under one roof
- Open-source contributions — RedPajama (100T+ tokens, 500+ models), DeepCoder build community trust[2]
- Real traction — ~$1B annualized revenue and marquee customers like Cursor and ElevenLabs[3]
- Performance claims — 2x faster inference, 60% cost reduction, 90% faster pre-training[1]
Weaknesses / Risks
- Premium positioning — Cutting-edge hardware comes at a price
- Compliance still maturing — ISO 27001:2022 landed in June 2026, but HIPAA is not advertised (vs Baseten, Fireworks)[6]
- Crowded market — Competing with well-funded Baseten, Fireworks, and custom silicon players; some developers rank cheaper hosts above it on price-to-performance[7]
- Scope expansion — CodeSandbox/sandbox products and speech-to-text broaden the surface area beyond core inference
- Valuation expectations — A reported $7.5B valuation prices in continued 3x-style growth[3]
What Developers Say
Public developer commentary on Together AI is thinner than its revenue scale would suggest; the most candid thread remains the Hacker News discussion of the CodeSandbox acquisition. There, user ilrwbwrkhv pushed back on the company's self-description: "'Together AI has quickly grown into the leading hosting platform for AI' Lol. Definitely not." — ranking Together "at the bottom of the pile" against Cerebras, DeepInfra, and Hyperbolic on price-to-performance.[7] That is one voice from December 2024, and the company's customer list (Cursor, Decagon, ElevenLabs) suggests production buyers see it differently[1] — but no strong corpus of recent first-hand developer reviews surfaced as of June 2026.
Competitive Landscape
vs. Baseten: Baseten leads on compliance (HIPAA) and GPU breadth. Together AI leads on hardware generation, research, and now scale (~$1B annualized revenue).
vs. Fireworks AI: Fireworks focuses on pure inference speed and reliability. Together AI offers a broader platform including pre-training and code sandboxes.
vs. Groq/Cerebras: Custom silicon is faster for supported models. Together AI offers more model variety and fine-tuning.
vs. Modal: Modal is general-purpose serverless compute. Together AI is purpose-built for AI workloads with optimized serving.
Ideal User
- AI teams wanting current-generation NVIDIA hardware without managing infrastructure
- Organizations needing the full ML lifecycle (train → fine-tune → serve), plus sandboxed code execution for agents
- Research groups that value open-source contributions and community alignment
- Companies running large-scale inference across many model types
Bottom Line
Together AI is the research-meets-production play in AI inference, and the bet is now validated by numbers: roughly $1B annualized revenue, a reported $1B raise at $7.5B, and customers like Cursor and ElevenLabs.[3] The FlashAttention-4/ATLAS pipeline and on-demand Blackwell clusters create genuine technical differentiation, and the CodeSandbox acquisition has matured into a real agent-infrastructure product rather than a distraction.[5]
Recommended for: Teams that want one platform for the entire ML lifecycle — training through serving through agent sandboxes — with research-grade optimization.
Not recommended for: Buyers who shop purely on price-per-token (cheaper hosts exist) or who need HIPAA-grade compliance today.
Outlook: Expect the funding round to close and the platform to keep absorbing adjacent agent infrastructure (sandboxes, speech, coding-agent serving). The main thing to watch is whether premium pricing holds as Blackwell capacity commoditizes.
Sources
- [1] Together AI Website
- [2] Together AI Research
- [3] Investing.com: Together AI reportedly in talks to raise $1 billion at $7.5 billion valuation (Mar 2026)
- [4] Together AI Documentation
- [5] Together AI: CodeSandbox acquisition announcement
- [6] Together AI Blog
- [7] Hacker News: CodeSandbox Acquired by Together AI