Key takeaways
- $5B valuation with $585M raised — Series E led by CapitalG, IVP, and NVIDIA
- Truss open-source framework simplifies custom model deployment across GPU types from T4 to B200
- SOC 2 Type II and HIPAA compliant — one of few inference platforms with enterprise-grade compliance
- Per-minute GPU billing with volume discounts and forward deployed engineers for enterprise customers
FAQ
What is Baseten?
An AI inference platform for deploying custom and open-source models with enterprise compliance and broad GPU support.
How much funding has Baseten raised?
$585M total, including a $300M Series E in January 2026 from CapitalG, IVP, and NVIDIA at a $5B valuation.
What GPUs does Baseten support?
T4, L4, A10G, A100, H100, and B200.
Is Baseten HIPAA compliant?
Yes. SOC 2 Type II and HIPAA compliant.
Company Overview
Baseten is an AI inference platform that lets teams deploy custom and open-source models to production with enterprise-grade reliability and compliance.[1] Founded to solve the gap between training a model and serving it at scale, Baseten provides the infrastructure layer between model development and production serving.
The company raised $585M total, including a $300M Series E in January 2026 led by CapitalG (Google's investment arm), IVP, and NVIDIA, valuing the company at $5B.[2] Customers serving 70M+ end users rely on Baseten for production inference.
What It Does
Baseten provides a managed platform for deploying ML models as API endpoints. Key capabilities:
- Model deployment — Deploy any model (custom PyTorch, TensorFlow, or open-source) as a scalable API
- Truss framework — Open-source model packaging tool that standardizes deployment[3]
- Model APIs — Pre-optimized endpoints for popular open-source models (OpenAI-compatible)
- Training — Managed fine-tuning and training infrastructure
- Autoscaling — Automatic scale-to-zero and scale-up based on traffic
- Multi-cloud & self-hosted — Deploy on Baseten's cloud or in your own environment
How It Works
- Package your model using Truss (open-source) or bring a container
- Deploy to Baseten's GPU fleet — choose from T4, L4, A10G, A100, H100, or B200
- Optimize with TensorRT-LLM and Baseten's inference engine
- Scale automatically based on request volume, including scale-to-zero
- Monitor with built-in observability and logging
Baseten handles GPU orchestration, load balancing, and infrastructure management. The Truss framework provides a standardized way to package models with their dependencies, pre/post-processing logic, and configuration.
Pricing
- Per-minute GPU billing — pay only for compute time used
- Volume discounts — negotiated pricing for high-volume customers
- Scale-to-zero — no charges when models aren't receiving traffic
- Forward deployed engineers — dedicated engineering support for enterprise accounts
| GPU | Approximate Cost |
|---|---|
| T4 | ~$0.30/hour |
| A10G | ~$0.80/hour |
| A100 (40GB) | ~$2.50/hour |
| H100 | ~$4.00/hour |
Pricing varies by commitment and volume.
Strengths
- GPU breadth — Widest GPU selection among inference platforms (T4 through B200)
- Truss open-source — No vendor lock-in for model packaging
- Compliance — SOC 2 Type II and HIPAA set it apart from most competitors
- Enterprise support — Forward deployed engineers, not just tickets
- NVIDIA backing — Strategic investor brings hardware access advantages
- Scale-to-zero — Cost-efficient for bursty workloads
- Self-hosted option — Deploy in your own VPC for maximum control
Weaknesses / Risks
- Complexity — More setup required than API-first platforms like Replicate or DeepInfra
- Not the cheapest — Premium pricing reflects enterprise features
- Less developer-friendly — Steeper learning curve for simple use cases
- Competition — Together AI and Fireworks AI competing aggressively on similar turf
- GPU dependency — No custom silicon differentiation vs Groq/Cerebras
Competitive Landscape
vs. Together AI: Together AI offers a broader platform (pre-training, research) with newer hardware (GB200/GB300). Baseten wins on compliance and GPU breadth.
vs. Fireworks AI: Fireworks focuses on speed optimization. Baseten offers broader GPU options and stronger compliance posture.
vs. Modal: Modal is more general-purpose (any Python workload). Baseten is purpose-built for model serving with better model-specific tooling.
vs. Replicate: Replicate is simpler but offers less control. Baseten targets teams with custom models and enterprise requirements.
Ideal User
- Enterprise ML teams deploying custom models with compliance requirements
- Companies needing broad GPU selection for different model sizes
- Teams wanting open-source tooling (Truss) without vendor lock-in
- Organizations requiring HIPAA-compliant AI inference
Bottom Line
Baseten is the enterprise play in AI inference — strong compliance, broad GPU support, and hands-on engineering support. The $5B valuation and NVIDIA backing signal confidence in the custom model deployment market. Best suited for teams that have moved past API wrappers and need production-grade infrastructure for their own models.