Baseten | Ry Walker Research

Key takeaways

$5B valuation with $585M raised — Series E led by CapitalG, IVP, and NVIDIA
Truss open-source framework simplifies custom model deployment across GPU types from T4 to B200
SOC 2 Type II and HIPAA compliant — one of few inference platforms with enterprise-grade compliance
Per-minute GPU billing with volume discounts and forward deployed engineers for enterprise customers

FAQ

What is Baseten?

An AI inference platform for deploying custom and open-source models with enterprise compliance and broad GPU support.

How much funding has Baseten raised?

$585M total, including a $300M Series E in January 2026 from CapitalG, IVP, and NVIDIA at a $5B valuation.

What GPUs does Baseten support?

T4, L4, A10G, A100, H100, and B200.

Is Baseten HIPAA compliant?

Yes. SOC 2 Type II and HIPAA compliant.

Company Overview

Baseten is an AI inference platform that lets teams deploy custom and open-source models to production with enterprise-grade reliability and compliance.^[1] Founded to solve the gap between training a model and serving it at scale, Baseten provides the infrastructure layer between model development and production serving.

The company raised $585M total, including a $300M Series E in January 2026 led by CapitalG (Google's investment arm), IVP, and NVIDIA, valuing the company at $5B.^[2] Customers serving 70M+ end users rely on Baseten for production inference.

What It Does

Baseten provides a managed platform for deploying ML models as API endpoints. Key capabilities:

Model deployment — Deploy any model (custom PyTorch, TensorFlow, or open-source) as a scalable API
Truss framework — Open-source model packaging tool that standardizes deployment^[3]
Model APIs — Pre-optimized endpoints for popular open-source models (OpenAI-compatible)
Training — Managed fine-tuning and training infrastructure
Autoscaling — Automatic scale-to-zero and scale-up based on traffic
Multi-cloud & self-hosted — Deploy on Baseten's cloud or in your own environment

How It Works

Package your model using Truss (open-source) or bring a container
Deploy to Baseten's GPU fleet — choose from T4, L4, A10G, A100, H100, or B200
Optimize with TensorRT-LLM and Baseten's inference engine
Scale automatically based on request volume, including scale-to-zero
Monitor with built-in observability and logging

Baseten handles GPU orchestration, load balancing, and infrastructure management. The Truss framework provides a standardized way to package models with their dependencies, pre/post-processing logic, and configuration.

Pricing

Per-minute GPU billing — pay only for compute time used
Volume discounts — negotiated pricing for high-volume customers
Scale-to-zero — no charges when models aren't receiving traffic
Forward deployed engineers — dedicated engineering support for enterprise accounts

GPU	Approximate Cost
T4	~$0.30/hour
A10G	~$0.80/hour
A100 (40GB)	~$2.50/hour
H100	~$4.00/hour

Pricing varies by commitment and volume.

Strengths

GPU breadth — Widest GPU selection among inference platforms (T4 through B200)
Truss open-source — No vendor lock-in for model packaging
Compliance — SOC 2 Type II and HIPAA set it apart from most competitors
Enterprise support — Forward deployed engineers, not just tickets
NVIDIA backing — Strategic investor brings hardware access advantages
Scale-to-zero — Cost-efficient for bursty workloads
Self-hosted option — Deploy in your own VPC for maximum control

Weaknesses / Risks

Complexity — More setup required than API-first platforms like Replicate or DeepInfra
Not the cheapest — Premium pricing reflects enterprise features
Less developer-friendly — Steeper learning curve for simple use cases
Competition — Together AI and Fireworks AI competing aggressively on similar turf
GPU dependency — No custom silicon differentiation vs Groq/Cerebras

Competitive Landscape

vs. Together AI: Together AI offers a broader platform (pre-training, research) with newer hardware (GB200/GB300). Baseten wins on compliance and GPU breadth.

vs. Fireworks AI: Fireworks focuses on speed optimization. Baseten offers broader GPU options and stronger compliance posture.

vs. Modal: Modal is more general-purpose (any Python workload). Baseten is purpose-built for model serving with better model-specific tooling.

vs. Replicate: Replicate is simpler but offers less control. Baseten targets teams with custom models and enterprise requirements.

Ideal User

Enterprise ML teams deploying custom models with compliance requirements
Companies needing broad GPU selection for different model sizes
Teams wanting open-source tooling (Truss) without vendor lock-in
Organizations requiring HIPAA-compliant AI inference

Bottom Line

Baseten is the enterprise play in AI inference — strong compliance, broad GPU support, and hands-on engineering support. The $5B valuation and NVIDIA backing signal confidence in the custom model deployment market. Best suited for teams that have moved past API wrappers and need production-grade infrastructure for their own models.

Sources