← Back to research
·4 min read·product

Baseten

Baseten is an AI inference platform for deploying custom and open-source models at scale with enterprise compliance.

Key takeaways

  • $5B valuation with $585M raised — Series E led by CapitalG, IVP, and NVIDIA
  • Truss open-source framework simplifies custom model deployment across GPU types from T4 to B200
  • SOC 2 Type II and HIPAA compliant — one of few inference platforms with enterprise-grade compliance
  • Per-minute GPU billing with volume discounts and forward deployed engineers for enterprise customers

FAQ

What is Baseten?

An AI inference platform for deploying custom and open-source models with enterprise compliance and broad GPU support.

How much funding has Baseten raised?

$585M total, including a $300M Series E in January 2026 from CapitalG, IVP, and NVIDIA at a $5B valuation.

What GPUs does Baseten support?

T4, L4, A10G, A100, H100, and B200.

Is Baseten HIPAA compliant?

Yes. SOC 2 Type II and HIPAA compliant.

Company Overview

Baseten is an AI inference platform that lets teams deploy custom and open-source models to production with enterprise-grade reliability and compliance.[1] Founded to solve the gap between training a model and serving it at scale, Baseten provides the infrastructure layer between model development and production serving.

The company raised $585M total, including a $300M Series E in January 2026 led by CapitalG (Google's investment arm), IVP, and NVIDIA, valuing the company at $5B.[2] Customers serving 70M+ end users rely on Baseten for production inference.

What It Does

Baseten provides a managed platform for deploying ML models as API endpoints. Key capabilities:

  • Model deployment — Deploy any model (custom PyTorch, TensorFlow, or open-source) as a scalable API
  • Truss framework — Open-source model packaging tool that standardizes deployment[3]
  • Model APIs — Pre-optimized endpoints for popular open-source models (OpenAI-compatible)
  • Training — Managed fine-tuning and training infrastructure
  • Autoscaling — Automatic scale-to-zero and scale-up based on traffic
  • Multi-cloud & self-hosted — Deploy on Baseten's cloud or in your own environment

How It Works

  1. Package your model using Truss (open-source) or bring a container
  2. Deploy to Baseten's GPU fleet — choose from T4, L4, A10G, A100, H100, or B200
  3. Optimize with TensorRT-LLM and Baseten's inference engine
  4. Scale automatically based on request volume, including scale-to-zero
  5. Monitor with built-in observability and logging

Baseten handles GPU orchestration, load balancing, and infrastructure management. The Truss framework provides a standardized way to package models with their dependencies, pre/post-processing logic, and configuration.

Pricing

  • Per-minute GPU billing — pay only for compute time used
  • Volume discounts — negotiated pricing for high-volume customers
  • Scale-to-zero — no charges when models aren't receiving traffic
  • Forward deployed engineers — dedicated engineering support for enterprise accounts
GPUApproximate Cost
T4~$0.30/hour
A10G~$0.80/hour
A100 (40GB)~$2.50/hour
H100~$4.00/hour

Pricing varies by commitment and volume.

Strengths

  • GPU breadth — Widest GPU selection among inference platforms (T4 through B200)
  • Truss open-source — No vendor lock-in for model packaging
  • Compliance — SOC 2 Type II and HIPAA set it apart from most competitors
  • Enterprise support — Forward deployed engineers, not just tickets
  • NVIDIA backing — Strategic investor brings hardware access advantages
  • Scale-to-zero — Cost-efficient for bursty workloads
  • Self-hosted option — Deploy in your own VPC for maximum control

Weaknesses / Risks

  • Complexity — More setup required than API-first platforms like Replicate or DeepInfra
  • Not the cheapest — Premium pricing reflects enterprise features
  • Less developer-friendly — Steeper learning curve for simple use cases
  • Competition — Together AI and Fireworks AI competing aggressively on similar turf
  • GPU dependency — No custom silicon differentiation vs Groq/Cerebras

Competitive Landscape

vs. Together AI: Together AI offers a broader platform (pre-training, research) with newer hardware (GB200/GB300). Baseten wins on compliance and GPU breadth.

vs. Fireworks AI: Fireworks focuses on speed optimization. Baseten offers broader GPU options and stronger compliance posture.

vs. Modal: Modal is more general-purpose (any Python workload). Baseten is purpose-built for model serving with better model-specific tooling.

vs. Replicate: Replicate is simpler but offers less control. Baseten targets teams with custom models and enterprise requirements.

Ideal User

  • Enterprise ML teams deploying custom models with compliance requirements
  • Companies needing broad GPU selection for different model sizes
  • Teams wanting open-source tooling (Truss) without vendor lock-in
  • Organizations requiring HIPAA-compliant AI inference

Bottom Line

Baseten is the enterprise play in AI inference — strong compliance, broad GPU support, and hands-on engineering support. The $5B valuation and NVIDIA backing signal confidence in the custom model deployment market. Best suited for teams that have moved past API wrappers and need production-grade infrastructure for their own models.