← Back to research
·16 min read·industry

AI Agent Sandboxes

A comparison of 19 leading sandbox and infrastructure platforms for AI agents — E2B, Daytona, Modal, Sprites, Vercel Sandbox, Cloudflare Sandbox SDK, AWS AgentCore Code Interpreter, Google Agent Sandbox, NVIDIA OpenShell, Blaxel, Northflank, OpenSandbox, Runloop, CodeSandbox SDK, Microsandbox, AIO Sandbox, ComputeSDK, Quilt, and Zeroboot.

Key takeaways

  • Every big cloud now has an agent sandbox: AWS AgentCore, Google Agent Sandbox, Cloudflare Sandbox SDK, Vercel Sandbox, and NVIDIA OpenShell all arrived or matured within one quarter
  • The ephemeral-vs-persistent debate is over — E2B shipped pause/resume with full memory state while Sprites, Daytona, and Vercel made persistence default; checkpointing is becoming table stakes
  • The money arrived: Modal raised $355M at $4.65B with ~$300M annualized revenue; Daytona raised $24M and hit a $1M run rate in under 3 months
  • The category got its first public security stress-test — researchers showed DNS exfiltration and credential-extraction paths in AWS's code interpreter sandboxes

FAQ

What's the best sandbox for AI agents?

E2B for scale (1B+ sandboxes started), Daytona for open source + Computer Use, Sprites for persistent state with instant checkpoints, Modal for GPU workloads, Vercel Sandbox if you're on Vercel, and the hyperscaler options (AWS AgentCore, Google Agent Sandbox) if you're committed to one cloud.

Should I use ephemeral or persistent sandboxes?

The distinction is collapsing. E2B (historically ephemeral) now supports pause/resume with memory state; Sprites, Daytona, Vercel Sandbox, and Blaxel are persistent-first with checkpoints or hibernation. Choose based on security posture and pricing model, not the ephemeral/persistent label.

Which sandbox supports GPUs?

Modal leads for GPU workloads (T4 through B200, per-second billing). Daytona added GPU sandboxes (H100 $3.95/hr) and Northflank offers H100s at $2.74/hr with VPC deployment.

What's the cheapest option?

Most platforms have free tiers or credits ($30-200). For production, compare billing models: Vercel bills only active CPU, Sprites and Blaxel bill nothing while idle/hibernated, E2B and Modal bill per-second. Self-hosted open source (OpenSandbox, Microsandbox, AIO Sandbox) is free plus your infrastructure.

Executive Summary

A new infrastructure category has matured fast: sandbox platforms for AI agent code execution. These platforms solve the problem of "where should AI-generated code run?" with isolated, secure environments agents can use to execute arbitrary code. Since our March report the category changed shape: every big cloud entered, the money arrived, and the ephemeral-vs-persistent debate effectively ended.

Key Findings:

  • Big cloud arrived in forceVercel Sandbox (GA January), Cloudflare Sandbox SDK (plus isolate-based Dynamic Worker Loader, "100x faster than containers"), AWS AgentCore Code Interpreter, Google Agent Sandbox (April), and NVIDIA's OpenShell policy runtime (GTC, 7K stars)[1][2]
  • The money arrivedModal raised $355M at $4.65B with ~$300M annualized revenue; Daytona raised a $24M Series A (FirstMark, with Datadog and Figma Ventures) and hit a $1M run rate in under 3 months[3][4]
  • E2B is no longer ephemeral-only — 1B+ sandboxes started, 94% of Fortune 100, and pause/resume that preserves full memory state plus snapshots[5]
  • Persistence convergedSprites checkpoints dropped to ~300ms, Daytona shipped mid-execution snapshots and forking, Vercel made persistence the default, Blaxel built its whole model on hibernation[6][7]
  • First security stress-test — researchers demonstrated DNS-exfiltration and credential-extraction paths in AWS's code interpreter sandboxes; AWS called the network behavior intended functionality and published hardening guidance[8]
  • The long tail is churningQuilt went dormant (site offline, 15 stars), Zeroboot stalled as a prototype after its viral Show HN, and CodeSandbox SDK is mid-rebrand to Together Code Sandbox
  • Open source consolidated upwardOpenSandbox (now org-independent, CNCF Landscape) hit 11.5K stars and shipped persistence + Firecracker/Kata isolation; Daytona reached ~72.5K stars

Strategic Planning Assumptions:

  • By 2027, checkpoint/restore will become table stakes across all platformsConfirmed early: E2B, Daytona, Sprites, Vercel, Runloop, Microsandbox, and CodeSandbox all ship some form of snapshot/checkpoint as of June 2026
  • By 2028, Computer Use (browser/desktop) will be a standard feature, not a differentiator
  • By 2027, hyperscaler bundling (AWS, Google, Cloudflare, Vercel) will force independent platforms to compete on multi-cloud neutrality, price, or isolation depth

Market Definition

AI agent sandboxes are isolated execution environments designed for running AI-generated or arbitrary code safely. They provide:

  • Isolation — Code runs in VMs or containers, separated from host systems
  • APIs/SDKs — Programmatic creation and management
  • Security — Protection against malicious or buggy code
  • Scalability — Ability to run thousands of concurrent sandboxes

Key distinction from cloud compute: These are purpose-built for agent use cases, not general application hosting. They optimize for fast creation, easy cleanup, and developer-friendly SDKs.

Membership note: Vercel AI Gateway, included in the March edition for infrastructure completeness, is a model-routing proxy rather than an execution sandbox — it now lives in our AI Inference Platforms comparison. Its slot here goes to Vercel's actual sandbox product. Raw isolation technologies (Firecracker, gVisor, Unikraft) are covered separately in Container & VM Runtimes.


Comparison Matrix

PlatformModelCreationPersistCheckpointsGPUIsolation
AgentCoreManaged sessionsNot publishedSession ≤8hmicroVM
AIO SandboxAll-in-one~secondsContainer-lifetimeDocker
BlaxelPerpetual/hibernatingSub-25ms resume✅ (hibernate)microVM
CloudflareContainers + isolates~seconds / ms (isolates)✅ (DO state)Containers/V8
CodeSandboxPersistent2.7s P95microVMs
ComputeSDKAbstractionVariesVariesVariesVariesVaries
DaytonaPersistent90ms✅ (snapshots, fork)✅ (H100)Docker
E2BEphemeral-first~150ms✅ (pause/resume)✅ (snapshots)Firecracker
Google Agent SandboxManaged (preview)Sub-secondTTL ≤14 daysHardened containers
MicrosandboxLocal-first~320ms✅ (fork/restore)libkrun microVMs
ModalServerlessSub-secondVolumes✅ (memory snapshots)✅ (T4–B200)gVisor
NorthflankBoth~200msmicroVMs/gVisor
OpenSandboxSelf-hostedPool pre-warm✅ (PVC, volumes)✅ (pause/resume)gVisor/Kata/Firecracker
OpenShellPolicy runtimeNot benchmarked✅ (long-lived)Landlock/seccomp/OPA
QuiltEphemeral (dormant)~200msNamespaces
RunloopPersistent~100ms exec✅ (snapshot, branch)Custom hypervisor
SpritesPersistent1-2s✅ (~300ms)Firecracker
Vercel SandboxPersistent-defaultMilliseconds✅ (snapshots)Firecracker
ZerobootEphemeral (stalled)0.79msFirecracker CoW

Daytona also supports Computer Use (Linux/Windows/macOS/Android virtual desktops); Google Agent Sandbox includes browser computer-use.

Status Check

PlatformStatus as of June 2026
QuiltDormant — quilt.sh offline (deployment disabled), no commits since February, cloud never launched[9]
ZerobootStalled prototype — no commits since March 21 despite 2.4K stars and a 311-point Show HN; demo API still live[10]
CodeSandbox SDKMid-rebrand to Together Code Sandbox under Together AI; actively maintained, pricing cut post-acquisition[11]
MicrosandboxCompany rebranded: Zerocore AI → Super Rad Company; cloud in closed beta
OpenSandboxMoved orgs: alibaba → opensandbox-group; CNCF Landscape-listed

Product Profiles

AIO Sandbox — 5K ★, all-in-one Docker container[12]

  • Browser, shell, VSCode, Jupyter, MCP in one container — ByteDance-affiliated Agent Infra team
  • v1.9.3 (May 2026), stateful bash API, desktop recording; linux/amd64 only
  • Free (Apache 2.0, self-hosted); Docker-level isolation

AWS AgentCore Code Interpreter — the hyperscaler incumbent[13]

  • Managed Python/JS/TS sessions to 8 hours, S3 file access, CloudTrail audit; GA October 2025
  • $0.0895/vCPU-hr + $0.00945/GB-hr, per-second, idle free
  • 2026 security research showed DNS-exfil and credential-extraction paths; AWS ruled the network behavior intended and published hardening guidance[8]

Blaxel — perpetual sandboxes with hibernation economics[7]

  • Sub-25ms resume with memory and processes intact; $0 compute on standby
  • $7.3M seed (First Round); Webflow, Shortwave, Strapi; 7.5M+ requests/day
  • Managed-only; hypervisor undisclosed

Cloudflare Sandbox SDK — edge containers plus V8 isolates[2]

  • Exec, files, code interpreter, preview URLs on Workers + Durable Objects; 1K ★, beta pre-v1
  • Dynamic Worker Loader isolates start in milliseconds — powers Code Mode (agents write TypeScript instead of tool calls)
  • Containers + isolates, not microVMs — Cloudflare itself notes isolates carry a bigger attack surface than hardware VMs

CodeSandbox SDK — becoming Together Code Sandbox[11]

  • microVMs with git-versioned filesystem, memory snapshots, ~500ms resume, sub-second live forks
  • $0.01486/credit (cut post-acquisition), free tier with $100 credit
  • Migration to the Together platform is the open risk

ComputeSDK — the abstraction layer[14]

  • One TypeScript API over 9 documented providers (E2B, Modal, Daytona, Runloop, Cloudflare, Vercel…); 212 ★, MIT
  • v2.0 Sandbox Gateway: BYOK orchestration, failover strategies
  • "Terraform for running other people's code" — still pre-community

Daytona — open-source leader, freshly funded[4]

  • ~72.5K ★ (AGPL-3.0), 90ms creation, mid-execution snapshots + forking, GPU sandboxes (H100 $3.95/hr)
  • $24M Series A (FirstMark + Datadog/Figma strategic); $1M run rate in under 3 months; LangChain, Writer, SambaNova
  • Computer Use across Linux/Windows/macOS/Android; Docker-based isolation is the trade-off

E2B — the scale leader[5]

  • 1B+ sandboxes started, 94% of Fortune 100, 3.5M+ monthly SDK downloads, 12.5K ★
  • No longer ephemeral-only: pause/resume preserves full memory state; snapshots via SDK
  • Firecracker microVMs (same tech as AWS Lambda); $21M Series A (Insight Partners), $32M total

Google Agent Sandbox — Gemini Enterprise's execution layer[15]

  • Managed hardened sandboxes for model-generated code + browser computer-use; announced April 22, 2026
  • Sub-second creation, stateful TTL to 14 days, 150+ preinstalled Python packages
  • Preview: us-central1 only, no network access, no custom installs; billing starts July 1, 2026

Microsandbox — local-first microVMs[16]

  • libkrun hardware isolation on your machine; secrets injected at network layer, never visible to sandbox code
  • 6.5K ★; snapshot/fork/restore with sub-ms restores; Python/JS/Rust/Go SDKs; YC X26 (now Super Rad Company)
  • Cloud in closed beta; local free forever

Modal — the GPU heavyweight[3]

  • $355M Series C at $4.65B (May 2026), ~$300M annualized revenue; 1B+ sandboxes run
  • GPUs T4 through B200 per-second; only GPU-accelerated provider in the OpenAI Agents SDK
  • gVisor isolation; memory snapshots give 10x faster starts; Lovable ran 1M sandboxes in 48 hours

Northflank — enterprise VPC deployment[17]

  • microVM (Kata) or gVisor per workload; deploy in their cloud or your VPC (AWS, GCP, Azure, Oracle, Civo)
  • 80K+ developers, 2,000+ companies; H100s $2.74/hr, B200s available; Sentry and Writer as customers
  • $24.9M raised (Bain Capital Ventures Series A)

OpenSandbox — protocol-driven open source at K8s scale[18]

  • 11.5K ★, now org-independent (opensandbox-group) and CNCF Landscape-listed
  • Persistence shipped (PVC auto-provisioning, pause/resume rootfs snapshots); gVisor/Kata/Firecracker options
  • Go SDK GA joins Python/Java/JS/.NET; free, Apache 2.0, no managed tier

OpenShell — NVIDIA's policy runtime[19]

  • Declarative YAML policy over filesystem (Landlock), network (OPA proxy), exec (seccomp), and inference routing
  • 7K ★ in ~3.5 months; Canonical ships an Ubuntu snap; wraps Claude Code, Codex, Copilot, OpenCode
  • Alpha by its own README ("one developer, one environment"); a governance layer, not a hosted platform

Quilt — dormant[9]

  • Rust + Linux namespaces with inter-container networking; ~200ms creation
  • Site offline (deployment disabled), 15 ★, no commits since February 2026, cloud never launched
  • Reference implementation only

Runloop — devboxes for agent evals, now with public pricing[20]

  • Git-style disk snapshots + branching on a custom bare-metal hypervisor; SWE-bench/Terminal-Bench built in
  • Basic $0/mo, Pro $250/mo + $0.108/CPU-hr; benchmark runs priced per round ($1.17–$18.66)
  • Trajectory ran 10,000 concurrent devboxes; RL fine-tuning tilt; thin third-party mindshare

Sprites — persistent VMs, the anti-ephemeral original[6]

  • 100GB ext4 on object storage; Live Checkpoints ~300ms, sub-second restore; auto-sleep with zero idle charges
  • Native MCP endpoint (March 2026); plans $20–$2,000/mo by concurrent Sprites
  • Firecracker isolation; Fly.io-only infrastructure, no GPU path

Vercel Sandbox — the platform default for Vercel shops[1]

  • Firecracker microVMs, millisecond starts, persistent by default with snapshots; GA January 30, 2026
  • Active-CPU billing ($0.128/vCPU-hr — I/O wait unbilled) + $0.60/1M creations; up to 32 vCPU/64GB
  • In production under v0, Blackbox AI, RooCode; single region (iad1) is the main limitation

Zeroboot — the stalled speed record[10]

  • 0.79ms p50 spawn via copy-on-write forks of Firecracker snapshots; ~265KB per sandbox
  • 2.4K ★ after a 311-point Show HN — but zero commits since March 21; networking never shipped
  • Proof of concept worth studying, not adopting

Architecture Patterns

Isolation Technologies

TechnologyUsed BySecurity LevelPerformance
FirecrackerE2B, Sprites, Vercel Sandbox⭐⭐⭐ Hardware-levelFast
Firecracker CoWZeroboot⭐⭐⭐ Hardware-levelFastest
libkrunMicrosandbox⭐⭐⭐ Hardware-levelFast
microVMs (managed)CodeSandbox, Northflank, Runloop, Blaxel, AgentCore⭐⭐⭐ Hardware-levelFast
gVisor / KataModal, Northflank, OpenSandbox⭐⭐ Kernel-levelVery Fast
Hardened containersGoogle Agent Sandbox, Cloudflare⭐⭐ Container+Very Fast
Kernel policy (Landlock/seccomp)OpenShell⭐⭐ Kernel-levelNative speed
Docker/K8sAIO Sandbox, Daytona⭐ Container-levelFastest
V8 isolatesCloudflare (Worker Loader)⭐ Process-levelInstant (ms)
NamespacesQuilt⭐ Kernel-levelVery Fast

The Convergence

In March this market split into philosophical camps — ephemeral (E2B), persistent (Sprites), local-first (Microsandbox). Fly.io argued "ephemeral sandboxes are obsolete" — Claude doesn't want a stateless container, it wants a computer.[21] Three months later the camps have largely merged:

  • E2B (the ephemeral standard-bearer) shipped pause/resume that preserves full memory state, plus snapshots
  • Daytona, Vercel Sandbox, CodeSandbox, Runloop, Blaxel are persistent-first with snapshot/fork/hibernate mechanics
  • Modal added memory snapshots for 10x faster starts

What still differentiates platforms is the billing model around idle state (Vercel bills active CPU only; Sprites and Blaxel bill ~zero when sleeping; session-based AgentCore expires at 8 hours), the isolation depth (hardware microVM vs gVisor vs containers vs isolates), and where it runs (your cloud, their cloud, your laptop). The local-first camp (Microsandbox) and the policy-runtime camp (OpenShell, with Anthropic's Claude Code among its wrapped agents) remain genuinely distinct approaches.

The Security Stress-Test

The category's implicit promise — "run untrusted code safely" — got its first public audit. BeyondTrust showed AWS AgentCore's default "sandbox" network mode allowed DNS-based exfiltration and command-and-control, and Sonrai demonstrated credential-extraction paths via the metadata service; AWS ruled the network behavior intended functionality and published hardening guidance rather than a patch.[8] The lesson for buyers: "sandboxed" is a spectrum, not a checkbox — ask specifically about network egress, metadata services, and credential handling.


Strategic Recommendations

By Use Case

Use CaseRecommendedRunner-Up
High-security enterpriseE2BNorthflank (VPC)
Persistent agent stateSpritesVercel Sandbox
GPU/ML workloadsModalDaytona / Northflank
Browser automation / Computer UseDaytonaGoogle Agent Sandbox
Agent benchmarking (SWE-bench)RunloopE2B
Fastest creation timeVercel Sandbox (ms)Daytona (90ms)
Checkpoint/restoreSprites (~300ms)Daytona / E2B
Local-first / secrets on-hostMicrosandbox
Policy/governance over agentsOpenShellMicrosandbox
All-in-one dev environmentAIO SandboxDaytona
Open source preferenceDaytonaOpenSandbox
Idle-cost optimizationBlaxelSprites
Already on VercelVercel Sandbox
Already on Cloudflare WorkersCloudflare Sandbox SDK
Already on AWSAgentCore Code InterpreterNorthflank (VPC)
Already on Google CloudGoogle Agent Sandbox
VPC/BYOC deploymentNorthflankDaytona (self-hosted)
Multi-provider flexibilityComputeSDK
Self-hosted Kubernetes-nativeOpenSandbox

By Team Profile

Security-first enterprise team: → E2B (Firecracker, SOC2, F100-proven) or Northflank (VPC deployment) — and read the AgentCore security research before trusting any "sandbox" network mode

AI agent startup iterating fast: → Sprites or Vercel Sandbox (persistence + checkpoints without rebuild time); Blaxel if idle cost dominates

ML/AI team needing GPUs: → Modal (deepest GPU catalog, OpenAI Agents SDK native) or Daytona (open source + H100s)

Hyperscaler-committed enterprise: → AgentCore (AWS), Google Agent Sandbox (GCP, preview), Cloudflare Sandbox SDK (Workers) — accept single-cloud lock-in for procurement simplicity

Security-paranoid team (secrets must stay on-host): → Microsandbox (local-first microVMs, network-layer secret injection); OpenShell for policy enforcement around existing agents

Running agent evaluations: → Runloop (built-in benchmark catalog) or E2B (scale)


Market Outlook

Near-Term (2026)

  • Hyperscaler GA waves: Google Agent Sandbox exits preview (billing starts July 1), AgentCore expands regions, Cloudflare's SDK approaches v1 — bundled distribution starts pressuring independent pricing
  • E2B, Daytona, and Modal consolidate the independent tier with funding and revenue scale the long tail can't match
  • Expect more security research in the AgentCore vein — network egress and metadata-service hardening become buying criteria

Medium-Term (2027)

  • The ephemeral/persistent distinction disappears from marketing entirely; billing-while-idle becomes the comparison axis
  • Computer Use becomes standard; GPU sandboxes spread beyond Modal/Daytona/Northflank
  • Acquisitions likely in the long tail; abstraction layers (ComputeSDK) matter more as provider count grows

Long-Term (2028+)

  • Category consolidates around 2-3 independents plus the hyperscaler bundles
  • Integration with agent orchestration platforms (Tembo, etc.) becomes the default consumption path
  • Specialized sandboxes for specific agent types (RL training, browser fleets, evals)

Bottom Line

19 platforms serve the AI agent sandbox market across four tiers — independent leaders, hyperscaler bundles, self-hosted open source, and a churning long tail:

PlatformBest ForKey Differentiator
E2BScale + security1B+ sandboxes, 94% of F100, now with pause/resume
DaytonaOpen source + Computer Use72.5K ★, $24M Series A, GPU sandboxes
ModalGPU workloads$4.65B valuation, T4–B200, OpenAI Agents SDK
SpritesPersistent state~300ms checkpoints, zero idle cost
Vercel SandboxVercel platform teamsms starts, active-CPU-only billing
Cloudflare Sandbox SDKWorkers/edge teamsContainers + ms-start V8 isolates, Code Mode
AgentCoreAWS enterprises8-hr sessions, CloudTrail, per-second billing
Google Agent SandboxGCP enterprisesGemini Enterprise integration (preview)
OpenShellAgent governanceNVIDIA-backed policy runtime, 7K ★
BlaxelIdle-cost economicsHibernation, sub-25ms resume
NorthflankEnterprise VPC + GPUsBYOC on five clouds, Kata/gVisor
OpenSandboxSelf-hosted K8s scale11.5K ★, protocol-driven, persistence shipped
RunloopAgent evalsBenchmark catalog, snapshot/branch
CodeSandbox SDKWeb dev agentsForking; rebranding under Together AI
MicrosandboxLocal-first secretslibkrun, network-layer secret injection
AIO SandboxAll-in-one dev envBrowser+IDE+MCP in one container
ComputeSDKMulti-providerOne API over 9 providers
Quilt(dormant)Reference implementation only
Zeroboot(stalled)0.79ms CoW forking proof of concept

The story of this quarter is the big clouds showing up — Vercel, Cloudflare, AWS, Google, and NVIDIA all now have credible agent-execution offerings, most bundled into platforms developers already pay for. The independents answered with funding (Modal's $355M, Daytona's $24M), scale proof (E2B's billionth sandbox), and feature velocity (checkpointing everywhere). The next twelve months decide whether "agent sandbox" stays a product category or becomes a feature of every cloud — and the GitHub graveyard already forming at the long tail suggests the middle won't hold.


Research by Ry Walker Research • methodology

Disclosure: Author is CEO of Tembo, which may integrate with sandbox platforms for agent execution.