Key takeaways
- Every big cloud now has an agent sandbox: AWS AgentCore, Google Agent Sandbox, Cloudflare Sandbox SDK, Vercel Sandbox, and NVIDIA OpenShell all arrived or matured within one quarter
- The ephemeral-vs-persistent debate is over — E2B shipped pause/resume with full memory state while Sprites, Daytona, and Vercel made persistence default; checkpointing is becoming table stakes
- The money arrived: Modal raised $355M at $4.65B with ~$300M annualized revenue; Daytona raised $24M and hit a $1M run rate in under 3 months
- The category got its first public security stress-test — researchers showed DNS exfiltration and credential-extraction paths in AWS's code interpreter sandboxes
FAQ
What's the best sandbox for AI agents?
E2B for scale (1B+ sandboxes started), Daytona for open source + Computer Use, Sprites for persistent state with instant checkpoints, Modal for GPU workloads, Vercel Sandbox if you're on Vercel, and the hyperscaler options (AWS AgentCore, Google Agent Sandbox) if you're committed to one cloud.
Should I use ephemeral or persistent sandboxes?
The distinction is collapsing. E2B (historically ephemeral) now supports pause/resume with memory state; Sprites, Daytona, Vercel Sandbox, and Blaxel are persistent-first with checkpoints or hibernation. Choose based on security posture and pricing model, not the ephemeral/persistent label.
Which sandbox supports GPUs?
Modal leads for GPU workloads (T4 through B200, per-second billing). Daytona added GPU sandboxes (H100 $3.95/hr) and Northflank offers H100s at $2.74/hr with VPC deployment.
What's the cheapest option?
Most platforms have free tiers or credits ($30-200). For production, compare billing models: Vercel bills only active CPU, Sprites and Blaxel bill nothing while idle/hibernated, E2B and Modal bill per-second. Self-hosted open source (OpenSandbox, Microsandbox, AIO Sandbox) is free plus your infrastructure.
Executive Summary
A new infrastructure category has matured fast: sandbox platforms for AI agent code execution. These platforms solve the problem of "where should AI-generated code run?" with isolated, secure environments agents can use to execute arbitrary code. Since our March report the category changed shape: every big cloud entered, the money arrived, and the ephemeral-vs-persistent debate effectively ended.
Key Findings:
- Big cloud arrived in force — Vercel Sandbox (GA January), Cloudflare Sandbox SDK (plus isolate-based Dynamic Worker Loader, "100x faster than containers"), AWS AgentCore Code Interpreter, Google Agent Sandbox (April), and NVIDIA's OpenShell policy runtime (GTC, 7K stars)[1][2]
- The money arrived — Modal raised $355M at $4.65B with ~$300M annualized revenue; Daytona raised a $24M Series A (FirstMark, with Datadog and Figma Ventures) and hit a $1M run rate in under 3 months[3][4]
- E2B is no longer ephemeral-only — 1B+ sandboxes started, 94% of Fortune 100, and pause/resume that preserves full memory state plus snapshots[5]
- Persistence converged — Sprites checkpoints dropped to ~300ms, Daytona shipped mid-execution snapshots and forking, Vercel made persistence the default, Blaxel built its whole model on hibernation[6][7]
- First security stress-test — researchers demonstrated DNS-exfiltration and credential-extraction paths in AWS's code interpreter sandboxes; AWS called the network behavior intended functionality and published hardening guidance[8]
- The long tail is churning — Quilt went dormant (site offline, 15 stars), Zeroboot stalled as a prototype after its viral Show HN, and CodeSandbox SDK is mid-rebrand to Together Code Sandbox
- Open source consolidated upward — OpenSandbox (now org-independent, CNCF Landscape) hit 11.5K stars and shipped persistence + Firecracker/Kata isolation; Daytona reached ~72.5K stars
Strategic Planning Assumptions:
By 2027, checkpoint/restore will become table stakes across all platforms— Confirmed early: E2B, Daytona, Sprites, Vercel, Runloop, Microsandbox, and CodeSandbox all ship some form of snapshot/checkpoint as of June 2026- By 2028, Computer Use (browser/desktop) will be a standard feature, not a differentiator
- By 2027, hyperscaler bundling (AWS, Google, Cloudflare, Vercel) will force independent platforms to compete on multi-cloud neutrality, price, or isolation depth
Market Definition
AI agent sandboxes are isolated execution environments designed for running AI-generated or arbitrary code safely. They provide:
- Isolation — Code runs in VMs or containers, separated from host systems
- APIs/SDKs — Programmatic creation and management
- Security — Protection against malicious or buggy code
- Scalability — Ability to run thousands of concurrent sandboxes
Key distinction from cloud compute: These are purpose-built for agent use cases, not general application hosting. They optimize for fast creation, easy cleanup, and developer-friendly SDKs.
Membership note: Vercel AI Gateway, included in the March edition for infrastructure completeness, is a model-routing proxy rather than an execution sandbox — it now lives in our AI Inference Platforms comparison. Its slot here goes to Vercel's actual sandbox product. Raw isolation technologies (Firecracker, gVisor, Unikraft) are covered separately in Container & VM Runtimes.
Comparison Matrix
| Platform | Model | Creation | Persist | Checkpoints | GPU | Isolation |
|---|---|---|---|---|---|---|
| AgentCore | Managed sessions | Not published | Session ≤8h | — | — | microVM |
| AIO Sandbox | All-in-one | ~seconds | Container-lifetime | — | — | Docker |
| Blaxel | Perpetual/hibernating | Sub-25ms resume | ✅ | ✅ (hibernate) | — | microVM |
| Cloudflare | Containers + isolates | ~seconds / ms (isolates) | ✅ (DO state) | — | — | Containers/V8 |
| CodeSandbox | Persistent | 2.7s P95 | ✅ | ✅ | — | microVMs |
| ComputeSDK | Abstraction | Varies | Varies | Varies | Varies | Varies |
| Daytona | Persistent | 90ms | ✅ | ✅ (snapshots, fork) | ✅ (H100) | Docker |
| E2B | Ephemeral-first | ~150ms | ✅ (pause/resume) | ✅ (snapshots) | — | Firecracker |
| Google Agent Sandbox | Managed (preview) | Sub-second | TTL ≤14 days | — | — | Hardened containers |
| Microsandbox | Local-first | ~320ms | ✅ | ✅ (fork/restore) | — | libkrun microVMs |
| Modal | Serverless | Sub-second | Volumes | ✅ (memory snapshots) | ✅ (T4–B200) | gVisor |
| Northflank | Both | ~200ms | ✅ | — | ✅ | microVMs/gVisor |
| OpenSandbox | Self-hosted | Pool pre-warm | ✅ (PVC, volumes) | ✅ (pause/resume) | — | gVisor/Kata/Firecracker |
| OpenShell | Policy runtime | Not benchmarked | ✅ (long-lived) | — | — | Landlock/seccomp/OPA |
| Quilt | Ephemeral (dormant) | ~200ms | — | — | — | Namespaces |
| Runloop | Persistent | ~100ms exec | ✅ | ✅ (snapshot, branch) | — | Custom hypervisor |
| Sprites | Persistent | 1-2s | ✅ | ✅ (~300ms) | — | Firecracker |
| Vercel Sandbox | Persistent-default | Milliseconds | ✅ | ✅ (snapshots) | — | Firecracker |
| Zeroboot | Ephemeral (stalled) | 0.79ms | — | — | — | Firecracker CoW |
Daytona also supports Computer Use (Linux/Windows/macOS/Android virtual desktops); Google Agent Sandbox includes browser computer-use.
Status Check
| Platform | Status as of June 2026 |
|---|---|
| Quilt | Dormant — quilt.sh offline (deployment disabled), no commits since February, cloud never launched[9] |
| Zeroboot | Stalled prototype — no commits since March 21 despite 2.4K stars and a 311-point Show HN; demo API still live[10] |
| CodeSandbox SDK | Mid-rebrand to Together Code Sandbox under Together AI; actively maintained, pricing cut post-acquisition[11] |
| Microsandbox | Company rebranded: Zerocore AI → Super Rad Company; cloud in closed beta |
| OpenSandbox | Moved orgs: alibaba → opensandbox-group; CNCF Landscape-listed |
Product Profiles
AIO Sandbox — 5K ★, all-in-one Docker container[12]
- Browser, shell, VSCode, Jupyter, MCP in one container — ByteDance-affiliated Agent Infra team
- v1.9.3 (May 2026), stateful bash API, desktop recording; linux/amd64 only
- Free (Apache 2.0, self-hosted); Docker-level isolation
AWS AgentCore Code Interpreter — the hyperscaler incumbent[13]
- Managed Python/JS/TS sessions to 8 hours, S3 file access, CloudTrail audit; GA October 2025
- $0.0895/vCPU-hr + $0.00945/GB-hr, per-second, idle free
- 2026 security research showed DNS-exfil and credential-extraction paths; AWS ruled the network behavior intended and published hardening guidance[8]
Blaxel — perpetual sandboxes with hibernation economics[7]
- Sub-25ms resume with memory and processes intact; $0 compute on standby
- $7.3M seed (First Round); Webflow, Shortwave, Strapi; 7.5M+ requests/day
- Managed-only; hypervisor undisclosed
Cloudflare Sandbox SDK — edge containers plus V8 isolates[2]
- Exec, files, code interpreter, preview URLs on Workers + Durable Objects; 1K ★, beta pre-v1
- Dynamic Worker Loader isolates start in milliseconds — powers Code Mode (agents write TypeScript instead of tool calls)
- Containers + isolates, not microVMs — Cloudflare itself notes isolates carry a bigger attack surface than hardware VMs
CodeSandbox SDK — becoming Together Code Sandbox[11]
- microVMs with git-versioned filesystem, memory snapshots, ~500ms resume, sub-second live forks
- $0.01486/credit (cut post-acquisition), free tier with $100 credit
- Migration to the Together platform is the open risk
ComputeSDK — the abstraction layer[14]
- One TypeScript API over 9 documented providers (E2B, Modal, Daytona, Runloop, Cloudflare, Vercel…); 212 ★, MIT
- v2.0 Sandbox Gateway: BYOK orchestration, failover strategies
- "Terraform for running other people's code" — still pre-community
Daytona — open-source leader, freshly funded[4]
- ~72.5K ★ (AGPL-3.0), 90ms creation, mid-execution snapshots + forking, GPU sandboxes (H100 $3.95/hr)
- $24M Series A (FirstMark + Datadog/Figma strategic); $1M run rate in under 3 months; LangChain, Writer, SambaNova
- Computer Use across Linux/Windows/macOS/Android; Docker-based isolation is the trade-off
- 1B+ sandboxes started, 94% of Fortune 100, 3.5M+ monthly SDK downloads, 12.5K ★
- No longer ephemeral-only: pause/resume preserves full memory state; snapshots via SDK
- Firecracker microVMs (same tech as AWS Lambda); $21M Series A (Insight Partners), $32M total
Google Agent Sandbox — Gemini Enterprise's execution layer[15]
- Managed hardened sandboxes for model-generated code + browser computer-use; announced April 22, 2026
- Sub-second creation, stateful TTL to 14 days, 150+ preinstalled Python packages
- Preview: us-central1 only, no network access, no custom installs; billing starts July 1, 2026
Microsandbox — local-first microVMs[16]
- libkrun hardware isolation on your machine; secrets injected at network layer, never visible to sandbox code
- 6.5K ★; snapshot/fork/restore with sub-ms restores; Python/JS/Rust/Go SDKs; YC X26 (now Super Rad Company)
- Cloud in closed beta; local free forever
Modal — the GPU heavyweight[3]
- $355M Series C at $4.65B (May 2026), ~$300M annualized revenue; 1B+ sandboxes run
- GPUs T4 through B200 per-second; only GPU-accelerated provider in the OpenAI Agents SDK
- gVisor isolation; memory snapshots give 10x faster starts; Lovable ran 1M sandboxes in 48 hours
Northflank — enterprise VPC deployment[17]
- microVM (Kata) or gVisor per workload; deploy in their cloud or your VPC (AWS, GCP, Azure, Oracle, Civo)
- 80K+ developers, 2,000+ companies; H100s $2.74/hr, B200s available; Sentry and Writer as customers
- $24.9M raised (Bain Capital Ventures Series A)
OpenSandbox — protocol-driven open source at K8s scale[18]
- 11.5K ★, now org-independent (opensandbox-group) and CNCF Landscape-listed
- Persistence shipped (PVC auto-provisioning, pause/resume rootfs snapshots); gVisor/Kata/Firecracker options
- Go SDK GA joins Python/Java/JS/.NET; free, Apache 2.0, no managed tier
OpenShell — NVIDIA's policy runtime[19]
- Declarative YAML policy over filesystem (Landlock), network (OPA proxy), exec (seccomp), and inference routing
- 7K ★ in ~3.5 months; Canonical ships an Ubuntu snap; wraps Claude Code, Codex, Copilot, OpenCode
- Alpha by its own README ("one developer, one environment"); a governance layer, not a hosted platform
- Rust + Linux namespaces with inter-container networking; ~200ms creation
- Site offline (deployment disabled), 15 ★, no commits since February 2026, cloud never launched
- Reference implementation only
Runloop — devboxes for agent evals, now with public pricing[20]
- Git-style disk snapshots + branching on a custom bare-metal hypervisor; SWE-bench/Terminal-Bench built in
- Basic $0/mo, Pro $250/mo + $0.108/CPU-hr; benchmark runs priced per round ($1.17–$18.66)
- Trajectory ran 10,000 concurrent devboxes; RL fine-tuning tilt; thin third-party mindshare
Sprites — persistent VMs, the anti-ephemeral original[6]
- 100GB ext4 on object storage; Live Checkpoints ~300ms, sub-second restore; auto-sleep with zero idle charges
- Native MCP endpoint (March 2026); plans $20–$2,000/mo by concurrent Sprites
- Firecracker isolation; Fly.io-only infrastructure, no GPU path
Vercel Sandbox — the platform default for Vercel shops[1]
- Firecracker microVMs, millisecond starts, persistent by default with snapshots; GA January 30, 2026
- Active-CPU billing ($0.128/vCPU-hr — I/O wait unbilled) + $0.60/1M creations; up to 32 vCPU/64GB
- In production under v0, Blackbox AI, RooCode; single region (iad1) is the main limitation
Zeroboot — the stalled speed record[10]
- 0.79ms p50 spawn via copy-on-write forks of Firecracker snapshots; ~265KB per sandbox
- 2.4K ★ after a 311-point Show HN — but zero commits since March 21; networking never shipped
- Proof of concept worth studying, not adopting
Architecture Patterns
Isolation Technologies
| Technology | Used By | Security Level | Performance |
|---|---|---|---|
| Firecracker | E2B, Sprites, Vercel Sandbox | ⭐⭐⭐ Hardware-level | Fast |
| Firecracker CoW | Zeroboot | ⭐⭐⭐ Hardware-level | Fastest |
| libkrun | Microsandbox | ⭐⭐⭐ Hardware-level | Fast |
| microVMs (managed) | CodeSandbox, Northflank, Runloop, Blaxel, AgentCore | ⭐⭐⭐ Hardware-level | Fast |
| gVisor / Kata | Modal, Northflank, OpenSandbox | ⭐⭐ Kernel-level | Very Fast |
| Hardened containers | Google Agent Sandbox, Cloudflare | ⭐⭐ Container+ | Very Fast |
| Kernel policy (Landlock/seccomp) | OpenShell | ⭐⭐ Kernel-level | Native speed |
| Docker/K8s | AIO Sandbox, Daytona | ⭐ Container-level | Fastest |
| V8 isolates | Cloudflare (Worker Loader) | ⭐ Process-level | Instant (ms) |
| Namespaces | Quilt | ⭐ Kernel-level | Very Fast |
The Convergence
In March this market split into philosophical camps — ephemeral (E2B), persistent (Sprites), local-first (Microsandbox). Fly.io argued "ephemeral sandboxes are obsolete" — Claude doesn't want a stateless container, it wants a computer.[21] Three months later the camps have largely merged:
- E2B (the ephemeral standard-bearer) shipped pause/resume that preserves full memory state, plus snapshots
- Daytona, Vercel Sandbox, CodeSandbox, Runloop, Blaxel are persistent-first with snapshot/fork/hibernate mechanics
- Modal added memory snapshots for 10x faster starts
What still differentiates platforms is the billing model around idle state (Vercel bills active CPU only; Sprites and Blaxel bill ~zero when sleeping; session-based AgentCore expires at 8 hours), the isolation depth (hardware microVM vs gVisor vs containers vs isolates), and where it runs (your cloud, their cloud, your laptop). The local-first camp (Microsandbox) and the policy-runtime camp (OpenShell, with Anthropic's Claude Code among its wrapped agents) remain genuinely distinct approaches.
The Security Stress-Test
The category's implicit promise — "run untrusted code safely" — got its first public audit. BeyondTrust showed AWS AgentCore's default "sandbox" network mode allowed DNS-based exfiltration and command-and-control, and Sonrai demonstrated credential-extraction paths via the metadata service; AWS ruled the network behavior intended functionality and published hardening guidance rather than a patch.[8] The lesson for buyers: "sandboxed" is a spectrum, not a checkbox — ask specifically about network egress, metadata services, and credential handling.
Strategic Recommendations
By Use Case
| Use Case | Recommended | Runner-Up |
|---|---|---|
| High-security enterprise | E2B | Northflank (VPC) |
| Persistent agent state | Sprites | Vercel Sandbox |
| GPU/ML workloads | Modal | Daytona / Northflank |
| Browser automation / Computer Use | Daytona | Google Agent Sandbox |
| Agent benchmarking (SWE-bench) | Runloop | E2B |
| Fastest creation time | Vercel Sandbox (ms) | Daytona (90ms) |
| Checkpoint/restore | Sprites (~300ms) | Daytona / E2B |
| Local-first / secrets on-host | Microsandbox | — |
| Policy/governance over agents | OpenShell | Microsandbox |
| All-in-one dev environment | AIO Sandbox | Daytona |
| Open source preference | Daytona | OpenSandbox |
| Idle-cost optimization | Blaxel | Sprites |
| Already on Vercel | Vercel Sandbox | — |
| Already on Cloudflare Workers | Cloudflare Sandbox SDK | — |
| Already on AWS | AgentCore Code Interpreter | Northflank (VPC) |
| Already on Google Cloud | Google Agent Sandbox | — |
| VPC/BYOC deployment | Northflank | Daytona (self-hosted) |
| Multi-provider flexibility | ComputeSDK | — |
| Self-hosted Kubernetes-native | OpenSandbox | — |
By Team Profile
Security-first enterprise team: → E2B (Firecracker, SOC2, F100-proven) or Northflank (VPC deployment) — and read the AgentCore security research before trusting any "sandbox" network mode
AI agent startup iterating fast: → Sprites or Vercel Sandbox (persistence + checkpoints without rebuild time); Blaxel if idle cost dominates
ML/AI team needing GPUs: → Modal (deepest GPU catalog, OpenAI Agents SDK native) or Daytona (open source + H100s)
Hyperscaler-committed enterprise: → AgentCore (AWS), Google Agent Sandbox (GCP, preview), Cloudflare Sandbox SDK (Workers) — accept single-cloud lock-in for procurement simplicity
Security-paranoid team (secrets must stay on-host): → Microsandbox (local-first microVMs, network-layer secret injection); OpenShell for policy enforcement around existing agents
Running agent evaluations: → Runloop (built-in benchmark catalog) or E2B (scale)
Market Outlook
Near-Term (2026)
- Hyperscaler GA waves: Google Agent Sandbox exits preview (billing starts July 1), AgentCore expands regions, Cloudflare's SDK approaches v1 — bundled distribution starts pressuring independent pricing
- E2B, Daytona, and Modal consolidate the independent tier with funding and revenue scale the long tail can't match
- Expect more security research in the AgentCore vein — network egress and metadata-service hardening become buying criteria
Medium-Term (2027)
- The ephemeral/persistent distinction disappears from marketing entirely; billing-while-idle becomes the comparison axis
- Computer Use becomes standard; GPU sandboxes spread beyond Modal/Daytona/Northflank
- Acquisitions likely in the long tail; abstraction layers (ComputeSDK) matter more as provider count grows
Long-Term (2028+)
- Category consolidates around 2-3 independents plus the hyperscaler bundles
- Integration with agent orchestration platforms (Tembo, etc.) becomes the default consumption path
- Specialized sandboxes for specific agent types (RL training, browser fleets, evals)
Bottom Line
19 platforms serve the AI agent sandbox market across four tiers — independent leaders, hyperscaler bundles, self-hosted open source, and a churning long tail:
| Platform | Best For | Key Differentiator |
|---|---|---|
| E2B | Scale + security | 1B+ sandboxes, 94% of F100, now with pause/resume |
| Daytona | Open source + Computer Use | 72.5K ★, $24M Series A, GPU sandboxes |
| Modal | GPU workloads | $4.65B valuation, T4–B200, OpenAI Agents SDK |
| Sprites | Persistent state | ~300ms checkpoints, zero idle cost |
| Vercel Sandbox | Vercel platform teams | ms starts, active-CPU-only billing |
| Cloudflare Sandbox SDK | Workers/edge teams | Containers + ms-start V8 isolates, Code Mode |
| AgentCore | AWS enterprises | 8-hr sessions, CloudTrail, per-second billing |
| Google Agent Sandbox | GCP enterprises | Gemini Enterprise integration (preview) |
| OpenShell | Agent governance | NVIDIA-backed policy runtime, 7K ★ |
| Blaxel | Idle-cost economics | Hibernation, sub-25ms resume |
| Northflank | Enterprise VPC + GPUs | BYOC on five clouds, Kata/gVisor |
| OpenSandbox | Self-hosted K8s scale | 11.5K ★, protocol-driven, persistence shipped |
| Runloop | Agent evals | Benchmark catalog, snapshot/branch |
| CodeSandbox SDK | Web dev agents | Forking; rebranding under Together AI |
| Microsandbox | Local-first secrets | libkrun, network-layer secret injection |
| AIO Sandbox | All-in-one dev env | Browser+IDE+MCP in one container |
| ComputeSDK | Multi-provider | One API over 9 providers |
| Quilt | (dormant) | Reference implementation only |
| Zeroboot | (stalled) | 0.79ms CoW forking proof of concept |
The story of this quarter is the big clouds showing up — Vercel, Cloudflare, AWS, Google, and NVIDIA all now have credible agent-execution offerings, most bundled into platforms developers already pay for. The independents answered with funding (Modal's $355M, Daytona's $24M), scale proof (E2B's billionth sandbox), and feature velocity (checkpointing everywhere). The next twelve months decide whether "agent sandbox" stays a product category or becomes a feature of every cloud — and the GitHub graveyard already forming at the long tail suggests the middle won't hold.
Research by Ry Walker Research • methodology
Disclosure: Author is CEO of Tembo, which may integrate with sandbox platforms for agent execution.
Sources
- [1] Vercel Sandbox Documentation
- [2] Cloudflare Sandbox SDK GitHub
- [3] Modal Website
- [4] Daytona Website
- [5] E2B Website
- [6] Sprites Website
- [7] Blaxel Website
- [8] BeyondTrust: Pwning AWS AgentCore Code Interpreter
- [9] Quilt GitHub
- [10] Zeroboot GitHub
- [11] CodeSandbox SDK
- [12] AIO Sandbox GitHub Repository
- [13] AWS AgentCore Code Interpreter Documentation
- [14] ComputeSDK Website
- [15] Google Agent Sandbox Documentation
- [16] Microsandbox GitHub Repository
- [17] Northflank Sandboxes
- [18] OpenSandbox GitHub
- [19] NVIDIA OpenShell GitHub
- [20] Runloop Website
- [21] Code And Let Live