Local AI Agent Sandboxes Compared | Ry Walker Research

Key takeaways

First-party sandboxing has arrived: Anthropic's open-source sandbox-runtime (4,388 stars) powers Claude Code's /sandbox tool and Codex ships Landlock + seccomp sandboxing by default — eating the niche the standalone wrappers were built to fill
Survivors differentiate on what first-party tools won't do: cross-agent support and credential brokering — nono (2,643 stars, from Sigstore creator Luke Hinds) keeps API keys outside the sandbox entirely via a credential-injection proxy
Three of the five original members are dormant or quiet: landrun stalled in October 2025 (frozen at Landlock ABI v5 while the kernel reached v9), yolo-cage has had no commits since February 2026, and shai is down to maintenance releases at 40 stars
yolobox is the healthy original — 603 stars, weekly releases (v0.18.4, June 2026), and a pivot from simple safety net to local parallel-agent workbench
The ~27-tool gold rush cataloged in the wincent "new wave of AI agent sandboxes" HN thread is consolidating — around OS primitives (Landlock, Seatbelt, bubblewrap) rather than containers

FAQ

What's the best local sandbox for AI coding agents?

If you use Claude Code, enable the built-in /sandbox — it's powered by Anthropic's open-source sandbox-runtime. For multi-agent setups, nono adds kernel enforcement plus credential proxying on macOS and Linux, fence adds command deny rules and SSH filtering, and yolobox remains the simplest container-based option.

Do I need a local sandbox if my agent has built-in safety?

Less than before. Claude Code's /sandbox and Codex's default Landlock + seccomp sandbox now provide real OS-level isolation, not just permission prompts. Standalone tools still earn their keep if you run multiple agents (one policy across all of them), want credentials brokered outside the sandbox, or need policy features like command deny rules the first-party sandboxes lack.

How do local sandboxes differ from cloud sandboxes like E2B?

Local sandboxes run on your machine and protect your filesystem. Cloud sandboxes (E2B, Daytona, Sprites) provide remote isolated environments via API. Local is for individual developers; cloud is for production agent infrastructure.

Which sandbox works on macOS?

anthropic-sandbox-runtime, nono, and fence all use Apple's Seatbelt natively on macOS. yolobox (Docker/Podman/Apple Containers) and shai (Docker) work via containers. litterbox and landrun are Linux-only, and yolo-cage's macOS path (QEMU) is experimental and unmaintained.

Executive Summary

AI coding agents are most productive when you let them run without permission prompts — but one bad command and your home directory is gone. A category of local sandboxing tools emerged to solve this: run your agent in full-auto mode while keeping your machine safe.

Since this comparison's March 2026 snapshot, the story has changed. First-party sandboxing arrived. Anthropic open-sourced sandbox-runtime — the isolation layer behind Claude Code's /sandbox tool, which cut permission prompts by 84% in internal testing — and Codex ships a Landlock + seccomp sandbox by default. The agent vendors are eating the niche the standalone wrappers were built to fill, and the original wave shows it: three of the five tools profiled in March are now dormant or quiet.

This is distinct from cloud sandbox platforms (E2B, Daytona, Sprites) which provide remote isolated environments via API. Local sandboxes run on your machine, wrapping your existing agent CLI in a protective layer.

8 tools compared: anthropic-sandbox-runtime, nono, landrun, fence, yolobox, litterbox, yolo-cage, and shai — plus Codex's built-in sandbox.

Key findings:

The first-party entrant leads: Anthropic's sandbox-runtime (4,388 stars) is the reference implementation the rest of the category now defines itself against
The strongest independent is nono (2,643 stars) — kernel enforcement on both macOS and Linux, plus a credential-injection proxy that keeps API keys outside the sandbox entirely, from Sigstore creator Luke Hinds
The market consolidated around OS primitives — Landlock, Seatbelt, bubblewrap — over containers; every healthy new entrant is container-free
3 of the 5 original members are dormant or quiet: landrun (stalled October 2025), yolo-cage (stalled February 2026), shai (maintenance mode, 40 stars)
yolobox is the healthy original — 603 stars, weekly releases, and a pivot toward being a local parallel-agent workbench
Survivors differentiate on cross-agent support and credential brokering — the two things first-party sandboxes structurally won't provide

The Problem

Every major AI coding agent now ships with a "yolo mode":

Agent	Full-Auto Flag
Claude Code	`--dangerously-skip-permissions`
Codex	`--ask-for-approval never`
Gemini CLI	`--yolo`
Copilot	`--yolo`

These modes are where agents are most productive — no interruptions, no decision fatigue, no waiting. But they're also where agents are most dangerous. A misinterpreted prompt can rm -rf ~, overwrite your SSH keys, push to main, or exfiltrate secrets through a dependency install.

Local sandboxes exist to decouple agent autonomy from machine safety. Let the agent go wild inside the sandbox. Keep your actual system untouchable.

Status Check

The defining update of this refresh: the original wave is splitting into survivors and casualties.

Tool	Status (June 11, 2026)	Evidence
anthropic-sandbox-runtime	🟢 Active (beta)	v0.0.54 (June 4), repo pushed June 11 — but "Beta Research Preview," 118 open issues, community PRs report slow responses
nono	🟢 Healthy	v0.62.0 (June 7), same-day commits, weekly releases
fence	🟢 Healthy	~60 releases since December 2025, pushed June 11
yolobox	🟢 Healthy	v0.18.4 (June 9), weekly releases, 0 open issues
litterbox	🟢 Active	v0.6.0 (May 27), pushed June 10 — small but steady
shai	🟡 Quiet	Maintenance-grade monthly releases (v0.0.11, June 1); 40 stars, ~455 monthly npm downloads
yolo-cage	🔴 Stalled	No commits since February 1, 2026 — two days after launch
landrun	🔴 Stalled	No commits since October 1, 2025; no release since v0.1.14 (April 2025); frozen at Landlock ABI v5 while the kernel reached v9

A dormant security tool is worse than a dormant utility: blocklists, secret patterns, and kernel ABI support all rot. landrun still works — the enforcement lives in the kernel — but it exposes none of the Landlock v6–v9 capabilities (scoped signals, audit logging, Unix socket controls) the kernel has since shipped.

Comparison Matrix

Tool	Language	Stars	Isolation	macOS	Network Control	Status
anthropic-sandbox-runtime	TypeScript	4,388	Seatbelt (macOS), bubblewrap (Linux)	✅	Domain-allowlist proxy	🟢 Active (beta)
nono	Rust	2,643	Landlock (Linux), Seatbelt (macOS)	✅	Allowlist proxy + credential injection	🟢 Healthy
landrun	Go	2,217	Landlock LSM	❌ Linux only	TCP bind/connect rules	🔴 Stalled (Oct 2025)
fence	Go	794	Seatbelt (macOS), bubblewrap (Linux)	✅	Default-deny + domain allowlist proxy	🟢 Healthy
yolobox	Go	603	Docker/Podman/Apple Containers	✅	All-or-nothing	🟢 Healthy
litterbox	Rust	115	Podman + Landlock	❌ Linux only	Limited	🟢 Active
yolo-cage	Python	110	Vagrant VM + K8s	⚠️ Experimental	Egress proxy + secret scanning	🔴 Stalled (Feb 2026)
shai	Go	40	Docker	✅	Allowlist per resource set	🟡 Quiet
Codex built-in	—	—	Landlock + seccomp (Linux), Seatbelt (macOS)	✅	Workspace-scoped	🟢 First-party default

Two Architectures

The March version of this report framed the market as permissive vs restrictive. That axis still exists, but the more important split is now containers vs OS primitives — and the market picked a winner.

Container-based (yolobox, shai, litterbox, yolo-cage): wrap the agent in Docker/Podman/VM isolation. Broader containment, but a runtime dependency, image management, and a shared-kernel trust boundary.

Container-free OS primitives (anthropic-sandbox-runtime, nono, fence, landrun, both built-in sandboxes): apply Seatbelt, Landlock, bubblewrap, or seccomp policies directly to the process. No daemon, no images, near-zero overhead, restrictions inherited by every child process — and on macOS, nothing to install.

Every healthy new entrant since late 2025 is container-free, and both agent vendors chose OS primitives for their first-party sandboxes. The permissive/restrictive philosophy debate hasn't gone away — yolobox is still "let it rip inside the box" while nono and fence are default-deny — but the architectural question is settled.

Product Profiles

anthropic-sandbox-runtime (srt)

"Enforcing filesystem and network restrictions on arbitrary processes at the OS level, without requiring a container."

The first-party entrant that pressures the whole category. srt is the open-sourced isolation layer behind Claude Code's /sandbox sandboxed Bash tool: Seatbelt profiles on macOS, bubblewrap on Linux, plus an HTTP/SOCKS5 proxy enforcing a domain allowlist on every child process. Anthropic reports an 84% reduction in permission prompts in internal testing.

Isolation: Seatbelt (macOS), bubblewrap + network namespaces (Linux/WSL2) — no containers
Defaults: reads allowed (deny rules available), writes denied, network denied
Traction: 4,388 stars, 320 forks in under eight months; Apache-2.0
Caveats: "Beta Research Preview" in the anthropic-experimental org; TLS-blind proxy (domain fronting, DNS exfiltration demonstrated); credentials readable by default without explicit denyRead rules; 118 open issues
Best for: Claude Code users (enable /sandbox) and teams wanting the reference architecture as a library

Full profile →

nono

"Kernel-enforced capability sandbox for AI agents."

The strongest independent — and the fastest-growing tool in the niche, overtaking landrun's star count in a fraction of the time. Built by Luke Hinds (creator of Sigstore, co-founder of Stacklok) under his new company Always Further. Kernel enforcement via Landlock on Linux and Seatbelt on macOS, irreversible once applied and inherited by all child processes.

Isolation: Landlock (Linux), Seatbelt (macOS), WSL2; native Windows planned
Credential injection proxy: API keys never enter the sandboxed process — a trusted proxy injects them into outbound requests from the OS keystore or 1Password
Extras: filesystem snapshots/rollback, cryptographic audit logs, Sigstore attestation of CLAUDE.md/SKILLS.md, profile registry (nono pull) for Claude Code, Codex, OpenCode and more
Traction: 2,643 stars, 184 forks since January 31, 2026; v0.62.0 (June 7); Apache-2.0
Caveats: pre-1.0 with 161 open issues; community discussion thinner than the star count suggests; enterprise pricing undefined
Best for: Multi-agent users on macOS/Linux where API-key exfiltration is the top threat

Full profile →

fence

"Sandbox CLI commands with network/filesystem restrictions."

A Tusk spinout (now its own fencesandbox GitHub org) that openly credits Anthropic's sandbox-runtime as its inspiration — and adds the policy features srt lacks: command deny rules (rm -rf /, git push), SSH command filtering, inbound port exposure, and per-agent templates for Claude Code, Codex, Amp, Gemini CLI, Copilot, OpenCode, and Factory Droid.

Isolation: Seatbelt (macOS), bubblewrap + socat (Linux) — container-free
Defaults: outbound network blocked, filesystem writes denied; JSONC config (fence.jsonc) with template inheritance
Traction: 794 stars and ~60 tagged releases in under six months (created December 18, 2025); Apache-2.0
Caveats: explicitly not a hostile-code boundary per its own security docs; proxy-based allowlisting fails on non-proxy-aware network stacks; no resource limits
Best for: Developers who want one prefix command (fence -t code -- claude) putting default-deny policy around any agent or install script

Full profile →

yolobox

"Let your AI go full send. Your home directory stays home."

The healthiest of the original five. One command (yolobox claude) drops you into a Docker container with your project mounted, all agent CLIs pre-installed and pre-aliased to skip permissions, full sudo inside. Since March it has been pivoting from safety net toward local parallel-agent workbench — fork isolation, clipboard bridge, runtime context manifest, .localhost reverse-proxy integration.

Isolation: Docker/Podman/Apple Containers; home directory not mounted
Traction: 603 stars (up from 536 in March), v0.18.4 (June 9, 2026), still shipping roughly weekly, 0 open issues
Network: full access by default, no_network for air-gapped mode — still no fine-grained allowlists
Best for: Developers who want maximum agent productivity with a simple safety net, and increasingly, parallel local agents

Full profile →

litterbox

"Somewhat isolated development environments."

Still the only tool sandboxing your entire GUI dev environment: Wayland socket forwarding runs editors and agents inside a Podman container, and a custom SSH agent prompts before every key signing. Small but genuinely active — v0.6.0 shipped May 27, 2026, and the repo was pushed June 10.

Isolation: Podman containers + Landlock (Linux only)
SSH agent: per-key exposure with confirmation pop-ups before every signing operation
Traction: 115 stars (up from 66 in March) — slow, steady, early
Honest disclaimers: documents its own limits (shared kernel, clipboard access, audio risk) better than most tools document features
Best for: Linux developers who want their whole dev environment sandboxed, not just agents

Full profile →

shai

"Sandboxing shell for AI coding agents."

The "cellular development" pioneer — read-only workspace by default, opt-in write per subdirectory, and Resource Sets (named bundles of allowed HTTP destinations, mounts, ports, env vars) committed to the repo as team policy. The ideas remain genuinely differentiated; the traction hasn't followed.

Isolation: Docker containers, non-root, read-only mounts
Status: alive but quiet — maintenance-grade monthly releases (v0.0.11, June 1, 2026), 40 stars essentially unchanged since March, ~455 monthly npm downloads
Best for: Security-conscious teams who want per-component access control and accept real abandonment risk

Full profile →

yolo-cage

"AI coding agents that can't exfiltrate secrets or merge their own PRs."

The most security-focused architecture of the original wave — egress proxy scanning all HTTP for secrets (sk-ant-, AKIA, ghp_*), branch isolation, blocked gh pr merge/gh repo delete, TruffleHog pre-push scans. But development stopped two days after launch: no commits since February 1, 2026.

Isolation: Vagrant VM + MicroK8s pods (8GB RAM, 4 CPUs)
Status: stalled at 110 stars; not archived, but the blocklists and secret patterns are rotting in place
Best for: Reading the architecture. The egress-proxy-plus-dispatcher design is worth studying; adopting it is not advisable

Full profile →

landrun

"Sandbox any Linux process using Landlock. No root. No containers."

The tool that proved kernel primitives were the right approach — then stopped. landrun wraps any Linux process in Landlock LSM filesystem and TCP rules with near-zero overhead, and at 2,217 stars it validated the entire container-free category. But there have been no commits since October 1, 2025 and no release since v0.1.14 (April 2025), leaving it frozen at Landlock ABI v5 while the kernel advanced to v9.

Isolation: Landlock LSM (kernel-level, no containers), Linux 5.13+
Status: stalled; what it does it still does well (enforcement is in the kernel), but nobody is extending it
Best for: Linux users who want the smallest auditable binary and accept an unmaintained dependency — nono now covers the same ground, actively, on two platforms

Full profile →

Codex Built-in Sandbox

"OS-enforced sandbox that limits what Codex can touch."

OpenAI's first-party answer: Codex CLI runs commands inside a Landlock + seccomp sandbox by default on Linux and Seatbelt on macOS, scoped to the workspace with configurable approval policies. Not a standalone tool — but together with Claude Code's /sandbox, it means both major agents now ship the protection this category was invented to bolt on.

Isolation Technologies Compared

Technology	Used By	Container Required	Root Required	macOS	Kernel Sharing
Seatbelt (sandbox-exec)	srt, nono, fence, Codex built-in	No	No	✅ (macOS-only)	Yes (same process)
Landlock LSM	landrun, nono, litterbox, Codex built-in	No	No	❌	Yes (same process)
bubblewrap	srt, fence	No	No	❌	Yes
seccomp	Codex built-in	No	No	❌	Yes
Docker	yolobox, shai	Yes	No (daemon does)	✅	Yes
Podman	litterbox	Yes	No (rootless)	❌	Yes
Vagrant/VM	yolo-cage	Yes (VM)	No	⚠️	No

Seatbelt is what makes the new generation cross-platform: srt, nono, and fence all use Apple's built-in sandbox-exec on macOS — nothing to install. The shared risk: Apple has long marked sandbox-exec deprecated.

Landlock is the kernel-native Linux option — irreversible, inherited by children, unprivileged. landrun pioneered it; nono carries it forward actively.

Proxy-based network filtering (srt, nono, fence) is the common new pattern: the OS blocks direct outbound, local HTTP/SOCKS proxies allow listed domains through. It's also the common weakness — srt's docs acknowledge domain fronting and DNS tunneling bypasses.

Containers and VMs still provide the broadest containment (separate filesystem, PID namespace, resource limits) — the case for yolobox's approach — at the cost of a runtime dependency.

Differentiators That Matter Now

With OS-level isolation becoming table stakes, the surviving tools compete on what's layered around it:

Differentiator	Who Has It	Why It Matters
Cross-agent support	nono, fence, yolobox	First-party sandboxes serve one agent; most power users run several
Credential brokering	nono (injection proxy)	Keys outside the sandbox can't be exfiltrated — structurally stronger than path blocking
Command deny rules	fence	Block `git push` or `rm -rf /` by policy, not hope
SSH protection	fence (command filtering), litterbox (per-key signing prompts)	Git operations without exposing the whole agent socket
Parallel-agent workflow	yolobox (fork isolation), nono (multiplexing)	The emerging use case: many agents, one machine
Attestation & audit	nono (Sigstore signing of CLAUDE.md, audit logs)	Prompt-injection supply chain is the next threat surface
Team-sharable policy	shai (.shai/config.yaml), fence (fence.jsonc), nono (profile registry)	Security policy that travels with the repo

When to Use What

By Situation

Situation	Best Tool	Why
You use Claude Code	Built-in `/sandbox` (srt)	Already installed; 84% fewer prompts
You use Codex	Built-in sandbox	Landlock + seccomp on by default
Multiple agents, credential paranoia	nono	One policy across agents; keys never enter the sandbox
Wrap anything in default-deny, zero setup	fence	One prefix command, command deny rules, agent templates
Maximum agent productivity, simple safety	yolobox	Full sudo in a container, home dir not mounted
Entire IDE sandboxed (Linux)	litterbox	Wayland forwarding + confirming SSH agent
Per-component team policy	shai	Resource Sets in version control — if you accept the project risk

By Platform

Platform	Options
macOS	srt, nono, fence (all Seatbelt, container-free); yolobox, shai (containers)
Linux	Everything
Windows	WSL2 only (srt, nono, fence via WSL); native support is nobody's story yet

The Built-in Sandbox Question, Answered

The March version of this report asked whether built-in agent sandboxes would make standalone tools unnecessary, and concluded "not yet." Three months later, the answer is: mostly, for single-agent users.

Claude Code now ships /sandbox — OS-enforced Seatbelt/bubblewrap isolation with proxy-based network allowlisting, generally available, with managed-settings enforcement for organizations — alongside its permission system
Codex runs Landlock + seccomp sandboxing by default on Linux and Seatbelt on macOS

What's left for standalone tools is real but narrower:

Cross-agent coverage — first-party sandboxes serve their own agent. If you run Claude Code, Codex, and Gemini CLI, only a third-party tool gives you one policy across all of them.
Credential brokering — no first-party sandbox keeps API keys outside the agent process; nono's injection proxy does.
Policy depth — srt has no command deny rules, no SSH filtering, and read-denylist-only file policy; fence and forks exist precisely to fill those gaps.
Maintenance trust — srt is a "Beta Research Preview" with a known pattern of slow PR responses; some teams will prefer a tool whose only job is sandboxing.

The predicted endgame — built-ins get good enough for most users, standalone tools survive for power users — is no longer a prediction. It's the current market structure.

Market Context

The wincent "Ask HN: The new wave of AI agent sandboxes?" thread cataloged roughly 27 sandbox tools launched within a year — a genuine gold rush. The consolidation since is visible in this report's own membership: of the five tools profiled in March 2026, two are stalled, one is in maintenance mode, and the survivors are the ones that found a differentiated reason to exist beyond "isolation" — which the agent vendors now ship themselves.

The local sandbox category remains distinct from cloud sandboxes (E2B, Daytona, Sprites, Modal):

	Local Sandboxes	Cloud Sandboxes
Runs on	Your machine	Remote servers
Protects	Your filesystem	Multi-tenant isolation
Use case	Individual dev	Production agent infra
Billing	Free (open source)	Usage-based
Networking	Host network	Isolated network
Persistence	Your disk	Ephemeral or managed

Most developers will use both: local sandbox for development, cloud sandbox for production.

Bottom Line

Tool	Stars	Status	Best For
anthropic-sandbox-runtime	4,388	🟢 Active (beta)	Claude Code users; the reference architecture
nono	2,643	🟢 Healthy	Cross-agent kernel enforcement + credential brokering
landrun	2,217	🔴 Stalled (Oct 2025)	Reading; nono absorbed its niche
fence	794	🟢 Healthy	Default-deny wrapper with command/SSH policy
yolobox	603	🟢 Healthy	Simplest full-auto experience; parallel-agent workbench
litterbox	115	🟢 Active	Linux GUI dev environment isolation
yolo-cage	110	🔴 Stalled (Feb 2026)	Studying the egress-proxy architecture
shai	40	🟡 Quiet	Per-component team policy, with abandonment risk
Codex built-in	—	🟢 First-party	Codex users — it's already on

The local sandbox space is no longer early and fragmented — it's consolidating, fast. First-party sandboxing from Anthropic and OpenAI made OS-level isolation the default expectation, the container-based wrapper generation is mostly dormant, and the tools still growing (nono, fence, yolobox) each picked a job the built-ins won't do: cross-agent policy, credential brokering, parallel-agent workflows.

The agents didn't get safe enough to make sandboxes unnecessary. The sandboxes got absorbed into the agents — and what survives outside them is the part the agent vendors can't or won't build.

Sources