Runloop | Ry Walker Research

Key takeaways

Git-style disk snapshots enable reproducible agent development and experimentation
Built-in SWE-bench and public benchmark support for agent evaluation at scale
Custom bare-metal hypervisor with 2x faster vCPUs and 100ms command execution
Pricing went public in 2026 — Basic $0/mo, Pro $250/mo plus metered compute at $0.108/CPU-hr
Proven at scale: Trajectory ran 10,000 concurrent devboxes on Runloop (May 2026)

FAQ

What is Runloop?

Runloop provides devbox infrastructure for AI coding agents with disk snapshots, benchmark integration, and enterprise security for building and evaluating agents.

How is Runloop different from E2B?

Runloop offers disk snapshots (git for sandboxes), built-in SWE-bench, and ARM support. E2B has larger ecosystem and Fortune 100 adoption.

How much does Runloop cost?

Pricing is now public: Basic is $0/month plus usage, Pro is $250/month, Enterprise is custom. Compute is metered at $0.108/CPU-hr and $0.0252/GB-hr memory, with a $50 free trial credit.

Who competes with Runloop?

E2B, Daytona, Modal, CodeSandbox SDK, and Fly.io Sprites are direct competitors.

Executive Summary

Runloop provides devbox infrastructure specifically designed for AI coding agents, with a focus on benchmarking, evaluation, and reproducible development. The platform's git-style disk snapshots and built-in SWE-bench integration make it the go-to choice for teams developing and evaluating AI coding agents. As of June 2026, pricing is fully public and usage-based, and the platform has demonstrated scale with customer Trajectory running 10,000 concurrent devboxes.

Attribute	Value
Company	Runloop AI
Founded	2024
Funding	$7M (Seed, July 2025)
Employees	~12
Headquarters	San Francisco, CA

Product Overview

Runloop was built to solve the specific infrastructure needs of AI coding agent development. The platform provides "devboxes" — secure, sandboxed environments where agents can execute code, run tests, and interact with git repositories.

Devboxes reached general availability in May 2025. The company raised $7M in seed funding (July 2025) led by The General Partnership with participation from Blank Ventures. Founder Jonathan Wall previously co-founded Google Wallet and Index (acquired by Stripe), signaling enterprise ambitions. In May 2026, customer Trajectory announced it ran 10,000 concurrent devboxes on Runloop for continual-learning workloads — the platform's most significant public scale proof to date.

Key Capabilities

Capability	Description
Disk Snapshots	Git-style snapshot and branch from sandbox disk state
SWE-Bench Integration	Built-in support for running SWE-bench and other benchmarks
Custom Blueprints	Team-shared templates with pre-configured environments
Repo Connections	Automatic environment inference for git repositories
Browser & Computer Use	Headless browser and computer-use capabilities for web interaction
Docker Support	Docker Compose and nested container configurations inside devboxes
Suspend/Resume	Minimize costs with pause/resume for bursty workloads (Pro tier and above)
Axon	Agent coordination primitive, metered at $0.006/axon-hr

Product Surfaces / Editions

Surface	Description	Availability
Python SDK	Primary SDK for devbox control	GA
TypeScript SDK	TypeScript bindings	GA
CLI	Command-line management tools	GA
Dashboard	Web UI for monitoring and management	GA
Public Benchmarks	Hosted SWE-bench, SWE-bench Verified, SWE-Smith, BigCodeBench, DS-1000, Terminal-Bench, HumanEval	GA
Custom Benchmarks	Build and run private evals (Pro tier and above)	GA

Technical Architecture

Runloop uses a custom bare-metal hypervisor optimized for AI agent workloads, claiming 2x faster vCPUs than standard cloud VMs. The architecture supports both ephemeral and stateful use cases with disk snapshot capabilities.

┌─────────────────────────────────────────┐
│          Runloop Platform               │
├─────────────────────────────────────────┤
│  ┌──────────┐  ┌──────────┐             │
│  │  Devbox  │  │  Devbox  │    ...      │
│  │  (μVM)   │  │  (μVM)   │             │
│  └────┬─────┘  └────┬─────┘             │
│       │             │                   │
│  ┌────┴─────────────┴─────┐             │
│  │   Snapshot Store       │             │
│  │   (Git for Disk)       │             │
│  └────────────────────────┘             │
│                                         │
│  ┌────────────────────────────────┐     │
│  │  Custom Bare-Metal Hypervisor │     │
│  │  (2x faster vCPUs)             │     │
│  └────────────────────────────────┘     │
└─────────────────────────────────────────┘

Key Technical Details

Aspect	Detail
Isolation	Custom hypervisor (hardware-level)
vCPU Performance	2x faster than standard cloud
Command Latency	~100ms execution
Persistence	Stateful with disk snapshots
ARM Support	Full arm64 and x86 support
Open Source	No (proprietary platform)

Strengths

Disk snapshots — Git-style snapshot and branch enables reproducible experiments and rollback
SWE-bench integration — One-click benchmark execution; compare against published baselines
Performance — Custom hypervisor with 2x faster vCPUs and 100ms command execution
ARM support — Only provider with full arm64 support alongside x86
Framework agnostic — Works with any agent framework (LangChain, AutoGPT, custom)
Enterprise-ready — SOC2, HIPAA, GDPR compliant; VPC deployment available
Suspend/resume — Cost optimization for bursty agent workloads

Cautions

Newer platform — Founded 2024; less battle-tested than E2B at scale
Smaller ecosystem — Fewer integrations and community resources than market leaders
Not open source — Proprietary platform; no self-hosting option currently
Benchmark-focused — Stronger for evaluation use cases than general agent deployment
No GPU support — CPU-only; ML workloads requiring GPUs need Modal
Limited third-party mindshare — Prominent 2026 sandbox comparisons (e.g., Northflank's) omit Runloop entirely, suggesting weaker awareness outside the eval niche
Pro tier base fee — $250/month platform fee before compute for suspend/resume, repo connections, and custom benchmarks

What Developers Say

No substantive third-party developer commentary on Runloop was found on Hacker News, Reddit, or X as of June 2026 — HN search returns only unrelated uses of "runloop" (iOS NSRunLoop, event loops), and coverage is dominated by Runloop's own blog and press releases. The strongest independent signal is a customer announcement: Trajectory publicly stated it ran 10,000 concurrent devboxes on Runloop for continual-learning workloads. The absence of organic community discussion — and omission from major 2026 sandbox comparison articles — is itself a data point: Runloop's traction appears concentrated in direct enterprise relationships rather than bottoms-up developer adoption.

Pricing & Licensing

Pricing became fully public in 2026 (previously contact-only).

Tier	Price	Includes
Free Trial	$50 credits	Full Pro features; limits: 3 devboxes, 5 blueprints, 10 snapshots; no credit card
Basic	$0/mo + usage	Public benchmarks, devboxes, blueprints, snapshots, 100 GB free storage, email support
Pro	$250/mo + usage	Suspend/resume, repo connections, custom benchmarks, beta access, 1 TB free storage, Slack support
Enterprise	Custom	Reinforcement fine-tuning, VPC deployment, priority support, custom storage

Compute rates: $0.108/CPU-hr, $0.0252/GB-hr memory, $0.00034236/GB-hr devbox storage; snapshot/blueprint/object storage at $0.000072/GB-hr.

Benchmark runs: Hosted evals priced per round, from $1.17 (HumanEval) to $18.66 (SWE-bench Verified).

Licensing model: Proprietary, usage-based with published rates

Hidden costs: Benchmark runs can consume significant compute; Pro's $250/month base fee applies before any usage

Competitive Positioning

Direct Competitors

Competitor	Differentiation
E2B	E2B has larger ecosystem and Fortune 100 adoption; Runloop has disk snapshots and SWE-bench
Daytona	Daytona has Computer Use and open source; Runloop has benchmarking focus
Modal	Modal has GPUs; Runloop is specialized for coding agent development
Sprites	Sprites has checkpoint/restore; Runloop has SWE-bench and agent tooling

When to Choose Runloop Over Alternatives

Choose Runloop when: You're building/evaluating coding agents and need disk snapshots or SWE-bench integration
Choose E2B when: You need proven enterprise scale or the largest ecosystem
Choose Daytona when: You need Computer Use or want open-source self-hosting
Choose Modal when: You need GPU access for ML workloads

Ideal Customer Profile

Best fit:

Teams building and evaluating AI coding agents
Research organizations running SWE-bench or custom benchmarks
Companies needing reproducible agent development environments
Organizations wanting ARM support for cost optimization
Enterprise teams requiring SOC2/HIPAA compliance with VPC deployment
RL/continual-learning teams needing thousands of concurrent environments

Poor fit:

General AI code execution (E2B may be simpler)
Computer Use/desktop automation needs (Daytona better fit)
GPU-required ML workloads (Modal better fit)
Hobbyists for whom Pro's $250/month base fee is prohibitive

Viability Assessment

Factor	Assessment
Financial Health	Moderate — $7M seed (July 2025); early stage; no new round announced as of June 2026
Market Position	Niche Leader — Strong in agent benchmarking; proven 10K-concurrent-devbox scale
Innovation Pace	Rapid — Active development, benchmark focus
Community/Ecosystem	Growing — Smaller but focused on agent builders
Long-term Outlook	Positive — Well-positioned for coding agent market

Runloop has carved out a niche in the AI coding agent development and evaluation space. The combination of disk snapshots and SWE-bench integration addresses real pain points for agent builders. Risk is competing with better-funded platforms expanding into benchmarking.

Bottom Line

Runloop is purpose-built for AI coding agent development and evaluation. The disk snapshot feature (git for sandboxes) enables reproducible experiments that are difficult to achieve on ephemeral platforms like E2B. The built-in SWE-bench integration makes it the obvious choice for teams running agent evaluations.

The trade-off is a smaller ecosystem and limited third-party mindshare — though the earlier pricing-opacity concern is resolved (rates are now published), and the Trajectory deployment of 10,000 concurrent devboxes answers the scale question. For general code execution, E2B's larger ecosystem may be simpler. For agent development, benchmarking, and RL-style workloads, Runloop's specialized features add real value.

Recommended for: Teams building and evaluating AI coding agents who need reproducible environments, disk snapshots, and benchmark integration.

Not recommended for: General AI code execution use cases, Computer Use/desktop automation, or teams needing GPU access.

Outlook: Runloop is well-positioned to capture the growing agent development tools market, and its 2026 pivot toward RL/continual-learning workloads (Trajectory, reinforcement fine-tuning on Enterprise) opens a second growth vector. Watch for a Series A — the $7M seed is approaching a year old with no follow-on announced.

Research by Ry Walker Research • methodology

Sources