← Back to research
·9 min read·company

Runloop

Runloop provides devbox infrastructure for AI coding agents with disk snapshots, SWE-bench integration, public usage-based pricing, and enterprise-grade security for agent development and evaluation.

Key takeaways

  • Git-style disk snapshots enable reproducible agent development and experimentation
  • Built-in SWE-bench and public benchmark support for agent evaluation at scale
  • Custom bare-metal hypervisor with 2x faster vCPUs and 100ms command execution
  • Pricing went public in 2026 — Basic $0/mo, Pro $250/mo plus metered compute at $0.108/CPU-hr
  • Proven at scale: Trajectory ran 10,000 concurrent devboxes on Runloop (May 2026)

FAQ

What is Runloop?

Runloop provides devbox infrastructure for AI coding agents with disk snapshots, benchmark integration, and enterprise security for building and evaluating agents.

How is Runloop different from E2B?

Runloop offers disk snapshots (git for sandboxes), built-in SWE-bench, and ARM support. E2B has larger ecosystem and Fortune 100 adoption.

How much does Runloop cost?

Pricing is now public: Basic is $0/month plus usage, Pro is $250/month, Enterprise is custom. Compute is metered at $0.108/CPU-hr and $0.0252/GB-hr memory, with a $50 free trial credit.

Who competes with Runloop?

E2B, Daytona, Modal, CodeSandbox SDK, and Fly.io Sprites are direct competitors.

Executive Summary

Runloop provides devbox infrastructure specifically designed for AI coding agents, with a focus on benchmarking, evaluation, and reproducible development. The platform's git-style disk snapshots and built-in SWE-bench integration make it the go-to choice for teams developing and evaluating AI coding agents. As of June 2026, pricing is fully public and usage-based, and the platform has demonstrated scale with customer Trajectory running 10,000 concurrent devboxes.

AttributeValue
CompanyRunloop AI
Founded2024
Funding$7M (Seed, July 2025)
Employees~12
HeadquartersSan Francisco, CA

Product Overview

Runloop was built to solve the specific infrastructure needs of AI coding agent development. The platform provides "devboxes" — secure, sandboxed environments where agents can execute code, run tests, and interact with git repositories.

Devboxes reached general availability in May 2025. The company raised $7M in seed funding (July 2025) led by The General Partnership with participation from Blank Ventures. Founder Jonathan Wall previously co-founded Google Wallet and Index (acquired by Stripe), signaling enterprise ambitions. In May 2026, customer Trajectory announced it ran 10,000 concurrent devboxes on Runloop for continual-learning workloads — the platform's most significant public scale proof to date.

Key Capabilities

CapabilityDescription
Disk SnapshotsGit-style snapshot and branch from sandbox disk state
SWE-Bench IntegrationBuilt-in support for running SWE-bench and other benchmarks
Custom BlueprintsTeam-shared templates with pre-configured environments
Repo ConnectionsAutomatic environment inference for git repositories
Browser & Computer UseHeadless browser and computer-use capabilities for web interaction
Docker SupportDocker Compose and nested container configurations inside devboxes
Suspend/ResumeMinimize costs with pause/resume for bursty workloads (Pro tier and above)
AxonAgent coordination primitive, metered at $0.006/axon-hr

Product Surfaces / Editions

SurfaceDescriptionAvailability
Python SDKPrimary SDK for devbox controlGA
TypeScript SDKTypeScript bindingsGA
CLICommand-line management toolsGA
DashboardWeb UI for monitoring and managementGA
Public BenchmarksHosted SWE-bench, SWE-bench Verified, SWE-Smith, BigCodeBench, DS-1000, Terminal-Bench, HumanEvalGA
Custom BenchmarksBuild and run private evals (Pro tier and above)GA

Technical Architecture

Runloop uses a custom bare-metal hypervisor optimized for AI agent workloads, claiming 2x faster vCPUs than standard cloud VMs. The architecture supports both ephemeral and stateful use cases with disk snapshot capabilities.

┌─────────────────────────────────────────┐
│          Runloop Platform               │
├─────────────────────────────────────────┤
│  ┌──────────┐  ┌──────────┐             │
│  │  Devbox  │  │  Devbox  │    ...      │
│  │  (μVM)   │  │  (μVM)   │             │
│  └────┬─────┘  └────┬─────┘             │
│       │             │                   │
│  ┌────┴─────────────┴─────┐             │
│  │   Snapshot Store       │             │
│  │   (Git for Disk)       │             │
│  └────────────────────────┘             │
│                                         │
│  ┌────────────────────────────────┐     │
│  │  Custom Bare-Metal Hypervisor │     │
│  │  (2x faster vCPUs)             │     │
│  └────────────────────────────────┘     │
└─────────────────────────────────────────┘

Key Technical Details

AspectDetail
IsolationCustom hypervisor (hardware-level)
vCPU Performance2x faster than standard cloud
Command Latency~100ms execution
PersistenceStateful with disk snapshots
ARM SupportFull arm64 and x86 support
Open SourceNo (proprietary platform)

Strengths

  • Disk snapshots — Git-style snapshot and branch enables reproducible experiments and rollback
  • SWE-bench integration — One-click benchmark execution; compare against published baselines
  • Performance — Custom hypervisor with 2x faster vCPUs and 100ms command execution
  • ARM support — Only provider with full arm64 support alongside x86
  • Framework agnostic — Works with any agent framework (LangChain, AutoGPT, custom)
  • Enterprise-ready — SOC2, HIPAA, GDPR compliant; VPC deployment available
  • Suspend/resume — Cost optimization for bursty agent workloads

Cautions

  • Newer platform — Founded 2024; less battle-tested than E2B at scale
  • Smaller ecosystem — Fewer integrations and community resources than market leaders
  • Not open source — Proprietary platform; no self-hosting option currently
  • Benchmark-focused — Stronger for evaluation use cases than general agent deployment
  • No GPU support — CPU-only; ML workloads requiring GPUs need Modal
  • Limited third-party mindshare — Prominent 2026 sandbox comparisons (e.g., Northflank's) omit Runloop entirely, suggesting weaker awareness outside the eval niche
  • Pro tier base fee — $250/month platform fee before compute for suspend/resume, repo connections, and custom benchmarks

What Developers Say

No substantive third-party developer commentary on Runloop was found on Hacker News, Reddit, or X as of June 2026 — HN search returns only unrelated uses of "runloop" (iOS NSRunLoop, event loops), and coverage is dominated by Runloop's own blog and press releases. The strongest independent signal is a customer announcement: Trajectory publicly stated it ran 10,000 concurrent devboxes on Runloop for continual-learning workloads. The absence of organic community discussion — and omission from major 2026 sandbox comparison articles — is itself a data point: Runloop's traction appears concentrated in direct enterprise relationships rather than bottoms-up developer adoption.


Pricing & Licensing

Pricing became fully public in 2026 (previously contact-only).

TierPriceIncludes
Free Trial$50 creditsFull Pro features; limits: 3 devboxes, 5 blueprints, 10 snapshots; no credit card
Basic$0/mo + usagePublic benchmarks, devboxes, blueprints, snapshots, 100 GB free storage, email support
Pro$250/mo + usageSuspend/resume, repo connections, custom benchmarks, beta access, 1 TB free storage, Slack support
EnterpriseCustomReinforcement fine-tuning, VPC deployment, priority support, custom storage

Compute rates: $0.108/CPU-hr, $0.0252/GB-hr memory, $0.00034236/GB-hr devbox storage; snapshot/blueprint/object storage at $0.000072/GB-hr.

Benchmark runs: Hosted evals priced per round, from $1.17 (HumanEval) to $18.66 (SWE-bench Verified).

Licensing model: Proprietary, usage-based with published rates

Hidden costs: Benchmark runs can consume significant compute; Pro's $250/month base fee applies before any usage


Competitive Positioning

Direct Competitors

CompetitorDifferentiation
E2BE2B has larger ecosystem and Fortune 100 adoption; Runloop has disk snapshots and SWE-bench
DaytonaDaytona has Computer Use and open source; Runloop has benchmarking focus
ModalModal has GPUs; Runloop is specialized for coding agent development
SpritesSprites has checkpoint/restore; Runloop has SWE-bench and agent tooling

When to Choose Runloop Over Alternatives

  • Choose Runloop when: You're building/evaluating coding agents and need disk snapshots or SWE-bench integration
  • Choose E2B when: You need proven enterprise scale or the largest ecosystem
  • Choose Daytona when: You need Computer Use or want open-source self-hosting
  • Choose Modal when: You need GPU access for ML workloads

Ideal Customer Profile

Best fit:

  • Teams building and evaluating AI coding agents
  • Research organizations running SWE-bench or custom benchmarks
  • Companies needing reproducible agent development environments
  • Organizations wanting ARM support for cost optimization
  • Enterprise teams requiring SOC2/HIPAA compliance with VPC deployment
  • RL/continual-learning teams needing thousands of concurrent environments

Poor fit:

  • General AI code execution (E2B may be simpler)
  • Computer Use/desktop automation needs (Daytona better fit)
  • GPU-required ML workloads (Modal better fit)
  • Hobbyists for whom Pro's $250/month base fee is prohibitive

Viability Assessment

FactorAssessment
Financial HealthModerate — $7M seed (July 2025); early stage; no new round announced as of June 2026
Market PositionNiche Leader — Strong in agent benchmarking; proven 10K-concurrent-devbox scale
Innovation PaceRapid — Active development, benchmark focus
Community/EcosystemGrowing — Smaller but focused on agent builders
Long-term OutlookPositive — Well-positioned for coding agent market

Runloop has carved out a niche in the AI coding agent development and evaluation space. The combination of disk snapshots and SWE-bench integration addresses real pain points for agent builders. Risk is competing with better-funded platforms expanding into benchmarking.


Bottom Line

Runloop is purpose-built for AI coding agent development and evaluation. The disk snapshot feature (git for sandboxes) enables reproducible experiments that are difficult to achieve on ephemeral platforms like E2B. The built-in SWE-bench integration makes it the obvious choice for teams running agent evaluations.

The trade-off is a smaller ecosystem and limited third-party mindshare — though the earlier pricing-opacity concern is resolved (rates are now published), and the Trajectory deployment of 10,000 concurrent devboxes answers the scale question. For general code execution, E2B's larger ecosystem may be simpler. For agent development, benchmarking, and RL-style workloads, Runloop's specialized features add real value.

Recommended for: Teams building and evaluating AI coding agents who need reproducible environments, disk snapshots, and benchmark integration.

Not recommended for: General AI code execution use cases, Computer Use/desktop automation, or teams needing GPU access.

Outlook: Runloop is well-positioned to capture the growing agent development tools market, and its 2026 pivot toward RL/continual-learning workloads (Trajectory, reinforcement fine-tuning on Enterprise) opens a second growth vector. Watch for a Series A — the $7M seed is approaching a year old with no follow-on announced.


Research by Ry Walker Research • methodology