pi-autoresearch | Ry Walker Research

Key takeaways

Generalizes autoresearch beyond ML — any measurable metric becomes an optimization target: test speed, bundle size, Lighthouse scores, build times
Clean extension/skill separation: infrastructure (run/log/dashboard) is global, domain knowledge (command, metric, scope) is per-project
Resumable sessions — `.auto/prompt.md` + `.auto/log.jsonl` let any fresh agent continue where the last left off, surviving context resets and crashes
Grew from 817 stars at launch to ~7K by June 2026, with five releases since March and a confidence-scoring system that separates real gains from benchmark jitter
The most-starred derivative of Karpathy's autoresearch — and the template the community ported to Claude Code and Cursor

FAQ

What is pi-autoresearch?

An extension for the Pi coding agent that adds autonomous experiment loops for any optimization target. Try an idea, measure it, keep what works, discard what doesn't, repeat forever.

What metrics can it optimize?

Anything measurable: test execution time, JavaScript bundle size, build speed, Lighthouse performance scores, LLM training loss, or any custom metric that outputs a number.

How much does pi-autoresearch cost?

It's free, MIT-licensed open source, distributed as the `pi-autoresearch` npm package. You pay only for the LLM tokens your Pi agent consumes while running experiments.

Overview

pi-autoresearch takes Karpathy's autoresearch pattern and makes it domain-agnostic. Built as a TypeScript extension for the Pi coding agent (now stewarded by Earendil), it adds autonomous experiment loops that work for any optimization target — not just ML training.

"Try an idea, measure it, keep what works, discard what doesn't, repeat forever."

Status (as of June 2026)

The project is healthy and actively maintained. The GitHub repo sits at ~7K stars and 413 forks (up from 817 stars at our March 2026 profile), with the latest push on June 8, 2026. Five releases have shipped since this profile was first written — v1.2.0 through v1.6.0 — with v1.6.0 (June 8, 2026) migrating the npm scope from @mariozechner to @earendil-works, tracking Pi's move under Earendil stewardship at pi.dev. It is listed as an official package in the pi.dev registry.

Pricing: free, MIT-licensed open source; the only cost is the LLM tokens Pi consumes while iterating.

Architecture

Extension + Skill Pattern

The key insight is separating infrastructure from domain knowledge:

Layer	Scope	What it provides
Extension (global)	All projects	`run_experiment`, `log_experiment`, widget, dashboard
Skill (per-domain)	Specific project	command, metric, direction, scope, ideas

This means one extension serves unlimited domains — from optimizing React bundle size to ML training loss.

Three Tools

Tool	Description
`init_experiment`	One-time session config — name, metric, unit, direction (lower/higher)
`run_experiment`	Runs any command, times wall-clock duration, captures output
`log_experiment`	Records result, auto-commits, updates widget and dashboard

Session Files

As of v1.6.0, all session files live in a .auto/ subfolder (previously top-level autoresearch.* files):

File	Purpose
`.auto/prompt.md`	Session document: objective, metrics, files in scope, what's been tried
`.auto/log.jsonl`	Append-only experiment log: commit hash, metric value, kept/discarded status
Backpressure checks	Optional: tests, types, lint. Failures block keeping changes

Example Targets

Domain	Metric	Command
Test speed	seconds ↓	`pnpm test`
Bundle size	KB ↓	`pnpm build && du -sb dist`
LLM training	val_bpb ↓	`uv run train.py`
Build speed	seconds ↓	`pnpm build`
Lighthouse	perf score ↑	`lighthouse http://localhost:3000 --output=json`

UX

Status widget — Always visible above the editor with live run counts and best metric
Dashboard — Defaults to expanded since v1.6.0; fullscreen overlay via Ctrl+Shift+F
Confidence scoring — After 3+ experiments, green/yellow/red indicators distinguish real gains from benchmark jitter on noisy signals
Resumable — .auto/prompt.md captures enough context for any fresh agent to continue; auto-compaction keeps long loops running across context limits

Changes Since March 2026

Five releases (v1.2.0–v1.6.0) between late April and June 8, 2026
Earendil migration — npm scope moved from @mariozechner to @earendil-works as Pi itself (61.7K stars) moved under Earendil stewardship at pi.dev
Confidence scoring, hooks system, and config-file support added for custom behavior at iteration boundaries
Ecosystem spread — the architecture has been ported to a Claude Code plugin (ozeron/autoresearch) and a Cursor port (cursor-autoresearch), making pi-autoresearch the de facto template for non-ML autoresearch

Strengths & Limitations

Strengths:

Proves autoresearch generalizes beyond ML
Clean architecture (extension/skill separation)
Correctness checks gate keeping changes (tests/lint must pass)
Good UX with real-time dashboard
Branch-aware, resumable across context resets

Limitations:

Pi agent only (not standalone CLI) — though community ports cover Claude Code and Cursor
Single-maintainer project; bus factor remains a risk despite the Earendil ecosystem alignment
No multi-agent coordination

What Developers Say

Community commentary is positive but still thin — most discussion happens in ecosystem roundups rather than long review threads.

paddo.dev's autoresearch ecosystem survey called it "the most popular derivative by stars, largely because it makes the loop usable for non-ML tasks with a proper interface," noting it "adds persistent sessions that survive restarts and context resets, a dashboard UI, and branch-aware experiment tracking."
On Hacker News, developer ozeron called it a "great find" and announced a Claude Code plugin port of its architecture.

Bottom Line

Recommended if you use Pi and have any metric worth optimizing overnight. pi-autoresearch is the proof point that the autoresearch pattern is bigger than ML. By separating infrastructure from domain knowledge, it shows that any CI pipeline with a metric is an autoresearch target. Not recommended as your entry point if you're not on Pi — use the Claude Code or Cursor ports instead.

Outlook: the March question — would this stay editor-specific or become a platform? — has been half-answered: it stayed a Pi extension, but its architecture became the template everyone else ports. With Earendil now stewarding Pi and the npm scope migrated accordingly, pi-autoresearch looks like durable first-party-adjacent infrastructure rather than a weekend hack.

Research by Ry Walker Research • methodology

Sources