Key takeaways
- Generalizes autoresearch beyond ML — any measurable metric becomes an optimization target: test speed, bundle size, Lighthouse scores, build times
- Clean extension/skill separation: infrastructure (run/log/dashboard) is global, domain knowledge (command, metric, scope) is per-project
- Resumable sessions — autoresearch.md + autoresearch.jsonl let any fresh agent continue where the last left off, surviving context resets
- Built for pi editor; proves the autoresearch pattern is extractable from ML into general software engineering
FAQ
What is pi-autoresearch?
An extension for the pi editor that adds autonomous experiment loops for any optimization target. Try an idea, measure it, keep what works, discard what doesn't, repeat forever.
What metrics can it optimize?
Anything measurable: test execution time, JavaScript bundle size, build speed, Lighthouse performance scores, LLM training loss, or any custom metric that outputs a number.
Overview
pi-autoresearch takes Karpathy's autoresearch pattern and makes it domain-agnostic. Built as an extension for the pi editor, it adds autonomous experiment loops that work for any optimization target — not just ML training.
"Try an idea, measure it, keep what works, discard what doesn't, repeat forever."
Architecture
Extension + Skill Pattern
The key insight is separating infrastructure from domain knowledge:
| Layer | Scope | What it provides |
|---|---|---|
| Extension (global) | All projects | run_experiment, log_experiment, widget, dashboard |
| Skill (per-domain) | Specific project | command, metric, direction, scope, ideas |
This means one extension serves unlimited domains — from optimizing React bundle size to ML training loss.
Three Tools
| Tool | Description |
|---|---|
init_experiment | One-time session config — name, metric, unit, direction (lower/higher) |
run_experiment | Runs any command, times wall-clock duration, captures output |
log_experiment | Records result, auto-commits, updates widget and dashboard |
Session Files
| File | Purpose |
|---|---|
autoresearch.md | Session document: objective, metrics, files in scope, what's been tried |
autoresearch.sh | Benchmark script: pre-checks, runs workload, outputs METRIC name=number |
autoresearch.checks.sh | Optional backpressure: tests, types, lint. Failures block keeping changes |
Example Targets
| Domain | Metric | Command |
|---|---|---|
| Test speed | seconds ↓ | pnpm test |
| Bundle size | KB ↓ | pnpm build && du -sb dist |
| LLM training | val_bpb ↓ | uv run train.py |
| Build speed | seconds ↓ | pnpm build |
| Lighthouse | perf score ↑ | lighthouse http://localhost:3000 --output=json |
UX
- Status widget — Always visible above editor:
🔬 autoresearch 12 runs 8 kept │ best: 42.3s /autoresearchdashboard — Full results table (Ctrl+X to toggle, Escape to close)- Resumable —
autoresearch.mdcaptures enough context for any fresh agent to continue
Strengths & Limitations
Strengths:
- Proves autoresearch generalizes beyond ML
- Clean architecture (extension/skill separation)
- Correctness checks gate keeping changes (tests/lint must pass)
- Good UX with real-time dashboard
- Branch-aware, resumable across context resets
Limitations:
- pi editor only (not standalone CLI)
- 2 days old, small community (817 stars)
- No multi-agent coordination
Bottom Line
pi-autoresearch is the proof point that the autoresearch pattern is bigger than ML. By separating infrastructure from domain knowledge, it shows that any CI pipeline with a metric is an autoresearch target. The question is whether this pattern stays editor-specific or becomes a standalone platform.
Research by Ry Walker Research • methodology