Key takeaways
- The hobbyist/experimental tier froze: Karpathy's autoresearch is dormant at 86K stars (no commits since March 26, 2026, no LICENSE file), and AutoAgent, AutoKernel, and autoresearch-at-home all stalled within days of launch
- Commercial science arrived in its place — Kosmos ($70M seed, $200/run, embedded in Incyte's R&D), Autoscience ($14M led by General Catalyst), and Google Co-Scientist (GA via Gemini for Science at I/O May 2026, deployed to all 17 DOE national labs)
- Nature published the first peer-reviewed methodology for a fully automated AI research system — Sakana's AI Scientist, March 26, 2026 — credentialing the category even as the open-source repos went quiet
- Recursive Superintelligence's $650M raise signals "automated AI research" is now a venture category — the platform gap the hobbyist wave exposed is being filled by funded companies, not weekend repos
FAQ
What is autoresearch?
Autoresearch is a pattern where AI agents autonomously run experiment loops — modify code, run a benchmark, measure the result, keep improvements or revert failures, and repeat. Coined by Andrej Karpathy in March 2026.
How many experiments can autoresearch run overnight?
Karpathy's original runs ~12 experiments/hour (~100 overnight) with 5-minute training budgets. AutoKernel runs ~40 experiments/hour with 90-second cycles. Results vary by domain and benchmark duration.
Does autoresearch only work for ML training?
No. pi-autoresearch generalizes the pattern to any measurable metric — test speed, bundle size, build times, Lighthouse scores. AutoKernel applies it to GPU kernel optimization. The pattern is domain-agnostic.
What is the difference between autoresearch and deep research agents?
Autoresearch runs code experiment loops (edit → benchmark → keep/revert). Deep research agents search the web, read sources, and synthesize knowledge reports. Both are autonomous but solve different problems.
Is Karpathy's autoresearch still maintained?
No. The repo has had no commits since March 26, 2026, with 185 open issues and no LICENSE file despite the README's MIT reference. The pattern lives on in forks and ports — pi-autoresearch is the most active — but the original repo is dormant.
What are the commercial AI scientist platforms?
Kosmos (Edison Scientific) runs 12-hour autonomous discovery loops at $200 per run; Autoscience deploys its Mira agent into customers' production ML models; Google Co-Scientist generates and ranks research hypotheses via Gemini for Science. All three are closed, funded, and enterprise-focused.
Executive Summary
Three months after Karpathy's autoresearch lit up GitHub, the category has split in two. The hobbyist/experimental tier froze: Karpathy's repo is dormant at 86,192 stars (no commits since March 26, 2026, no LICENSE file), and the launch-week derivatives — AutoAgent, AutoKernel, autoresearch-at-home — all stalled within days of shipping.
Meanwhile, commercial science arrived. Kosmos (Edison Scientific) raised a $70M seed, charges $200 per 12-hour discovery run, and is embedded across Incyte's R&D. Autoscience raised $14M led by General Catalyst for its Carl and Mira agents. Google took Co-Scientist GA via Gemini for Science at I/O May 2026 and deployed it to all 17 DOE national labs. And on the same day Karpathy's repo went quiet, Nature published the first peer-reviewed methodology for a fully automated AI research system — Sakana's AI Scientist.
Key Findings:
- The experiment-loop tier is a graveyard with one survivor — only pi-autoresearch still ships releases; everything else froze within weeks of launch
- The money moved to scientific discovery — Kosmos, Autoscience, and Google Co-Scientist are funded, closed, enterprise products, not repos
- Credentialing replaced code — Nature publication (AI Scientist), arXiv papers (AutoKernel, AutoResearchClaw), and peer-review milestones now define the leaderboard
- Recursive Superintelligence's $650M raise confirms "automated AI research" as a venture category — the platform gap is being filled by companies, not open source
Strategic Planning Assumptions:
- By 2027, the dominant autoresearch products will be closed commercial platforms (Kosmos-style per-run pricing or Mira-style embedded agents), with open source relegated to reference implementations
- The experiment-loop pattern will survive as a feature inside coding agents (pi-autoresearch's ports to Claude Code and Cursor are the template), not as standalone tools
- Verification — citation integrity, result registries, accuracy audits — becomes the primary differentiator as venues crack down on AI-generated papers
Market Definition
Autoresearch tools are AI systems that autonomously conduct research — whether through code experimentation, web knowledge synthesis, or scientific discovery — with minimal human intervention.
Inclusion Criteria:
- Autonomous operation (agent decides what to try next)
- Measurable output (metrics, reports, papers, or deployed improvements)
- Open source, or commercially available with publicly documented capabilities
Exclusion Criteria:
- Manual AI-assisted tools (copilots that suggest but don't act)
- Pure benchmarking frameworks without agent loops
Note: the original April 2026 edition excluded proprietary products and required active development. Both criteria are gone — the most consequential entrants are now closed commercial platforms, and half the open-source field is dormant. Dormancy is now a status flag, not a disqualifier.
Status Check: The Dormant Tier (June 2026)
The defining fact of this refresh — most of the open-source wave stopped moving:
| Tool | Stars | Last activity | Status |
|---|---|---|---|
| karpathy/autoresearch | 86,192 | Mar 26, 2026 | Dormant. 185 open issues, no LICENSE file despite README's MIT reference |
| AutoAgent | ~4,500 | Apr 3, 2026 | Frozen since launch week. Zero commits, zero releases since one day after creation |
| AutoKernel | 1,404 | Mar 19, 2026 | Stalled six days after launch; companion arXiv paper is the lasting artifact |
| autoresearch-at-home | 487 | Mar 13, 2026 | Dormant three days after creation; one documented coordinated run |
| dzhng/deep-research | 19,099 | mid-2025 | Dormant. Only commit since Sep 2025 is an April 2026 README attribution change (Aomni → Duet) |
| AI Scientist | 13.9k (v1) / 6.5k (v2) | Dec 2025 | Repo quiet since the custom-license change; 2026 milestone was the Nature paper, not code |
| Tongyi DeepResearch | 19,360 | Feb 27, 2026 | Cooling. Sept 2025 weights remain the only release |
The paradox: stars keep climbing across the board while commits stop. Usage of the patterns is accelerating — PostHog and Shopify ran the Karpathy loop against production codebases — but the repos themselves are unmaintained.
Tier 1: Experiment Loop Agents
The "overnight optimization" pattern. Agent modifies a single file, runs a fixed-time benchmark, keeps improvements, reverts failures, repeats autonomously.
Market Map
| Tool | ⭐ Stars | Created | Domain | Status (June 2026) |
|---|---|---|---|---|
| karpathy/autoresearch | 86,192 | Mar 6, 2026 | LLM training | Dormant since Mar 26; no license |
| davebcn87/pi-autoresearch | ~7,000 | Mar 11, 2026 | Any metric | Active — v1.6.0 (Jun 8), Earendil stewardship |
| kevinrgu/autoagent | ~4,500 | Apr 2026 | Agent harnesses | Frozen since Apr 3 (launch week) |
| Hyperspace AGI | 1,923 | Mar 8, 2026 | Multi-domain | Agent branches alive; pivoting to crypto (A1 blockchain, airdrop trackers) |
| RightNow-AI/autokernel | 1,404 | Mar 11, 2026 | GPU kernels | Stalled Mar 19; arXiv paper published |
| autoresearch-at-home | 487 | Mar 10, 2026 | Distributed | Dormant since Mar 13 |
The launch-week ports (MLX, Windows/RTX, autoresearch-mlx-mkw) remain small curiosities.
pi-autoresearch is the only survivor — five releases since March (v1.2.0–v1.6.0), confidence scoring to separate real gains from benchmark jitter, and an npm-scope migration to @earendil-works tracking Pi's move under Earendil stewardship. Its architecture became the template for community ports to Claude Code and Cursor.
The Core Architecture
All experiment loop tools share this pattern:
program.md— Natural language instructions defining what to optimize, constraints, and strategy- Single file constraint — Agent only modifies one file (e.g.,
train.py), keeping scope manageable - Fixed time budget — Each experiment runs for the same duration, making results comparable
- Append-only log — Results survive restarts and context resets
- Keep/revert decision — Binary outcome per experiment, committed to git
Karpathy's insight — you're not writing code, you're writing the markdown that tells the agent how to write code — survives the repo's dormancy. The pattern's real-world record is now mixed and instructive: it found a three-year-old query-engine bug at PostHog, and produced an overfit 53% "speedup" at Shopify that reviewers rejected.
AutoAgent took the loop meta — optimizing the agent harness itself (prompts, tools, routing) via Harbor benchmarks — before freezing at the proof-of-concept stage.
Tier 2: Deep Research Agents
Web-based knowledge synthesis. These agents search, read, reason, and produce comprehensive research reports. The tier has matured into a two-speed market: one actively shipping incumbent, four slowing or static alternatives.
Market Map
| Tool | ⭐ Stars | Created | Status (June 2026) |
|---|---|---|---|
| gpt-researcher | 27,643 | May 2023 | Active — v3.5.0 (May 28, 2026), steady release cadence since 2023 |
| Tongyi DeepResearch | 19,360 | Jan 2025 | Cooling — repo quiet since Feb 2026; Sept 2025 weights still SOTA-class, now on OpenRouter with a free tier |
| dzhng/deep-research | 19,099 | Feb 2025 | Dormant since mid-2025; attribution moved to Duet. Still the best ~500-LoC teaching artifact |
| open_deep_research | 11,671 | Nov 2024 | Maintenance mode — dependency bumps only since the July 2025 supervisor rewrite |
| DeepResearchAgent | 3,449 | May 2025 | Slowing — last push May 4, 2026; v2.0.0 "self evolving" remains the only major release |
Approaches Diverging
The two schools of thought from April still hold, with a status update:
Prompt-based: Use frontier models with good prompting and tool orchestration. GPT Researcher is the durable winner here — three years of releases, ~$0.10 per research run, provider-agnostic. dzhng/deep-research and LangChain's open_deep_research persist as reference architectures rather than evolving products.
Fine-tuned: Train specialized models via RL. Tongyi DeepResearch proved a 30B MoE (3.3B active params) can lead BrowseComp, GAIA, and HLE — and it runs in production inside Alibaba (Amap, Tongyi FaRui) — but nine months without a new checkpoint is eroding the bet. The promised "next generation of agentic models" hasn't shipped.
SkyworkAI's Autogenesis protocol (self-evolving agents that version their own tools and prompts) remains the most architecturally ambitious design in the tier, with near-zero independent validation.
Tier 3: Scientific Discovery Agents
End-to-end: ideation → experiment → paper or report. This is where the category's center of gravity moved — and where the money went.
Market Map
| Tool | Model | Created | Key Differentiator |
|---|---|---|---|
| Kosmos (Edison Scientific) | Closed SaaS, $200/run | 2025 | 12-hour runs: ~200 rollouts, ~42K lines of code, ~1,500 papers read; every statement cited; $70M seed; embedded in Incyte's R&D |
| Google Co-Scientist | Closed (Gemini) | Feb 2025 | Multi-agent Gemini research partner; Gemini for Science GA at I/O May 2026; DOE national labs; enterprise (Daiichi Sankyo, Bayer Crop Science) |
| Autoscience (Carl + Mira) | Closed, early access | Mar 2025 | First AI papers through double-blind review (ICLR 2025 workshops — withdrawn amid controversy); Mira deploys research into production ML models; $14M GC seed |
| AutoResearchClaw | Open source (MIT) | Mar 15, 2026 | UNC AIMING Lab's 23-stage pipeline: idea → cited LaTeX paper; 13.4K stars in under 3 months; 4-layer citation verification; walked back its own "no human intervention" claim |
| AI Scientist (Sakana) | Open source (custom license) | Aug 2024 | First AI paper through peer review; methodology published in Nature (Mar 26, 2026) — but no v3 and quiet repos since Dec 2025 |
The Commercial Arrival
The April edition excluded proprietary products. That's no longer tenable — the three most consequential entrants are closed:
Kosmos is the most credible commercial AI scientist: traceable citations on every statement, three of seven reported discoveries reproducing unpublished findings, and a real pharma deployment. The catch is accuracy — independent scientists rated 79.4% of statements accurate, with synthesis claims at just 58%.
Autoscience has the strongest external validation (double-blind peer review, a Kaggle featured-competition silver medal) and the most tainted milestone — the ICLR papers were submitted without organizers' knowledge and withdrawn after academics accused the company of co-opting peer review for publicity. Its commercial product Mira reads 1,200+ papers a week and implements improvements directly into customers' production models.
Google Co-Scientist is the scale play: a supervisor coordinating Generation, Reflection, Ranking, Evolution, Proximity, and Meta-review agents, with Nature-published validations and distribution through Gemini for Science to individual researchers, enterprise pharma, and all 17 DOE national labs.
AutoResearchClaw is the open-source counterweight — MIT-licensed where AI Scientist relicensed restrictively, actively shipping where everything else froze, and refreshingly honest: its own paper concludes targeted human collaboration beats full autonomy. Caveats: its 54.7% benchmark win over AI Scientist v2 is self-graded on the lab's own ARC-Bench, and its 13.4K stars coexist with near-zero independent community discussion.
The Verification Arms Race
The tier's emerging differentiator isn't generation — it's verification. Kosmos cites every statement to code or literature; AutoResearchClaw ships 4-layer citation checks and anti-fabrication registries; arXiv now bans authors who submit unverified AI-generated content. The systems winning trust are the ones engineering their errors to be findable.
Competitive Dynamics
What Changed Since April
-
The ecosystem flywheel stopped. April's story was "7 days from Karpathy's release to 6+ variants." June's story is that nearly all of those variants — and the original — are frozen. The pattern won; the repos lost.
-
Capital replaced stars as the scoreboard. Kosmos's $70M, Autoscience's $14M, and Recursive Superintelligence's $650M say more about where autoresearch is going than any GitHub metric.
-
Credentialing became the moat. Nature publication (Sakana), peer-review acceptances (Autoscience's Carl), Nature-validated discoveries (Google Co-Scientist) — the scientific establishment's gatekeepers are now the benchmark that matters.
-
The platform gap is being filled top-down. April's "nobody has built autoresearch-as-a-service" observation is being answered — but by closed commercial platforms (per-run Kosmos, embedded Mira), not the horizontal open-source layer the ecosystem expected.
The Platform Gap, Revisited
The vertical fragmentation remains on the open-source side: Karpathy's is LLM-training-specific (and dormant), pi-autoresearch is Pi-editor-specific (though its ports spread the architecture), AutoKernel is kernel-specific (and stalled). The horizontal layer is emerging as a commercial product category instead — which is exactly what the funding signals predicted.
Technical Comparison
| Dimension | Experiment Loops | Deep Research | Scientific Discovery |
|---|---|---|---|
| Input | Code + metric | Query/topic | Research goal + datasets |
| Agent action | Edit code, run benchmark | Search web, read sources | Design experiments, analyze data, write papers |
| Output | Optimized code + log | Markdown/PDF report | Cited report or LaTeX paper |
| Loop type | Keep/revert per experiment | Depth/breadth exploration | Tree search / multi-cycle discovery runs |
| Duration | Hours to days | Minutes to hours | Hours (Kosmos: 12-hour runs) |
| Business model | Free OSS (mostly dormant) | Free OSS + BYO API keys | Per-run ($200), seats, enterprise deals |
| Verification | Benchmark + git history | Source citations | Citation-to-code/literature, accuracy audits |
What to Watch
Near-term (H2 2026)
- Whether any dormant Tier 1 repo revives — a Karpathy follow-up (the "research community of PhD students" vision) would restart the wave
- Named Mira customers with verifiable results, and whether Kosmos's Incyte collaboration reports productivity gains net of the verification tax
- Independent replication of AutoResearchClaw's ARC-Bench results, and a first peer-review acceptance for one of its generated papers
Medium-term (2026-2027)
- Tongyi's promised next-generation agentic models — if nothing ships in 2026, prompt-based agents on newer frontier models erode its benchmark lead
- Whether venue defenses (arXiv's ban policy, reviewer-consent norms) make autonomous paper generation publishable at all — or push the category fully toward Kosmos-style data discovery
- Sakana's v3 leveraging the Nature paper's scaling-law claim: paper quality tracks foundation-model capability
Long-term (2027+)
- Whether "automated AI research" as a venture category (Recursive Superintelligence's $650M bet) produces a genuine research breakthrough attributable to an agent
- Consolidation: the deep research tier's likely endgame is absorption into frontier-lab products, with GPT Researcher as the durable open-source baseline
Bottom Line
The April thesis — autoresearch as a paradigm shift hiding inside a simple loop — survived. The repos didn't. Karpathy's 86K-star category definition is dormant and unlicensed; the launch-week derivatives froze; only pi-autoresearch still ships. The pattern's value was proven in production (PostHog's three-year-old bug) and its failure mode documented (Shopify's overfit benchmark) — and then the energy moved up-stack.
The category's second act is commercial science: Kosmos selling $200 discovery runs into pharma R&D, Autoscience deploying research agents into production ML models, Google distributing Co-Scientist through Gemini for Science to national labs. Nature publishing Sakana's methodology gave the field its scientific credential at exactly the moment the open-source wave receded.
The biggest opportunity has changed shape: in April it was a horizontal open-source platform — any repo + any metric. In June it's trust infrastructure: verification, citation integrity, and accuracy auditing for autonomous research systems whose best-in-class still gets one in five conclusions wrong. The tools that make AI research checkable will outlast the tools that merely make it fast.
Research by Ry Walker Research • methodology
Sources
- [1] karpathy/autoresearch
- [2] davebcn87/pi-autoresearch
- [3] RightNow-AI/autokernel
- [4] mutable-state-inc/autoresearch-at-home
- [5] assafelovic/gpt-researcher
- [6] dzhng/deep-research
- [7] Alibaba-NLP/DeepResearch (Tongyi)
- [8] langchain-ai/open_deep_research
- [9] SakanaAI/AI-Scientist-v2
- [10] SkyworkAI/DeepResearchAgent
- [11] VentureBeat Coverage
- [12] Awesome Deep Research (comprehensive list)
- [13] matt-k-wong/autoresearch_mlx_mkw
- [14] hyperspaceai/agi
- [15] kevinrgu/autoagent
- [16] aiming-lab/AutoResearchClaw
- [17] Edison Scientific (Kosmos)
- [18] Incyte and Edison Scientific Strategic Collaboration
- [19] TechFundingNews: Edison lands $70M to build autonomous AI scientists
- [20] Autoscience
- [21] R&D World: Autoscience raises $14M seed round
- [22] TechCrunch: Academics accuse AI startups of co-opting peer review
- [23] Google: Gemini for Science at I/O 2026
- [24] Google DeepMind supports the US DOE Genesis Mission
- [25] Sakana AI: The AI Scientist, Now Published in Nature