Key takeaways
- First AI system to have a paper written entirely by AI accepted through peer review (ICLR 2025 workshop). End-to-end: hypothesis generation, experiment design, execution, data analysis, and LaTeX paper writing
- v2 uses progressive agentic tree search guided by an experiment manager agent, removing reliance on human-authored templates. Generalizes across ML domains
- v1 (12.3k stars) works better with strong templates; v2 (2.3k stars) is designed for open-ended scientific exploration with lower but broader success rates
- Caution: executes LLM-written code autonomously. Requires sandboxed environment. Uses NVIDIA GPUs with CUDA/PyTorch
FAQ
What is AI Scientist?
A fully autonomous scientific research system from Sakana AI that generates hypotheses, designs and runs experiments, analyzes data, and writes scientific papers in LaTeX — all without human intervention. v2 was the first AI system to have a paper accepted through peer review.
What is the difference between v1 and v2?
v1 follows human-authored templates for high success rates on well-defined tasks. v2 removes template dependency, uses agentic tree search, and generalizes across ML domains — but has lower success rates on any given run.
Overview
AI Scientist is Sakana AI's fully autonomous scientific research system. v2 made history as the first AI system to have a paper written entirely by AI accepted through peer review at an ICLR 2025 workshop.
The system autonomously generates hypotheses, designs experiments, runs them, analyzes data, and writes complete LaTeX manuscripts — all without human templates or intervention. v2 uses a progressive agentic tree search guided by an experiment manager agent.
Key stats: v2 has 2,277 stars (v1 has 12,330), MIT-equivalent license, Python. Requires NVIDIA GPUs with CUDA.
Architecture
v2 replaces v1's template-driven approach with open-ended exploration:
- Hypothesis generation — LLM generates research ideas without human templates
- Agentic tree search — Experiment manager agent explores multiple research directions simultaneously
- Experiment execution — Runs code autonomously (requires sandboxed environment)
- Paper generation — Writes complete LaTeX papers with results, analysis, and citations
The tradeoff: v1 has higher success rates on well-defined tasks (strong templates), while v2 tackles open-ended exploration with lower per-run success but broader coverage.
Competitive Position
Strengths: First peer-reviewed AI-authored paper. True end-to-end scientific discovery. No human templates required in v2. Open source.
Weaknesses: High compute requirements (NVIDIA GPUs). Executes arbitrary code (security risk). Lower success rate than template-based v1. Papers are workshop-level, not conference-level.
Research by Ry Walker Research