Key takeaways
- One of the fastest-growing repos of 2026: launched January 18, 2026 and at 47.4K stars by June 11 — including a reported 21,424 stars gained in a single week in late May — yet still pre-1.0 (v0.9.9) and ~91% single-maintainer by commits
- Architecture is the differentiator: tree-sitter parses 21 languages into a local SQLite symbol/call/import graph with FTS5 search and OS-native file watchers for incremental sync — no embeddings, no vector store, no API keys, nothing leaves the machine
- Vendor benchmarks across 7 codebases report a median 58% fewer tool calls and 47% fewer tokens; one independent week-long review measured a 70% median tool-call cut, while noting symbol graphs fail on fuzzy semantic queries
- MIT-licensed and free today, but a hosted "CodeGraph platform" waitlist signals a coming commercial layer from a solo maintainer
FAQ
What is CodeGraph?
CodeGraph is a free, fully local CLI and MCP server that pre-indexes a codebase into a SQLite knowledge graph of symbols, calls, and imports using tree-sitter, so AI coding agents can query code structure directly instead of burning tokens on grep-and-read exploration.
How much does CodeGraph cost?
Nothing — it is MIT-licensed and runs entirely locally with no API keys or subscriptions; a hosted "CodeGraph platform" is announced as coming soon via waitlist.
Which agents and languages does CodeGraph support?
Eight agent integrations (Claude Code, Cursor, Codex CLI, opencode, Hermes Agent, Gemini CLI, Antigravity IDE, Kiro) over MCP, and 21 languages with full support including TypeScript, Python, Go, Rust, Java, C#, and C++.
How is CodeGraph different from Serena or claude-context?
Serena derives symbol understanding live from language servers and claude-context does embedding-based semantic search; CodeGraph pre-computes a static symbol/call graph into embedded SQLite — faster and dependency-free for structural queries, but unable to answer fuzzy "where is the code that does X" questions.
Executive Summary
CodeGraph attacks the most expensive habit AI coding agents have: rediscovering a codebase one grep and file-read at a time, every session. It pre-indexes the repository instead — tree-sitter parses source into ASTs, language-specific queries extract symbols and edges (calls, imports, inheritance), and everything lands in a local SQLite database with full-text search at .codegraph/codegraph.db. Agents then query the graph over MCP, and OS-native file watchers keep the index current incrementally as code changes.[1] It is 100% local — no embeddings, no external APIs, no data leaving the machine — and MIT-licensed.[2][1]
The growth curve is the story. Created January 18, 2026, the repo stood at 47,413 stars and 2,893 forks by June 11 — under five months — including a reported 21,424 stars in a single seven-day stretch in late May that put it at #2 on GitHub Trending.[2][3][4] Against that velocity sits an unusually thin everything-else: the project is pre-1.0 (v0.9.9, June 2, 2026), creator Colby McHenry accounts for 391 of roughly 430 contributor commits (the next contributor has 16), there are 224 open issues, and no dedicated Hacker News launch thread exists despite the star count.[5][2][6]
| Attribute | Value |
|---|---|
| Creator | Colby McHenry (solo maintainer; 391 of ~430 commits)[2] |
| Launched | January 18, 2026[2] |
| Funding | None disclosed; hosted platform waitlist announced[1] |
| GitHub Stars | 47.4K (June 11, 2026); 2.9K forks, 115 watchers[2] |
| License | MIT[2] |
| Latest Release | v0.9.9 (June 2, 2026); 10 releases May 21–June 2[5] |
Product Overview
CodeGraph's pitch is "understand any codebase as a graph."[7] The workflow is three commands: install the binary, run codegraph install to wire the MCP server into your agents, and codegraph init -i per project to build the index. From then on, when Claude Code or Cursor needs to know who calls a function, where a symbol is defined, or what a file imports, it asks the graph in one tool call instead of running a grep-read-grep loop.[1]
Vendor benchmarks across 7 open-source codebases (7 languages, 110 to 10K files) report a median of 58% fewer tool calls, 47% fewer tokens, 22% faster responses, and roughly 16% average cost reduction — with honest variance disclosed: 25–40% savings on smaller codebases, near break-even on response-heavy ones.[1] An independent week-long review across four repositories measured a 70% median tool-call reduction, 59% fewer tokens, and 49% faster responses.[8]
Key Capabilities
| Capability | Description |
|---|---|
| Pre-indexed symbol graph | Functions, classes, methods as nodes; calls, imports, inheritance as edges[1] |
| Full-text search | SQLite FTS5 over the index[1] |
| Auto-sync | FSEvents/inotify/ReadDirectoryChangesW watchers, 2-second debounce, incremental updates[1] |
| 21 languages | TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Swift, Kotlin, Scala, Dart, Lua, Luau, Svelte, Vue, Liquid, Pascal/Delphi; partial Objective-C[1] |
| 8 agent integrations | Claude Code, Cursor, Codex CLI, opencode, Hermes Agent, Gemini CLI, Antigravity IDE, Kiro[1] |
| 100% local | No external APIs, no telemetry of code, nothing leaves the machine[1] |
Product Surfaces
| Surface | Description | Availability |
|---|---|---|
| CLI | Self-contained binary (no Node.js required) or npm package | GA (pre-1.0)[1] |
| MCP server | Graph queries exposed to any MCP-capable agent | GA (pre-1.0)[1] |
| Hosted platform | "CodeGraph platform" at getcodegraph.com | Waitlist[1] |
Technical Architecture
Three layers: extraction (tree-sitter ASTs plus per-language queries), storage (embedded SQLite with FTS5 — no database server, no vector store, no Docker), and sync (native OS file watchers with incremental re-indexing).[1] The codebase is TypeScript, shipped as a self-contained binary.[2][1]
curl -fsSL https://raw.githubusercontent.com/colbymchenry/codegraph/main/install.sh | sh
codegraph install # wire MCP server into agents
codegraph init -i # index the current project
The README is candid about static-analysis ceilings: framework-specific entry points (ASP.NET, Spring, Django, Drupal, Play) resolve at roughly 74–84% coverage because of reflection and convention-based dispatch, and true runtime polymorphism "remains a frontier." Pre-0.9 builds had SQLite lock contention on network shares and WSL2 mounts, addressed with a bundled WAL-mode runtime.[1]
Key Technical Details
| Aspect | Detail |
|---|---|
| Deployment | Local CLI + MCP server; index lives at .codegraph/codegraph.db[1] |
| Model(s) | None — purely static analysis; no embeddings or LLM calls[1] |
| Integrations | 8 agents via MCP; install via curl script, PowerShell, or npm[1] |
| Open Source | MIT, TypeScript, 47.4K stars as of June 2026[2] |
Strengths
- Zero-dependency architecture — embedded SQLite instead of Neo4j or a vector database means no server to run, no API keys, no embedding costs; the entire index is one file in the repo directory.[1]
- Benchmarks with disclosed variance — the vendor publishes per-repo results including the unflattering ones (near break-even on response-heavy codebases), and an independent reviewer's measurements (70% median tool-call cut) landed at or above the published claims.[1][8]
- Genuinely broad surface for a five-month-old project — 21 languages and 8 agent integrations, with a self-contained installer that requires no Node.js.[1]
- Privacy by construction — static parsing on-device with nothing leaving the machine, an easier security review than any embedding-based indexer that ships code to an API.[1]
- Rapid release cadence — 10 releases in the 12 days ending June 2, 2026, with documented fixes landing against known issues like WSL2 lock contention.[5][1]
Cautions
- Solo-maintainer concentration — Colby McHenry has 391 of roughly 430 contributor commits; the next-largest contributor has 16. A 47K-star dependency on one person's continued attention is real bus-factor risk.[2]
- The star curve outruns the community footprint — 21K+ stars in one week, yet 115 watchers, 224 open issues, and no dedicated HN launch thread as of June 2026. This research found no substantiated botting allegation, but the mismatch between star velocity and engagement depth is atypical and worth noting.[2][3][6]
- Symbol graphs cannot answer fuzzy questions — if you don't know the symbol's name, the graph can't find it; semantic "where is the code that does X" queries are exactly where embedding-based rivals win.[3]
- Headline savings are vendor-reported and methodology-limited — the published benchmarks test one question per repo rather than realistic multi-question sessions, and small codebases (under 300 files) see limited value.[3][1]
- Pre-1.0 with a monetization question mark — v0.9.9 software, and the announced hosted platform means the free local tool may become the funnel for a commercial product whose terms are unknown.[5][1]
- Static-analysis ceilings — framework routing resolves at ~74–84% and runtime polymorphism is unsolved, so call graphs in reflection-heavy enterprise codebases will have holes.[1]
What Developers Say
Despite the star count there is no dedicated Hacker News thread for this CodeGraph as of June 2026 — the Algolia results for the name are unrelated, smaller projects — so independent voice lives in individual reviews and r/ClaudeCode threads.[6]
"The strongest pitch yet for symbol graphs as the structural layer beneath AI coding agents." — andrew.ooo, independent review, May 28, 2026[3]
"No embeddings, no API keys, no Docker, no vector store." — andrew.ooo on the architecture[3]
"Symbol graphs don't help with fuzzy questions. If you don't know what the symbol is called, the graph can't find it." — andrew.ooo, same review[3]
"I gave Claude Code a map of my repo — CodeGraph killed 70% of its tool calls." — Chew Loong Nian, week-long test across four repositories, Level Up Coding[8]
The andrew.ooo reviewer reports r/ClaudeCode users putting long-session savings at 30–50%, consistent with the README's median — but note that much of the surrounding content (Medium explainers, SEO guides) is enthusiast or vendor-adjacent rather than adversarial testing.[3][8]
Pricing & Licensing
| Tier | Price | Includes |
|---|---|---|
| Open source | Free | Full CLI, MCP server, all 21 languages, all 8 agent integrations, unlimited local use[1] |
| CodeGraph platform | Not announced | Hosted offering; waitlist at getcodegraph.com[1] |
Licensing model: MIT — permissive, fork-friendly, no copyleft obligations.[2]
Hidden costs: None today — no API keys or metered services. The real costs are operational: index maintenance on very large monorepos, gaps in reflection-heavy frameworks, and the risk that future development effort shifts toward the paid hosted platform.[1]
Competitive Positioning
Direct Competitors
| Competitor | Differentiation |
|---|---|
| Repomix | Repomix packs the whole repo into a single LLM-friendly file — context stuffing; CodeGraph is queryable structure, sending only the symbols and edges the agent asks for |
| Serena | Serena derives symbol understanding live from language servers (LSP) — deeper semantic accuracy, heavier runtime; CodeGraph pre-computes a static index into embedded SQLite with no language-server dependency |
| claude-context | Embedding-based semantic search over MCP — wins on fuzzy "find the code that does X" queries, but needs a vector store and embedding API; CodeGraph is exact-structure, fully offline |
| CodeGraphContext | Near-namesake with the same graph thesis, but built on Neo4j — a database server to operate; CodeGraph's single-file SQLite is the lighter deployment |
When to Choose CodeGraph Over Alternatives
- Choose CodeGraph when: your agent burns most of its budget on structural exploration of a large polyglot codebase, you know your symbols by name, and code must never leave the machine.
- Choose Repomix when: the codebase fits comfortably in a context window and you want zero indexing infrastructure at all.
- Choose Serena when: you want compiler-grade semantic precision (rename-safe references, type-aware navigation) and accept the language-server runtime.
- Choose claude-context when: natural-language semantic search over the codebase matters more than exact call-graph traversal.
Ideal Customer Profile
Best fit:
- Individual developers and teams running Claude Code, Cursor, or Codex against large (1K+ file) repositories where agent exploration dominates token spend
- Polyglot codebases inside the 21 supported languages
- Privacy-constrained environments — regulated industries, air-gapped machines — that rule out embedding-API indexers
Poor fit:
- Small codebases (under 300 files), where the indexing overhead outweighs measured savings[3]
- Teams whose dominant query pattern is fuzzy semantic search rather than symbol-anchored navigation
- Organizations that require a vendor with institutional backing, an SLA, or a contributor base deeper than one person
Viability Assessment
| Factor | Assessment |
|---|---|
| Financial Health | Unfunded solo project; hosted-platform waitlist is the only visible monetization path[1] |
| Market Position | Star leader in the code-graph-for-agents niche — 47.4K stars dwarfs every direct rival — but the position is five months old[2] |
| Innovation Pace | Very high — 10 releases in 12 days, 21 languages and 8 integrations since January[5][1] |
| Community/Ecosystem | Shallow relative to stars: ~91% single-maintainer commits, 115 watchers, 224 open issues, no HN thread[2][6] |
| Long-term Outlook | Hinges on whether the maintainer converts star momentum into contributors and a sustainable platform before agent vendors build indexing in natively |
The thesis is sound and the execution is fast, but the project's most-quoted number — the star count — is also its least informative. The meaningful signals are the architecture (embedded, local, dependency-free), the disclosed-variance benchmarks, and independent measurements that matched or beat the claims; the meaningful risks are one maintainer, pre-1.0 software, and a category that Anthropic, OpenAI, or Cursor could absorb into their agents directly.[1][8][2]
Bottom Line
CodeGraph is the best-packaged version of an idea whose time has clearly come: give agents a pre-built structural map instead of letting them grep their way to one. The single-file SQLite design makes it the easiest graph indexer to adopt and the only one with nothing to operate, and the savings claims have survived at least one independent week-long test. The discount to apply is concentration risk — one maintainer, five months old, version 0.9.x, and a community whose depth does not yet match its star count.
Recommended for: Developers running MCP-capable agents on large, symbol-rich codebases who want measurable token and tool-call reductions with zero infrastructure and zero data egress.
Not recommended for: Small repos, fuzzy-search-dominant workflows, or organizations that cannot take a bus-factor-of-one dependency in their daily toolchain.
Outlook: Watch for a 1.0 release, second-maintainer commit share, terms of the hosted platform — and whether the extraordinary star velocity translates into the contributor and issue-triage depth a project this depended-upon will need.
Research by Ry Walker Research • methodology
Sources
- [1] CodeGraph README: Architecture, Benchmarks, and Caveats
- [2] CodeGraph GitHub Repository
- [3] andrew.ooo: CodeGraph Review — Pre-Indexed Knowledge Graph for AI Agents
- [4] Trendshift: colbymchenry/codegraph trending stats
- [5] CodeGraph GitHub Releases
- [6] CodeGraph mentions on Hacker News (Algolia search)
- [7] CodeGraph Website
- [8] Level Up Coding: I Gave Claude Code a Map of My Repo — CodeGraph Killed 70% of Its Tool Calls