CodeGraph | Ry Walker Research

Key takeaways

One of the fastest-growing repos of 2026: launched January 18, 2026 and at 47.4K stars by June 11 — including a reported 21,424 stars gained in a single week in late May — yet still pre-1.0 (v0.9.9) and ~91% single-maintainer by commits
Architecture is the differentiator: tree-sitter parses 21 languages into a local SQLite symbol/call/import graph with FTS5 search and OS-native file watchers for incremental sync — no embeddings, no vector store, no API keys, nothing leaves the machine
Vendor benchmarks across 7 codebases report a median 58% fewer tool calls and 47% fewer tokens; one independent week-long review measured a 70% median tool-call cut, while noting symbol graphs fail on fuzzy semantic queries
MIT-licensed and free today, but a hosted "CodeGraph platform" waitlist signals a coming commercial layer from a solo maintainer

FAQ

What is CodeGraph?

CodeGraph is a free, fully local CLI and MCP server that pre-indexes a codebase into a SQLite knowledge graph of symbols, calls, and imports using tree-sitter, so AI coding agents can query code structure directly instead of burning tokens on grep-and-read exploration.

How much does CodeGraph cost?

Nothing — it is MIT-licensed and runs entirely locally with no API keys or subscriptions; a hosted "CodeGraph platform" is announced as coming soon via waitlist.

Which agents and languages does CodeGraph support?

Eight agent integrations (Claude Code, Cursor, Codex CLI, opencode, Hermes Agent, Gemini CLI, Antigravity IDE, Kiro) over MCP, and 21 languages with full support including TypeScript, Python, Go, Rust, Java, C#, and C++.

How is CodeGraph different from Serena or claude-context?

Serena derives symbol understanding live from language servers and claude-context does embedding-based semantic search; CodeGraph pre-computes a static symbol/call graph into embedded SQLite — faster and dependency-free for structural queries, but unable to answer fuzzy "where is the code that does X" questions.

Executive Summary

CodeGraph attacks the most expensive habit AI coding agents have: rediscovering a codebase one grep and file-read at a time, every session. It pre-indexes the repository instead — tree-sitter parses source into ASTs, language-specific queries extract symbols and edges (calls, imports, inheritance), and everything lands in a local SQLite database with full-text search at .codegraph/codegraph.db. Agents then query the graph over MCP, and OS-native file watchers keep the index current incrementally as code changes.^[1] It is 100% local — no embeddings, no external APIs, no data leaving the machine — and MIT-licensed.^[2]^[1]

The growth curve is the story. Created January 18, 2026, the repo stood at 47,413 stars and 2,893 forks by June 11 — under five months — including a reported 21,424 stars in a single seven-day stretch in late May that put it at #2 on GitHub Trending.^[2]^[3]^[4] Against that velocity sits an unusually thin everything-else: the project is pre-1.0 (v0.9.9, June 2, 2026), creator Colby McHenry accounts for 391 of roughly 430 contributor commits (the next contributor has 16), there are 224 open issues, and no dedicated Hacker News launch thread exists despite the star count.^[5]^[2]^[6]

Attribute	Value
Creator	Colby McHenry (solo maintainer; 391 of ~430 commits)^[2]
Launched	January 18, 2026^[2]
Funding	None disclosed; hosted platform waitlist announced^[1]
GitHub Stars	47.4K (June 11, 2026); 2.9K forks, 115 watchers^[2]
License	MIT^[2]
Latest Release	v0.9.9 (June 2, 2026); 10 releases May 21–June 2^[5]

Product Overview

CodeGraph's pitch is "understand any codebase as a graph."^[7] The workflow is three commands: install the binary, run codegraph install to wire the MCP server into your agents, and codegraph init -i per project to build the index. From then on, when Claude Code or Cursor needs to know who calls a function, where a symbol is defined, or what a file imports, it asks the graph in one tool call instead of running a grep-read-grep loop.^[1]

Vendor benchmarks across 7 open-source codebases (7 languages, 110 to 10K files) report a median of 58% fewer tool calls, 47% fewer tokens, 22% faster responses, and roughly 16% average cost reduction — with honest variance disclosed: 25–40% savings on smaller codebases, near break-even on response-heavy ones.^[1] An independent week-long review across four repositories measured a 70% median tool-call reduction, 59% fewer tokens, and 49% faster responses.^[8]

Key Capabilities

Capability	Description
Pre-indexed symbol graph	Functions, classes, methods as nodes; calls, imports, inheritance as edges^[1]
Full-text search	SQLite FTS5 over the index^[1]
Auto-sync	FSEvents/inotify/ReadDirectoryChangesW watchers, 2-second debounce, incremental updates^[1]
21 languages	TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Swift, Kotlin, Scala, Dart, Lua, Luau, Svelte, Vue, Liquid, Pascal/Delphi; partial Objective-C^[1]
8 agent integrations	Claude Code, Cursor, Codex CLI, opencode, Hermes Agent, Gemini CLI, Antigravity IDE, Kiro^[1]
100% local	No external APIs, no telemetry of code, nothing leaves the machine^[1]

Product Surfaces

Surface	Description	Availability
CLI	Self-contained binary (no Node.js required) or npm package	GA (pre-1.0)^[1]
MCP server	Graph queries exposed to any MCP-capable agent	GA (pre-1.0)^[1]
Hosted platform	"CodeGraph platform" at getcodegraph.com	Waitlist^[1]

Technical Architecture

Three layers: extraction (tree-sitter ASTs plus per-language queries), storage (embedded SQLite with FTS5 — no database server, no vector store, no Docker), and sync (native OS file watchers with incremental re-indexing).^[1] The codebase is TypeScript, shipped as a self-contained binary.^[2]^[1]

curl -fsSL https://raw.githubusercontent.com/colbymchenry/codegraph/main/install.sh | sh
codegraph install   # wire MCP server into agents
codegraph init -i   # index the current project

The README is candid about static-analysis ceilings: framework-specific entry points (ASP.NET, Spring, Django, Drupal, Play) resolve at roughly 74–84% coverage because of reflection and convention-based dispatch, and true runtime polymorphism "remains a frontier." Pre-0.9 builds had SQLite lock contention on network shares and WSL2 mounts, addressed with a bundled WAL-mode runtime.^[1]

Key Technical Details

Aspect	Detail
Deployment	Local CLI + MCP server; index lives at `.codegraph/codegraph.db`^[1]
Model(s)	None — purely static analysis; no embeddings or LLM calls^[1]
Integrations	8 agents via MCP; install via curl script, PowerShell, or npm^[1]
Open Source	MIT, TypeScript, 47.4K stars as of June 2026^[2]

Strengths

Zero-dependency architecture — embedded SQLite instead of Neo4j or a vector database means no server to run, no API keys, no embedding costs; the entire index is one file in the repo directory.^[1]
Benchmarks with disclosed variance — the vendor publishes per-repo results including the unflattering ones (near break-even on response-heavy codebases), and an independent reviewer's measurements (70% median tool-call cut) landed at or above the published claims.^[1]^[8]
Genuinely broad surface for a five-month-old project — 21 languages and 8 agent integrations, with a self-contained installer that requires no Node.js.^[1]
Privacy by construction — static parsing on-device with nothing leaving the machine, an easier security review than any embedding-based indexer that ships code to an API.^[1]
Rapid release cadence — 10 releases in the 12 days ending June 2, 2026, with documented fixes landing against known issues like WSL2 lock contention.^[5]^[1]

Cautions

Solo-maintainer concentration — Colby McHenry has 391 of roughly 430 contributor commits; the next-largest contributor has 16. A 47K-star dependency on one person's continued attention is real bus-factor risk.^[2]
The star curve outruns the community footprint — 21K+ stars in one week, yet 115 watchers, 224 open issues, and no dedicated HN launch thread as of June 2026. This research found no substantiated botting allegation, but the mismatch between star velocity and engagement depth is atypical and worth noting.^[2]^[3]^[6]
Symbol graphs cannot answer fuzzy questions — if you don't know the symbol's name, the graph can't find it; semantic "where is the code that does X" queries are exactly where embedding-based rivals win.^[3]
Headline savings are vendor-reported and methodology-limited — the published benchmarks test one question per repo rather than realistic multi-question sessions, and small codebases (under 300 files) see limited value.^[3]^[1]
Pre-1.0 with a monetization question mark — v0.9.9 software, and the announced hosted platform means the free local tool may become the funnel for a commercial product whose terms are unknown.^[5]^[1]
Static-analysis ceilings — framework routing resolves at ~74–84% and runtime polymorphism is unsolved, so call graphs in reflection-heavy enterprise codebases will have holes.^[1]

What Developers Say

Despite the star count there is no dedicated Hacker News thread for this CodeGraph as of June 2026 — the Algolia results for the name are unrelated, smaller projects — so independent voice lives in individual reviews and r/ClaudeCode threads.^[6]

"The strongest pitch yet for symbol graphs as the structural layer beneath AI coding agents." — andrew.ooo, independent review, May 28, 2026^[3]

"No embeddings, no API keys, no Docker, no vector store." — andrew.ooo on the architecture^[3]

"Symbol graphs don't help with fuzzy questions. If you don't know what the symbol is called, the graph can't find it." — andrew.ooo, same review^[3]

"I gave Claude Code a map of my repo — CodeGraph killed 70% of its tool calls." — Chew Loong Nian, week-long test across four repositories, Level Up Coding^[8]

The andrew.ooo reviewer reports r/ClaudeCode users putting long-session savings at 30–50%, consistent with the README's median — but note that much of the surrounding content (Medium explainers, SEO guides) is enthusiast or vendor-adjacent rather than adversarial testing.^[3]^[8]

Pricing & Licensing

Tier	Price	Includes
Open source	Free	Full CLI, MCP server, all 21 languages, all 8 agent integrations, unlimited local use^[1]
CodeGraph platform	Not announced	Hosted offering; waitlist at getcodegraph.com^[1]

Licensing model: MIT — permissive, fork-friendly, no copyleft obligations.^[2]

Hidden costs: None today — no API keys or metered services. The real costs are operational: index maintenance on very large monorepos, gaps in reflection-heavy frameworks, and the risk that future development effort shifts toward the paid hosted platform.^[1]

Competitive Positioning

Direct Competitors

Competitor	Differentiation
Repomix	Repomix packs the whole repo into a single LLM-friendly file — context stuffing; CodeGraph is queryable structure, sending only the symbols and edges the agent asks for
Serena	Serena derives symbol understanding live from language servers (LSP) — deeper semantic accuracy, heavier runtime; CodeGraph pre-computes a static index into embedded SQLite with no language-server dependency
claude-context	Embedding-based semantic search over MCP — wins on fuzzy "find the code that does X" queries, but needs a vector store and embedding API; CodeGraph is exact-structure, fully offline
CodeGraphContext	Near-namesake with the same graph thesis, but built on Neo4j — a database server to operate; CodeGraph's single-file SQLite is the lighter deployment

When to Choose CodeGraph Over Alternatives

Choose CodeGraph when: your agent burns most of its budget on structural exploration of a large polyglot codebase, you know your symbols by name, and code must never leave the machine.
Choose Repomix when: the codebase fits comfortably in a context window and you want zero indexing infrastructure at all.
Choose Serena when: you want compiler-grade semantic precision (rename-safe references, type-aware navigation) and accept the language-server runtime.
Choose claude-context when: natural-language semantic search over the codebase matters more than exact call-graph traversal.

Ideal Customer Profile

Best fit:

Individual developers and teams running Claude Code, Cursor, or Codex against large (1K+ file) repositories where agent exploration dominates token spend
Polyglot codebases inside the 21 supported languages
Privacy-constrained environments — regulated industries, air-gapped machines — that rule out embedding-API indexers

Poor fit:

Small codebases (under 300 files), where the indexing overhead outweighs measured savings^[3]
Teams whose dominant query pattern is fuzzy semantic search rather than symbol-anchored navigation
Organizations that require a vendor with institutional backing, an SLA, or a contributor base deeper than one person

Viability Assessment

Factor	Assessment
Financial Health	Unfunded solo project; hosted-platform waitlist is the only visible monetization path^[1]
Market Position	Star leader in the code-graph-for-agents niche — 47.4K stars dwarfs every direct rival — but the position is five months old^[2]
Innovation Pace	Very high — 10 releases in 12 days, 21 languages and 8 integrations since January^[5]^[1]
Community/Ecosystem	Shallow relative to stars: ~91% single-maintainer commits, 115 watchers, 224 open issues, no HN thread^[2]^[6]
Long-term Outlook	Hinges on whether the maintainer converts star momentum into contributors and a sustainable platform before agent vendors build indexing in natively

The thesis is sound and the execution is fast, but the project's most-quoted number — the star count — is also its least informative. The meaningful signals are the architecture (embedded, local, dependency-free), the disclosed-variance benchmarks, and independent measurements that matched or beat the claims; the meaningful risks are one maintainer, pre-1.0 software, and a category that Anthropic, OpenAI, or Cursor could absorb into their agents directly.^[1]^[8]^[2]

Bottom Line

CodeGraph is the best-packaged version of an idea whose time has clearly come: give agents a pre-built structural map instead of letting them grep their way to one. The single-file SQLite design makes it the easiest graph indexer to adopt and the only one with nothing to operate, and the savings claims have survived at least one independent week-long test. The discount to apply is concentration risk — one maintainer, five months old, version 0.9.x, and a community whose depth does not yet match its star count.

Recommended for: Developers running MCP-capable agents on large, symbol-rich codebases who want measurable token and tool-call reductions with zero infrastructure and zero data egress.

Not recommended for: Small repos, fuzzy-search-dominant workflows, or organizations that cannot take a bus-factor-of-one dependency in their daily toolchain.

Outlook: Watch for a 1.0 release, second-maintainer commit share, terms of the hosted platform — and whether the extraordinary star velocity translates into the contributor and issue-triage depth a project this depended-upon will need.

Research by Ry Walker Research • methodology

Sources