Code Intelligence Tools for AI Agents Compared | Ry Walker Research

Key takeaways

The category went vertical in 2026: CodeGraph hit 47.4k stars within five months of its January launch — the biggest tool in the category — and GitNexus rocketed from ~1.2k to 42k stars between April and June, adding an enterprise tier. Code intelligence is no longer a niche.
Local-first graphs are the winning pattern. The two breakout leaders (CodeGraph's embedded SQLite graph, GitNexus's zero-server LadybugDB) both pre-compute structure on-device and serve it over MCP — no cloud, no embeddings API, no code egress.
The "agentic grep vs semantic index" debate turned empirical. May 2026 benchmarks favor semantic/indexed retrieval — an independent test measured 97% fewer Claude Code input tokens via grepai, independent reviews measured 58–70% fewer tool calls with CodeGraph, and Augment claims 70%+ agent quality gains — even as Anthropic itself still ships grep-only retrieval.
Context engines are unbundling from IDEs: Augment Code ($252M raised) spun its proprietary context engine out as a standalone MCP server in February 2026, and Zilliz ships claude-context as a funnel to its vector cloud. The durable primitive is the index, not the agent.

FAQ

What is code intelligence for AI agents?

Tools that give AI coding agents structural understanding of a codebase — dependencies, call chains, blast radius, symbols, semantic retrieval — so they can make informed edits instead of blind changes. Ranges from lightweight context packing to full knowledge graph engines and commercial context engines.

Which code intelligence tool should I use with Claude Code?

CodeGraph (MIT, 47.4k stars) is the easiest local graph to adopt — one SQLite file, eight agent integrations. GitNexus has the deepest Claude Code integration (16 MCP tools + skills + hooks) but a noncommercial license. Serena is the standard LSP-backed symbol layer. For lightweight context, Aider's built-in repo-map or Repomix work without extra setup.

Do I need a knowledge graph or is context packing enough?

For small repos (under 10k files), context packing tools like Repomix often suffice. For large codebases with complex dependency chains, a knowledge graph (CodeGraph, GitNexus, CodeGraphContext) provides blast radius analysis and impact detection that flat context cannot. Semantic search tools (claude-context, grepai) are the middle path for fuzzy "where is the code that does X" retrieval.

Are these tools safe to use with proprietary code?

CodeGraph, GitNexus, Serena, grepai, Aider repo-map, and Repomix all run entirely local. claude-context sends code chunks to a cloud embedding API and vector store by default (self-hosted Milvus + Ollama avoids it). Augment's remote mode hosts your index on its cloud; Sourcegraph Cody and Greptile have cloud components. Always check the data flow before indexing proprietary code.

Executive Summary

AI coding agents have a structural awareness problem. They can read code, generate code, and even reason about code — but they routinely break things because they do not understand how code connects. An agent edits a function without knowing that 47 other functions call it. It renames a class without tracing the import chain. It refactors a module without checking the blast radius.

Code intelligence tools solve this by building a structural understanding layer — knowledge graphs, symbol indexes, semantic search, dependency maps — and exposing it to agents via MCP, CLI, or API. In the three months since this category was first mapped, it went vertical: the two graph leaders now hold 47.4k and 42k GitHub stars between them, a well-funded vendor unbundled its context engine as a standalone product, and the retrieval debate acquired actual benchmarks.

Key Findings:

The category went vertical — CodeGraph launched January 18, 2026 and hit 47.4k stars in under five months (the biggest tool in the category); GitNexus broke out from ~1.2k stars in April to 42k by June and added an enterprise/commercial tier via Akon Labs
Local-first graphs are the winning pattern — both breakouts pre-compute structure entirely on-device (CodeGraph: embedded SQLite + tree-sitter; GitNexus: LadybugDB, native or in-browser WASM) and serve it over MCP with zero code egress
The "agentic grep vs semantic index" debate turned empirical — May 2026 measurements favor indexed retrieval: 97% fewer Claude Code input tokens (grepai, independently benchmarked), 58–70% fewer tool calls (CodeGraph, vendor + independent), 88% fewer tool calls in a 17-agent production audit (GitNexus), and 70%+ agent quality gains (Augment, vendor-run)
Context engines are unbundling from IDEs — Augment Code ($252M raised, $977M valuation) shipped its proprietary engine as a standalone MCP server in February 2026; Zilliz maintains claude-context (11.8k stars) as a funnel to its vector cloud
The symbol-level tier has a standard — Serena (25.2k stars, MIT, LSP-over-MCP) was a pre-existing omission from this comparison and is the default answer for symbol-level retrieval and editing

Market Definition

Code intelligence tools for AI agents are systems that give coding agents structural understanding of a codebase — beyond what raw file reading provides — so agents can make informed, safe edits.

Inclusion Criteria:

Provides structural or semantic code understanding (dependencies, call chains, symbols, semantic retrieval, or comprehensive context)
Designed to work with AI coding agents (MCP, API, or agent-native integration)
Active development (updates in last 6 months) — flagged where status has changed

Exclusion Criteria:

Pure code editors/IDEs without dedicated intelligence layers
Static analysis tools that only report findings without agent integration
Documentation generators without structural analysis

Tier 1: Knowledge Graph Engines

Build a full graph of codebase relationships — every import, call, definition, extension. Expose the graph via MCP for agents to query before making changes. This tier produced both 2026 breakouts.

Market Map

Tool	Stars	Created	Language	License	Status	Key Differentiator
CodeGraph	47,413	Jan 2026	TypeScript	MIT	Very active (pre-1.0)	Biggest in category. Local SQLite symbol/call graph over MCP. 21 languages, 8 agent integrations, file-watcher incremental sync
GitNexus	41,958	Aug 2025	TypeScript	PolyForm NC	Very active	Deepest MCP integration (16 tools incl. cross-repo groups). Zero-server. New enterprise/commercial tier via Akon Labs
CodeGraphContext	3,702	Aug 2025	Python	MIT	Active	Pluggable graph backends (FalkorDB Lite, KuzuDB, Neo4j). 22 languages, ~31k PyPI downloads/month
Axon	711	Feb 2026	Python	MIT	Stalled — no commits since Mar 25, 2026	Best visualization: WebGL force-directed graph, coupling heatmaps, health scores

What Makes This Tier Different

Knowledge graph engines do not just search code — they model structural relationships:

IMPORTS — which modules depend on which
CALLS — which functions call which functions
DEFINES/IMPLEMENTS/EXTENDS — class hierarchies and interface contracts
Clusters — functional groups detected via community algorithms (Leiden)
Processes — execution flows traced through call chains

This enables capabilities that text search cannot provide:

Blast radius analysis — "if I change this function, what breaks?"
Impact detection — "these git changes affect these execution flows"
Safe rename — "rename this symbol across all 23 files that reference it"
Execution tracing — "trace this request from API endpoint to database query"

CodeGraph vs GitNexus vs CodeGraphContext

Dimension	CodeGraph	GitNexus	CodeGraphContext
Stars (Jun 2026)	47.4k	42k	3.7k
License	MIT	PolyForm Noncommercial	MIT
Language	TypeScript	TypeScript	Python
Database	Embedded SQLite (FTS5)	LadybugDB (custom)	Pluggable: FalkorDB Lite, KuzuDB, Neo4j, more
Incremental updates	Yes — OS-native file watchers	Roadmap (chunk cache only)	Re-index
MCP depth	8 agent integrations	16 tools, 7 resources, skills, Claude Code hooks	MCP server + CLI
Benchmarked savings	58% fewer tool calls (vendor); 70% (independent)	88% fewer tool calls, 74% token savings (production audit)	Project-reported only
Maintainer base	Solo (~91% of commits)	Single core maintainer, 23 contributors/cycle	Community, small
Commercial use	Free (MIT); hosted platform on waitlist	Requires separate license (Akon Labs)	Free (MIT)

Bottom line: CodeGraph is the lightest to adopt (one SQLite file, MIT) and now the category's star leader; GitNexus has the deepest agent integration and the only enterprise track, but the noncommercial license drove at least one team (LangWatch) to MIT-licensed CodeGraphContext. Both breakouts carry solo-maintainer concentration risk, and both maintainers acknowledge star velocity outrunning community depth.

Tier 2: Symbol-Level and Semantic Code Search

Lighter than a full knowledge graph. These tools provide symbol navigation, semantic code search, and focused analysis via MCP — without modeling every structural relationship.

Market Map

Tool	Stars	Created	Language	License	Key Differentiator
Serena	25.2k	Mar 2025	Python	MIT	The symbol-level standard: LSP-over-MCP retrieval and editing/refactoring, 40+ languages, 170+ contributors
claude-context	11.8k	Jun 2025	TypeScript	MIT	Zilliz's hybrid BM25 + vector semantic search; AST chunking, Merkle-tree incremental indexing
grepai	1,734	Jan 2026	Go	MIT	Privacy-first semantic search + call-graph tracing; 100% local via Ollama embeddings
Octocode MCP	863	Jun 2025	TypeScript	MIT	14 MCP tools. LSP navigation, PR archaeology, GitHub multi-repo research, ~16k npm downloads/month
CodePathFinder	137	Nov 2023	Go	Apache-2.0	Security-focused: cross-file taint analysis, 211 rules, relicensed from AGPL
mcp-vector-search	47	Aug 2025	Python	Elastic 2.0	LanceDB semantic search + knowledge graph, complexity analysis, dead-code detection

The New Entrants

Serena is a pre-existing omission corrected, not a new arrival: created March 2025 by Munich-based Oraios AI, it wraps language servers and exposes their semantics as MCP tools — find_symbol, find_referencing_symbols, replace_symbol_body, project-wide rename. It is the only widely-adopted tool in this category that does symbol-level editing, not just retrieval, and at 25.2k stars it is the default mention in agent-tooling threads. The live debate is whether improving first-party agent tools erode its value.

claude-context (also a pre-existing omission, created June 2025) is Zilliz's open-source flagship: hybrid BM25 + dense-vector search over AST-chunked code, stored in Milvus or Zilliz Cloud, with Merkle-tree incremental re-indexing. It is the most-cited semantic code-search MCP server — and unapologetically a funnel to Zilliz's managed vector database, with code chunks leaving the machine by default.

grepai (January 2026, French solo maintainer Yoan Bernabeu) is the privacy-first counterpoint: embeddings run 100% locally through Ollama by default, with call-graph tracing and a file-watcher daemon in a single Go binary. Its headline claim has rare independent verification — a third-party benchmark measured a 97% reduction in Claude Code input tokens and 27.5% lower API cost.

Grep vs Semantic Index: The Debate Got Data

Anthropic ships grep-only retrieval in Claude Code after reportedly finding grep "just worked better" — and the existence of this entire tier is a bet against that position. As of May–June 2026 the bet has numbers: grepai's independently benchmarked 97% input-token cut, CodeGraph's independently measured 70% median tool-call reduction, GitNexus's production-audited 88% fewer tool calls, and Augment's vendor-run 70%+ quality claims all point the same direction — indexed retrieval beats raw agentic grep on token economics for non-trivial codebases. The honest caveats: the strongest numbers are tool-specific, several are vendor-reported, and grep still wins on zero setup and exact-pattern speed.

When to Use Tier 2 vs Tier 1

Tier 2 tools are best when you need:

Quick setup — no full-graph indexing step (Serena, Octocode)
Fuzzy retrieval — "where is the code that does X" via embeddings (claude-context, grepai)
Symbol-level editing — atomic rename/replace through the same semantic layer (Serena)
Cross-repo search — query across GitHub orgs, not just local repos (Octocode)
Security analysis — taint analysis and vulnerability rules (CodePathFinder)

They are weaker when you need:

Full dependency chain tracing
Blast radius analysis with confidence scoring
Execution flow mapping
Community/cluster detection

Tier 3: Context Packing

The simplest approach: flatten your codebase into a single LLM-friendly format. No graph, no database — just comprehensive context in one prompt.

Market Map

Tool	Stars	Created	Language	Status	Key Differentiator
Repomix	26,188	Jul 2024	TypeScript	Active	XML-structured output. Tree-sitter compression (~70% token reduction). ~255k npm downloads/month
Context Hub	13,556	Mar 2026	TypeScript	Active, slowing	Andrew Ng's curated, versioned API docs CLI for coding agents — 622 doc entries
code2prompt	7,399	Mar 2024	Rust	Stable, slowing	Fast CLI. Handlebars templates, Python bindings; no tagged release since Dec 2025
Aider repo-map	(built-in)	2023	Python	Active	Tree-sitter tag map. Dynamically optimized per chat context

The Context Packing Philosophy

These tools take the opposite approach from knowledge graphs: instead of building a queryable structure, they pack everything the LLM might need into a single context window.

Repomix is the category leader (26.2k stars, ~255k npm downloads/month). It packs entire repos into XML-structured files optimized for Claude's XML parsing; Tree-sitter compression cuts tokens by ~70% while preserving structure, and the April 2026 v1.14.0 release cut pack time 58% in response to speed criticism. It also has an MCP server for dynamic packing.

code2prompt remains the solid second — Rust-fast, with a template system and Python bindings Repomix lacks — but it is slowing: 7.4k stars to Repomix's 26.2k, with no tagged release since December 2025 despite continued commits. The adoption gap is widening, not closing.

Aider's repo-map is the most sophisticated built-in approach. It uses Tree-sitter to extract a tag map of all definitions and references, then dynamically selects the most relevant context for each chat. It is not a separate tool — it is integrated into Aider's agent loop.

Context Hub packs a different kind of context: curated, versioned API documentation rather than your own repo, attacking the hallucinated-API problem. It grew from 68 docs at its March 2026 launch to 622 entries and 13.5k stars by June, though release cadence has cooled.

Limitations

Context packing breaks down at scale:

Token limits — even with compression, large monorepos exceed context windows
No structural queries — you cannot ask "what calls this function?" without the graph
No blast radius — changing code requires understanding that flat context does not provide
Stale context — packed files are snapshots, not live indexes

Tier 4: Platforms and Commercial Context Engines

Enterprise, cloud-hosted, and paid code intelligence with AI integration. The 2026 development: dedicated context engines unbundling from the IDEs and assistants that built them.

Market Map

Tool	Type	Key Differentiator
Augment Context Engine	Commercial MCP (closed source)	Augment Code's semantic engine unbundled as MCP, GA Feb 2026. Local + hosted cross-repo modes; vendor claims 70%+ agent quality gains. $252M-funded parent
Sourcegraph Cody	Enterprise SaaS	Code search + intelligence + Cody AI. RAG over entire codebase
DeepWiki	Free cloud tool	AI-generated documentation for any public GitHub repo
Greptile	YC-backed SaaS	AI code review with full codebase context. GitHub/GitLab/Bitbucket

The Unbundling

Augment Context Engine is the signal event: a $977M-valuation coding-assistant vendor conceded the agent layer to Claude Code and Cursor and shipped its real asset — the semantic index — as a standalone MCP server any agent can call. Two modes (local Auggie CLI with real-time indexing; Augment-hosted cross-repo index via GitHub App), token-based pricing with a 40% service fee, and vendor-run benchmarks claiming 70%+ quality improvement (80% for Claude Code + Opus 4.5) on 300 Elasticsearch PRs. The numbers are self-reported and unreplicated, but the strategy — context as the durable primitive, agents as interchangeable consumers — is the clearest articulation yet of where this category is heading.

Platform vs Open Source

Dimension	Platform / Commercial (Tier 4)	Open Source (Tiers 1-3)
Setup	Minutes (cloud)	Manual indexing
Privacy	Code or index goes to cloud (Augment local mode excepted)	Everything local (claude-context's default excepted)
Scale	Massive monorepos, cross-repo	Varies by tool
Cost	Paid plans / per-query fees	Free
Customization	Limited	Full control
Agent integration	MCP (Augment, Cody) or API	MCP-native

DeepWiki deserves special mention: it generates readable documentation and architecture diagrams for any public GitHub repo. GitNexus positions itself as "like DeepWiki but deeper" — DeepWiki describes code in natural language while GitNexus models structural relationships. They solve different problems.

Greptile is the most agent-oriented review platform — it indexes your entire codebase and uses that context for AI code review on every PR. YC-backed, growing fast in the enterprise segment.

Technical Comparison

Dimension	Knowledge Graphs	Symbol/Semantic Search	Context Packing	Platforms/Commercial
Structural awareness	Full (relationships, clusters, flows)	Partial (symbols, references, semantics)	None (flat text)	Varies
Blast radius	Yes	Limited (call-graph tracing in grepai)	No	Sourcegraph: partial
Fuzzy semantic retrieval	Limited (CodeGraph: no)	Yes (claude-context, grepai, mcp-vector-search)	No	Yes (Augment)
Setup effort	Low–medium (indexing step)	Low (MCP config; claude-context needs vector DB)	Low (CLI)	Low (cloud)
Privacy	Local	Local (claude-context cloud by default)	Local	Cloud (Augment has local mode)
Real-time updates	CodeGraph: file watchers; GitNexus: re-index	Serena: live LSP; claude-context: Merkle incremental; grepai: watcher	Re-pack required	Continuous sync
Agent integration	MCP (deep)	MCP	Prompt injection / MCP	MCP or API
Best for	Complex refactors, impact analysis	Navigation, fuzzy search, symbol editing	Small/medium repos, one-shot context	Enterprise teams, cross-repo

Competitive Dynamics

What Is Driving the Category

MCP as the standard. Every tool that wants to serve AI agents needs MCP support. The Cambrian explosion of late 2025/early 2026 has matured into a layered stack: graphs, symbols, semantics, packing — all over the same transport.
Token economics as the buying trigger. The 2026 benchmarks reframed the pitch from "fewer broken edits" to "measurably cheaper sessions" — 97% input-token cuts and 58–88% tool-call reductions translate directly into more tasks per capped plan.
The unbundling. Augment proved that a context engine can be a product independent of the assistant that built it. Expect more vendors to sell the index and let Claude Code and Cursor keep the seat.
The IDE convergence. Cursor, Windsurf, and Claude Code are all building native code intelligence. Every tool in this category is racing the possibility that agents absorb its function — the sharpest community criticism of Serena is precisely that Claude Code's native tools got better.

The Concentration Problem

The category's new scale sits on remarkably thin foundations. CodeGraph is ~91% one person's commits at 47.4k stars; GitNexus's core decisions rest with one maintainer; grepai, Octocode, CodePathFinder, and mcp-vector-search are all solo projects. Both breakout star curves also outrun their community footprints — CodeGraph has 115 watchers and no HN launch thread, and GitNexus's maintainer concedes some growth may be bot-driven. Treat star counts as awareness signals, not durability signals.

MCP provides the transport layer but not the semantic layer. A "code intelligence protocol" that standardizes how agents query codebase structure — independent of the backend engine — would unlock composability. Nobody has shipped it.

What to Watch

Near-term (H2 2026)

Whether CodeGraph ships 1.0, grows a second maintainer, and reveals hosted-platform terms — and whether its star velocity converts into contributor depth
GitNexus's commercial execution: public pricing and real enterprise references for the Akon Labs track, plus incremental indexing (still roadmap) and the documented heap-OOM scaling fixes
Independent replication of Augment's 70%+ benchmark — the whole paid tier's value proposition rests on it
Whether Anthropic ships first-party semantic retrieval in Claude Code, which would validate the semantic-index thesis while absorbing much of the niche
Axon: commits resume or it becomes a finished artifact

Medium-term (2026-2027)

Knowledge graph tools merge with agent memory — understanding codebase structure plus remembering past changes and decisions
Cross-repo intelligence matures (GitNexus group tools and Augment's hosted mode are the first real entries)
IDE-native intelligence narrows the gap, but complex refactors still need dedicated tools

Long-term (2027+)

Real-time, always-current code graphs become the norm (CodeGraph's file watchers and claude-context's Merkle sync are the early pattern)
Structural awareness becomes an expected capability, not a separate tool
The winners will be whoever defines the standard semantic layer for code intelligence

Bottom Line

Code intelligence for AI agents graduated from promising niche to breakout category in the spring of 2026: two tools north of 40k stars, a $252M-funded vendor unbundling its engine into the space, and benchmarks that finally put numbers on the grep-vs-index debate. The landscape is stratified:

Need blast radius and impact analysis? → Knowledge graph: CodeGraph (MIT, biggest, zero-dependency SQLite) or GitNexus (deepest MCP integration, enterprise tier — but PolyForm Noncommercial). CodeGraphContext is the pluggable-backend MIT alternative.
Need symbol-level navigation and editing? → Serena (25.2k stars, the de facto standard)
Need fuzzy semantic search? → claude-context (cloud-indexed, monorepo scale) or grepai (100% local, independently benchmarked)
Need to pack a repo into a prompt? → Repomix (26.2k stars, ~255k npm downloads/month, category leader)
Enterprise team with cloud budget? → Augment Context Engine for cross-repo retrieval, Sourcegraph Cody, or Greptile

Status flags from this refresh: Axon is stalled (no commits since March 25, 2026), code2prompt is stable but slowing (no release since December 2025), and Context Hub's post-launch cadence has cooled. Everything else in the category is shipping.

The biggest risk is unchanged in kind but larger in degree: IDE-native intelligence (Cursor, Claude Code, Windsurf) absorbing the standalone market — now with the added twist that Anthropic's grep-only stance is the position the entire semantic tier exists to refute. The concentration risk is new: the category's two 40k-star leaders are effectively solo-maintained.

The gap that closed: incremental, real-time updates — last cycle's "biggest gap" — is now table stakes at the front of the pack (CodeGraph's file watchers, claude-context's Merkle sync, grepai's daemon, Serena's live LSP). The gap that remains: a standard semantic layer, so agents can query any engine the same way. The first to define it wins the category.

Research by Ry Walker Research • methodology

Sources