← Back to research
·12 min read·company

grepai

grepai is a privacy-first, MIT-licensed Go CLI for semantic code search and call-graph tracing — embeddings run 100% locally via Ollama by default, and an MCP server exposes it to Claude Code and Cursor. 1.7K+ GitHub stars and 50 releases within five months of its January 2026 launch.

Key takeaways

  • 1.7K+ GitHub stars, 140 forks, and 50 releases in roughly five months — launched January 9, 2026 by French solo maintainer Yoan Bernabeu, with v0.35.0 shipped March 16, 2026
  • The pitch is token economics for AI agents: an independent benchmark measured a 97% reduction in Claude Code input tokens and 27.5% lower API cost when the agent searches through grepai instead of reading files
  • 100% local by design — embeddings run through Ollama by default (LM Studio and OpenAI optional), so code never leaves the machine unless you opt into a cloud embedding provider
  • Ships both an MCP server and a companion repo of 27 agent skills, making it a drop-in retrieval layer for Claude Code, Cursor, and Windsurf

FAQ

What is grepai?

grepai is an open-source CLI that indexes a codebase with vector embeddings so developers and AI agents can search code by meaning ("authentication logic") rather than exact text, and trace call graphs before changing a function.

How much does grepai cost?

It is free and MIT-licensed; the only costs are local compute for the Ollama embedding model (the default) or API fees if you opt into OpenAI embeddings.

Does grepai send my code to the cloud?

No — by default embeddings are generated locally via Ollama (or LM Studio), and the index lives on your machine; code only leaves your machine if you explicitly configure the OpenAI embedding provider.

How is grepai different from mcp-vector-search?

Both give agents local semantic code search over MCP, but grepai adds call-graph tracing ("who calls this function"), a file-watcher daemon that keeps the index fresh, and ships as a single Go binary rather than a Python package.

Executive Summary

grepai bills itself as "grep for the AI era": a privacy-first CLI that indexes a codebase with vector embeddings so queries match meaning instead of text — ask for "authentication logic" and it finds handleUserSession — and that traces call graphs so an agent knows who calls a function before changing it.[1][2] Everything runs locally: embeddings come from Ollama by default (LM Studio and OpenAI are alternatives), the index lives on disk, and a file-watcher daemon keeps it fresh as code changes.[1] The headline use case is token economics — exposed as an MCP server, grepai hands Claude Code or Cursor a handful of relevant snippets instead of letting the agent grep and read raw files.[1]

The project is young and moving fast: created January 9, 2026 by French developer Yoan Bernabeu, it reached 1,734 GitHub stars, 140 forks, and 50 releases by June 2026, with v0.35.0 shipped March 16, 2026 and commits landing as recently as June 8.[1] The strongest independent validation is a third-party benchmark showing a 97% reduction in Claude Code input tokens (51,147 → 1,326) and a 27.5% API-cost reduction on a real task.[3] A companion repo ships 27 agent skills for grepai workflows, and the tool has drawn coverage from LinuxLinks and a Product Hunt feature.[4][5][6]

AttributeValue
CreatorYoan Bernabeu — lead developer and YouTuber, Drôme, France[1]
FoundedJanuary 9, 2026 (repo created)[1]
FundingNone disclosed; independent open-source project[1]
GitHub Stars1.7K+ (1,734), 140 forks, as of June 2026[1]
LicenseMIT[1]
Latest Releasev0.35.0 (March 16, 2026); 50 releases total[1]

Product Overview

The core loop is three commands: grepai init in a project, grepai watch to start the indexing daemon, then grepai search "error handling" for semantic queries or grepai trace callers "Login" for call-graph questions.[1] Results are meant to be consumed by both humans at a terminal and AI agents over MCP — the README's framing is that grepai "drastically reduces AI agent input tokens by providing relevant context instead of raw search results."[1]

The independent benchmark behind that claim: on a Claude Code task, routing code retrieval through grepai cut input tokens 97% (51,147 → 1,326), cache-creation tokens 71% (563,883 → 162,289), eliminated subagent spawning entirely, and reduced API cost from $6.78 to $4.92 — a 27.5% saving.[3]

Key Capabilities

CapabilityDescription
Semantic searchNatural-language queries over vector embeddings; finds conceptually related code across naming conventions[1]
Call-graph tracinggrepai trace callers/callees — who calls a function before you change it[1]
File watcherDaemon keeps the index up to date automatically as files change[1]
MCP serverAgents call grepai directly as a tool; works with Claude Code, Cursor, Windsurf out of the box[1]
Local-first privacyCode never leaves the machine with the default Ollama backend[1][5]
Shell completionZsh, Bash, Fish, PowerShell, including dynamic values (workspaces, providers, backends)[1]
Agent skillsCompanion grepai-skills repo with 27 skills for search, tracing, and troubleshooting workflows[4]

Product Surfaces

SurfaceDescriptionAvailability
CLIinit / watch / search / trace; Homebrew, install script, PowerShell installerGA[1]
MCP serverTool interface for Claude Code, Cursor, WindsurfGA[1]
Agent skills27 packaged skills (separate MIT repo, January 28, 2026)GA[4]

Technical Architecture

grepai is written in Go and distributed as a single binary via Homebrew tap, curl installer, or PowerShell script.[1] It requires an embedding provider: Ollama is the default and recommended path (ollama pull nomic-embed-text), with LM Studio as a second local option and OpenAI as the cloud opt-out from the privacy guarantee.[1] The index is stored locally; an independent reviewer measured roughly 100–500MB of index for a medium project of around 10,000 files.[3] Call-graph tracing is language-specific, covering Go, TypeScript, JavaScript, Python, PHP, Java, C, C#, C++, Rust, and Zig per the same review.[3]

brew install yoanbernabeu/tap/grepai
ollama pull nomic-embed-text
grepai init && grepai watch
grepai search "user authentication flow"

Key Technical Details

AspectDetail
DeploymentLocal CLI + daemon; macOS, Linux, Windows[1]
Model(s)Ollama nomic-embed-text by default; LM Studio or OpenAI embeddings optional[1]
IntegrationsMCP server for Claude Code, Cursor, Windsurf; shell completions; 27 agent skills[1][4]
Open SourceMIT; 1.7K+ stars, 140 forks, 50 releases as of June 2026[1]

Strengths

  • Independently benchmarked token savings — a third-party test measured 97% fewer Claude Code input tokens and 27.5% lower API cost on a real task, the rare case where a tool's headline claim has outside verification.[3]
  • Genuine local-first privacy — the default Ollama backend means code and embeddings never leave the machine, a hard requirement for many enterprise and regulated environments that cloud-indexing competitors cannot meet.[1][5]
  • Search plus structure in one binary — pairing semantic search with call-graph tracing covers both "find the concept" and "what breaks if I change this," where most rivals do only one.[1]
  • High shipping velocity — 50 releases in roughly five months, with pushes continuing into June 2026 and a 27-skill companion ecosystem.[1][4]
  • Low-friction adoption — Homebrew/script/PowerShell installers, a three-command quick start, and out-of-the-box MCP wiring for the major agent CLIs.[1]

Cautions

  • Solo-maintainer risk — the project is one developer's work with no disclosed funding or organization behind it; bus factor is the standard caveat for infrastructure you wire into every agent session.[1]
  • Young codebase with a sizable issue queue — created January 2026, with 88 open issues against it by June 2026; expect rough edges typical of a five-month-old tool.[1][7]
  • Release cadence has cooled — after 50 releases, the latest tag (v0.35.0) dates to March 16, 2026; commits continued through June 8, but the tagged-release pace of the first two months has not held.[1]
  • Embedding dependency adds setup and compute — usefulness depends on running Ollama (or LM Studio) locally, and the index can occupy 100–500MB for a medium repo; plain grep/ripgrep remains faster for exact-pattern lookups.[1][3]
  • Call-graph coverage is language-bound — tracing supports a fixed list of ~11 languages; codebases outside it get search but not structure.[3]
  • Thin independent scrutiny — no substantive Hacker News thread exists as of June 2026, and the README's testimonials are vendor-curated, so community evidence skews toward the project's own channels.[8][1]

What Developers Say

Community discussion is concentrated in a January 2026 r/ClaudeAI launch thread (the README claims 280K+ views for it) and scattered social posts; there is no substantive Hacker News thread as of June 2026, and the quotes below from Reddit and X circulate via the project's own README, so treat the curation accordingly.[9][1][8]

"I just hit my limit and it took 13% of my max5 plan just to read my codebase. I am very, very excited about your new tool." — u/911pleasehold on r/ClaudeAI[9]

"It works great! Takes 5 minutes to install. Crazy!" — @LesSaleGeek on X, as reproduced in the project README[1]

"GrepAI's semantic understanding finds conceptually related code that IDE search would miss." — Richard Joseph Porter, independent benchmark review[3]

The most balanced independent assessment — Porter's — also notes traditional grep stays faster for simple pattern matching and that call-graph tracing only covers specific languages; substantive critical discussion beyond that has not yet materialized as of June 2026.[3][8]


Pricing & Licensing

TierPriceIncludes
Open sourceFreeFull CLI, MCP server, call-graph tracing, file watcher, all 27 companion skills[1][4]

Licensing model: MIT for both grepai and grepai-skills; no commercial tier, hosted service, or paid support exists.[1][4]

Hidden costs: local compute for the Ollama embedding model and 100–500MB of index storage on medium repos — or OpenAI API fees (and the loss of the privacy guarantee) if you choose the cloud embedding backend.[1][3]


Competitive Positioning

Direct Competitors

CompetitorDifferentiation
MCP Vector SearchThe closest analogue — local semantic code search over MCP; grepai adds call-graph tracing, a file-watcher daemon, and ships as a single Go binary rather than a Python package
claude-contextIndexes codebases into a vector database for Claude Code; grepai's default path keeps embeddings and index entirely local with no external database dependency
CodeGraphLeads with graph-structured code understanding; grepai leads with embedding search and treats the call graph as a lighter companion feature
grep / ripgrepExact text and regex matching — faster for known patterns, no setup; grepai trades startup cost for meaning-based retrieval[1]

When to Choose grepai Over Alternatives

  • Choose grepai when: code cannot leave the machine, you want semantic search and caller/callee tracing from one binary, and the goal is cutting agent token burn in Claude Code or Cursor.[1][3]
  • Choose MCP Vector Search when: you live in a Python toolchain and only need semantic retrieval without call graphs.
  • Choose claude-context when: you want vector-database-backed indexing integrated with Claude Code and local-only operation is not a hard requirement.
  • Choose CodeGraph when: structural relationships across the codebase matter more than natural-language retrieval.
  • Choose ripgrep when: you know the exact string and want it in milliseconds with zero setup.

Ideal Customer Profile

Best fit:

  • Heavy Claude Code / Cursor users on usage-capped plans, where the benchmarked 97% input-token reduction translates directly into more tasks per limit[3]
  • Teams in regulated or IP-sensitive environments that need agent-grade code retrieval without any code leaving the machine[1]
  • Developers on large polyglot repos in the supported call-graph languages who want impact analysis before refactors[3]

Poor fit:

  • Organizations requiring vendor SLAs, paid support, or a corporate maintainer behind core tooling
  • Codebases in languages outside the call-graph list that mainly need structural analysis
  • Workflows where exact-pattern search suffices — ripgrep is faster and needs no daemon or embedding model[3]

Viability Assessment

FactorAssessment
Financial HealthN/A — unfunded solo open-source project; zero revenue dependency but also zero institutional backing[1]
Market PositionEarly mover in local-first agent code search; 1.7K+ stars in five months is real, but the niche is crowding fast[1]
Innovation PaceHigh — 50 releases since January 2026, though tagged releases paused after v0.35.0 in March while commits continue[1]
Community/EcosystemGrowing — 140 forks, a 27-skill companion repo, LinuxLinks and Product Hunt coverage, an active Reddit launch audience; 88 open issues and no HN presence temper it[1][4][5][6][8]
Long-term OutlookHinges on one maintainer's stamina and on whether agent CLIs ship comparable retrieval natively[1]

The traction-to-age ratio is strong for a solo project — 1,734 stars, 140 forks, and a verified third-party benchmark inside five months — and MIT licensing means the work survives even if maintenance stalls.[1][3] The structural risk runs the other direction: semantic retrieval is an obvious feature for Claude Code, Cursor, and their peers to absorb first-party, which would compress grepai's reason to exist to its privacy guarantee and call-graph tracing.[1]


Bottom Line

grepai is the rare agent-tooling project whose headline claim — slashing Claude Code's input tokens — has independent benchmark verification, and its local-first Ollama default makes it one of the few semantic code-search options viable where code cannot leave the machine. The trade is depending on a five-month-old, solo-maintained binary with 88 open issues and a release cadence that has already slowed from its launch sprint.

Recommended for: capped-plan Claude Code and Cursor users burning tokens on codebase reads; privacy-constrained teams that need semantic retrieval plus call-graph tracing fully offline.

Not recommended for: organizations needing a corporate maintainer or SLA; repos outside the supported call-graph languages that need structural analysis; anyone for whom ripgrep already answers the question.

Outlook: Watch whether the maintainer sustains cadence past v0.35.0, whether the issue queue gets worked down, and whether first-party retrieval in the major agent CLIs erodes the niche before the project builds a contributor base.


Research by Ry Walker Research • methodology