AI Engineering Intelligence Tools | Ry Walker Research

Key takeaways

The category now splits three ways on attribution: capture-at-commit-time (Oobo), model-based detection (Span's span-detect-1), and post-hoc heuristics/derived signals (everyone else)
DX's AI Measurement Framework has become the category's reference vocabulary, while Span shipped the only proprietary AI-code detection model — both were missing from the April version of this report
The incumbents moved fast: LinearB now ships labeling-based AI attribution (the April "none" rating no longer holds), Jellyfish's AI Impact covers assistants, agents, and AI code-review tools with spend tracking
GitHub's first-party Copilot metrics (GA Feb 2026, adoption cohorts May 2026) are squeezing the usage-dashboard value prop — outcome correlation is where the independents live now

FAQ

Why do teams need AI engineering intelligence?

As AI writes more code, engineering leaders need to know: Is AI actually improving productivity? Is AI-generated code maintaining quality? Which teams are adopting AI tools effectively? Traditional git analytics can't answer these questions.

What's the difference between capture-time, model-based, and post-hoc attribution?

Capture-time tools (Oobo) record which AI session produced which code at commit time — ground truth. Model-based detection (Span) classifies code chunks with a trained detector. Post-hoc tools (GitClear, Exceeds AI) infer AI involvement from diffs and patterns; survey/derived approaches (DX, Jellyfish) combine vendor APIs with developer self-reports.

Which tool should an enterprise start with?

DX if you want the measurement framework large enterprises are standardizing on. Exceeds AI or Milestone for the fastest executive ROI story. GitClear for deep code-quality metrics. Span for native detection without per-developer installs. Oobo for ground-truth attribution if you can manage local installs.

The Problem

AI coding tools changed how software gets written. Cursor, Claude Code, Copilot, Codex — engineers are shipping more code faster. But engineering leaders are left asking basic questions: How much of our code is AI-generated? Is it any good? Are we getting ROI on our AI tool spend?

Traditional engineering analytics platforms were built for a world where humans wrote all the code. They measure velocity, cycle time, and DORA metrics — useful, but blind to the AI contribution layer. A category of tools has emerged to fill this gap, and since April it has both expanded (DX, Span, Jellyfish join this report) and professionalized (the incumbents shipped real AI features).

The Market Map

The landscape splits along two axes: data collection method (how they learn about AI involvement) and primary audience (developers vs. engineering leadership vs. C-suite).

Tool	Data Source	AI Attribution	Deployment	Primary Audience	Stage
Oobo	Local capture at commit time	✅ Ground-truth (session linking, per-line)	CLI per developer + hosted	AI-heavy teams	v1.0 (May 2026), Techstars
Span	Git + tickets + agent traces	✅ Model-based (span-detect-1, chunk-level) + line-level traces	SaaS	Mid-to-large orgs	$25M Seed+A (Nov 2025)
GitClear	Git history analysis	✅ Line-level Diff Delta + AI-vendor APIs	SaaS / on-prem	Developers + managers	Bootstrapped, est. 2018
Exceeds AI	PR/commit diff analysis	✅ Heuristic multi-signal	SaaS (GitHub/GitLab app)	Engineering leadership	Seed $4.6M (Venrock)
Milestone	Git + PM + org + AI tool APIs	✅ Claimed >90% accuracy	SaaS / on-prem	C-suite / VPE	$10M seed (Nov 2025)
DX	System data + surveys	✅ Framework-level (utilization, impact, cost)	SaaS	Enterprise leadership	Growth (ICONIQ, 300+ customers)
Jellyfish	Git + PM + vendor APIs	⚠️ Derived signals + per-tool spend	SaaS	Executives + finance	Series C, $114M+ total
LinearB	Git + Jira + CI/CD	⚠️ Labeling-based (gitStream) + AI dashboards	SaaS	Managers	Series B, $71M total
Swarmia	Git + Jira + DX surveys	⚠️ PR-level detection, no code-level split	SaaS (GitHub app)	Team leads + VPE	Series A $11M (Jun 2025)
Faros AI	100+ integrations aggregated	⚠️ Cohort/baseline, % AI per repo	SaaS (enterprise)	C-suite / VPE	Series A $36M
Cortex	Service catalog + SDLC data	⚠️ Copilot-only usage label	SaaS (enterprise)	Platform engineering	Series C $60M

Corrections from the April edition: LinearB's "no AI attribution" rating no longer holds — it ships labeling-based attribution via gitStream plus AI adoption dashboards. Faros AI is Series A ($36M disclosed), not Series B. Milestone's $10M round is a seed, not a Series A.

The Three Approaches

Capture at Commit Time

Oobo intercepts commits as they happen and records which AI session contributed — ground-truth attribution with session transcripts, exact token counts, and line-level blame.^[1]

Pros: Most accurate data possible. Cons: Requires every developer to install a local CLI; still a solo-developer project.

Model-Based Detection

Span trained a proprietary detector (span-detect-1) that classifies AI-generated code at the chunk level across any tool, supplemented since June 2026 by line-level attribution from agent traces.^[2]

Pros: No per-developer install, tool-agnostic, much closer to code-level truth than heuristics. Cons: Detection models have error bars; accuracy claims are vendor-reported.

Analyze After the Fact

GitClear, Exceeds AI, and Milestone analyze diffs, commit patterns, and metadata; DX, Jellyfish, Swarmia, Faros, LinearB, and Cortex combine vendor APIs, derived signals, and surveys.^[3]^[4]^[5]

Pros: 15-minute setup, works retroactively. Cons: Attribution is inferred, not observed.

Tool Profiles

Oobo — Ground-Truth Attribution

Git decorator enriching commits with anchor metadata: linked AI sessions, tokens, cost, per-line blame; 15 tool integrations^[1]
$20–200/member/month + free OSS CLI; v1.0 shipped May 2026
Still a solo-developer project with minimal community traction — the best data model in the category, unproven as a company

Span — Native Detection

span-detect-1 model classifies AI code chunk-level across all tools; AI Effectiveness suite (June 2026) adds line-level agent-trace attribution^[2]
$25M from Alt Capital, Craft Ventures et al.; customers include Ramp, Vanta, Carvana, Intercom
The strongest answer to "what did AI actually write?" without per-developer installs

GitClear — Deep Code Analytics + AI Research

2018-vintage, bootstrapped; Diff Delta line-level analysis plus direct Copilot/Cursor/Claude Code API integrations^[3]
Free Starter; Pro $14.95–Enterprise $34.95/dev/mo (annual); on-prem available
Its annual AI code-quality research (duplication, churn) is the most-cited in the field — and also contested by academic work

Exceeds AI — AI-First Analytics for Managers

Heuristic multi-signal detection; benchmarks now claim 356K+ engineers and 53.9B lines analyzed^[4]
Pricing moved to per-manager-seat ($49/mo); $4.6M Venrock seed; Wayfair and GoodRx logos
Fastest path to an executive ROI narrative; attribution remains heuristic

Milestone — GenAI ROI for the C-Suite

"GenAI data lake" joining git, PM, org, and AI-tool data; claimed >90% attribution accuracy^[5]
$10M seed (Heavybit, Hanaco, Atlassian Ventures); GitHub's COO is a reference customer
Enterprise-only and demo-led; deliberately declines smaller customers

DX — The Measurement Framework

AI Measurement Framework (utilization, impact, cost) has become the category's reference vocabulary; Agent Ops tooling for agent fleets^[6]^[7]
300+ enterprise customers — Dropbox, Adyen, Vanguard, Booking.com (3,500+ engineers measured)
Survey-plus-systems approach measures outcomes, not line-level attribution

Jellyfish — The Incumbent's AI Impact Product

Broadest multi-tool coverage: assistants (Copilot, Cursor, Claude Code, Gemini), agents (Devin, Jules), and AI code-review tools (CodeRabbit, Graphite, Greptile), with per-tool spend^[8]
Series C-era SEI platform, $114M+ raised, 20M PRs analyzed
Attribution is derived/correlational, not model-based

LinearB — Corrected: It Has AI Attribution Now

gitStream labeling attributes AI-assisted PRs; AI adoption and code-review dashboards shipped^[9]
Series B, $71M total; pricing $29–59/contributor/mo (annual), 45-day trial
Labeling-based approach requires workflow adoption to be accurate

Swarmia — Team Habits + DX Surveys

PR-level AI detection (Copilot/Cursor/Claude Code) added to its git + Jira + survey core^[10]
Series A $11M (June 2025); transparent per-dev pricing
Still no code-level AI/human split — process and sentiment remain its strengths

Faros AI — Enterprise Aggregation

100+ integrations; AI impact via cohort/baseline comparisons and % AI-generated per repo^[11]
Series A, $36M disclosed; founders ex-Salesforce Einstein
Breadth over depth: no per-line attribution; enterprise sales cycle

Cortex — Platform Engineering Lens

Internal developer portal (catalog, scorecards) with a Copilot-only AI usage label^[12]
Series C ($60M, Sept 2024)
The narrowest AI analytics in this report — include it when the IDP is the point

Feature Comparison

Feature	Oobo	Span	GitClear	Exceeds	Milestone	DX	Jellyfish
AI code attribution	✅ Line (ground truth)	✅ Chunk/line (model)	✅ Line (Diff Delta)	✅ Commit (heuristic)	✅ Claimed >90%	⚠️ Framework-level	⚠️ Derived
Token/cost tracking	✅ Exact	✅ Traces	✅ Via API	⚠️ Estimated	✅ Spend	✅ Cost pillar	✅ Per-tool spend
Session transcripts	✅	⚠️ Agent traces	❌	❌	❌	❌	❌
DORA/process metrics	❌	✅	✅	⚠️	✅	✅	✅
Surveys	❌	✅	❌	❌	❌	✅	⚠️
No per-dev install	❌	✅	✅	✅	✅	✅	✅
Open source	✅ CLI	❌	❌	❌	❌	❌	❌

(Swarmia, LinearB, Faros, and Cortex omitted for width — see the matrix above for their attribution levels.)

Picking the Right Tool

"We need to prove AI ROI to the board next quarter" → Exceeds AI or Milestone. Fastest executive-ready reports; Milestone if you're enterprise-scale.

"We want the measurement framework the industry is standardizing on" → DX. Its AI Measurement Framework is what Dropbox- and Booking.com-scale orgs deploy.

"We want to know what code AI actually wrote, without per-dev installs" → Span. Model-based detection plus agent traces is the closest thing to truth at SaaS convenience.

"We want comprehensive engineering metrics, AI included" → GitClear. Deepest code analysis with real AI tracking; bootstrapped stability.

"We want ground-truth AI attribution with full privacy" → Oobo. Capture-time anchors; accept the local-install and tiny-vendor tradeoffs.

"We already run an SEI platform" → Jellyfish (broadest AI Impact), LinearB (labeling + automation), or Swarmia (surveys + habits) — your incumbent probably has more AI measurement than it did in April.

"We're platform engineering with an IDP" → Cortex or Faros AI for aggregation; treat their AI analytics as a bonus, not the product.

The First-Party Squeeze

GitHub's Copilot metrics went GA in February 2026 and added per-user AI-adoption cohorts in May.^[13] Vendors whose pitch was "a dashboard for Copilot usage" are being commoditized from below. The defensible ground is what GitHub concedes it doesn't do: connecting AI usage to engineering outcomes — quality, rework, delivery, spend across many tools. Every tool in this report is racing to that ground.

The Tembo Angle

This category is directly relevant to AI agent orchestration. As Tembo orchestrates coding agents across repos and tasks, the question "what did each agent session produce, and was it good?" is core infrastructure. The approaches here suggest two complementary paths:

Capture-time metadata (like Oobo's anchors) should be built into the orchestration layer — when Tembo runs an agent session, it should automatically record the session-to-commit mapping
Post-hoc quality analysis (like GitClear/Span) can validate that agent-generated code meets quality standards

The winning move is probably both: ground-truth capture for attribution, post-hoc analysis for quality assurance.

Bottom Line

The category grew up in ten weeks. DX brought a reference framework and enterprise gravity; Span brought the first credible detection model; the incumbents (Jellyfish, LinearB, Swarmia, Faros) shipped real AI features and erased the April edition's "incumbents can't do this" framing.

The market will still consolidate — every engineering analytics platform now claims AI measurement, and GitHub is commoditizing the bottom of the stack. The differentiated positions are clear: Oobo owns ground truth, Span owns detection, DX owns the framework, GitClear owns code-quality depth, and Exceeds/Milestone own the executive narrative. Choose based on what your org actually needs to answer.

Research by Ry Walker Research · methodology

Disclosure: Author is CEO of Tembo, which builds coding-agent orchestration; attribution tooling is adjacent to Tembo's interests.

Sources