Key takeaways
- The category now splits three ways on attribution: capture-at-commit-time (Oobo), model-based detection (Span's span-detect-1), and post-hoc heuristics/derived signals (everyone else)
- DX's AI Measurement Framework has become the category's reference vocabulary, while Span shipped the only proprietary AI-code detection model — both were missing from the April version of this report
- The incumbents moved fast: LinearB now ships labeling-based AI attribution (the April "none" rating no longer holds), Jellyfish's AI Impact covers assistants, agents, and AI code-review tools with spend tracking
- GitHub's first-party Copilot metrics (GA Feb 2026, adoption cohorts May 2026) are squeezing the usage-dashboard value prop — outcome correlation is where the independents live now
FAQ
Why do teams need AI engineering intelligence?
As AI writes more code, engineering leaders need to know: Is AI actually improving productivity? Is AI-generated code maintaining quality? Which teams are adopting AI tools effectively? Traditional git analytics can't answer these questions.
What's the difference between capture-time, model-based, and post-hoc attribution?
Capture-time tools (Oobo) record which AI session produced which code at commit time — ground truth. Model-based detection (Span) classifies code chunks with a trained detector. Post-hoc tools (GitClear, Exceeds AI) infer AI involvement from diffs and patterns; survey/derived approaches (DX, Jellyfish) combine vendor APIs with developer self-reports.
Which tool should an enterprise start with?
DX if you want the measurement framework large enterprises are standardizing on. Exceeds AI or Milestone for the fastest executive ROI story. GitClear for deep code-quality metrics. Span for native detection without per-developer installs. Oobo for ground-truth attribution if you can manage local installs.
The Problem
AI coding tools changed how software gets written. Cursor, Claude Code, Copilot, Codex — engineers are shipping more code faster. But engineering leaders are left asking basic questions: How much of our code is AI-generated? Is it any good? Are we getting ROI on our AI tool spend?
Traditional engineering analytics platforms were built for a world where humans wrote all the code. They measure velocity, cycle time, and DORA metrics — useful, but blind to the AI contribution layer. A category of tools has emerged to fill this gap, and since April it has both expanded (DX, Span, Jellyfish join this report) and professionalized (the incumbents shipped real AI features).
The Market Map
The landscape splits along two axes: data collection method (how they learn about AI involvement) and primary audience (developers vs. engineering leadership vs. C-suite).
| Tool | Data Source | AI Attribution | Deployment | Primary Audience | Stage |
|---|---|---|---|---|---|
| Oobo | Local capture at commit time | ✅ Ground-truth (session linking, per-line) | CLI per developer + hosted | AI-heavy teams | v1.0 (May 2026), Techstars |
| Span | Git + tickets + agent traces | ✅ Model-based (span-detect-1, chunk-level) + line-level traces | SaaS | Mid-to-large orgs | $25M Seed+A (Nov 2025) |
| GitClear | Git history analysis | ✅ Line-level Diff Delta + AI-vendor APIs | SaaS / on-prem | Developers + managers | Bootstrapped, est. 2018 |
| Exceeds AI | PR/commit diff analysis | ✅ Heuristic multi-signal | SaaS (GitHub/GitLab app) | Engineering leadership | Seed $4.6M (Venrock) |
| Milestone | Git + PM + org + AI tool APIs | ✅ Claimed >90% accuracy | SaaS / on-prem | C-suite / VPE | $10M seed (Nov 2025) |
| DX | System data + surveys | ✅ Framework-level (utilization, impact, cost) | SaaS | Enterprise leadership | Growth (ICONIQ, 300+ customers) |
| Jellyfish | Git + PM + vendor APIs | ⚠️ Derived signals + per-tool spend | SaaS | Executives + finance | Series C, $114M+ total |
| LinearB | Git + Jira + CI/CD | ⚠️ Labeling-based (gitStream) + AI dashboards | SaaS | Managers | Series B, $71M total |
| Swarmia | Git + Jira + DX surveys | ⚠️ PR-level detection, no code-level split | SaaS (GitHub app) | Team leads + VPE | Series A $11M (Jun 2025) |
| Faros AI | 100+ integrations aggregated | ⚠️ Cohort/baseline, % AI per repo | SaaS (enterprise) | C-suite / VPE | Series A $36M |
| Cortex | Service catalog + SDLC data | ⚠️ Copilot-only usage label | SaaS (enterprise) | Platform engineering | Series C $60M |
Corrections from the April edition: LinearB's "no AI attribution" rating no longer holds — it ships labeling-based attribution via gitStream plus AI adoption dashboards. Faros AI is Series A ($36M disclosed), not Series B. Milestone's $10M round is a seed, not a Series A.
The Three Approaches
Capture at Commit Time
Oobo intercepts commits as they happen and records which AI session contributed — ground-truth attribution with session transcripts, exact token counts, and line-level blame.[1]
Pros: Most accurate data possible. Cons: Requires every developer to install a local CLI; still a solo-developer project.
Model-Based Detection
Span trained a proprietary detector (span-detect-1) that classifies AI-generated code at the chunk level across any tool, supplemented since June 2026 by line-level attribution from agent traces.[2]
Pros: No per-developer install, tool-agnostic, much closer to code-level truth than heuristics. Cons: Detection models have error bars; accuracy claims are vendor-reported.
Analyze After the Fact
GitClear, Exceeds AI, and Milestone analyze diffs, commit patterns, and metadata; DX, Jellyfish, Swarmia, Faros, LinearB, and Cortex combine vendor APIs, derived signals, and surveys.[3][4][5]
Pros: 15-minute setup, works retroactively. Cons: Attribution is inferred, not observed.
Tool Profiles
Oobo — Ground-Truth Attribution
- Git decorator enriching commits with anchor metadata: linked AI sessions, tokens, cost, per-line blame; 15 tool integrations[1]
- $20–200/member/month + free OSS CLI; v1.0 shipped May 2026
- Still a solo-developer project with minimal community traction — the best data model in the category, unproven as a company
Span — Native Detection
- span-detect-1 model classifies AI code chunk-level across all tools; AI Effectiveness suite (June 2026) adds line-level agent-trace attribution[2]
- $25M from Alt Capital, Craft Ventures et al.; customers include Ramp, Vanta, Carvana, Intercom
- The strongest answer to "what did AI actually write?" without per-developer installs
GitClear — Deep Code Analytics + AI Research
- 2018-vintage, bootstrapped; Diff Delta line-level analysis plus direct Copilot/Cursor/Claude Code API integrations[3]
- Free Starter; Pro $14.95–Enterprise $34.95/dev/mo (annual); on-prem available
- Its annual AI code-quality research (duplication, churn) is the most-cited in the field — and also contested by academic work
Exceeds AI — AI-First Analytics for Managers
- Heuristic multi-signal detection; benchmarks now claim 356K+ engineers and 53.9B lines analyzed[4]
- Pricing moved to per-manager-seat ($49/mo); $4.6M Venrock seed; Wayfair and GoodRx logos
- Fastest path to an executive ROI narrative; attribution remains heuristic
Milestone — GenAI ROI for the C-Suite
- "GenAI data lake" joining git, PM, org, and AI-tool data; claimed >90% attribution accuracy[5]
- $10M seed (Heavybit, Hanaco, Atlassian Ventures); GitHub's COO is a reference customer
- Enterprise-only and demo-led; deliberately declines smaller customers
DX — The Measurement Framework
- AI Measurement Framework (utilization, impact, cost) has become the category's reference vocabulary; Agent Ops tooling for agent fleets[6][7]
- 300+ enterprise customers — Dropbox, Adyen, Vanguard, Booking.com (3,500+ engineers measured)
- Survey-plus-systems approach measures outcomes, not line-level attribution
Jellyfish — The Incumbent's AI Impact Product
- Broadest multi-tool coverage: assistants (Copilot, Cursor, Claude Code, Gemini), agents (Devin, Jules), and AI code-review tools (CodeRabbit, Graphite, Greptile), with per-tool spend[8]
- Series C-era SEI platform, $114M+ raised, 20M PRs analyzed
- Attribution is derived/correlational, not model-based
LinearB — Corrected: It Has AI Attribution Now
- gitStream labeling attributes AI-assisted PRs; AI adoption and code-review dashboards shipped[9]
- Series B, $71M total; pricing $29–59/contributor/mo (annual), 45-day trial
- Labeling-based approach requires workflow adoption to be accurate
Swarmia — Team Habits + DX Surveys
- PR-level AI detection (Copilot/Cursor/Claude Code) added to its git + Jira + survey core[10]
- Series A $11M (June 2025); transparent per-dev pricing
- Still no code-level AI/human split — process and sentiment remain its strengths
Faros AI — Enterprise Aggregation
- 100+ integrations; AI impact via cohort/baseline comparisons and % AI-generated per repo[11]
- Series A, $36M disclosed; founders ex-Salesforce Einstein
- Breadth over depth: no per-line attribution; enterprise sales cycle
Cortex — Platform Engineering Lens
- Internal developer portal (catalog, scorecards) with a Copilot-only AI usage label[12]
- Series C ($60M, Sept 2024)
- The narrowest AI analytics in this report — include it when the IDP is the point
Feature Comparison
| Feature | Oobo | Span | GitClear | Exceeds | Milestone | DX | Jellyfish |
|---|---|---|---|---|---|---|---|
| AI code attribution | ✅ Line (ground truth) | ✅ Chunk/line (model) | ✅ Line (Diff Delta) | ✅ Commit (heuristic) | ✅ Claimed >90% | ⚠️ Framework-level | ⚠️ Derived |
| Token/cost tracking | ✅ Exact | ✅ Traces | ✅ Via API | ⚠️ Estimated | ✅ Spend | ✅ Cost pillar | ✅ Per-tool spend |
| Session transcripts | ✅ | ⚠️ Agent traces | ❌ | ❌ | ❌ | ❌ | ❌ |
| DORA/process metrics | ❌ | ✅ | ✅ | ⚠️ | ✅ | ✅ | ✅ |
| Surveys | ❌ | ✅ | ❌ | ❌ | ❌ | ✅ | ⚠️ |
| No per-dev install | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Open source | ✅ CLI | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
(Swarmia, LinearB, Faros, and Cortex omitted for width — see the matrix above for their attribution levels.)
Picking the Right Tool
"We need to prove AI ROI to the board next quarter" → Exceeds AI or Milestone. Fastest executive-ready reports; Milestone if you're enterprise-scale.
"We want the measurement framework the industry is standardizing on" → DX. Its AI Measurement Framework is what Dropbox- and Booking.com-scale orgs deploy.
"We want to know what code AI actually wrote, without per-dev installs" → Span. Model-based detection plus agent traces is the closest thing to truth at SaaS convenience.
"We want comprehensive engineering metrics, AI included" → GitClear. Deepest code analysis with real AI tracking; bootstrapped stability.
"We want ground-truth AI attribution with full privacy" → Oobo. Capture-time anchors; accept the local-install and tiny-vendor tradeoffs.
"We already run an SEI platform" → Jellyfish (broadest AI Impact), LinearB (labeling + automation), or Swarmia (surveys + habits) — your incumbent probably has more AI measurement than it did in April.
"We're platform engineering with an IDP" → Cortex or Faros AI for aggregation; treat their AI analytics as a bonus, not the product.
The First-Party Squeeze
GitHub's Copilot metrics went GA in February 2026 and added per-user AI-adoption cohorts in May.[13] Vendors whose pitch was "a dashboard for Copilot usage" are being commoditized from below. The defensible ground is what GitHub concedes it doesn't do: connecting AI usage to engineering outcomes — quality, rework, delivery, spend across many tools. Every tool in this report is racing to that ground.
The Tembo Angle
This category is directly relevant to AI agent orchestration. As Tembo orchestrates coding agents across repos and tasks, the question "what did each agent session produce, and was it good?" is core infrastructure. The approaches here suggest two complementary paths:
- Capture-time metadata (like Oobo's anchors) should be built into the orchestration layer — when Tembo runs an agent session, it should automatically record the session-to-commit mapping
- Post-hoc quality analysis (like GitClear/Span) can validate that agent-generated code meets quality standards
The winning move is probably both: ground-truth capture for attribution, post-hoc analysis for quality assurance.
Bottom Line
The category grew up in ten weeks. DX brought a reference framework and enterprise gravity; Span brought the first credible detection model; the incumbents (Jellyfish, LinearB, Swarmia, Faros) shipped real AI features and erased the April edition's "incumbents can't do this" framing.
The market will still consolidate — every engineering analytics platform now claims AI measurement, and GitHub is commoditizing the bottom of the stack. The differentiated positions are clear: Oobo owns ground truth, Span owns detection, DX owns the framework, GitClear owns code-quality depth, and Exceeds/Milestone own the executive narrative. Choose based on what your org actually needs to answer.
Research by Ry Walker Research · methodology
Disclosure: Author is CEO of Tembo, which builds coding-agent orchestration; attribution tooling is adjacent to Tembo's interests.
Sources
- [1] Oobo — Official Website
- [2] Span — AI-Native Engineering Intelligence
- [3] GitClear — Software Engineering Intelligence
- [4] Exceeds AI — AI Engineering Analytics
- [5] Milestone — GenAI ROI Platform
- [6] DX — Developer Intelligence Platform
- [7] DX: Introducing the AI Measurement Framework
- [8] Jellyfish AI Impact
- [9] LinearB — Software Delivery Management
- [10] Swarmia — Engineering Intelligence
- [11] Faros AI — Engineering Intelligence Platform
- [12] Cortex — Internal Developer Portal
- [13] GitHub Changelog: Copilot Usage Metrics API Adds AI Adoption Cohorts