We've been arguing about how to measure developer productivity for decades. Lines of code. Story points. Commits. Each metric proposed, debated, and ultimately found wanting. The SPACE framework is one of the more useful attempts to broaden the conversation beyond raw output.[1] Industry benchmarks like DORA push teams toward outcomes and flow rather than vanity activity metrics.[2] And now AI coding assistants are about to blow this whole conversation wide open by generating, for the first time, a granular log of the actual coding process. I don't have a clean answer here. Nobody does. But the questions are worth taking seriously, because the dashboards are coming whether we want them or not.
I broke this argument into five atomic posts. Read them in any order:
- The GTM vs. R&D Measurement Gap — Sales gets revenue, engineering gets vibes. The asymmetry is real and it's a leadership problem.
- The AI Coding Tool Wrinkle — Every prompt, accept, reject, iteration is now a logged event. New data is not better data.
- Top Performer Analysis — The interesting use of AI tool telemetry isn't ranking. It's studying how your best engineers actually work.
- The Gaming Problem Never Goes Away — Goodhart's law applies to AI metrics too. The fix is metric design, not detection.
- What Actually Helps Developer Performance — A portfolio of imperfect signals, used to develop people rather than rank them.
The thread underneath all of these: developer performance is genuinely hard to measure, AI tools don't fix that, and the companies that handle this with care will compound advantages over the ones that ship adoption leaderboards. I don't love thinking about measurement. Most engineers don't. We got into this work to build things, not to be quantified. But the question isn't going away, and pretending it will is how engineering leaders cede the ground to people with worse instincts.
The honest posture: measure to develop, not to rank. Use a portfolio so no single signal owns the decision. Decouple development data from compensation data. Treat AI tool telemetry as a microscope, not a scoreboard. Watch what happens when a metric "moves cleanly" — that's almost always Goodhart in action. And take seriously that the best engineers have options. Whatever measurement system you build, they're the ones who'll vote with their feet when it goes wrong.
Would I object to a dashboard ranking my dev team for AI adoption? Yes — and so should you. But the underlying instinct (we should understand how our team is performing) is right. The work is in the gap between that instinct and the lazy dashboard that pretends to satisfy it.
— Ry
Related Essays
AI-First Software Development: Redefining How We Build Software
The AI-First Software Development Manifesto: 11 principles for treating AI as a true development partner, not a fancy autocomplete.
Why "Good Enough" Code Wins
AI-assisted development has changed the economics of code quality. Teams shipping "good enough" code are moving faster than craftperfectionists.
Rise of the Agents: An AI Coding Ecosystem Map
A visual guide to the emerging AI agent ecosystem — from foundation lab tools to enterprise in-house agents, and everything in between.
Key takeaways
- Engineering metrics differ from GTM metrics.
- AI tools add signals but increase gaming risk.
- The goal is learning, not punishment.
FAQ
Why is dev performance hard to measure?
Work is creative and context-dependent. Outputs vary and quality is hard to quantify.
How could AI make it worse?
It introduces noisy micro-metrics that can be gamed. That can distort behavior without improving outcomes.