← Back to essays
·3 min read·By Ry Walker

Measuring Developer Performance (And Why AI Might Make It Worse)

Measuring Developer Performance (And Why AI Might Make It Worse)

We've been arguing about how to measure developer productivity for decades. Lines of code. Story points. Commits. Each metric proposed, debated, and ultimately found wanting. The SPACE framework is one of the more useful attempts to broaden the conversation beyond raw output.[1] Industry benchmarks like DORA push teams toward outcomes and flow rather than vanity activity metrics.[2] And now AI coding assistants are about to blow this whole conversation wide open by generating, for the first time, a granular log of the actual coding process. I don't have a clean answer here. Nobody does. But the questions are worth taking seriously, because the dashboards are coming whether we want them or not.

I broke this argument into five atomic posts. Read them in any order:

The thread underneath all of these: developer performance is genuinely hard to measure, AI tools don't fix that, and the companies that handle this with care will compound advantages over the ones that ship adoption leaderboards. I don't love thinking about measurement. Most engineers don't. We got into this work to build things, not to be quantified. But the question isn't going away, and pretending it will is how engineering leaders cede the ground to people with worse instincts.

The honest posture: measure to develop, not to rank. Use a portfolio so no single signal owns the decision. Decouple development data from compensation data. Treat AI tool telemetry as a microscope, not a scoreboard. Watch what happens when a metric "moves cleanly" — that's almost always Goodhart in action. And take seriously that the best engineers have options. Whatever measurement system you build, they're the ones who'll vote with their feet when it goes wrong.

Would I object to a dashboard ranking my dev team for AI adoption? Yes — and so should you. But the underlying instinct (we should understand how our team is performing) is right. The work is in the gap between that instinct and the lazy dashboard that pretends to satisfy it.

— Ry

Key takeaways

  • Engineering metrics differ from GTM metrics.
  • AI tools add signals but increase gaming risk.
  • The goal is learning, not punishment.

FAQ

Why is dev performance hard to measure?

Work is creative and context-dependent. Outputs vary and quality is hard to quantify.

How could AI make it worse?

It introduces noisy micro-metrics that can be gamed. That can distort behavior without improving outcomes.