← Back to essays
·2 min read·By Ry Walker

The Gaming Problem Never Goes Away

The Gaming Problem Never Goes Away

Goodhart's law shows up the moment you tie a metric to compensation. "When a measure becomes a target, it ceases to be a good measure." This is a law, not a tendency. It applies to lines of code, story points, deployment frequency, and yes — every AI tool metric anyone is excited about right now.

If you start tracking prompts per week, engineers will write more prompts. If you start tracking acceptance rate, they'll accept more suggestions and quietly rewrite them after. If you start tracking AI-generated lines of code, they'll find a way to inflate that too. None of this requires malice. It requires only that the people being measured are smart and care about their reviews. Both of those are true by construction in any team worth having.

You can detect the most obvious gaming with a script. Throwaway prompts. Auto-accepted suggestions immediately overwritten. Suspicious bursts before performance review season. Fine. That catches the lazy gamers. It doesn't catch the thoughtful ones, and it certainly doesn't address the underlying problem: the metric drove the wrong behavior in the first place.

The fundamental tension hasn't moved an inch with AI: things that are easy to measure aren't necessarily things that matter, and things that matter aren't necessarily easy to measure. AI tools don't solve this. They just give us new things to measure and new ways to get it wrong. I've argued elsewhere that AI coding tool telemetry is a microscope, not a diagnosis — and a microscope pointed at a comp decision is a thing nobody should want.

The right response isn't more sophisticated detection. It's metric design that anticipates gaming. Use a portfolio of signals so no single one is worth gaming. Decouple development metrics from compensation metrics — let the same data inform learning conversations without showing up on a perf review rubric. Treat any single number that moves "too cleanly" with suspicion. Most of all, take seriously that what actually helps developer performance is rarely a number on a dashboard.

If the only thing standing between your team and gamed metrics is a detection script, the metric is already broken. Build it like you expect smart people to optimize against it — because they will.

Key takeaways

  • Every metric tied to comp eventually gets gamed.
  • AI tool metrics are not immune — they may be easier to game.
  • The fix is not better detection, it is better metric design.

FAQ

Won't engineers just inflate their AI prompt counts?

Yes. Anything that can be counted will be optimized, and anything tied to compensation will be optimized harder. AI tool telemetry is not magically immune.

Can you script around gaming?

A little. You can detect obvious patterns. But the deeper problem isn't caught by scripts — it's that the metric drove the wrong behavior in the first place.