This post came out of some notes I wrote up when trying to persuade some other parts of our organisation that measuring quality as “number of open bugs” was a really poor idea – specifically that it’s (a) not a metric for quality and (b) it strongly encourages short-termist behavior that lowers long term quality (prioritising immediate functional bugs over maintainability, extensibility, reducing regression counts, diagnosability, all those other “ilities”). I’ve walked that path on so many projects, I’m keen to try to stop others following down the track…
What is quality – what are we trying to measure?
Before we think about how to measure quality, it helps to think about what is quality? The best definition I’ve come across originated with Jerry Weinberg
Quality is value to some person (who matters)
So for our products, who matters, and what do they care about?
From a quickish brainstorm meeting a year or few back, an incomplete set is:
- Who matters: Our customers, their customers, support, solutions teams, development teams, test teams, product management, trials/demo teams, Sales, our C-suite.
- What matters to them (not all things matter to all people): Function set, working function, reliability, price, usability, diagnostics, debuggability, maintainability, extensibility, documentation, standards compliance, integration, interop, configurability, simplicity/complexity, “IT”/Sexiness/Wow factor, supportability.
Looking at all those, it seems that quality is a pretty hard thing to pin down or measure. We do like measuring things though, and what we can do is measure some proxies for quality. Note – these proxies aren’t “quality” – they’re things that seem to align well with something that matters to someone (so part of quality).
So what can we set out to measure that would do better?
Easy(ish) to measure proxies for quality (“quality metrics”)
My goal with this list is to think of proxies that are easy to measure – so that you get cheap quantitative measures of quality of the product. I’m not suggesting that you should use all of them, but you could definitely measure them and use at least (say) half a dozen or so when defining a “quality bar” for a product or solution. In particular given people’s tendency (as everyone does under tight timescales) to get over-focussed on the proxies – it’s a good idea to use a whole bunch of them to measure quality to prevent you making poor short-term decisions while thinking that you’re not affecting quality.
I’ve also marked ones that you could measure internally before you ship – as those let you get some kind of measurable handle on quality before the product goes out the door.
- Function (set and that it works). Proxies:
- (External) Functional Issues (number and severity) raised by customers
- (Internal) Functional issues (number and severity) raised by ST or solution teams
- (External) % of shipped enhancements with follow-on enhancement requests / complaints that we’ve delivered the wrong thing
- Simplicity/Complexity. Extensibility. Proxies:
- (External/Internal) Average elapsed time from issue raised to fix.
- (External/Internal) Average number of fix attempts per bug. (%of the time that a fix resolves the issue.)
- (Internal) Average number of dead-on-arrival entries into ST – or – Average # of solution team acceptance tests failing when product team SLA tests pass
- (Internal) Average sprint velocity for each team (sprint velocity is also about the people, not just the product, but it’s not clear to me that they’re completely disentanglable in scrum)?
- Diagnostics. Proxies:
- (Internal / External) Average number of repros required before fix is understood (customers/support, solution/product, within product team)
- (Internal) Average number of misdiagnosed issues going from Solution or ST team to wrong product or dev team
- Documentation. Proxies:
- (Internal / External) Number of solution failures (solution teams and customer teams) resolved as misconfiguration.
- (Internal / External) Average time to bring up new system/solution.
- (Internal / External) Number of issues resolved with docs fix.
- Integration/Installation/Upgrade. Proxies:
- (External) Elapsed time from new release to #customers deployed.
- (Internal / External) Average time to install a new system / product / solution
- (Internal / External) Average downtime when upgrading a system / product / solution, including how impacting the downtime is (e.g. entire solution down for 2 hours vs. reduction in capacity for 1 hour, …)
- Supportability/Cost to us. Proxies:
- (External) Cost to support per $revenue (resource days).
- Wow factor/It/Sexiness. Proxies:
- (External) Customer ratings
- (External) User install/uninstall rates.
There’s obviously loads more here – both more areas and more metrics. But even if you just look at all these different measures of quality, and the serious impacts that they can have on your ability to deliver and support products, it’s blindingly obvious that we need to measure more than just number of bugs when gauging quality! And it’s also obvious that it’s pretty cheap to measure a bunch of these.
A final caveat though – while measuring a bunch of these is clearly better than just measuring bugs or field defects – I’m pretty sure there’s even more we could do here in terms of thinking about, talking about, and understanding quality without just using numbers. After all, this quality bar is easy to measure (it’s 98m long), but this quality bar is better.