My last post responded to Michael Bolton’s: “Why Pass vs. Fail Rates Are Unethical“, Michael argued that calculating the ratio of passing tests to failing tests is irresponsible, unethical, unprofessional, unscientific and inhumane. I think this is an example of a growing problem in the rhetoric of context-driven testing–I think it considers too little the value of tailoring what we do to the project’s context. Instead, too often, I see a moralistic insistence on adoption of preferred practices or rejection of practices that we don’t like.

I think it’s easy to convert any disagreement about policy or practice into a disagreement about ethics. I think this is characteristic of movements that are maturing into orthodox rigidity. Unfortunately, I think that’s fundamentally incompatible with a contextualist approach. My post advocated for dialing the rhetoric back, for a stronger distinction between disagreeing with someone and morally condemning them.

Michael responded with a restatement that I think is even more extreme.

I think the best way to answer this is with a series of posts (and perhaps some discussion) rather than one excessively long screed.

(Added 3/22/12) Michael and I plan to discuss this soon. My next post will be informed by that discussion.

The core messages of this first post are fairly simple:

Executives are Entitled and Empowered to Choose their Metrics

Several years ago, I had a long talk about metrics with Hung Quoc Nguyen. Hung runs LogiGear, a successful test lab. He was describing to me some of the metrics that his clients expected. I didn’t like some of these metrics and I asked why he was willing to provide them. Hung explained that he’d discussed this with several executives. They understood that the metrics were imperfect. But they felt that they needed ways to summarize what the organization knew about projects. They felt they needed ways to compare progress, costs, priorities, and risks. They felt they needed ways to organize the information so that they could compare several projects or groups at the same time. And they felt they needed to compare what was happening now to what had happened in the past. Hung then made three points:

  1. These are perfectly legitimate management goals.
  2. Quantification (metrics) is probably necessary to achieve these goals.
  3. The fact that there is no collection of metrics that will do this perfectly (or even terribly well) doesn’t eliminate the need. Without a better alternative, managers will do the best they can with what they’ve got.

Hung concluded that his clients were within their rights to ask for this type of information and that he should provide it to them.

If I remember correctly, Hung also gently chided me for being a bit of a perfectionist. It’s easy to refuse to provide something that isn’t perfect. But that’s not helpful when the perfect isn’t available. He also suggested that when it comes to testers or consultants offering a “better alternative”, every executive has both the right and the responsibility to decide which alternative is the better one for her or his situation.

By this point, I had joined Florida Tech and wasn’t consulting to clients who needed metrics, so I had the luxury of letting this discussion settle in my mind for a while before acting on it.

Finance Metrics Illustrate the Executives’ Context

A few years later, I started studying quantitative finance. I am particularly interested in the relationship between model evaluation in quantitative finance and exploratory testing. I also have a strong personal interest–I apply what I learn to managing my family’s investments.

The biggest surprise for me was how poor a set of core business metrics the investors have to work with. I’m thinking of the numbers in balance sheets, statements of cash flow, and income statements, and the added details in most quarterly and most annual investment reports. These paint an incomplete, often inaccurate picture of the company. The numbers are so subject to manipulation, and present such an incomplete view, that it can be hard to tell whether a company was actually profitable last year or how much their assets are actually worth.

Investors often supplement these numbers with qualitative information about the company (information that may or may not present a more trustworthy picture than the numbers). However, despite the flaws of the metrics, most investors pay careful attention to financial reports.

I suppose I should have expected these problems. My only formal studies of financial metrics (courses on accounting for lawyers and commercial law) encouraged a strong sense of skepticism. And of course, I’ve seen plenty of problems with engineering metrics.

But it was still a surprise that people actually rely on these numbers. People invest enormous amounts of money on the basis of these metrics.

It would be easy to rant against using these numbers. They are imperfect. They can be misleading. Sometimes severely, infuriatingly, expensively misleading. So we could gather together and have a nice chant that using these numbers would be irresponsible, unethical, unprofessional, unscientific and inhumane.

But in the absence of better data, when I make financial decisions (literally, every day), these numbers guide my decisions. It’s not that I like them. It’s that I don’t have better alternatives to them.

If someone insisted that I ignore the financial statistics, that using them would be irresponsible, unethical, unprofessional, unscientific, and inhumane, I would be more likely to lose respect for that person than to stop using the data.

Teaching Metrics

I teach software metrics at Florida Tech. These days, I start the course with chapters from Tockey’s Return on Software: Maximizing the Return on Your Software Investment. We study financial statistics and estimate future cost of a hypothetical project. The students see a fair bit of uncertainty. (They experience a fair bit of uncertainty–it can be a difficult experience.) I do this to help my students gain a broader view of their context.

When an executive asks them for software engineering metrics, they are being asked to provide imperfect metrics to managers who are swimming in a sea of imperfect metrics.

It is important (I think very important) to pay attention to the validity of our metrics. It is important to improve them, to find ways to mitigate the risks of using them, and to advise our clients about the characteristics and risks of the data/statistics we supply to them. I think it’s important to use metrics in ways that don’t abuse people. There are ethical issues here, but I think the blanket condemnation of metrics like pass/fail ratios does not begin to address the ethical issues.

The Principles

In the context-driven principles, we wrote (more precisely, I think, I wrote) “Metrics that are not valid are dangerous.” I still mostly (*) agree with these words but I think it is too easy to extend the statement into a position that is dogmatic and counterproductive. If I was writing the Principles today, I would reword this statement in a way that acknowledges the difficulty of the problem and the importance of the context.

(*) The statement that “Metrics that are not valid” is inaccurately absolute. It is not proper to describe a metric as valid (see Trochim and Shadish, Cook & Campbell, for example). Rather, we should talk about metrics as more valid or less valid (shades of gray). The wording “not valid” was a simplification at the time, and in retrospect, should be seen as an oversimplification.