AST Comments on Online Exams

AST Comments on Online Exams

AST was recently asked for an opinion about online examinations for certification bodies, specifically online Bar Exams.  Our conclusion is that to properly serve their technical and social purposes, online examinations must be administered in a fair and unbiased...
Call for Volunteers – AST Webinar Series

Call for Volunteers – AST Webinar Series

Part of the role of the Association for Software is creating a flow of information for the software testing community. For the past year, we have been running an experiment in webinars. We quickly learned a lesson; board members can not run programs that benefit the...

First timer at STPCon 2017 in Phoenix

In few hours I will be flying to US for STPCon 2017 which will be held in Phoenix, Arizona between 14 – 17 March.  Obviously I am excited by the opportunity of speaking and delivering a workshop at the conference.  If you are around, I would …

Agile hasn’t failed you..

Agile has been touted by people (who understand it) as an approach, a value centre, a mindset and philosophy. I wrote a short spiel on Agile some time ago in an effort to clarify how I see it.

For quite a while I have been observing some interesting posts and discussions going on LinkedIn.

One of the topics that I recently noticed was Agile failing organisations. If we regard Agile as a tool that a team or organisation might choose to use, then perhaps we can understand the failure of Agile for that organisation, as we might understand and critique the failure of any other tool the organisation might use. A hammer can certainly fail a carpenter if it breaks during carpentry work. But, if the carpenter does not know how to use a hammer, it is not the hammer’s fault, or is it? (Just so you know, this analogy does not represent Agile as a tool).
Let’s not jump too prematurely to any conclusions. Instead, let’s try to cognitively analyse if there is a problem here. Jerry Weinberg’s Rule of Three* states that if you can’t think of at least three different interpretations of what you have received, you haven’t really thought enough about what it might mean. Another version of this rule that my friend Jari Laakso suggested was, “If you can’t think of three things that might go wrong with your plans, then there’s something wrong with your thinking.”

When someone says that Agile has failed them (in other words, their Agile way of working was not successful), the actual problem might have been:
  1. We don’t know enough about Agile and we tried to “do Agile” rather than “be Agile”.
  2. We thought we knew about Agile and we implemented it the way we knew it. What we did didn’t work..
  3. We thought Agile was predominantly about specific practices and conventions: using post-it notes, having daily standups, having sprints and not much else. We couldn’t deliver anything..
In some contexts, any (or all) of these cases may have been a key contributor to the failure of Agile.

What troubles me is that many people who blame an approach or a methodology, do not in fact try to first understand that approach or methodology.** There was a mention of waterfall methodology somewhere and most people in the discussion did not know about waterfall’s origin. Someone mentioned Winston Royce and disappointingly it turned out that even that person had selective take of the paper and decided to conveniently forget about the last few sections of Royce’s paper which are very important.
More often than not, Agile methodologies are implemented incorrectly. Some implementers don’t realize that there are Agile values and principles (Jari reminded me about ScrumButs). Some have not taken time to look at and understand the Agile manifesto. I have done this experiment of asking anyone who mentions Agile whether they have actually read the manifesto. A large number of those had not. Many of those who had read the manifesto, did not try to  understand it well. Sadly those who understood it, could not implement what an approach as outlined by the Manifesto, because their organisations weren’t ready.
It is indeed often easier to blame a methodology or an approach. Agile (mindset) adoption and implementations of related frameworks can fail for many reasons. What is important is to investigate what went wrong and whether that could be avoided. Even more important is to understand an organisation’s culture and whether the organisation and the approach are good fit for each other. Jerry says in his second rule of consulting, “No Matter how it looks at first, it’s always a people problem.” A good Agile coach might be able to help bring a mindset change if not the culture change.

So, as often the case may be, Agile hasn’t failed you, you may have failed Agile.

Acknowledgements: Thanks to Paul Szymkowiak and Jari Laakso for their feedback and recommendations.

Quality Index : Fooled by Measurement

STPCon published this article on their website. I am happy to share it on my blog as well. 

The software industry seems to have an obsession with metrics and measurement. We want to quantify everything. Once upon a time everything was about counting lines of code (KLOC). Managers ran around asking, “How many lines of code have you written? How many bugs per KLOC are there? What is the size of project in KLOC?” etc. Then we started counting everything else that was left from quantifying KLOC and managers started asking, “How many requirements are there? What is the number of test cases? How many bugs did you find? What is the defect density? How many test cases are passed? What is the requirement coverage?” etc.

The obsession with quantification is often an influence from the manufacturing industry where you can count things that are physical and are visible to eyes. However, counting things in the software industry appears to have helped consultants who sell the premise that charts, graphs and measures based on invalid constructs are meaningful. The problem is often the misinterpretation of these metrics.
At one of my former workplaces, the testing team used to generate a report which had a metric called “Quality Index” (QI). The objective with this measure, that the team explained to me, was to have some indication of quality and performance. The managers needed an indicator to assess development performance (for example, are there any issues with understanding, communications, requirements, process, and so forth). The QI measure was considered a yardstick. That is, every time a new build was tested, QI could tell managers how good (or bad) the build was.
However, there is a problem with using yardsticks. They can’t be used for measuring something that is subjected to interpretation or is subjective in nature. Often they are good heuristics and can be useful as a first order metric, but mostly for physical measurement only.
A metric, like quality index, may be used as an indicator to ask, “is there a problem here?” In that case it becomes a heuristic. Because it is fallible. It may help you find a solution, but it may never guarantee one. Using heuristics for assessing your key employees performance is dangerous. You may lose their trust and respect and they may eventually leave (unless that is what you actually want).
Metrics are a powerful tool but they can always be misinterpreted and can be skewed to show favorable (or otherwise) results. Without context they are very much meaningless. Quantitative measurement can lead to a false sense of control. It creates an illusion that we can understand and control something because we can count it. Someone mentioned in an online forum about quality that if you can’t measure it, you can’t have it. I guess this person was referring to Tom DeMarco who wrote in his book “Controlling Software Projects, Management Measurement & Estimation, (1982), p. 3.” that – you can’t control what you can’t measure. In a recent discussion Michael Bolton reminded me that few years ago Tom renounced from the opinion that he has held. His recent views can be read here.
As I mentioned earlier, metrics are often used as a sales tool by consultants to gain more business. I was recently invited for dinner at a 5-star hotel by the testing group of a large bank in Australia. I wasn’t aware that the dinner was actually hosted by the bank’s testing vendor, a huge I.T. outsourcing firm. This vendor’s introductory presentation included the data of efficiency they brought to their banking client. These details included automating 3500 test cases, reducing test preparation time by 70% and so forth. As James Christie said in his blog post, “100 is bigger than 10. 10,000 is pretty impressive, and 100,000 is satisfyingly humongous. You might not really understand what’s happening, but when you face up to senior management and tell them that you’re managing thousands of things, well, they’ve got to be impressed.” This vendor certainly impressed their naïve client.

What is this beast called Quality index?

The QI index that was used by the teams I worked with had this definition:
The QI has been defined as a measure of defect density, such that the percentage of defects as a proportion of the total number of test cases executed is defined.

This is a measure of company’s software quality delivery to testing as opposed to company production quality.

Lower QI is better.
The report used to have statements like:
171 test cases executed successfully and 93 defects detected, providing a Quality Index (QI) = 54% (this is within the 1 – Unsatisfactory level).
There were graphs like the one below which explained what and how the quality of the build has been:

“So what’s the problem here?”, you may ask.
This seems to be a valid question, especially when you have been told about these indices and were presented with data that seemed accurate. We see such indices on TV every day where some eminent economist is presenting his view on the economy and predicting which way the markets will go, and later convincingly explains why the markets did not go the way he predicted. The simplest answer is that no one, including Nobel Prize winning economists, can predict the future. Humans simply do not have the ability to predict. You may say that you can predict that you will read the next word on this post – but even that is unpredictable. What you would actually mean is, “I predict that I might be able to read the next word provided the boss doesn’t call right at that moment, or the monitor doesn’t lose power or the sky doesn’t fall or..!” The list can go on.
I studied Statistics as one of the subjects during my Masters degree. While that study did not make me an expert in statistics, it did improve my knowledge of the subject though. And I think it will also help us examine what is the problem with this quality index. Let’s start by looking at definitions.

What is “quality”?

Jerry Weinberg defines quality as “value to some person(s)”. James Bach and Michael Bolton added ‘…who matter.’ to this definition. So the definition that I like is, “Quality is value to some person(s) who matter”.
Michael Bolton suggests that decisions about quality are always political and emotional; made by people with the power to make them; made with the desire to appear rational and yet ultimately based on how those people feel.
Let’s have a look at the overall definition once again. “The QI at Company X has been defined as a measure of defect density, such that the percentage of defects as a proportion of the total number of test cases executed is defined.”
What catches our attention is the term “defect”. Although I prefer calling them bugs.

Is there a point in defining defect density?

James Bach says that a bug is anything that threatens the value of a product – something that bugs someone, whose opinion matters (this last part was added by Michael Bolton). This definition automatically makes the benefits that someone may be seeking from quality index highly questionable. Like predicting the future, humans do not have the ability to find and explore all bugs that might be there in a system. Michael Bolton notes that “the idea of a “bug” is subject to the Relative Rule meaning a bug is not a thing that exists in the world; it doesn’t have a tangible form. However, a bug is a relationship between the product and some person. A bug is a threat to the value of the product to some person. The notion of a bug might be shared among many people, or it might be exclusive to some person.” So, there may be little point quantifying something that does not have a physical existence. Once people start counting bugs, they start falling in love with them. They feel that they own these conceptual non-physical things. Psychology defines this as reification, the perception of an object as having more spatial information than is actually present. Is it worthwhile defining density of an abstract concept or of something that is subject to relative rule?

The wild world of test cases

The definition of QI also talks about deriving a percentage as a ratio of number of test cases. What is your definition of a test case? My team stopped writing lengthy, step-to-step test cases or scripts in a deliberate move away from the idea that testers should develop test cases based on a requirements document. What I have observed is that many testers take a requirements document and create a number of test cases for each requirement like positive, negative, sedative, nonsense-itive and so on. These testers wrongly believe that these test cases provide complete coverage1 for the requirements. They create a Requirement Traceability Matrix (RTM) which is usually a table with test cases on one axis and requirements on the other. If all requirements have a mapped test case, then coverage is complete. The managers believe that these detailed test cases help their tester perform complete testing. Managers get upset when coverage metrics are not there or the RTM is missing.
What they don’t realize is that when they say test, what they really mean is check. Then, a single requirement may mean more than one assumption, proposition or assertion. A business analyst who writes business stakeholders requirements may interpret them entirely differently than the stakeholder herself. A developer may interpret them differently and similarly a tester may interpret and intersect the requirements in a very different fashion not understood or agreed by others. Hence writing a test case per requirement or multiple test cases per requirement sounds completely incorrect. If that is incorrect, then the ratio based on an incorrect number would be wrong too. And therefore, that makes the concept of Quality Index meaningless.
So what do you think of this claim now:
This is a measure of Company X’s software quality delivery to testing as opposed to Company X production quality.

Lower QI is better.
The QI as it is defined is not a measure of anything of value about the software or its quality – and is easily gamed (e.g. just split test cases into smaller test cases and hence immediately reduce the QI for exactly the same piece of software under test with exactly the same number of defects!). The QI itself even tells you that the testing it relates to (and the way it is being measured) is not great, by saying “Lower QI is better” – this means you are striving to make test cases that don’t find defects, why would you do that? Ethically we should not.
As skilled craftsmen, we should not waste time counting and showing percentages when we could best spend our time talking to our stakeholders about the things we see, the things that interest us, things that looks suspicious, the risks we observe, and the overall quality as we perceive it.
So, the next time you are counting test cases or bugs, working on a RTM or looking at percentages in reports, ask yourself, “Am I simply counting and giving statistics or am I helping our organization to deliver a quality product with the lowest risk of failure?”