Giving the good advice

Standard

In my previous post I tried to explain that software testing, as it happens in practice, cannot be represented as a logical and orderly (coherent) sequence of events. There are too many factors—specific circumstances— that influence the decision making process that guides testing. The study of testing is the study of this decision making process. It is driven by detailed information about the circumstances. And yet, in many publications about software testing, these details are disregarded. What remains is a representation of testing that is orderly, classified and coherent; a representation that is often lacking details (facts) by which we can verify the selected approach, method or model.

One such representation—which I mentioned in my comment on the previous post—is the catalog; a list of systematically arranged items. Usually, such a list contains items that are abstracted from reality. Items can be resources (such as software testing tools), skills, characteristics, methods, practices, preferences and many other things. The list can have many purposes; it can serve, for example, as a checklist, an overview, a heuristic, a process description or a guideline. In software testing we have a wide diversity of catalogs and my argument is that we focus too often on achieving an orderly, classified and coherent representation, while forgetting about reality.

As an example I would like to discuss a catalog that is presented in the EuroSTAR blog post G(r)ood testing 25: Tips for how to boost Unit testing as a Functional Tester. The catalog that is offered is called ‘Tips for boosting Unit testing’. In order to understand what the list looks like, a part of it is presented below.

Tips for boosting Unit testing

  • Focus on doing useful tests (tests deliver information) rather than discussing whether the tests are unit tests or functional tests
  • Ask the developer for Help (or offer him to help?), rather than telling him he needs to do UT
  • Pair the tester and developer together while defining the tests
  • Demonstrate the damage of not doing UT – E.g. showing that late found defects slow everyone down
  • Organize coding Dojo’s where you put unit testing on the agenda
  • Introduce test design techniques to the developers
  • Get an understanding about the build and deployment process

I am currently involved in setting up unit testing in an Agile team, so the topic of unit testing—and putting unit testing on the agenda—is familiar to me. When I read through the list that is mentioned in the blog post, at first glance I feel that the advice that is presented is sensible and practical. I even applied some of the tips that are on the list. It appears to contain solid and practical advice to someone wanting to make more of unit testing.

But is it really solid and practical advice? The fact that it looks very familiar to me, might suggest that it is folklore, which is defined by Merriam-Webster as “an often unsupported notion, story, or saying that is widely circulated.” Could it be that these tips are widely circulated and therefore look very familiar to me? This could be an interesting area for further investigation. And then the next question: is it good advice? Some state that “most advice is terrible”. This statement is made in one of the first links that turned up when I searched for the phrase “good or bad advice” in Google. It is interesting, because it suggests that this list of tips for unit testing that we a looking at is likely to be terrible. But it also suggests that we probably should look further and come to a more nuanced perspective of what advice means. In order to get to this nuanced perspective we need to study the nature of the advice, its context, the persons involved etc…

In order to evaluate the advice that is given to us, we need to know more about it. The blog post reveals quite a number things. The list of tips was the outcome of a brainstorm session that was held during the the 21st testing retreat in Château de Labusquière à Montadet in France. We learn that senior testers (twelve in total) from various countries were present. The fact that all testers were senior suggests that the advice has been derived from years of experience in software testing. With the means at hand it is not possible to find out if that experience encompasses unit testing, so we have to assume this. But all in all, the word ‘senior’ suggests that the advice should be good, since it must have been tested in practice extensively.

There are other aspects of the advice that can be learned from the text. For example, a motivation for giving the advice is stated (to help testers in their struggle) and in a very general sense a couple of situations are described in which the advice might be of use. But apart from the fact that the advice is likely to come from experience, we have no other indications that the advice is good. Moreover, we lack information by which we can verify whether the advice is good or not. Sure, one can apply each of the tips to the best of one’s abilities, but if this leads to (horrible) failure, how does this reflect on the quality of the advice? Perhaps the advice was bad indeed but the failure to deliver may also have been caused by shortcomings on the tester’s part. Perhaps she did not understand the advice or she lacked of skills to apply it. Furthermore there may have been circumstances adversarial to implementing the advice. Perhaps the timing was wrong, the order in which it was applied was incorrect or certain preconditions were not met.

By now it should be obvious that we lack two things. Firstly, the criteria by which we can evaluate the success or failure of the advice that is given are not present. And secondly we lack information about the context in which the advice is applicable and what it needed to apply it. Without these criteria it is possible to give any sort of advice. If I wanted for example to get from Utrecht in the Netherlands to New York, the advice might be to get on a plane. This advice contains a huge amount of implicit assumptions that are essential to the success of the advice. In other words; the advice is useless. Perhaps the same can, for example, be said of the advice to “introduce test design techniques to the developers.” I can see how test design techniques help us make an informed decision about domain coverage, but aren’t test design techniques usually based on detailed functional specifications? What is there to cover when we are coding and learning about the functioning of the application in parallel? And how do we know what technique to apply if we know little about risk? And if a test design technique tells us that we should write say thirty unit tests, how would I deal with the boredom of writing those tests? How would I handle the frustration of the developer who wrote these elaborate design-technique-based tests and has to throw them out three sprints later because of new insights with regards to the functionality? And which tests should I write that are not based on test design techniques? And what should they be based on?

As an afterthought, it may be a nice exercise in critical evaluation to add tips to the list and see if the list gets better or deteriorates because of it. If creativity is needed just look up the Celestial Emporium of Benevolent Knowledge, which is a brightly shining example of a catalog.

Never in a straight line

Standard

The theme of the seventh annual peer conference of the Dutch Exploratory Workshop on Testing (DEWT7) is lessons learned in software testing. In the light of that theme I want to share a lesson recently learned.

Broadly stated, the lesson learned is that nearly any effort in software testing develops in a non-linear way. This may seem like a wide open door, but I find that it contrasts with the way software testing is portrayed in many presentations, books and articles. It is likely that due to the limitations of the medium, decisions must be made to focus on some key areas and leave out seemingly trivial details. When describing or explaining testing to other people, we may be inclined to create coherent narratives in which a theme is gradually developed, following logical steps.

Over the last couple of months I came to realize something that I’ve been experiencing for a longer time; the reality of testing is not a coherent narrative. Rather; it is a series of insights based on a mixture of (intellectual) effort and will, craftsmanship, conflicts and emotions, personality and personal interests and, last but not certainly least, circumstance, among which chance and serendipity. The study aimed at the core of testing is the study of the decision making process that the software tester goes through.

My particular experience is one of balancing many aspects of the software development process in order to move towards a better view of the quality of the software. I spent six full weeks refactoring an old (semi) automated regression test suite in order to be able to produce test data in a more consistent manner. As expected, there was not enough time to complete this refactoring. Other priorities were pressing, so I got involved in the team’s effort to build a web service and assist in setting up unit testing. My personal interest in setting up unit testing evolved out of my conviction that the distribution of automated tests as shown in Cohn’s Test Automation Pyramid is basically a sound one. The drive to make more of unit testing was further fueled by a presentation by J.B. Rainsberger (Integrated Tests Are A Scam). I used unit testing to stimulate the team’s thinking about coverage. I was willing to follow through on setting up a crisp and sound automation strategy, but having set some wheels in motion I had to catch up with the business domain. With four developers in the team mainly focusing on code, I felt (was made to feel) that my added value to the team was in learning as much as needed about why we were building the software product. To look outward instead of inward. And this is were I am at the moment, employing heuristics such as FEW HICCUPS and CRUSSPIC STMPL (PDF) to investigate the context. It turns out that my investment in the old automated regression test suite to churn out production-like data is now starting to prove its worth. Luck or foresight?

All this time a test strategy (a single one?) is under development. Actually, there have been long (and I mean long) discussions about the test approach within the team. I could have ‘mandated’ a testing strategy from my position as being the person in the team who has the most experience in testing. Instead I decided to provide a little guidance here and there but to keep away from a formal plan. Currently the test strategy is evolving ‘by example’, which I believe is the most efficient way and also the way that keeps everyone involved and invested.

The evolution of the understanding of the quality of the software product is not a straight path. Be skeptical of anything or anyone telling you that testing is a series of more or less formalized steps leading to a more or less fixed outcome. Consider that the evolution of the understanding of quality is impacted by many factors.

 

 

On the Value of Test Cases

Standard

Something is rotten in the state of Denmark.

William Shakespeare – Hamlet

Over the period of a couple of weeks, I was able to observe the usage of test cases in a software development project. The creation of test cases was started at the moment when the functional specifications were declared to be relatively crystallized. The cases were detailed in specific steps and entered into a test management tool, in this case HP Quality Center. They’d be reviewed, and in due time executed and the results would be reported to the project management.

During these weeks after the finalization of the functional specifications, not a lot of software was actually built, so the testers involved in the project saw the perfect chance to prepare for the coming release by typing their test cases. They believed that they had been given a blissful moment before the storm, in which they would strengthen their approach and do as much of the preparatory work as they could, in order to be ready when the first wave of software would hit. Unfortunately, preparation, to these testers, meant the detailed specification of test cases for software changes that still had to be developed, a system that was partly unknown or unexplored by them, and functional specifications that proved to be less than ready.

There is no need to guess what happened next. When eventually software started coming down the line, the technical implementation of the changes was not quite as expected, the functional specifications had changed, and the project priorities and scope had shifted because of new demands. It was like the testers had shored up defenses to combat an army of foot soldiers carrying spears and they were now, much to their surprise, facing cannons of the Howitzer type. Needless to say, the defenders were scattered and forced to flee.

It is easy to blame our software development methods for these situations. One might argue that this project has characteristics of a typical waterfall project and that the waterfall model of software development invites failure. Such was argued in the 1970s (PDF, opens in new window). But instead of blaming the project we could ask ourselves why we prepare for software development the way we do. My point is that by pouring an huge amount of energy into trying to fixate our experiments in test cases (and rid them of meaning — but that’s another point), we willingly and knowingly move ourselves into a spot where we know we will be hurt the most when something unexpected happens (see Nassim Nicholas Taleb’s Black Swan for reference). Second of all, I think we seriously need to reassess the value of drawing up test cases as a method of preparation for the investigation of software. There are dozens of other ways to prepare for the investigation of software. For one, I think, even doing nothing beats defining elaborate and specific test cases, mainly because the former approach causes less damage. It goes without saying that I do not advocate doing nothing in the preparation for the investigation of software.

As a side note, among these dozens of other ways of preparing for the investigation of software, we can name the investigation of the requirements, the investigation of comparable products, having conversations with stake holders, having conversations with domain experts or users, the investigation of the current software product, the investigation of the history of the product, the reading of manuals etc… An excellent list can be found in Rikard Edgren’s Little Black Book on Test Design (PDF, opens in new window). If you’re a professional software tester, this list is not new to you. What it intends to say is that testers need to study in order to keep up.

Yet the fact remains that the creation of test cases as the best way to prepare for the investigation of software still seems to be what is passed on to testers starting a career in software testing. This is what is propagated in the testing courses offered by the ISTQB or, in the Netherlands, by TMap. This approach should have perished long ago for two reasons. On the one hand, and I’ve seen this happen, it falsely lures the tester in thinking that once we’re done specifying our test cases, we have exhausted and therefore finalized our tests. It strengthens the fallacy that the brain is only engaged during the test case creation ‘phase’ of the project. We’re done testing when the cases are complete and what remains is to run them, obviously the most uninspiring part of testing.

The second thing I’ve seen happening is that test case specification draws the inquiring mind away from what it does best, namely to challenge the assumptions that are in the software and the assumptions that are made by the people involved in creating the (software) system — including ourselves. Test case creation is a particular activity that forces the train of thought down a narrowing track of confirmation of requirements or acceptance criteria, specifically at a time when we should be widening our perspectives. By its focus on the confirmation of what we know about the software, it takes the focus away from what is unknown. Test case creation stands in the way of critical thinking and skepticism. It goes against the grain of experimentation, in which we build mental models of the subject we want to test and iteratively develop our models through interaction with the subject under test.

mcl82If there is one thing that I was forced to look at again during the last couple of weeks — during which I was preparing for the testing of software changes — it was the art of reasoning and asking meaningful questions. Though I feel confident when asking questions, and though I pay a lot of attention to the reasoning that got me to asking exactly that particular set of questions, I also still feel that I need to be constantly aware that there are questions I didn’t ask that could lead down entirely different avenues. It is possible to ask only those questions that strengthen your assumptions, even if your not consciously looking for confirmation. And very much so, it is possible that answers are misleading.

So for the sake of better testing, take your ISTQB syllabus and — by any means other than burning — remove the part on test cases. Replace it with anything by Bacon, Descartes or Dewey.

“Criticism is the examination and test of propositions of any kind which are offered for acceptance, in order to find out whether they correspond to reality or not. The critical faculty is a product of education and training. It is a mental habit and power. It is a prime condition of human welfare that men and women should be trained in it. It is our only guarantee against delusion, deception, superstition, and misapprehension of ourselves and our earthly circumstances. Education is good just so far as it produces well-developed critical faculty. A teacher of any subject who insists on accuracy and a rational control of all processes and methods, and who holds everything open to unlimited verification and revision, is cultivating that method as a habit in the pupils. Men educated in it cannot be stampeded. They are slow to believe. They can hold things as possible or probable in all degrees, without certainty and without pain. They can wait for evidence and weigh evidence. They can resist appeals to their dearest prejudices. Education in the critical faculty is the only education of which it can be truly said that it makes good citizens.”

William Graham Sumner – Folkways: A Study of Mores, Manners, Customs and Morals

Not a Conference on Test Strategy

Standard

A response to this blog post was written by Colin Cherry on his weblog. His article is entitled (In Response to DEWT5) – What Has a Test Strategy Ever Done for Us?


On page one, line two of my notes of the 5th peer conference of the Dutch Exploratory Workshop on Testing — the theme was test strategy — the following is noted:

Test (strategy) is dead!

And scribbled in the sideline:

Among a conference of 24 professionals there seems to be no agreement at all on what test strategy is.

In putting together a talk for DEWT5 I struggled to find examples of me creating and handling a test strategy. In retrospect, perhaps this struggle was not as much caused by a lack of strategizing on my part, as it was caused by my inability to recognize a test strategy as such.

Still I find it utterly fascinating that in the field of study that we call ‘software testing’ — which has been in existince since (roughly) the 1960s — we are at a total loss when we try to define even the most basic terms of our craft. During the conference it turned out that there are many ways to think of a strategy. During the open season after the first talk, by the very brave Marjana Shammi, a discussion between the delegates turned into an attempt to come to a common understanding of the concept of test strategy. Luckily this attempt was nipped in the bud by DEWT5 organizers Ruud Cox and Philip Hoeben.

For the rest of the conference we decided to put aside the nagging question of what me mean when we call something a test strategy, and just take the experience reports at face value. In hindsight, I think this was a heroic decision, and it proved to be right because the conference blossomed with colourful takes on strategy. Particularly Richard Bradshaw‘s persistent refusal to call his way of working — presented during his experience report —  a ‘strategy’, now does not stand out so much as an act of defiance, yet as an act of sensibility.

A definition of test strategy that reflects Richard’s point of view and was mentioned in other experience reports as well,  is that a strategy is “the things (that shape what) I do”.

And yet I couldn’t help myself by overturning the stone yet one more time during lunch on Sunday with Joep Schuurkes and Maaret Pyhäjärvi. Why is it that we are in a field of study that is apparently in such a mess that even seasoned professionals among themselves are unable to find agreement on definitions and terms. I proposed that, for example, the field of surgery will have very specific and exact definitions of, for example, the way to cut through human tissue. Why don’t we have such a common language?

Maaret offered as an answer that there may have been a time in our field of study when the words ‘test strategy’ meant the same thing to a relatively large number of people. At least we have books that testify of a test strategy in a confident and detailed way. The fact that the participants of the fifth conference of the Dutch Exploratory Workshop on Testing in 2015 are unable to describe ‘strategy’ in a common way, perhaps reflects the development of the craft since then.

Tower of Babel, Pieter Bruegel

The Tower of Babel by Pieter Bruegel the Elder (1563)

As a personal thought I would like to add to this that we should not necessarily think of our craft as a thing that progresses (constantly). It goes through upheavals that are powerful enough to destroy it, or to change it utterly. It may turn out that DEWT5 happened in the middle of one of these upheavals; one that forced us to rethink the existence of a common language. The biblical tale of the tower of Babel suggests that without a common language, humans are unable to work together and build greater things. Perhaps the challenge of working together and sharing knowledge without having access to a common language is what context-driven testing is trying to solve by adhering to experience reports. ISTQB and ISO 29119 are trying to fix the very same problem by declaring the language and forcing it upon the testing community. This is a blunt, political move, but, like the reaction from the context-driven community, it is also an attempt to survive.

With regards to my ‘surgery’ analogy, Joep suggested that surgeons deal with physical things and as such, they have the possibility to offer a physical representation of the definition. Software testing deals with the intangible, and as such our definitions are, forever, abstractions. If we want to look for analogies in other domains then perhaps the field of philosophy is closer to software testing. And in philosophy the struggle with definitions is never ending; it runs through the heart of this field. Maybe it is something we just need to accept.

The Cheeseburger Standard

Standard

Last evening I picked up Peopleware by Tom DeMarco and Timothy Lister. It has been, for quite some time now, a book that I open when I want an entertaining view on software development that stipulates some home truths about the game. I opened up chapter 2, read a couple of pages and was struck by how much this text relates to the ISO 29119 debate.

Chapter 2 – entitled ‘Make a cheeseburger, sell a cheeseburger’ – starts off as follows (in all quotations, emphasis mine).

Development is inherently different from production.

I am still, and always will be, greatly in awe of the fact that quite a number of people in the software industry are blissfully unaware of this and go full Taylor on everything that moves.

But managers of development and allied efforts often allow their thinking to be shaped by a management philosophy derived entirely from a production environment. Imagine for the moment that you’re the manager of the local fast food franchise. It makes perfect sense for you to take any or all of the following efficient production measures:

  • Squeeze out error. Make the machine (the human machine) run as smoothly as possible.
  • Take a hard line about people goofing off on the job.
  • Treat workers as interchangeable pieces of the machine.
  • Optimize the steady state. (Don’t even think about how the operation got up to speed, or what it would take to close it down.)
  • Standardize procedure. Do everything by the book.
  • Eliminate experimentation – that’s what the folks at the head-quarters are paid for.

These would be reasonable approaches if you were in the fast food business (or any production enironment), but you’re not. The “make a cheeseburger, sell a cheeseburger”mentality can be fatal in your development area. It can only serve to damp your people’s spirit and focus their attention away from the real problems at hand. This style of management will be directly at odds with the work.

To manage thinking workers effectively, you need to take measures nearly opposite those listed above.

And further on, on the subject of making errors.

Fostering an atmosphere that doesn’t allow for error simply makes people defensive. They don’t try things that may turn out badly. You encourage this defensiveness when you try to systematize the process, when you impose rigid methodologies so that staff members are not allowed to make any of the key strategic decisions lest they make them incorrectly. The average level of technology may be modestly improved by any steps you take to inhibit error. The team sociology, however, can suffer grievously.

Further on, on the subject of the steady state.

Steady-state production thinking is particularly ill-suited to project work. We tend to forget that a project’s entire purpose in life is to put itself out of business. The only steady state in the life of a project is rigor mortis. Unless you’re riding herd on a canceled or about-to-be-canceled project, the entire focus of project management ought to be the dynamics of the development effort. Yet the way we assess people’s value to a new project is often based on their steady-state characteristics: how much code they can write or how much documentation they can produce. We pay far to little attention to how well each of them fits into the effort as a whole.

And lastly, on the subject of doing without thinking.

If you are charged with getting a task done, what proportion of your time ought to be dedicated to actually doing the task? Not one hundred percent. There ought to be some provision for brainstorming, investigation, new methods, figuring out how to avoid doing some of the subtasks, reading, training and just goofing off.

The steady-state cheeseburger mentality barely even pays lip service to the idea of thinking on the job. Its every inclination is to push the effort into one hundred percent do-mode.

The dilemma in software testing that is characterized by ISO 29119 is whether we regard software testing as a factory process or as an act of human investigation. As the quotations from Peopleware showed, this dilemma is far from new. The ISO 29119 people may strongly doubt the experiences written down by Tom DeMarco and Timothy Lister as much as we (context-driven) testers doubt the way of working that is imposed by the Central Committee Working Group 26. I choose to believe that software testing is an act of investigation because from what I’ve experienced so far, the reality of software development looks much like it is described by DeMarco and Lister. If, however, the reality of software development is the exact opposite of what is described by DeMarco and Lister and the factory approach does indeed lead to better software each and every time, then I think the backers of the ISO 29119 standard should come forward, refute the evidence of software development as a human act, and convince us by showing us the reality of software development as they experience it.

References

Tom DeMarco, Timothy Lister (1999). Peopleware. Dorset House Publishing Company.

Communication Between the Hominids

Standard

How do we build the theories that describe what we think testing is? How do we evaluate them?

Five minutes into a presentation I attended at the Dutch TestNet Spring Event, the speaker recklessly confronted the audience with the following phrase.

communication between the disciplines

For me that was a clear call to run for the exit. The title of the talk was Test Improvement is Something You Do in the Workplace and I attended it hoping that I would learn a thing or two from hearing another tester’s perspective on how to improve testing. The phrase ‘communication between the discplines’ however, ignited my fear that this talk was not going to be about humans. When the speaker announced that we would do an excercise and consequently checklists were handed out, I was dead sure.

Later in the evening I reflected on my moment of frustration and on why the word ‘discipline’ startled me. If you quickly substitute ‘the discplines’ with ‘the people on the project’, which is probably what you did already without even noticing it, then there is nothing wrong with that phrase. But we should notice ‘communication between the disciplines’ actually means something different.

According to my Oxford Paperback Dictionary & Thesaurus a discipline is a branch of academic study. A discipline has a field of study, is likely to have a paradigm and will have ways of doing research. Here is a taxonomy of academic disciplines (PDF).

The concept ‘discipline’ is an abstraction, and the use of the word discipline to indicate people doing different tasks on a software project is indicative of a particular point of view. It shows how a theory of software testing choses to identify and classify entities in its realm. In this case it is a theory that is based on the use of ‘discipline’  as a classification mechanism. ‘Discipline’, in this theory, serves a mechanism that abstracts from the realm of software testing exactly those aspects that serve a purpose to the theory. Exclusively, or most preferably, the elements that form the concept of a discpline, are those that lend ultimate support to this theory of software testing.

This means that this particular theory of software testing decides to regard the humans doing particular tasks in a software project not from the perspective of them being human, but from the perspective of them working in a profession that originates from an academic field of study. The theory states that the latter perspective is by far more useful; it accounts for the phenomena that occur when doing testing in an excessively superior way.

I was inclined to dismiss this point of view right away. But I think further investigation is warranted. If this theory speaks of ‘disciplines’ rather than ‘people’ then there should be in the literature relating to this theory an examination of the disciplines that interact with software testing, and for each of these disciplines a clarification of how aspects of the discipline are relevant to the theory and how other perspectives are not. I’m assuming there are case studies or field studies too.

As of yet, however, I have been unable to find solid evidence that the ‘disciplines’ perspective trumps the ‘human’ perspective when it comes to communicating with other people on the project. Since conclusive evidence is lacking,  the speaker in the presentation mentioned above would be required to at least add a disclaimer to his ‘disciplines’ perspective and inform his audience that he is using a highly contestable abstraction. As you can guess, he did not say a word about it and I reacted too slowly to question his reasoning. Frankly, I was too infuriated.

In my current project I have five software developers. In theory their work is the subject of investigation of the following academic field of study.

Physical Sciences and Mathematics: Computer Sciences: Software Engineering

When this team creates software there are discussions on almost every aspect of software engineering. There are different points of view on what should be in the definition of done, how we should write our unit tests, how far refactoring should go, what should be documented where, what should be in code comments, what should be in scope for the acceptance tests, what tooling we should use, how we set up test automation, what should be the level of detail of our use cases, how we set up test environments and what purpose they should serve, how we set up data and how we should deal with incidents and interruptions. Behind each of these considerations there is a whealth of rationales, most of them probably not based on mathematical calculations, but on human emotions.

According to the ‘disciplines’ perspective I should be communicating with each of developers alike, as members of an academic field of study. In practice this will probably get me blank stares across the board. The thing that will help me in my communication with my fellow units is to know that they have very valid human reasons or sentiments to act in a certain kind of way. To make progress (to improve) is to appeal to these sentiments.

From this experience and a couple of others, I would say a typical software development workplace contains mostly hominidae of the genus homo. If we are looking to improve our testing, perhaps we should therefore start ‘communicating between the humans’ and concentrate our precious resources and intellect on the study of aspects of human behavior in software development, as did Gerald Weinberg, Tom DeMarco and Timothy Lister, and Alistair Cockburn.

On the Testing of Normative Theories

Standard

While I was writing a piece on newly created Dutch testing approach, I took a closer look at a couple of models in testing. In particular I tried to assess the Tester Freedom Scale by Jonathan Bach and the Heuristic Test Strategy Model by James Bach. To me, both these models are descriptive theories, which means that they try to capture and explain some of the phenomena in software testing.

In the case of the newly created Dutch testing approach, Jonathan Bach’s descriptive theory was modified into a normative theory. A normative theory is a theory that states: “If the situation is such and such, THOU SHALT do this and that (in order to achieve a satisfactory result).” I do not know if it is possible to turn a descriptive theory into a normative theory, but I think it is a dramatic switch, which, in case it happens, should probably be accompanied by a thorough scientific investigation. Such an investigation will probably take into account the data on which the original theory was based, and it will also explain why the proposed directives are the best solution in all the situations that are captured in the descriptive theory.

The fact that a normative theory – perhaps even more than a descriptive theory – should be tested thoroughly in order to produce valid instructions, is probably the reason why most of the normative theories we know in testing are watered down to ‘best practices’ in real life. If we take, for example, the theory of TMap – the most prominent normative theory of testing in the Netherlands – we see that in practice, its instructions are weakened to a toolbox of practices  that are applied at the best judgement of the tester. We have come to learn that the instructions that are written in the book had better be interpreted, adapted and reshaped in order for them to make some sense.

While this ‘weakening down’ of the normative theory is hailed as an innovation by the proponents of the methodology, it signifies, in fact, that the proposed theory of testing has been invalidated. It means that “Things fall apart; / the centre cannot hold“. At the core of the theory there is a definition of testing that will probably not stand thorough scientific investigation, if ever we should come to test the theories that form our craft. An example of such a definition can be found in the normative theory created by the International Software Testing Qualifications Board, as displayed below.

The process consisting of all lifecycle activities, both static and dynamic, concerned with planning, preparation and evaluation of software products and related work products to determine that they satisfy specified requirements, to demonstrate that they are fit for purpose and to detect defects.

Two things are striking enough to mention. The first one is that we do not consciously and willingly test our theories, nor do we invite rigorous testing. We are unable to provide evidence for, or challenge, the fundamental0 theoretical assumptions by which there is proposed one right way to approach software testing.

The second observation that intrigues me is that software testing produces normative theories at all. Yet here we are wrestling with overarching methodogies, instead of focusing on the practice and ways to describe what actually happens. If we feel the need for theories of testing, we should be focusing on descriptive theories. In an article on his weblog, Markus Gaertner expresses exactly this, in the following words.

Anyway, I think we should stop teaching testers practices that they might or might not need. Instead we should focus more on teaching testers how to evaluate their situation, and make useful improvements to their work where it’s appropriate.

I wholeheartedly welcome Gaertner’s sentiments. They parallel Philip Johnson-Laird’s handling of theories on mental modelling, to which he devotes a number of chapters of his book Mental Models. Johnson-Laird tests the theories one by one on a couple of criteria, the most important of which is that the theories have to account for the phenomena that are observed. Which brings me back to the Bach brothers and their descriptive models. I hope they will be tested rigorously.