The moment when it clicks

Standard

There are moments when something suddenly clicks. Something that appeared to be veiled and impossible to understand suddenly becomes intelligible and clear. It is a very astonishing moment that occasionally happens when we study something. My understanding of my own learning of a particular subject—whether it be a tool or a domain—is that it happens gradually. I add pieces to the puzzle and over time a more complete picture evolves. It can be a tedious affair. Sudden insight, as if passing through a door that unexpectedly opens, does not happen to me a lot.

Yet last week I had such a moment. I had been toying around with Kibana during the last couple of weeks, but without a lot of success. We use Kibana to sift through the logging that is generated in the production environment. We try to gather relevant statistics, signals through the aggregation of the data that is logged. I personally think the tester should familiarize himself with the usage of logs to analyze what is going on in production. The data gathered can inform testing, can tell him about the actual usage of the product and can reveal risk and help him direct his testing.

So, since our team uses Kibana (Kibana 3, to be precise), I felt like I had no excuse to dodge that bullet. I probably could have gotten away with avoiding looking at the logging. In my team there are at least two engineers who regularly look at the dashboards and I could have left it up to them to monitor the production environment and perhaps do some requests for me. But I personally wanted to get more out of monitoring and so I had to try to tackle the Elastic Stack.

For weeks I struggled with the Kibana dashboard. The queries and filtering seemed counter-intuitive and the results almost random. The creation of rows and panels (the layout of the dashboard) baffled me. It was my first encounter with Log4j and Tomcat logging and my inexperience with many of the parts of the Elastic Stack caused frustration. I would spend a couple of hours creating some queries but never ended up with the right result. The Elastic query DSL just failed to make a logical connection in my head. I looked up tutorials and some instructions videos on Youtube, but I did not advance. It was like knocking at the same door all the time to find it shut tight.

And last week the door suddenly opened. In the matter of an hour I went from hitting keys in frustration to freely and joyfully playing around with the tool. I do not think there is a single thing that unlocked the door, but in retrospect there are some things that helped. I’d like to offer a quick examination of those things.

First off, last week, I set myself a small, well-defined Kibana task, caused by the following. My team uses a Grafana dashboard to keep track of the errors that are generated in the production environment. The dashboard is shown on a wide screen television that is on all the time. Errors appear on our dashboard but it seems that we pay only marginal attention to them. The lack of interest that I noticed is a common one. It is the same lack of interest that can be observed when putting the results of flaky automated tests on a dashboard. Over time, the lack of trust in the results of these tests causes a kind of boredom, the shutting out of the false alarm. Since the Grafana dashboard does not facilitate the splitting up of the errors by root cause but Kibana does, my only task was to split up the errors by root cause and therewith increase our insight in the errors. This task was within my reach. The fact that there were some examples, created by other teams, readily available also helped.

Second, I finally took the time to notice the things that were going on in the Kibana dashboard. I should have paid attention to them long ago, but I think my frustration got in the way. For example; it is pretty easy to create a query in Kibana that will run indefinitely. Setting the scope of the query to a large number of days can do that for you. It will leave you guessing endlessly about the flakiness of your query unless you notice the tiny, tiny progress indicator running in the right upper corner of the panel.

Also, different panels of the dashboard will react differently to the results of the query. The table panel, which shows a paginated table of records matching your query, can show results pretty quickly, but a graph potentially takes a lot of time to build up. This seems downright obvious and yet understanding this dynamic takes away a lot of the frustration of working with a Kibana dashboard. It is a delicate tool and you have to think through each query in terms of performance.

Thirdly, I think determination also contributed to the click moment. I desperately wanted to win the battle against Kibana and I wanted to take away some of fuziness of the dashboard. Last week I noticed a difference between the number of errors as shown in the Grafana dashboard and the number of errors (for the same time period) as gathered from Kibana. So there was a bug in our dashboard. Then I knew for certain that Kibana can serve as a testing tool. Once I was fully aware of its potential, I knew there was only one way forward.

 

 

 

Never in a straight line

Standard

The theme of the seventh annual peer conference of the Dutch Exploratory Workshop on Testing (DEWT7) is lessons learned in software testing. In the light of that theme I want to share a lesson recently learned.

Broadly stated, the lesson learned is that nearly any effort in software testing develops in a non-linear way. This may seem like a wide open door, but I find that it contrasts with the way software testing is portrayed in many presentations, books and articles. It is likely that due to the limitations of the medium, decisions must be made to focus on some key areas and leave out seemingly trivial details. When describing or explaining testing to other people, we may be inclined to create coherent narratives in which a theme is gradually developed, following logical steps.

Over the last couple of months I came to realize something that I’ve been experiencing for a longer time; the reality of testing is not a coherent narrative. Rather; it is a series of insights based on a mixture of (intellectual) effort and will, craftsmanship, conflicts and emotions, personality and personal interests and, last but not certainly least, circumstance, among which chance and serendipity. The study aimed at the core of testing is the study of the decision making process that the software tester goes through.

My particular experience is one of balancing many aspects of the software development process in order to move towards a better view of the quality of the software. I spent six full weeks refactoring an old (semi) automated regression test suite in order to be able to produce test data in a more consistent manner. As expected, there was not enough time to complete this refactoring. Other priorities were pressing, so I got involved in the team’s effort to build a web service and assist in setting up unit testing. My personal interest in setting up unit testing evolved out of my conviction that the distribution of automated tests as shown in Cohn’s Test Automation Pyramid is basically a sound one. The drive to make more of unit testing was further fueled by a presentation by J.B. Rainsberger (Integrated Tests Are A Scam). I used unit testing to stimulate the team’s thinking about coverage. I was willing to follow through on setting up a crisp and sound automation strategy, but having set some wheels in motion I had to catch up with the business domain. With four developers in the team mainly focusing on code, I felt (was made to feel) that my added value to the team was in learning as much as needed about why we were building the software product. To look outward instead of inward. And this is were I am at the moment, employing heuristics such as FEW HICCUPS and CRUSSPIC STMPL (PDF) to investigate the context. It turns out that my investment in the old automated regression test suite to churn out production-like data is now starting to prove its worth. Luck or foresight?

All this time a test strategy (a single one?) is under development. Actually, there have been long (and I mean long) discussions about the test approach within the team. I could have ‘mandated’ a testing strategy from my position as being the person in the team who has the most experience in testing. Instead I decided to provide a little guidance here and there but to keep away from a formal plan. Currently the test strategy is evolving ‘by example’, which I believe is the most efficient way and also the way that keeps everyone involved and invested.

The evolution of the understanding of the quality of the software product is not a straight path. Be skeptical of anything or anyone telling you that testing is a series of more or less formalized steps leading to a more or less fixed outcome. Consider that the evolution of the understanding of quality is impacted by many factors.

 

 

Solving a Ten Thousand Piece Puzzle

Standard

On the third of March a meeting was organized by Improve Quality Services (my employer) and the Federation of Agile Testers in Utrecht, the Netherlands. The evening featured James Bach as speaker and his talk focused on the paper A Context-Driven Approach to Automation in Testing, which was written by him and Michael Bolton. My favorite part of the evening was the part during which James tested some functionality of an application and explains his way of working. He provided such a demonstration a year ago when introducing the test autopsy.

The exercise

inkscape_logoThis time around the subject under test was the distribution function of the open source drawing tool Inkscape and the focus was on the usage of tools to test this functionality. It must be said that Inkscape lends itself to the usage of tools because it stores all the images that are generated using this tool in the Scalable Vector Graphics (SVG) format, which is an open standard developed by the World Wide Web Consortium (W3C). This greatly increases the intrinsic testability (links opens PDF) of the product, as we will see.

The SVG format is described in XML and as such, the image is a text file that can be analyzed using different tools. It is also possible to create text files in the SVG format that can then be opened and rendered in Inkscape. As such, the possibilities for creating images by generating the XML script using code are virtually limitless. Before the start of the presentation James had created a drawing containing 10,000 squares. He created this drawing using some script (I am not sure he mentioned in which language this script was written). My initial reaction to James showing the drawing that he generated was one of astonishment. I was impressed by his idea of testing this functionality with 10,000 squares, by the drawing itself and the fact that it was generated using a script.

Impressed by complexity

Looking back, my amazement may have been caused my lack of experience with Inkscape and the SVG format. But it also reminded me that it is easy to be impressed by something new and especially if this new thing seems to be complex. I believe that, in testing, if you really want to impress people — for all the wrong reasons — all you need to do is to present to them a certain subject as being complex. People will revere you because you are the only one who seems to understand the subject. The exercise, as James walked us through it, seemed complex to me and this is what triggered me to investigate it.

So why use ten thousand squares?

I am sure it was not James’ intention to impress us, so then the question is: why would he use ten thousand squares? Actually this question occurred to me halfway through doing the exercise myself, when tinkering with the distribution function. Distributing, for example, 3 squares is easy; it does not require a file generated with a script. Furthermore, it is easy to draw conclusions from the distribution of 3 squares. Equal distribution can be ascertained visually (by looking at the picture), with a reasonable degree of certainty. So if equal distribution functions correctly with 3 squares, why would there be a problem with 10,000 squares? I am assuming that the distribution algorithm does not function differently based on the amount of input. I mean, why would it? So, taking this assumption into account, using 10,000 squares during testing does only the following things:

  1. It complicates testing, because it is no longer possible to ascertain equal distribution visually.
  2. Because of this, it forces the tester to use tools to generate the picture and to analyze the results.
  3. It complicates testing, because the loading of the large SVG file and the distribution function take a significant amount of time.
  4. It tests the performance of the distribution function in Inkscape.

Now the testing of the performance is not something I want to do as a part of this test. But it seems that working with 10,000 squares adds something meaningful to the exercise. A distributed image generated from 10,000 squares does not allow for a quick visual check and therefore simulates a degree of complexity that requires ingenuity and the use of tools if we want to check the functioning of distribution. Working with large data sets and having to distill meaning from a large set is, I believe, a problem that testers often face. So, as an exercise, it is interesting to see how this can be handled.

A deep dive into the matter

Some of the tools I use

  • Inkscape (for viewing and manipulating images)
  • Python (for writing scripts)
  • Kate (for editing scripts and viewing text files)
  • KSnapshot (for creating screen shots)
  • Google (for looking up examples & info)
  • R (for statistical analysis)

In an attempt to reproduce James’ exercise, I create a script to generate this drawing myself. In order to do so, I need to find out a little bit about the SVG standard. Then I create an Inkscape drawing containing one square, in order to find out the XML format of the square. Now I have an SVG file that I can manipulate so I have enough to start scripting. I install Python on my old Lubuntu netbook which is easy to do. I never did much programming in Python before. I could have written the script in PHP or Java, which are the two programming languages about which I know a fair amount, but it seems to me that Python is fairly light-weight and suitable for the job. It can be run from the command line without compilation, which contributes to its ease of use.

So I write a Python script that creates an SVG file with 10,000 squares in it. Part of the script is displayed below. I look up most of the Python code by Googling it and copy-pasting from examples, so the code is not written well, but it works. I can run the script from the command line and it generates the file in the blink of an eye. The file size of the file is just about 2.4 Mb, which is fairly large for a text file and when I open it using Inkscape, the program becomes unresponsive for a couple of seconds. Apparently the program has some difficulty generating the drawing, which is understandable, given that the file is large and the netbook on which I run the application is limited in both processing power and internal memory (2 Gb). Yet, the file opens without errors and shows a nice grid of 10,000 squares.

Python script for creating the squares

with open('many_squares.svg', 'w') as f:
    f.write(begin)
    
    x=0
    y=0
    offset = 12
    number_of_squares = 10000

    while number_of_squares > 0:
        square = '''<rect
        style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        id="rect3336"
        width="10"
        height="10"
        x="%d"
        y="%d" />''' % (x,y)
        if x + offset > 800 :
            y = y + offset
            x=0
        else:
            x = x + offset
        f.write(square)
        number_of_squares = number_of_squares -1
    f.write(end)

Which results in the following picture.

ten thousand squares grid

The regular grid of 10,000 squares created with the Python script

ten thousand squares grid close up

A close up of the grid created with the Python script

 

inkscape - align and distribute

The Inkscape distribution functions

I now have a grid of 10,000 squares with which I am trying to reproduce James’ exercise. The thing that I run into is that Inkscape has a number of distribution options. I am not sure which distribution James applied, so I try a couple. None of them however show as a final result the image that James showed during his presentation – as far as I can remember it was an oval shape with a higher density of objects near the edges. Initially it seems strange that I am unable to reproduce this, but through tinkering with the distribution function, I conclude that the fact that I am unable to reproduce James’ distributed image probably depends on the input. The grid that I create with the script contains identical squares of 10 by 10 pixels, evenly spaced (12 pixels apart) along the x and the y axes. It may differ in many aspects (for example, size, spacing and placement of the objects) from the input that James created.

Developing an expected result

I apply the Inkscape distribution functionality (distribute centers horizontally and vertically) to my drawing containing the 10,000 squares and the result is as shown below. The resulting picture looks somewhat chaotic to me. I cannot identify a pattern and even if I could identify a pattern, I would not be sure if this pattern is what I should be seeing. There seem to be some lines running through the picture, which seems odd. But in order to check the distribution properly, I need to develop an expected result, using oracles, by which I can check if the distribution is correct.

ten thousand squares distributed

The entire distributed drawing

ten thousand squares distributed close up

A close up of the distributed drawing (it kind of looks like art)

 

I do several things to arrive at a description of what distribution means. First I consult the Inkscape manual with regards to the distributions functions that I used. The description is as follows.

Distribute centers equidistanly horizontally or vertically

Apart from the spelling mistake in the manual, the word that I want to investigate is ‘equidistant’. It means — according to Merriam-Webster —

of equal distance : located at the same distance

Distance is complex concept. The Wikipedia page on distance is a nice starting point for the subject. I simplify the concept of distance to suit my exercise by assuming a couple of things. My definition is as follows: Distance is the space between two points, expressed as the physical length of the shortest possible path through space between these points that could be taken if there were no obstacles. In short, the distance is the length of the path in a straight line between two points.

There are other things I need to consider. The space in which the drawing is made is two dimensional. This might seem obvious, but it is important to realize that every single point in the picture can be identified with a two dimensional Cartesian coordinate system. In short, every point has a x and a y coordinate (which we already saw when generating the SVG file) and this realization greatly helps me when I try to analyze the picture.  Another question I need to answer is which two points I use. This is tricky, because in my exercise, I used the center of the square as reference point for distribution. Since I am dealing with squares, width and height are equal and since all the squares in my drawing have the same width and height, I simplified my problem in such a way that I can use the x an y coordinates of the top left corner (which can be found in the SVG file) of each square for further analysis. There is no need for me to calculate the center of each object and do my analysis on those coordinates.

And lastly I need to clarify what distribution means. It turns out that there are at least two ways to distribute things. I came across an excellent example in a Stack Exchange question. In this question the distinction is made between spreading out evenly and spacing evenly. To spread out evenly means that the centers of all objects are distributed evenly across the space. To space evenly means that the distance between the objects is the same for all objects. The picture below clarifies this.

Types of distribution

Types of distribution (source: Stack Exchange)

In my special case — I am working exclusively with squares that are all the same size — to spread out evenly means to space evenly. So the distinction, while relevant when talking about distribution, matters less to me. Aside from the investigation described above, I spoke with several co workers about this exercise and they gave me some useful feedback on how I should regard distribution.

To make a long story short, my expected result is as follows.

Given that all the objects in the drawing are squares of equal size, if the centers of all the squares are distributed equally along the x axis then I can analyze the x coordinates of the top left of all squares. If the x coordinates are sorted in ascending order, I should find that difference between one x coordinate and the x coordinate immediately following it, is the same for all x coordinates. The same should go for the y coordinates (vertical distribution).

This is what I’m looking for in the drawing with the distributed squares.

Some experiments in R

In order to do some analysis, I need the x and y coordinates of the top left corner of all the squares in the drawing. It turns out to be fairly easy to distill these values from the SVG file using Python. Again, I create a Python script by learning from examples found on the internet. The script, as displayed below, subtracts from the SVG file the x and y coordinates of the top left corner of each square and then writes these coordinates to an comma separated file (csv). The csv file is very suitable as input for R.

Python script for generating the csv file containing the coordinates

svg = open("many_squares_distr.svg", "r")

coordinates = []
for line in svg:
    print line
    if line.find(' x=') <> -1 or line.find(' y=') <> -1: # line containing x or y coordinate found
        positions = []
        for pos, char in enumerate(line):
            if char == '"':
                positions.append(pos)
        print line[positions[0]+1:positions[1]]
        if line.find('x=') <> -1 :
            x = line[positions[0]+1:positions[1]]
        if line.find('y=') <> -1 :
            y = line[positions[0]+1:positions[1]]
            new_coordinate = [x,y]
            coordinates.append(new_coordinate)

with open('coordinates.csv', 'w') as f:
    point = 1
    line = line = 'X,Y\n'
    f.write(line)
    for row in coordinates:
        line = '%s,%s\n' % (row[0],row[1])
        f.write(line)
        point = point + 1

Now we come to the part that is, for me, the toughest part of the exercise on which I consequently spent the most time. The language that I use for the analysis of the data is R. This is the language that James also used in his exercise. The language is entirely new to me. Furthermore, R is a language for statistical computation and I am not a hero at statistics. What I know about statistics dates back to a university course that I took some twenty years ago. So you’ll have to bear with the simplicity of my attempts.

It is not difficult to load the csv file into R. It can be done using this command.

coor <- read.csv(file="coordinates.csv",head=TRUE,sep=",")

A graph of this dataset can be created (plotted).

plot(co)

Resulting in the picture below.

plotted x and y
After that, I am creating a new dataset that only contains the x values, using the command below.

xco = coor[,1]

And then I sort the x values ascending.

xcor <- sort(xco)

Then I use the following command

plot(xcor)

to create a graph of the result as displayed below.

plotted sorted x coordinates

The final result is a relatively perfectly straight line because (as I expected) for each data point the (x) value is increased by the same amount, resulting in a lineair function. As a result this satisfies my needs, so this is where I stop testing. I could have created, using a Python script, a dataset containing all the differences between the consecutive x coordinates and I could have checked the distribution of these differences with R. I leave this for another time.

Afterthoughts

One of the questions you might ask is if I really tested the distribution functionality. My answer would be a downright ‘No’. I used the distribution functionality in an exercise, but the goal of the exercise was not to test the functionality. The goal was to see what tools can do in a complex situation. If I had really investigated the distribution functionality, I would have created a coverage outline and I would certainly have tried different kinds of input. Also I would have had to take a more in depth look at the concept of distribution.

One of the results of this exercise is that I know a little bit more about scripting, about the language R and about vector images. Also, I learned that the skills related to software testing are manifold and that it is not easy to describe them. I particularly liked describing how I arrived at my definition of the expected result, which meant investigating different sources and drawing conclusions from that investigation. I feel that the software tester should be able to do such an investigation and to build the evidence of testing on it. I also learned again that complexity is a many headed monster that often roams freely in software development. Testers need to master the tools that can tame that beast.

Some exploration took place in the form of ‘tinkering’ with the distribution function of Inkscape. This helped me build a mental model of distribution. Furthermore I toyed with R on a ‘trail and error’ basis, in order to find out how it works.

Why we do experience reports

Standard

Not so long ago a workshop was held at Improve Quality Services. The theme of the workshop was ‘Test strategy’ and the participants were asked to present a testing strategy that they recently used in a work situation. Mind you that participants were asked to present, as in to show and explain, their test strategy; not to do an experience report on it. Several strategies were presented and the differences were notable. Very broadly the following reports were discussed.

  • A description of the organization (people and processes) around the software system.
  • A description of the way testing was embedded in the development lifecycle.
  • A description of the testing principles that were shared by the testers.
  • A description of the test approach, aimed at communicating with integrating parties.
  • A description of the test approach, aimed at communicating with management.
  • A description of the approach and the actual testing of a screen.
  • A description of the approach and the actual testing of a database trigger.

I am not in a position to say whether any of these testing strategies were right or wrong and I am certain my judgement is irrelevant. I furthermore doubt that any of these strategies can be judged without further investigation, and I am sure that each of these contains elements that are great and elements that can be improved upon. It is not my aim to comment on this. However, as the evening went on, I felt a growing frustration about a number of things.

Considering the open mindedness and critical thinking abilities of the people that were in the room (all of them had, for example, taken the Rapid Software Testing course) there was a remarkably low number of comments on the strategies as presented. Aside from the occasional remark, the presentations were largely taken at face value. Now the fact that were not a lot of remarks can still imply many things about the evening itself, the organization of the meeting, the mental conditions of the people present and so on. Still, I like to think that the setup was conducive to feedback and learning and so I’d like to focus on the presentations themselves to see why they didn’t invite comment, or at the very least, why I did not feel inclined to comment.

My first issue during the evening was that most of the presentations did not discuss what happened when the rubber actually hit the road. If the proof of the pudding is in the eating, we hardly discussed how the strategy went down. Whether it cracked under the first strain or whether it was able to stay the course. If there is a way to evaluate a testing strategy (to put it to the test) it is to carefully note what happens to it when it is applied. Evaluation was the part that was largely missing from our presentations.

You see this kind of thing quite a lot at (testing) conferences. The speaker presents an approach, a framework, a general theory or general solution without getting into how this solution came into being or how it developed when it was actually applied in practice. The mere presentation of a theory does not lend itself to criticism. My reaction to this form of presentation is to shrug and move on with my business. I am unable to criticize it from my specific context, because usually the presented approach is not very specific about context so I cannot check whether my context applies. The only other form of criticism available to me is then to reason about the approach in an abstract way, either by checking the internal logic of the theory, or by comparing the theory to other theories in the same domain and to see if it is consistent. This is not ideal and certainly to do this in the timespan of the 40 minutes of a conference talk, without having access to reference material, is a tall order.

I had this feeling when a couple of months ago, at my work, an ATDD (Acceptance Test Driven Development) framework was presented as the new Agile way of working. The thing I can remember from that presentation is that there was a single image with bits and pieces and connections explaining the framework. The rest is a blur. I never heard anything of it since.

So the question is: what do we need to do to open up our theories to evaluation and investigation?

My second issue with the presented strategies was initially about distance. Quite a number of the strategies that were presented seemed to be distanced from the subject under test (SUT). By the subject under test I mean the actual software that is to be tested. And by distance, I mean that there were strategies that did not primarily discuss the subject under test. I was absolutely puzzled to see that some of the presented strategies did not discuss the execution of that strategy. As I stated above, the proof of the strategy should be in its execution. Discussing strategy without execution just didn’t make sense to me. But looking back at this experience I think I wrestled with the purpose of offering up the strategies to (more or less) public scrutiny. At least one or two presentations discussed not so much the test strategy itself but the communication of the strategy with management, integrating parties or the test department. This focuses on an entirely different aspect of testing, being the communication about the test strategy in order to reach a common ground or to align people along a common goal. The purpose of the presentation is not to scrutinize the test strategy, but to invite an examination of the way it was communicated. This purpose should be clear from the start. Otherwise the ensuing discussion is partly consumed by determining that purpose (which may be a waste of time) or, if there is no quest for the purpose, the discussion follows a winding path that has a good chance of not leading the audience, nor the presenter, anywhere at all.

The third thing that bothered me was that the displayed strategies rarely, if ever, discussed people. They discussed roles, but not people. That is the reason why, in my very short presentation, I decided (on the spur of the moment) to hardly pay any attention to the actual strategy that I selected and focused on the characteristics of the individuals in my team. There are two aphorisms that I had in mind while doing this: “Not matter what the problem is, it is always a people problem” and “Culture eats strategy for breakfast”. It appears to me that no matter how excellent your plan is, the result of its execution largely depends on the people that are aligned to play a part in this strategy. Wherever the strategy leaves room for personal interpretation, there will be interpretation. And basically, no strategy will ever be executed in the same way twice, not even with the same group of people, because the decisions people make will differ from time to time, influenced by many factors. If this is true, and I think it is, then I wonder why the human factor is not present in a more explicit and defined way in our testing strategy and in our reports in general. We seem prejudiced (primed?) to talk about processes and artifacts, and to fear the description of flesh and bone. This is a general remark on the evaluation of the context. If a report displays people as ‘puppet A’ and ‘puppet B’ then this is a sure sign of trouble. I know this from experience because our famed Dutch testing approach TMap Next exclusively discusses cardboard figures as a replacement for humans.

In conclusion; for an experience report to be open to evaluation and investigation and for a meaningful discussion to ensue, it should contain at least these three things.

  • The purpose (research question) of the report should be clear,
  • the context of the report should be described (including the hominids!) and
  • the results of applying the approach should be presented.

Hopefully I have been able to clarify these demands by sharing my feelings above. The discussion loops back to the usage of experience reports within the peer conferences as organized by the Dutch Exploratory Workshop on Testing. The way we look at an experience report is evolving and the road towards a better understanding of what we do (as a workshop) and how we do it, has been a very meaningful one.

Making Sense of the Legacy Database, Introduction

Standard

For the past 3 years I spent a lot of time digging through relational databases. Most of these databases that I looked at could be considered legacy databases. In one case it was an Oracle database, in another a Sybase database. They are legacy systems in the sense that the technology with which they have been built has been around for quite a while. Heck, relational databases were conceived in the 1970’s (link opens PDF), so from the perspective of computer history the concept of the relational database is pretty old.

The databases I worked with can be considered ‘legacy’ in another way. At least one of them has been in existence, and thus has been under maintenance, for 15 years. So, in terms of a software life cycle this database is relatively old. It is a monolith that has been crafted and honed by generations of software developers, so to speak.

And there is a third way in which these databases could be considered ‘legacy’. It appears that for at least a couple of these systems, someone was able to make the documentation about the inner workings of the database disappear almost entirely. And not only that; over time the people who knew the database intimately moved away from the company. So what remains is a database that is largely undocumented, with very few people who can tell you the intricacies of its existence. Clearly this is an exceptional situation… or maybe not.

Now, both knowledge and joy can be gained from studying the database. It is reasonable to assume that the database is a reflection (a model, if you will) of the world as the company sees it. In it is stored knowledge about elements of the world outside and the relationships between these elements. As a way of gaining deep knowledge about how its builders classify and structure the world, studying the database is fine starting point.

Indeed, one of the goals I have when I study the database, is to get to know how knowledge is classified. I study in order to gain knowledge about the functional aspects and the meaning of the system. Some may regard the database as a technical thing. Those sentiments may be strengthened by the fact that knowing how to write SQL queries is usually considered to be a ‘technical’ skill, one that ‘functional’ testers stay away from. That is a very damaging misconception, withholding a fine research instrument from the hands of the budding tester. SQL is not anything anyone should ever be afraid of.

But gaining knowledge about functional aspects is just one object of the study. Another one is that mastering a database allows you to verify what information is stored and whether it is stored correctly or not. This, clearly, allows you to check (test) the functioning of the system. Mastering the database also allows you to manipulate data for the purpose of testing and to create new data sets. And finally, it allows you conjure up answers to questions such as “Show me the top 100 books that were sold last year to customers from the Netherlands above the age of 45” much more quickly than through any GUI. So it can be used for analytical purposes.

The question I would like to answer is how do we analyze the database. What methods do we employ to get to the core of knowledge that may be spread out over hundreds of tables. There are a few methods that make use of the typical features of a relational database. Other methods are more general in nature. In other blog posts I would like to dig deeper into these methods of investigation. But for now I would like to leave you with a quick overview and a description of the first method.

Organization

When you are going to analyze a database, you will be writing SQL queries. Many of them, probably. If you are going to place all of your queries in a single file, that will become chaos rather quickly.  Bad organization will be a really heavy dead weight during the whole of your investigation. I like to organize my queries in folders, in which case these folders represent broad classifications. For examples, I create a folder containing all queries relating to orders, another one relating to customers and another one relating to financial data. Usually, this division really helps you to quickly zoom in on a specific area.

Within a folder there may be many different files containing queries. Again I try to group queries by ‘functional’ area. So I may create a file containing queries relating to ‘cancelled orders’ or ‘orders that are waiting to be processed’, or ‘orders from returning customers’. I first tried to group queries in files based on the table that was queried (such as a single file containing all queries on the ‘order’ table) but for some reason a ‘functional’ classification is easier to comprehend.

One of integrated development tools I worked with – Oracle SQL Developer – has a plugin for Subversion, so it allows you to keep and maintain a repository and a version history of the files containing your queries and to share it easily with other team members. I learned the hard way that keeping your laboriously built up set of queries solely on a single internal hard disk is not such a good idea.

Like in other programming languages, in SQL it is possible to add comments to your queries. I found out that comments are an essential part of your investigation. In SQL queries comments are usually written with two hyphens (–) at the beginning. Queries can quickly become long and complex. If your files that contain the queries contain only SQL statements and nothing else, this will seriously hamper your investigation. You will need to read each query again and again in order to find out what it meant. Reading a SQL query that joins many tables can be really tough, if you want to find exactly what information you are pulling from the database. Also, queries may look very similar but may differ on a slight detail that you will only find when you read and conceptualize the whole thing. And last but not least, since writing queries takes time and mental effort, your do not want to write duplicate queries. The only way to prevent you from doing things many times over is to have an easy way to detect if you already ‘have a query for that’.

So comments are essential as a means of identification or indexing. I like to go over my files with queries from time to time, ‘grooming’ and improving the comments and the queries themselves.

So far for the organization of queries. Below are the other topics that I like to cover as methods for the investigation of the legacy database. I will get back to these in posts to come.

  • Structuring the query
  • Testing your queries
  • What to do with the data model (if it exists)
  • Searching for distinct values
  • Paying attention to numbers
  • Scanning data & data patterns
  • Using dates
  • Complex joins
  • Emptiness (null values)
  • Querying the data model
  • Comparing the query results with the application
  • Looking for names
  • Use of junction tables
  • Use of lookup tables
  • Database tools and integrated development environments

On the Value of Test Cases

Standard

Something is rotten in the state of Denmark.

William Shakespeare – Hamlet

Over the period of a couple of weeks, I was able to observe the usage of test cases in a software development project. The creation of test cases was started at the moment when the functional specifications were declared to be relatively crystallized. The cases were detailed in specific steps and entered into a test management tool, in this case HP Quality Center. They’d be reviewed, and in due time executed and the results would be reported to the project management.

During these weeks after the finalization of the functional specifications, not a lot of software was actually built, so the testers involved in the project saw the perfect chance to prepare for the coming release by typing their test cases. They believed that they had been given a blissful moment before the storm, in which they would strengthen their approach and do as much of the preparatory work as they could, in order to be ready when the first wave of software would hit. Unfortunately, preparation, to these testers, meant the detailed specification of test cases for software changes that still had to be developed, a system that was partly unknown or unexplored by them, and functional specifications that proved to be less than ready.

There is no need to guess what happened next. When eventually software started coming down the line, the technical implementation of the changes was not quite as expected, the functional specifications had changed, and the project priorities and scope had shifted because of new demands. It was like the testers had shored up defenses to combat an army of foot soldiers carrying spears and they were now, much to their surprise, facing cannons of the Howitzer type. Needless to say, the defenders were scattered and forced to flee.

It is easy to blame our software development methods for these situations. One might argue that this project has characteristics of a typical waterfall project and that the waterfall model of software development invites failure. Such was argued in the 1970s (PDF, opens in new window). But instead of blaming the project we could ask ourselves why we prepare for software development the way we do. My point is that by pouring an huge amount of energy into trying to fixate our experiments in test cases (and rid them of meaning — but that’s another point), we willingly and knowingly move ourselves into a spot where we know we will be hurt the most when something unexpected happens (see Nassim Nicholas Taleb’s Black Swan for reference). Second of all, I think we seriously need to reassess the value of drawing up test cases as a method of preparation for the investigation of software. There are dozens of other ways to prepare for the investigation of software. For one, I think, even doing nothing beats defining elaborate and specific test cases, mainly because the former approach causes less damage. It goes without saying that I do not advocate doing nothing in the preparation for the investigation of software.

As a side note, among these dozens of other ways of preparing for the investigation of software, we can name the investigation of the requirements, the investigation of comparable products, having conversations with stake holders, having conversations with domain experts or users, the investigation of the current software product, the investigation of the history of the product, the reading of manuals etc… An excellent list can be found in Rikard Edgren’s Little Black Book on Test Design (PDF, opens in new window). If you’re a professional software tester, this list is not new to you. What it intends to say is that testers need to study in order to keep up.

Yet the fact remains that the creation of test cases as the best way to prepare for the investigation of software still seems to be what is passed on to testers starting a career in software testing. This is what is propagated in the testing courses offered by the ISTQB or, in the Netherlands, by TMap. This approach should have perished long ago for two reasons. On the one hand, and I’ve seen this happen, it falsely lures the tester in thinking that once we’re done specifying our test cases, we have exhausted and therefore finalized our tests. It strengthens the fallacy that the brain is only engaged during the test case creation ‘phase’ of the project. We’re done testing when the cases are complete and what remains is to run them, obviously the most uninspiring part of testing.

The second thing I’ve seen happening is that test case specification draws the inquiring mind away from what it does best, namely to challenge the assumptions that are in the software and the assumptions that are made by the people involved in creating the (software) system — including ourselves. Test case creation is a particular activity that forces the train of thought down a narrowing track of confirmation of requirements or acceptance criteria, specifically at a time when we should be widening our perspectives. By its focus on the confirmation of what we know about the software, it takes the focus away from what is unknown. Test case creation stands in the way of critical thinking and skepticism. It goes against the grain of experimentation, in which we build mental models of the subject we want to test and iteratively develop our models through interaction with the subject under test.

mcl82If there is one thing that I was forced to look at again during the last couple of weeks — during which I was preparing for the testing of software changes — it was the art of reasoning and asking meaningful questions. Though I feel confident when asking questions, and though I pay a lot of attention to the reasoning that got me to asking exactly that particular set of questions, I also still feel that I need to be constantly aware that there are questions I didn’t ask that could lead down entirely different avenues. It is possible to ask only those questions that strengthen your assumptions, even if your not consciously looking for confirmation. And very much so, it is possible that answers are misleading.

So for the sake of better testing, take your ISTQB syllabus and — by any means other than burning — remove the part on test cases. Replace it with anything by Bacon, Descartes or Dewey.

“Criticism is the examination and test of propositions of any kind which are offered for acceptance, in order to find out whether they correspond to reality or not. The critical faculty is a product of education and training. It is a mental habit and power. It is a prime condition of human welfare that men and women should be trained in it. It is our only guarantee against delusion, deception, superstition, and misapprehension of ourselves and our earthly circumstances. Education is good just so far as it produces well-developed critical faculty. A teacher of any subject who insists on accuracy and a rational control of all processes and methods, and who holds everything open to unlimited verification and revision, is cultivating that method as a habit in the pupils. Men educated in it cannot be stampeded. They are slow to believe. They can hold things as possible or probable in all degrees, without certainty and without pain. They can wait for evidence and weigh evidence. They can resist appeals to their dearest prejudices. Education in the critical faculty is the only education of which it can be truly said that it makes good citizens.”

William Graham Sumner – Folkways: A Study of Mores, Manners, Customs and Morals

On Performing an Autopsy

Standard

On Tuesday the 3rd of March 2015, a Quality Boost! Meetup was held by Improve Quality Services and InTraffic in Nieuwegein, the Netherlands. The evening was organized around a session by James Bach, who performed a ‘testing autopsy’ — or ‘testopsy’. Huib Schoots facilitated the questions and the discussion and Ruud Cox created a sketch note. James’ aim was to test a product for ten minutes, narrate his train of thought during that session, and afterwards discuss what happened. He chose this approach in order to be able to do a close examination of what happens during testing.

The definition of an autopsy, according to Merriam-Webster, is as follows.

a critical examination, evaluation, or assessment of someone or something past

On narration and obsession-based testing

The definition above describes pretty accurately what James was trying to do with the testing session. By making explicit the thoughts that guided him during the testing of the application, he made them available for examination. He told the audience about narration — the ability to tell a story — and how important it is for a tester to explain what he is doing and why he is doing it. There are many reasons why narration is important; for example because you want to explain to your team mates what you did. But James’ main reason for narration in this session was to be able to teach us about testing and about the particular skills that are involved in testing.

James said he does not recommend spelling out a testing session word for word. He showed us an example of a report that he created when he was challenged at the Let’s Test conference to test a volume control for a television. In the report he explains everything that is related to his testing. The report contains mental notes, records of conversations, sketches and models and revisions thereof, graphs, experiments and also pathways that eventually proved to be dead ends. The report contains a huge amount of material, but only part of that material would be useful in a practical report out to, for example, management. The full narration of a testing session has its uses, but you’d have to be pretty obsessed with testing to create such an elaborate report. Therefore James dubbed it obsession-based testing.

A very detailed report of testing can serve at least two purposes that were mentioned during the evening.

  • It can be used for teaching testing and to have a discussion about it. The Quality Boost! Meetup that I attended was an example of such usage.
  • It can be used to investigate the skills that are involved in testing. James recently received a detailed test report from Ruud Cox that matched his own obsession-based report. Ruud used the report that he created to find more about the mental models that testers use when testing.

On survey testing

The tested appplication
The tool that James tested during his session is Raw. Raw is an open web app to create custom vector-based visualizations.

The subject under test was an online tool that can be used to generate — among other things — Voronoi diagrams. The Voronoi diagram is a mathematical diagram in which a plane is partitioned into regions based on distance to points in a specific subset of the plane. Through the tool it was possible to provide a data set as input, based on which the tool would generate the diagram. James had prepared some data sets in Excel in advance and during the ten minute session he ran the data sets through the tool and with the audience he examined the diagrams that were generated. This way we all got to know a little bit more about Voronoi diagrams and about how we could detect if the diagrams that were shown were more or less correct.

The type of testing James performed during this particular session is what he himself described as survey testing; a way of learning about the product as fast as possible. He did not focus particularly on, for example, the user interface or on the handling of erroneous data. He just wanted to get to know the application. Later on in the evening, when asked what method he used to explore an application in such a survey, James mentioned the Lévy flight; a random walk that appears to resemble his own type of search. This scanning pattern is made up of long, shallow investigations and short, deep investigations, after which the long, shallow walk is resumed. It seems to be a pattern that is used by animals looking for food (though scientific studies in this direction have been contested), or even by human hunter-gatherers (PDF).

A Lévy flight

A Lévy flight

 On sense making

Because his aim was to learn about the product by examining it through testing, he called his investigation an act of sense making. To make sense of a software product we need a number of skills. Sense making is something we all have to do in software testing. If the application under test does not make sense to us, it will be very hard to test it. Yet sense making is a difficult art. During the evening we discussed how sense making depends on you being able to handle your emotions about complexity. When faced with a complex problem it is not uncommon to become frustrated or to panic. As testers we have to deal with these emotions in order to progress and get closer to the problem. It may take time to get to the core of the problem and it is possible that we make mistakes. In order to make sense of a situation we have to allow for these phenomena. Other tools that help in making sense are guideword heuristics that aid us in remembering what we know.

On breaking down complexity and using a simplified data oracle

In order to make sense of an application or a system, we usually need to break down the complexity of this application. In our craft it is not very helpful to be in awe of, or afraid of complexity; we need to have ways to tackle it. James mentioned how systems thinking (and particularly Gerald Weinberg’s An Introduction to General Systems Thinking) helped him to handle complexity.

Some Voronoi diagrams
Below are displayed some Voronoi diagrams that were generated during the testing session, using data from the following Excel sheet: voronoi data.
As you can see all diagrams except the second display regular patterns that can be checked quite easily for correctness. The titles of the diagrams correspond with the titles of the data in the Excel sheet.
1) Diagonal
voronoi2
2) Randomvoronoi5
3) Cartesian Plane w/o Diagonals
voronoi4
4) Widening Spiral
voronoi1

The trick is to break complexity down into simple parts, to find the underlying simplicity of a complex system. There are many ways to find this underlying simplicity. One way is to break down the system until you have parts that you are able to understand. Another way is what James showed during his testing session. When you look at Voronoi diagrams, this subject matter may be considered complex by many, especially by those who do no have a background in mathematics. James tackled the problem by preparing sets of data by which it would be easy to predict what the generated diagram would look like. As James puts it:

to choose input data and configuration parameters that will result in output that is highly patterned or otherwise easy to evaluate by eye.

By simplifying the data you throw at the problem, you are better able to predict what the observed result should look like. James calls this a simplified data oracle. He used, for example, his own tool for generating pairs (I believe it was ALLPAIRS, but I am not 100% sure) to generate a simple set of combinations that would serve as input data (figure 3). Also he used his knowledge of mathematics to generate data that would display a spiraling Voronoi pattern. And indeed, a spiraling pattern was displayed (see figure 4).

On flow

A couple of loosely connected things were said about the flow of testing — or what James called the ‘tempo of testing’ — during the session. The flow of testing is impacted by the aim of testing. Testing is a deliberate art, but there is room for spontaneity to guide your testing. The balance you strike between deliberation and spontaneity (serendipity?) impacts the flow of testing. Also, a session may be interrupted, or you may want to interrupt your session at certain moments. Furthermore we talked about alteration, switching back and forth between different ideas, different parts of the application or between the application and the requirements.

On skills

In order to generate the test data for the testing of the application above, James used the following Excel sheet: voronoi data (click to download). He briefly discussed his usage of Excel and mentioned that being skilled with Excel can be a huge advantage for software testers. It is an extremely versatile tool that can be used to generate data, analyze data, gather statistics or draw up reports. I have personally used Excel, for example, to quickly analyze differences between the structures of large database tables in different test environments. Easily learned functions can help you a lot to generate insight into larger data sets.

James furthermore related a story about an assignment in which he was asked to evaluate the process that was used by a group of testers to investigate bugs. When he first asked the testers how they investigated bugs he was presented with a pretty generic four step process, such as identify > isolate > reproduce > retest or something similar. But when he investigated further and worked with the testers for some time, he learned that the testers used quite a large number of skills to judge their problems and come up with solutions. The generic process that they described when first questioned about what they did, diverted the attention from the core skills that they possessed but perhaps were unable to identify and name. Narration, as mentioned above, serves to identify and understand the skills that you use.

On acquiring skills

There are many ways to acquire the skills that are needed for testing. One way is to acquire a skill— for example a tool, a technique, or a programming language — by developing it on the job, while you’re doing the work. However, sometimes we need to have the knowledge beforehand and do not want to spend the time on the job for learning a tool or a language. For such situations James recommends creating a problem for yourself so that you can practice the tool or the technique. He showed that he is currently working on learning the programming language R this way. James reminded me of my own work for my website Testing References and having to learn (object-oriented) PHP, CSS, MySQL and the use of Eclipse for software development. This prepared me for learning other programming languages that I can use in projects. Also, I recently bought a Raspberry Pi and I am looking to do something with a NoSQL database (particularly MongoDB) on that machine, just for fun. During the evening James mentioned Apache Hadoop as a possible point of interest.

So far for this summary of the Quality Boost! Meetup with James Bach. I want to thank James Bach and Ruud Cox for providing me with additional material. I hope you enjoyed reading it.