The Battle of the Browser Window

Standard

Automating tests for a user interface (UI) is certainly not the easiest way of implementing test automation. When deciding to automate against a UI one needs to overcome a number of obstacles. For example, the type of platform on which the UI runs can have effects on the performance of the test execution. And various aspects of size—such as the desktop resolution or the size of an application window—can seriously affect automated test execution. In this article I would like to discuss the size of an application window of a UI that runs through a browser.

The browser that is used for UI test automation is Google Chrome and the creation of automated tests is done using Java 8 and Selenium WebDriver. The environment in which the tests are developed is a Windows 7 environment. Chrome is the only browser that is used for test automation. Firefox was tried but not considered as an alternative because Chrome worked sufficiently. Two headless browsers were tried: Headless Chrome and PhantomJS. Both browsers failed to handle overlays in the application correctly and were therefore abandoned.

In the project that I am working on, every developer has his own virtual development environment. For the development of the page objects that interact with the pages of the application, the developer will use the Chrome browser on a desktop. This desktop has a resolution of 1920 x 1080 pixels. The Chrome browser is always started maximized and therefore we can assume that the browser window size in which the tests are developed is roughly 1900 x 980 pixels. Browser toolbars and scrollbars will consume some of the desktop size. It is not within the scope of the project to develop tests against a smaller browser window. The fact that every developer is running tests against a window size of roughly 1900 x 980 pixels means that the code in the page objects is optimized for this size. In the code of the testing framework it is expected that the HTML elements that we want to access are inside browser window. Once an element is outside the window, one needs an extra line of code to scroll to that element. There are several ways to do this. If the automation code is run in a smaller window an element that was inside the larger window might suddenly cause the code the automation code to fail, throwing an ‘element is not clickable’ WebDriver exception. So running the automated UI tests on window size that is smaller than the size in which the tests are developed may require code changes.

Questions about resolution and size

As long as tests are only run on the developer’s personal development environment, there are no problems with resolution and size. The developer has full control over both. But as soon as tests are run on a different environment then we need to pay special attention to the resolution and the size of the browser window This is the case when UI tests are, for example, run automatically on a daily basis on a Windows server. The tests are scheduled in a Jenkins job and are executed using Maven. But do we know the resolution and size when we run UI tests on the server?

There is no easy answer to this question. In order to have a resolution, one needs a desktop. On Windows, a desktop is tied to a user account. So we need to establish which user account is used to run the tests. On Windows Jenkins runs as a Windows service. The Windows service runs by default under the LocalSystem account, which is a very specific account. It has extensive privileges and there are opinions that Jenkins should not be running under this account because of security considerations. It also appears that there is no Windows desktop associated with the LocalSystem account. I was unable to find conclusive evidence for whether or not the LocalSystem account has a desktop. It is observed that the UI tests that run on the server are capable of starting a browser so there must be a desktop but it is unclear where that desktop comes from and how it can be controlled. What is clear is that the resolution of the desktop and the size of the browser are considerably smaller than on the development environment. It causes tests to fail with the exception I mentioned before. To make sure that the window size is the root cause of failure, we want to determine the exact sizes of the desktop and the browser.

Determining desktop resolution and window size

Once UI tests are run through Jenkins it is not possible to observe the browser while the tests are running. But there are at least two ways to verify the size of the browser window. One is to take a screenshot using WebDriver. The other one is to gather the information programmatically and write it to the application log.

Using the Java Abstract Window Toolkit (AWT) it is possible to get the screen resolution. In the code example below the width and height are logged in the application log using the method logScreenResolution. The browser window size can easily be established using Selenium WebDriver. In the example below the size is logged in the method logWindowSize.

import java.awt.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

public class Driver {
    private static final Logger LOG = LoggerFactory.getLogger(Driver.class);
    private WebDriver driver;

    public Driver() {
        WebDriver driver = new ChromeDriver();
    }

    private static void logScreenResolution() {
        Dimension screenSize = Toolkit.getDefaultToolkit().getScreenSize();
        double width = screenSize.getWidth();
        double height = screenSize.getHeight();
        LOG.info("Desktop resolution: {} x {}", width, height);
    }

    private void logWindowSize() {
        LOG.info("Browser screen size {}", driver.manage().window().getSize());
    }

Using these methods I consistently encounter a desktop resolution far less than the desired 1920 x 1080. So it appears that whatever desktop is conjured up on the server is too small for running the UI tests. Furthermore I observe that the browser window size is consistently slightly larger than the desktop resolution. It seems odd that a browser window can be larger than the desktop but I ascribe the difference to the non-identical ways of measuring.

I spend some time digging through Google search results in order to find some fix for the desktop resolution problem on Windows—some way to control the desktop—but no solution comes up. It should be an option to create a new Windows user on the server, configure this user (give it the proper desktop) and have the Jenkins Windows service run under this account. However, I decide not to explore this option.

Setting the browser window size

There are ways to control the window size of the browser. But it appears that there are no ways to create a browser window that is larger than the desktop resolution. In other words, a browser window will always be confined within the dimensions of the desktop. Given that precondition, it is not very relevant to look at setting the browser size unless you specifically want a smaller window. But since it is a possibility I am including these options. For Chrome there are two ways to set the window size, one using the ChromeOptions, the other one using the setSize method of Selenium WebDriver. I think both ways can be used interchangeably. Note that there are also two ways to maximize the browser window. In the example below the window is maximized after the size is set. This is not very relevant since maximizing will create a browser window that fills the entire space of the desktop. It will override the set window size. But for the sake of showing both ways to manipulate the window size, they are done within the same method.

import org.openqa.selenium.Dimension;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

public class Driver {
    private WebDriver driver;

    public static WebDriver getChromeDriverUsingOptions() {
        ChromeOptions options = new ChromeOptions();
        options.addArguments("window-size=1400,1000");
        options.addArguments("--start-maximized");
        return new ChromeDriver(options);
    }

    public static WebDriver getChromeDriver() {
        WebDriver driver = new ChromeDriver();
        driver.manage().window().setSize(new Dimension(1400, 1000));
        driver.manage().window().maximize();
        return driver;
    }
}

 

The reliable desktop

Since the desktop resolution seems to be the variable that needs to be controlled in order to get to a proper window size, the quest is on for a reliable desktop. One of the ways to get a reliable and controllable desktop on the Jenkins Windows server is to create a virtual machine on that server that has such a desktop. And this can be done using Docker. Docker is a virtualization platform that allows for the easy creation and running of (Linux) machines on a server. I will not go into the intricacies of installing Docker on Windows; it is beside the point of this article. In order to run a Docker machine you need an image, a package that contains everything one needs to run an application. Selenium has developed several Docker images that can be downloaded and run instantly. The project is called Docker Selenium. The project sports an image in which a Chrome browser can be run. This image is called Standalone Chrome. Actually the image runs Chromium—the open source variety of the Chrome browser—but for the project that I work on this is sufficient. The question is whether the desktop of this image can be controlled. When we look at the base image of Standalone Chrome, we see that it is based on Ubuntu 16.04. Ubuntu 16.04 has a desktop, but the desktop is not used by Chromium. Instead, the X Virtual Framebuffer (Xvfb) is used to run Chromium. Xvfb is a virtual framebuffer, which means that it acts as a graphical interface on systems on which there is no display hardware and no graphical interface. Within Xvfb there is no actual desktop so there is no desktop resolution. Therefore it is possible to define a browser window of any size. This definitely solves the problem and brings an end to the battle of the browser window.

Conclusion

Having to use a browser for the testing of an application can be a nuisance in many ways. One of the problems is that a browser requires a graphical user interface. When tests are run on a Windows server it can be difficult to control the properties of the graphical user interface such as the resolution of desktop and the size of the browser window. Running a browser on a headless interface takes away the need to control a desktop. There is a variety of headless browsers, such as Headless Chrome or PhantomJS. But difficulties may be encountered when interacting with the application through a headless browser. The ultimate solution is to run a regular browser on a headless desktop. This way, the graphical user interface is provided by a virtual framebuffer. There are limitations with regards to the operating system (Linux) and the type of browser that can be used. But the combination of Chromium, Linux, Xvfb and—if desired—Docker can create a reliable and controllable environment for UI testing. Ironically, the struggle with the desktop resolution ends by eliminating the desktop.

By desktop resolution I indicate the dimensions of the desktop, not the pixel density. So ‘desktop resolution’ is a misnomer but it is used to make a clear distinction between the size of the desktop and the size of the browser window.

Advertisements

Untangling Maven dependencies

Standard

The Maven Project Object Model (POM) of a Java project contains references to all the libraries that are used in that particular project. These libraries are called dependencies in Maven and Maven handles for you the dirty work of downloading and managing the packages of the dependencies and including them in your project. It is a formidable tool but it can confront you with tricky puzzles from time to time.

In many Java test automation projects you’ll find references to Selenium libraries because Selenium is a popular tool for GUI test automation. In the project that I worked on as a test automation engineer some Selenium libraries were used. My Maven dependency challenge began when I wanted to start using Headless Chrome for UI automation. Headless Chrome is a handy browser because it does not require the actual Chrome browser so it is possible to run UI tests without the actual UI. It takes away some of the problems of using normal browsers for GUI automation, such as sluggishness and the need to be precise about the browser window size.

The starting point

I join the test automation project when it is already in full swing. As such, there is an existing Maven POM file which contains references to some Selenium libraries. There is, for example, a reference to the Selenium Chrome Driver library. This is the client library that is used in the Java code. The dependency is included in the Maven POM as below.

<dependency>
   <groupId>org.seleniumhq.selenium</groupId>
   <artifactId>selenium-chrome-driver</artifactId>
   <version>3.0.1</version>
</dependency>

As you can see the version of the library is 3.0.1 which is a version that was released in October 2016.

In order to run Headless Chrome, it appears that at least version 59 of the Chrome browser is needed. This version of Chrome was released in May 2017, so in order to avoid compatibility conflicts I think it best to upgrade the version of the Selenium Chrome driver library. In retrospect I cannot find evidence that Chrome version 59 depends on a later version of the Selenium Chrome driver library, but this does not occur to me at that moment.

Making a mess

As we saw above the Selenium Chrome driver library can be included as a separate dependency in the Maven POM. But it is not necessary to include this driver library and other driver libraries (such as those for FireFox and Internet Explorer) separately since they are also a compile dependency of the encompassing Selenium Java library. So by including the Selenium Java library as a dependency I would be able to get rid of all the separate Selenium dependencies, which makes the POM a whole lot simpler. I decide to try this approach and copy the dependency from the POM of another test automation project. Here is the code that I copied.

<dependency>
   <groupId>org.seleniumhq.selenium</groupId>
   <artifactId>selenium-java</artifactId>
   <version>3.4.0</version>
</dependency>

As you notice, the version of the library is 3.4.0. This is by no means the latest version, but at the time I forget to look up what the latest version is. I am really satisfied having reduced the number of Selenium dependencies from five to one and hit the ‘maven clean install’ command to build the code. To my amazement, the code does not compile!

Untangling the web of dependencies

The compiler tells me that there is a certain method that is used in the Java code that cannot be recognized anymore. The method is JsonObject.keySet(), which is used in some part of the code that processes JSON files. The only thing I changed in the project was the Selenium dependencies, so I am flabbergasted by the fact that suddenly the compiler is tripped up by a JSON error. The two are entirely unrelated. In the import statement of the class in which the keySet() method is used it appears that the Google Gson library is somehow imported. This means that the Google Gson library should be mentioned in the Maven POM file. I look at the POM file. There is no Google Gson dependency! I roll back the changes that I made to the POM file. Still no Google Gson dependency! Where does this dependency come from??

Then I remember that Maven does a lot of the heavy lifting for you. Libraries can have compile dependencies and Maven manages the compile dependencies for you. We saw that the Selenium Java library has all the Selenium drivers as compile dependencies and that creating a dependency for the Selenium Java library dismisses us from having to import these driver libraries separately. So the Google Gson library must be a compile dependency of a library that is actually mentioned in the POM file!

I use IntelliJ as an IDE and IntelliJ has a useful array of plugins. One the plugins is the Maven Helper. This plugin allows you to dig deeper into the web of dependencies that is set up in the POM. It has a handy little search box that allows you to search through the dependencies and their compile time dependencies. Using this tool I find that the Google Gson library is actually a compile time dependency of the FireFox driver library. It is a bonus—so to say—that come with the Firefox driver package. See the screen shot below for the search using the Maven Helper plugin.

Furthermore, I notice that the version of the FireFox driver library that I use needs version 2.8.2 of the Google Gson library. Then I roll forward to the changes that I made in the POM file and see that version 3.4.0 of the Selenium Java library that I introduced only has version 2.8.0 of the Google Gson library as a compile dependency. Could it be that the keySet() method that was not recognized after I made the changes to the POM was introduced in version 2.8.1 or version 2.8.2 of the Google Gson library? Luckily the Google Gson Change Log provides the answer: the keySet method was introduced in version 2.8.1. So I cannot use version 3.4.0 of the Selenium Java library, unless I create a separate dependency for the Google Gson library in the POM. But that would lead to a dependency conflict.

Fixing it

Then I look again at the version of the Selenium Java library. Is 3.4.0 really the latest version or is there a later version that includes version 2.8.2 of the Google Gson library as a compile dependency? A quick search in the Maven repository tells me that version 3.11.0 is actually the latest version and that this version has version 2.8.2 of the Google Gson library as a compile dependency. I change the version of the Selenium Java library in the POM and hit ‘maven clean install’ again. This time the code builds successfully.

Used tools

  • IntelliJ
  • IntelliJ Maven Helper plugin
  • Maven repository browser
  • Google Search

Pointing to Pyramids

Standard

When we look at the collection of test automation pyramids that have been published over the last couple of years, it is hard to get a clear picture of its (original) purpose and the significance of its variations. It appears that every test automation pyramid we encounter is a new model in itself due to sometimes slight and sometimes fundamental adjustments. There is no doubt about its popularity as a model in software testing. But its development and use is troubled by the rather reckless treatment of its lineage.

There are many reasons why the field of software testing easily loses track of what has been produced in the past. One of the reasons is that it is hard to find good references. Finding references requires the study of literature and this is something that is often taken for granted. As an example of how the field of software testing obfuscates rather than clarifies the history, the evolution and the use of its models, I would like to take a few lines from an article that was recently published in the magazine Tea-time with Testers.

Up front I must mention that Tea-time with Testers is a magazine that is offered for free to the testing community. It is platform for those who desire to contribute to the field. I think we should appreciate any initiative that tries to improve the state of testing and that we should respect the people, such as Lalitkumar Bhamare (editor of Tea-time with Testers) who invest their time in these initiatives. Furthermore we should respect the writers who invest their time in sharing their experiences with the community.

In the January edition of Tea-time with Testers there is an article entitled The Agile Testing Pyramid: Not so much about Tools But more about People And Culture. In the article a reference is made to Mike Cohn’s test automation pyramid. The article itself is an experience report of the implementation of a test automation strategy in accordance with the test automation pyramid. There are some interesting conclusions and on the whole it is nice to read about the application of the pyramid in practice.

It is not my goal to criticize the article. Merely, I want to use the way it references the test automation pyramid as an example of how inaccuracies in referencing can obscure the intentions of a model. I could have taken another article or even a book about software testing to show that same lack of accuracy or sometimes even the total lack of acknowledgement of the existence of predecessor.

At the end of the article the following is stated:

The Agile Testing Pyramid is an agile test automation concept developed by Mike Cohn.

And…

For more information: Succeeding with Agile: Software Development Using Scrum. Mike Cohn, Pearson Education, 2010

The Agile Testing Pyramid as shown in the article

The picture besides the text in the article shows the test automation pyramid with the three layers: unit, service and UI as developed by Mike Cohn. The pyramid that is shown in the article strongly resembles the pyramid that is published in the book Succeeding with Agile, which was published by Addison-Wesley in November 2009, ISBN 0321579364. It is highly likely that the (2010) reference in the article actually intends to point to this book, but we cannot be one hundred percent sure. Here is why…

The article states that Mike Cohn’s concept is called the Agile Testing Pyramid. But in the text of Succeeding with Agile, there is no mention of an Agile Testing Pyramid. Cohn calls his concept the test automation pyramid, throughout the entire book. So can we assume that there has been slight inaccuracy, that the naming of the pyramids (remember there are many of them) has gotten mixed up, but that the article still intends to point to Cohn’s 2009 pyramid? Maybe we can.

There are in fact quite a number of articles and blogs that refer to Cohn’s 2009 pyramid as the Agile Testing Pyramid, so it is not uncommon to make this mistake. There is even a book called Agile Swift by Godfrey Nolan that refers to Cohn’s 2009 pyramid as the Agile Testing Pyramid and displays an image of a totally different pyramid. Any awkwardness in this area, however, can be avoided by looking up the pyramid in Cohn’s book and referring to his work correctly.

The test automation pyramid by Mike Cohn, 2009

The question we should ask ourselves is whether test automation pyramid means something different than agile testing pyramid. It seems that the testing community believes that these names can be used interchangeably. And practitioners in the field throw other names into the mix, such as the testing pyramid, the software testing pyramid, the test pyramid, or the agile test automation pyramid. But if we do not use the name that was assigned to the model by its author, how can we be sure we indicate that particular model? We can, for example, reference the source. Without that the name Agile Testing Pyramid can mean just about anything. Luckily the article mentions the source of the pyramid. But as we saw the reference is slightly incorrect. We already suspected that the authors intended to point to the 2009 book. If there is any remaining doubt at all, this is removed by the fact that a picture of a model closely resembling Cohn’s 2009 model is shown in the article. These three facts combined, though they each have their flaws, point to the intended model. In the article that lies before us we need this three-step method for establishing that we have the correct model in mind.

  • Indicate the name of the model.
  • Make a reference to the publication of the model.
  • Add a picture of the model.

If the name and the reference would have been correct, the first two steps would have sufficed. And yet we are lucky that the authors of the article spent some effort trying to guide us to the intended model. Sometimes we are not so lucky. Sometimes Cohn’s 2009 pyramid is referred to as Cohn’s pyramid. The fact that we know of at least two disparate pyramids that were published by Mike Cohn (one in 2004 and one in 2009) means that referring to Cohn’s pyramid is not enough to point to the intended model.

Another delightful example of creating mysteries using references is the aim to point to Cohn’s 2009 pyramid by referring to it as the original pyramid. In the text of the article this phrase is used too.

The original agile testing pyramid knows three levels: unit tests, services and UI tests

The other original pyramid: the test automation pyramid by Lisa Crispin and Janet Gregory, 2009

At this point in the article we already know which model is intended by the original agile testing pyramid. Nevertheless, the phrase is wrong. If we define original as earliest then Cohn’s 2009 pyramid is not the original. There are at least a couple of test automation pyramids that were published before November 2009. None of these pyramids seem to have been popular enough to leave a lasting impression on mainstream testing. But to designate Cohn’s 2009 pyramid as the original would be distorting the truth. Even if we leave out the lesser known models there is one contender for the title of original testing pyramid, which is the model that was published by Lisa Crispin and Janet Gregory in the popular book Agile Testing in January 2009.

One of the root causes of pyramid mayhem in software testing is that there is huge amount of uncertainty about what came first and what came in what particular order. Because of this it is nearly impossible to discern any pattern or direction in the development of the test automation pyramid as a concept. If we take the Utopian view of the development of software testing we hope that it will go along the lines of the scientific method. This way a model evolves from a certain initial model and newer versions are created by testing, building on, extending and refining that initial model. In reality what we have is a Cambrian explosion of models because we do not know which models exist and how they have been tried. Each sloppy reference just stirs the soup and makes it a little bit murkier. It is because of this that the following situations may happen in practice.

  • The author is blissfully unaware of any test automation pyramid. It seems to him that his idea is new to the field of software testing. Must. Publish.
  • The author has heard about the existence of a test automation pyramid. He googles for it and after having skimmed the first three search results decides that his idea is new and fresh and the world needs to know about it.
  • The author has heard about the existence of a testing automation pyramid. He googles for it and reads the first search result. The pyramid in the first search result was published a couple of years ago and does not reference to any older pyramid. He decides that his pyramid is an improvement over the pyramid shown in the first search result and decides to publish and reference to the first search result.
  • The author has a bit more knowledge of the field of software testing. He has read the book Agile Testing by Crispin and Gregory. He knows there is a pyramid in that book. He publishes a piece about the development of the Agile pyramid and references the book as containing the original Agile pyramid.
  • The author has found Mike Cohn’s 2009 test automation pyramid. He references the pyramid in his own work and places beside it a picture of a totally different pyramid. I have actually found two examples of this, so there should be more.

The moment when it clicks

Standard

There are moments when something suddenly clicks. Something that appeared to be veiled and impossible to understand suddenly becomes intelligible and clear. It is a very astonishing moment that occasionally happens when we study something. My understanding of my own learning of a particular subject—whether it be a tool or a domain—is that it happens gradually. I add pieces to the puzzle and over time a more complete picture evolves. It can be a tedious affair. Sudden insight, as if passing through a door that unexpectedly opens, does not happen to me a lot.

Yet last week I had such a moment. I had been toying around with Kibana during the last couple of weeks, but without a lot of success. We use Kibana to sift through the logging that is generated in the production environment. We try to gather relevant statistics, signals through the aggregation of the data that is logged. I personally think the tester should familiarize himself with the usage of logs to analyze what is going on in production. The data gathered can inform testing, can tell him about the actual usage of the product and can reveal risk and help him direct his testing.

So, since our team uses Kibana (Kibana 3, to be precise), I felt like I had no excuse to dodge that bullet. I probably could have gotten away with avoiding looking at the logging. In my team there are at least two engineers who regularly look at the dashboards and I could have left it up to them to monitor the production environment and perhaps do some requests for me. But I personally wanted to get more out of monitoring and so I had to try to tackle the Elastic Stack.

For weeks I struggled with the Kibana dashboard. The queries and filtering seemed counter-intuitive and the results almost random. The creation of rows and panels (the layout of the dashboard) baffled me. It was my first encounter with Log4j and Tomcat logging and my inexperience with many of the parts of the Elastic Stack caused frustration. I would spend a couple of hours creating some queries but never ended up with the right result. The Elastic query DSL just failed to make a logical connection in my head. I looked up tutorials and some instructions videos on Youtube, but I did not advance. It was like knocking at the same door all the time to find it shut tight.

And last week the door suddenly opened. In the matter of an hour I went from hitting keys in frustration to freely and joyfully playing around with the tool. I do not think there is a single thing that unlocked the door, but in retrospect there are some things that helped. I’d like to offer a quick examination of those things.

First off, last week, I set myself a small, well-defined Kibana task, caused by the following. My team uses a Grafana dashboard to keep track of the errors that are generated in the production environment. The dashboard is shown on a wide screen television that is on all the time. Errors appear on our dashboard but it seems that we pay only marginal attention to them. The lack of interest that I noticed is a common one. It is the same lack of interest that can be observed when putting the results of flaky automated tests on a dashboard. Over time, the lack of trust in the results of these tests causes a kind of boredom, the shutting out of the false alarm. Since the Grafana dashboard does not facilitate the splitting up of the errors by root cause but Kibana does, my only task was to split up the errors by root cause and therewith increase our insight in the errors. This task was within my reach. The fact that there were some examples, created by other teams, readily available also helped.

Second, I finally took the time to notice the things that were going on in the Kibana dashboard. I should have paid attention to them long ago, but I think my frustration got in the way. For example; it is pretty easy to create a query in Kibana that will run indefinitely. Setting the scope of the query to a large number of days can do that for you. It will leave you guessing endlessly about the flakiness of your query unless you notice the tiny, tiny progress indicator running in the right upper corner of the panel.

Also, different panels of the dashboard will react differently to the results of the query. The table panel, which shows a paginated table of records matching your query, can show results pretty quickly, but a graph potentially takes a lot of time to build up. This seems downright obvious and yet understanding this dynamic takes away a lot of the frustration of working with a Kibana dashboard. It is a delicate tool and you have to think through each query in terms of performance.

Thirdly, I think determination also contributed to the click moment. I desperately wanted to win the battle against Kibana and I wanted to take away some of fuziness of the dashboard. Last week I noticed a difference between the number of errors as shown in the Grafana dashboard and the number of errors (for the same time period) as gathered from Kibana. So there was a bug in our dashboard. Then I knew for certain that Kibana can serve as a testing tool. Once I was fully aware of its potential, I knew there was only one way forward.

 

 

 

Giving the good advice

Standard

In my previous post I tried to explain that software testing, as it happens in practice, cannot be represented as a logical and orderly (coherent) sequence of events. There are too many factors—specific circumstances— that influence the decision making process that guides testing. The study of testing is the study of this decision making process. It is driven by detailed information about the circumstances. And yet, in many publications about software testing, these details are disregarded. What remains is a representation of testing that is orderly, classified and coherent; a representation that is often lacking details (facts) by which we can verify the selected approach, method or model.

One such representation—which I mentioned in my comment on the previous post—is the catalog; a list of systematically arranged items. Usually, such a list contains items that are abstracted from reality. Items can be resources (such as software testing tools), skills, characteristics, methods, practices, preferences and many other things. The list can have many purposes; it can serve, for example, as a checklist, an overview, a heuristic, a process description or a guideline. In software testing we have a wide diversity of catalogs and my argument is that we focus too often on achieving an orderly, classified and coherent representation, while forgetting about reality.

As an example I would like to discuss a catalog that is presented in the EuroSTAR blog post G(r)ood testing 25: Tips for how to boost Unit testing as a Functional Tester. The catalog that is offered is called ‘Tips for boosting Unit testing’. In order to understand what the list looks like, a part of it is presented below.

Tips for boosting Unit testing

  • Focus on doing useful tests (tests deliver information) rather than discussing whether the tests are unit tests or functional tests
  • Ask the developer for Help (or offer him to help?), rather than telling him he needs to do UT
  • Pair the tester and developer together while defining the tests
  • Demonstrate the damage of not doing UT – E.g. showing that late found defects slow everyone down
  • Organize coding Dojo’s where you put unit testing on the agenda
  • Introduce test design techniques to the developers
  • Get an understanding about the build and deployment process

I am currently involved in setting up unit testing in an Agile team, so the topic of unit testing—and putting unit testing on the agenda—is familiar to me. When I read through the list that is mentioned in the blog post, at first glance I feel that the advice that is presented is sensible and practical. I even applied some of the tips that are on the list. It appears to contain solid and practical advice to someone wanting to make more of unit testing.

But is it really solid and practical advice? The fact that it looks very familiar to me, might suggest that it is folklore, which is defined by Merriam-Webster as “an often unsupported notion, story, or saying that is widely circulated.” Could it be that these tips are widely circulated and therefore look very familiar to me? This could be an interesting area for further investigation. And then the next question: is it good advice? Some state that “most advice is terrible”. This statement is made in one of the first links that turned up when I searched for the phrase “good or bad advice” in Google. It is interesting, because it suggests that this list of tips for unit testing that we a looking at is likely to be terrible. But it also suggests that we probably should look further and come to a more nuanced perspective of what advice means. In order to get to this nuanced perspective we need to study the nature of the advice, its context, the persons involved etc…

In order to evaluate the advice that is given to us, we need to know more about it. The blog post reveals quite a number things. The list of tips was the outcome of a brainstorm session that was held during the the 21st testing retreat in Château de Labusquière à Montadet in France. We learn that senior testers (twelve in total) from various countries were present. The fact that all testers were senior suggests that the advice has been derived from years of experience in software testing. With the means at hand it is not possible to find out if that experience encompasses unit testing, so we have to assume this. But all in all, the word ‘senior’ suggests that the advice should be good, since it must have been tested in practice extensively.

There are other aspects of the advice that can be learned from the text. For example, a motivation for giving the advice is stated (to help testers in their struggle) and in a very general sense a couple of situations are described in which the advice might be of use. But apart from the fact that the advice is likely to come from experience, we have no other indications that the advice is good. Moreover, we lack information by which we can verify whether the advice is good or not. Sure, one can apply each of the tips to the best of one’s abilities, but if this leads to (horrible) failure, how does this reflect on the quality of the advice? Perhaps the advice was bad indeed but the failure to deliver may also have been caused by shortcomings on the tester’s part. Perhaps she did not understand the advice or she lacked of skills to apply it. Furthermore there may have been circumstances adversarial to implementing the advice. Perhaps the timing was wrong, the order in which it was applied was incorrect or certain preconditions were not met.

By now it should be obvious that we lack two things. Firstly, the criteria by which we can evaluate the success or failure of the advice that is given are not present. And secondly we lack information about the context in which the advice is applicable and what it needed to apply it. Without these criteria it is possible to give any sort of advice. If I wanted for example to get from Utrecht in the Netherlands to New York, the advice might be to get on a plane. This advice contains a huge amount of implicit assumptions that are essential to the success of the advice. In other words; the advice is useless. Perhaps the same can, for example, be said of the advice to “introduce test design techniques to the developers.” I can see how test design techniques help us make an informed decision about domain coverage, but aren’t test design techniques usually based on detailed functional specifications? What is there to cover when we are coding and learning about the functioning of the application in parallel? And how do we know what technique to apply if we know little about risk? And if a test design technique tells us that we should write say thirty unit tests, how would I deal with the boredom of writing those tests? How would I handle the frustration of the developer who wrote these elaborate design-technique-based tests and has to throw them out three sprints later because of new insights with regards to the functionality? And which tests should I write that are not based on test design techniques? And what should they be based on?

As an afterthought, it may be a nice exercise in critical evaluation to add tips to the list and see if the list gets better or deteriorates because of it. If creativity is needed just look up the Celestial Emporium of Benevolent Knowledge, which is a brightly shining example of a catalog.

Never in a straight line

Standard

The theme of the seventh annual peer conference of the Dutch Exploratory Workshop on Testing (DEWT7) is lessons learned in software testing. In the light of that theme I want to share a lesson recently learned.

Broadly stated, the lesson learned is that nearly any effort in software testing develops in a non-linear way. This may seem like a wide open door, but I find that it contrasts with the way software testing is portrayed in many presentations, books and articles. It is likely that due to the limitations of the medium, decisions must be made to focus on some key areas and leave out seemingly trivial details. When describing or explaining testing to other people, we may be inclined to create coherent narratives in which a theme is gradually developed, following logical steps.

Over the last couple of months I came to realize something that I’ve been experiencing for a longer time; the reality of testing is not a coherent narrative. Rather; it is a series of insights based on a mixture of (intellectual) effort and will, craftsmanship, conflicts and emotions, personality and personal interests and, last but not certainly least, circumstance, among which chance and serendipity. The study aimed at the core of testing is the study of the decision making process that the software tester goes through.

My particular experience is one of balancing many aspects of the software development process in order to move towards a better view of the quality of the software. I spent six full weeks refactoring an old (semi) automated regression test suite in order to be able to produce test data in a more consistent manner. As expected, there was not enough time to complete this refactoring. Other priorities were pressing, so I got involved in the team’s effort to build a web service and assist in setting up unit testing. My personal interest in setting up unit testing evolved out of my conviction that the distribution of automated tests as shown in Cohn’s Test Automation Pyramid is basically a sound one. The drive to make more of unit testing was further fueled by a presentation by J.B. Rainsberger (Integrated Tests Are A Scam). I used unit testing to stimulate the team’s thinking about coverage. I was willing to follow through on setting up a crisp and sound automation strategy, but having set some wheels in motion I had to catch up with the business domain. With four developers in the team mainly focusing on code, I felt (was made to feel) that my added value to the team was in learning as much as needed about why we were building the software product. To look outward instead of inward. And this is were I am at the moment, employing heuristics such as FEW HICCUPS and CRUSSPIC STMPL (PDF) to investigate the context. It turns out that my investment in the old automated regression test suite to churn out production-like data is now starting to prove its worth. Luck or foresight?

All this time a test strategy (a single one?) is under development. Actually, there have been long (and I mean long) discussions about the test approach within the team. I could have ‘mandated’ a testing strategy from my position as being the person in the team who has the most experience in testing. Instead I decided to provide a little guidance here and there but to keep away from a formal plan. Currently the test strategy is evolving ‘by example’, which I believe is the most efficient way and also the way that keeps everyone involved and invested.

The evolution of the understanding of the quality of the software product is not a straight path. Be skeptical of anything or anyone telling you that testing is a series of more or less formalized steps leading to a more or less fixed outcome. Consider that the evolution of the understanding of quality is impacted by many factors.

 

 

Solving a Ten Thousand Piece Puzzle

Standard

On the third of March a meeting was organized by Improve Quality Services (my employer) and the Federation of Agile Testers in Utrecht, the Netherlands. The evening featured James Bach as speaker and his talk focused on the paper A Context-Driven Approach to Automation in Testing, which was written by him and Michael Bolton. My favorite part of the evening was the part during which James tested some functionality of an application and explains his way of working. He provided such a demonstration a year ago when introducing the test autopsy.

The exercise

inkscape_logoThis time around the subject under test was the distribution function of the open source drawing tool Inkscape and the focus was on the usage of tools to test this functionality. It must be said that Inkscape lends itself to the usage of tools because it stores all the images that are generated using this tool in the Scalable Vector Graphics (SVG) format, which is an open standard developed by the World Wide Web Consortium (W3C). This greatly increases the intrinsic testability (links opens PDF) of the product, as we will see.

The SVG format is described in XML and as such, the image is a text file that can be analyzed using different tools. It is also possible to create text files in the SVG format that can then be opened and rendered in Inkscape. As such, the possibilities for creating images by generating the XML script using code are virtually limitless. Before the start of the presentation James had created a drawing containing 10,000 squares. He created this drawing using some script (I am not sure he mentioned in which language this script was written). My initial reaction to James showing the drawing that he generated was one of astonishment. I was impressed by his idea of testing this functionality with 10,000 squares, by the drawing itself and the fact that it was generated using a script.

Impressed by complexity

Looking back, my amazement may have been caused my lack of experience with Inkscape and the SVG format. But it also reminded me that it is easy to be impressed by something new and especially if this new thing seems to be complex. I believe that, in testing, if you really want to impress people — for all the wrong reasons — all you need to do is to present to them a certain subject as being complex. People will revere you because you are the only one who seems to understand the subject. The exercise, as James walked us through it, seemed complex to me and this is what triggered me to investigate it.

So why use ten thousand squares?

I am sure it was not James’ intention to impress us, so then the question is: why would he use ten thousand squares? Actually this question occurred to me halfway through doing the exercise myself, when tinkering with the distribution function. Distributing, for example, 3 squares is easy; it does not require a file generated with a script. Furthermore, it is easy to draw conclusions from the distribution of 3 squares. Equal distribution can be ascertained visually (by looking at the picture), with a reasonable degree of certainty. So if equal distribution functions correctly with 3 squares, why would there be a problem with 10,000 squares? I am assuming that the distribution algorithm does not function differently based on the amount of input. I mean, why would it? So, taking this assumption into account, using 10,000 squares during testing does only the following things:

  1. It complicates testing, because it is no longer possible to ascertain equal distribution visually.
  2. Because of this, it forces the tester to use tools to generate the picture and to analyze the results.
  3. It complicates testing, because the loading of the large SVG file and the distribution function take a significant amount of time.
  4. It tests the performance of the distribution function in Inkscape.

Now the testing of the performance is not something I want to do as a part of this test. But it seems that working with 10,000 squares adds something meaningful to the exercise. A distributed image generated from 10,000 squares does not allow for a quick visual check and therefore simulates a degree of complexity that requires ingenuity and the use of tools if we want to check the functioning of distribution. Working with large data sets and having to distill meaning from a large set is, I believe, a problem that testers often face. So, as an exercise, it is interesting to see how this can be handled.

A deep dive into the matter

Some of the tools I use

  • Inkscape (for viewing and manipulating images)
  • Python (for writing scripts)
  • Kate (for editing scripts and viewing text files)
  • KSnapshot (for creating screen shots)
  • Google (for looking up examples & info)
  • R (for statistical analysis)

In an attempt to reproduce James’ exercise, I create a script to generate this drawing myself. In order to do so, I need to find out a little bit about the SVG standard. Then I create an Inkscape drawing containing one square, in order to find out the XML format of the square. Now I have an SVG file that I can manipulate so I have enough to start scripting. I install Python on my old Lubuntu netbook which is easy to do. I never did much programming in Python before. I could have written the script in PHP or Java, which are the two programming languages about which I know a fair amount, but it seems to me that Python is fairly light-weight and suitable for the job. It can be run from the command line without compilation, which contributes to its ease of use.

So I write a Python script that creates an SVG file with 10,000 squares in it. Part of the script is displayed below. I look up most of the Python code by Googling it and copy-pasting from examples, so the code is not written well, but it works. I can run the script from the command line and it generates the file in the blink of an eye. The file size of the file is just about 2.4 Mb, which is fairly large for a text file and when I open it using Inkscape, the program becomes unresponsive for a couple of seconds. Apparently the program has some difficulty generating the drawing, which is understandable, given that the file is large and the netbook on which I run the application is limited in both processing power and internal memory (2 Gb). Yet, the file opens without errors and shows a nice grid of 10,000 squares.

Python script for creating the squares

with open('many_squares.svg', 'w') as f:
    f.write(begin)
    
    x=0
    y=0
    offset = 12
    number_of_squares = 10000

    while number_of_squares > 0:
        square = '''<rect
        style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        id="rect3336"
        width="10"
        height="10"
        x="%d"
        y="%d" />''' % (x,y)
        if x + offset > 800 :
            y = y + offset
            x=0
        else:
            x = x + offset
        f.write(square)
        number_of_squares = number_of_squares -1
    f.write(end)

Which results in the following picture.

ten thousand squares grid

The regular grid of 10,000 squares created with the Python script

ten thousand squares grid close up

A close up of the grid created with the Python script

 

inkscape - align and distribute

The Inkscape distribution functions

I now have a grid of 10,000 squares with which I am trying to reproduce James’ exercise. The thing that I run into is that Inkscape has a number of distribution options. I am not sure which distribution James applied, so I try a couple. None of them however show as a final result the image that James showed during his presentation – as far as I can remember it was an oval shape with a higher density of objects near the edges. Initially it seems strange that I am unable to reproduce this, but through tinkering with the distribution function, I conclude that the fact that I am unable to reproduce James’ distributed image probably depends on the input. The grid that I create with the script contains identical squares of 10 by 10 pixels, evenly spaced (12 pixels apart) along the x and the y axes. It may differ in many aspects (for example, size, spacing and placement of the objects) from the input that James created.

Developing an expected result

I apply the Inkscape distribution functionality (distribute centers horizontally and vertically) to my drawing containing the 10,000 squares and the result is as shown below. The resulting picture looks somewhat chaotic to me. I cannot identify a pattern and even if I could identify a pattern, I would not be sure if this pattern is what I should be seeing. There seem to be some lines running through the picture, which seems odd. But in order to check the distribution properly, I need to develop an expected result, using oracles, by which I can check if the distribution is correct.

ten thousand squares distributed

The entire distributed drawing

ten thousand squares distributed close up

A close up of the distributed drawing (it kind of looks like art)

 

I do several things to arrive at a description of what distribution means. First I consult the Inkscape manual with regards to the distributions functions that I used. The description is as follows.

Distribute centers equidistanly horizontally or vertically

Apart from the spelling mistake in the manual, the word that I want to investigate is ‘equidistant’. It means — according to Merriam-Webster —

of equal distance : located at the same distance

Distance is complex concept. The Wikipedia page on distance is a nice starting point for the subject. I simplify the concept of distance to suit my exercise by assuming a couple of things. My definition is as follows: Distance is the space between two points, expressed as the physical length of the shortest possible path through space between these points that could be taken if there were no obstacles. In short, the distance is the length of the path in a straight line between two points.

There are other things I need to consider. The space in which the drawing is made is two dimensional. This might seem obvious, but it is important to realize that every single point in the picture can be identified with a two dimensional Cartesian coordinate system. In short, every point has a x and a y coordinate (which we already saw when generating the SVG file) and this realization greatly helps me when I try to analyze the picture.  Another question I need to answer is which two points I use. This is tricky, because in my exercise, I used the center of the square as reference point for distribution. Since I am dealing with squares, width and height are equal and since all the squares in my drawing have the same width and height, I simplified my problem in such a way that I can use the x an y coordinates of the top left corner (which can be found in the SVG file) of each square for further analysis. There is no need for me to calculate the center of each object and do my analysis on those coordinates.

And lastly I need to clarify what distribution means. It turns out that there are at least two ways to distribute things. I came across an excellent example in a Stack Exchange question. In this question the distinction is made between spreading out evenly and spacing evenly. To spread out evenly means that the centers of all objects are distributed evenly across the space. To space evenly means that the distance between the objects is the same for all objects. The picture below clarifies this.

Types of distribution

Types of distribution (source: Stack Exchange)

In my special case — I am working exclusively with squares that are all the same size — to spread out evenly means to space evenly. So the distinction, while relevant when talking about distribution, matters less to me. Aside from the investigation described above, I spoke with several co workers about this exercise and they gave me some useful feedback on how I should regard distribution.

To make a long story short, my expected result is as follows.

Given that all the objects in the drawing are squares of equal size, if the centers of all the squares are distributed equally along the x axis then I can analyze the x coordinates of the top left of all squares. If the x coordinates are sorted in ascending order, I should find that difference between one x coordinate and the x coordinate immediately following it, is the same for all x coordinates. The same should go for the y coordinates (vertical distribution).

This is what I’m looking for in the drawing with the distributed squares.

Some experiments in R

In order to do some analysis, I need the x and y coordinates of the top left corner of all the squares in the drawing. It turns out to be fairly easy to distill these values from the SVG file using Python. Again, I create a Python script by learning from examples found on the internet. The script, as displayed below, subtracts from the SVG file the x and y coordinates of the top left corner of each square and then writes these coordinates to an comma separated file (csv). The csv file is very suitable as input for R.

Python script for generating the csv file containing the coordinates

svg = open("many_squares_distr.svg", "r")

coordinates = []
for line in svg:
    print line
    if line.find(' x=') <> -1 or line.find(' y=') <> -1: # line containing x or y coordinate found
        positions = []
        for pos, char in enumerate(line):
            if char == '"':
                positions.append(pos)
        print line[positions[0]+1:positions[1]]
        if line.find('x=') <> -1 :
            x = line[positions[0]+1:positions[1]]
        if line.find('y=') <> -1 :
            y = line[positions[0]+1:positions[1]]
            new_coordinate = [x,y]
            coordinates.append(new_coordinate)

with open('coordinates.csv', 'w') as f:
    point = 1
    line = line = 'X,Y\n'
    f.write(line)
    for row in coordinates:
        line = '%s,%s\n' % (row[0],row[1])
        f.write(line)
        point = point + 1

Now we come to the part that is, for me, the toughest part of the exercise on which I consequently spent the most time. The language that I use for the analysis of the data is R. This is the language that James also used in his exercise. The language is entirely new to me. Furthermore, R is a language for statistical computation and I am not a hero at statistics. What I know about statistics dates back to a university course that I took some twenty years ago. So you’ll have to bear with the simplicity of my attempts.

It is not difficult to load the csv file into R. It can be done using this command.

coor <- read.csv(file="coordinates.csv",head=TRUE,sep=",")

A graph of this dataset can be created (plotted).

plot(co)

Resulting in the picture below.

plotted x and y
After that, I am creating a new dataset that only contains the x values, using the command below.

xco = coor[,1]

And then I sort the x values ascending.

xcor <- sort(xco)

Then I use the following command

plot(xcor)

to create a graph of the result as displayed below.

plotted sorted x coordinates

The final result is a relatively perfectly straight line because (as I expected) for each data point the (x) value is increased by the same amount, resulting in a lineair function. As a result this satisfies my needs, so this is where I stop testing. I could have created, using a Python script, a dataset containing all the differences between the consecutive x coordinates and I could have checked the distribution of these differences with R. I leave this for another time.

Afterthoughts

One of the questions you might ask is if I really tested the distribution functionality. My answer would be a downright ‘No’. I used the distribution functionality in an exercise, but the goal of the exercise was not to test the functionality. The goal was to see what tools can do in a complex situation. If I had really investigated the distribution functionality, I would have created a coverage outline and I would certainly have tried different kinds of input. Also I would have had to take a more in depth look at the concept of distribution.

One of the results of this exercise is that I know a little bit more about scripting, about the language R and about vector images. Also, I learned that the skills related to software testing are manifold and that it is not easy to describe them. I particularly liked describing how I arrived at my definition of the expected result, which meant investigating different sources and drawing conclusions from that investigation. I feel that the software tester should be able to do such an investigation and to build the evidence of testing on it. I also learned again that complexity is a many headed monster that often roams freely in software development. Testers need to master the tools that can tame that beast.

Some exploration took place in the form of ‘tinkering’ with the distribution function of Inkscape. This helped me build a mental model of distribution. Furthermore I toyed with R on a ‘trail and error’ basis, in order to find out how it works.