Tweaking automated checks - part three

29/01/2016 19:26
Generating random data
 
The next part of my tweaks to writing checks is drawn from a talk at the XPdays Germany 2010 by Nicole Rauch and Marc Philipps. I'm pretty bad at coming up with random data. And I don't like doing it. It also mostly clutters my code a lot when I really want to set up some useful data. Therefore I rather invest a little time into writing something that does these things for me and where I can hide a bit of complexity that I don't need to know about when reading a check. 
 
In an earlier post I mentioned test data generation as part of my test approach. This time I will elaborate a bit by using ScalaTest with ScalaCheck as example of how it can look. The concepts that I explain here work with any Quickcheck based framework that I have encountered so far even if the syntax differs.
 
Consider you are testing some code around two classes Product and Client, that look somewhat like this:
 
Product(id: Long, started: Long, name: String)
 
Client(group: String, name: String, id: Long, products: List[Product])
 
So every time when I want to use Clients in my checks, I need to create one of them and even give them a list of Products, that I have to create as well unless I keep that list empty. Looks like a lot of work and high risks of clashing with my "significance needs". 
 
So instead of doing that every time, I write myself generators that takes care of it: 
 
def genLong = Gen.choose(0L, 99999L)
def genLowPosInt = Gen.choose(0, 100)
 
val product: Gen[Product] =
    for {
      id <- genLong
      started <- genLowPosInt
      name <- Gen.alphaStr
    } yield Product(id, started, name)
 
val client: Gen[Client] =
    for {
      group <- Gen.alphaStr
      name <- Gen.alphaStr
      id <- genLong
      products <- Gen.listOf(product)
    } yield Client(group, name, id, products)
 
The first two lines are generators that choose a number randomly, whereas the other two functions create more elaborate generators that combine the basic ones. By calling client now, I get a generator for Clients back. Not yet the Clients themselves. I personally use generators in two different ways:
 
 
1. Sampling 
 
If I have a check that is somehow expensive and I just need some data randomized, then I sample the generator directly:
 
val randomClient = client.sample.get
 
By sampling I get a brand new Client every time. All the generators within this generator will be bumped and give me a new value. The values are then puzzled together to the new Client that is as random as the involved generators. 
 
Now I can use this random data in my checks to do significant things on it. For example I exchange the list of products for an empty one in one check and mess with different groups in another, while the rest of the Client is random.
 
If I need a setup of properties for a whole series of checks I can also write the generator more specific. For example I can assure that there is never an element in the list of products or always at least one or precisely 3.
 
If you need to know with which data your test fails, you need to be careful here. Although the quickcheck frameworks I have seen support shrinking, that information about the seed with which the data is generated, is lost on sampling. To know with what data your check fails, you'll need to get that information somehow in your error message. More on that topic in the last post of this series.
 
 
2. Crowding
 
For unit tests, as they are cheap, I often don't sample one single case of the generator but use the property based testing functionality that was in in every quickcheck based library I have seen so far. Then you feed the check with the generator instead of the values and let the check run x times with different property setups. In scala that looks like this:
 
"Clients" should "have products" in {
    forAll(clients) { (client: Client) =>
      client should haveProducts
    }
  }
 
For those who haven't looked into ScalaTest yet: The first line of this is basically just naming the test and making it one. In the second line there's the magic: forAll(clients) causes the check to run mutliple times. If it's not specifically specified, then it is 100 times with different clients. And the rest of the code reads basically "a client of type client will be projected to client should haveProducts". haveProducts is a wrapper function that allows me to write readable English and checks if the list of products is not empty.
 
This way you can run the check multiple times without writing new checks for every value combination. Sometimes I find a bug when writing such a check because the generators generate also null and empty values if you don't explicitly tailor them differently. Therefore the code undergoes a second round of thinking about its corner cases. And instead of trying to check all important different cases at the same time, you can check different values controlled by properties with every run. Over time you cover much more and different ground then with picking just some examples. That is in my cases usually good enough.
 
Disclaimer: I call this kind of testing crowding but don't expect this to be any kind of official name that you can look up. I just don't know how to describe this better as opposed to the sampling.
 
Feel free to comment. The next and last part of this series is here. If you want to go back to the second post within this series, you'll find it here.

—————

Back


comments powered by Disqus