Wed
22
Aug '07
|
|
I recently bought Beautiful Code. In chapter 7 Alberto Savoia writes the essay “Beautiful Tests” about the use of randomized tests to easily create a wide range of inputs for a system under test.
I really like and use his idea of randomized tests. But there is one very important hint missing in the chapter:
Initialize the random number generator for every test to a fixed seed!
Why? Tests should be repeatable! Otherwise you can never be sure if you have fixed a bug from the previous test run, because you don’t know if the same test data was created.
Why for every test? Again: tests should be repeatable! If you want to run a single test and the random number generator was initialized for the whole test suite, you behavior will change for all but the very first test case.
Yep, that’s the first requirement for reproducibility. I haven’t read the book to know what is covered but here’s a few from my experience:
1. Changing any ordering in a random test will change your randomness. The only way to guard against this is to set a functional coverage point so that you know the random stimulus hit the scenario you are trying to repeat. The corollary is that threaded random tests are virtually impossible to repeat, unless you are using non-interruptible threads (ie cooperative threads).
2. Random testing needs to be correlated with code coverage or functional coverage, so it can be measured. In other words, you need to know that your random stimulus is hitting all the potential cases. To be even more rigorous, you should start with zero coverage, run your random or saved-off random test suite, and then see what percentage of coverage you hit (hopefully you can get it to 100%).
2.1. In utopia, your random stimulus generator gets feedback based on the current coverage percentage, and then dynamically tweaks its randomization to target the missing coverage pieces. I’ve never seen this before, but you can always wish.
3. Depending on how your random stimulus works, you may be creating stimulus impossible in the real world. Thus, getting your code to correctly interpret this may be a waste of time. As a corollary, debugging random stimulus failures is usually much harder than debugging directed tests, due to random generating very bizarre cases. However this is often a good thing, as human-created directed tests are often boring and repetitive, which usually do not yield many bugs.
Or, at the very least log the random generator seed so that if a test fails you can recreate the failure.