Sunday, October 19, 2008

When Code Coverage Is Necessary But Not Sufficient

In Can unit testing be a waste? pbielicki argues that it is far more efficient to test at the highest possible level of abstraction and rely on your code coverage tools to insure that all code paths have been tested. That might work in his world, but just a couple of days after having read that article I ran into a situation where developers had been doing just that and missed a bug completely.

The C++ code was implementing an algorithm in an open specification for the binary encoding of an identity string into a binary format for transmission to a remote piece of equipment. The identity string is a globally unique fifteen digit decimal number. The string is broken up in to several separate fields and each field is encoded from decimal into binary before being stuffed into a protocol packet destined for the remote equipment. There are several encoding algorithms depending on the length of the specific field, which range from one to three decimal digits in length.

Looking at the couple of dozen lines of C++ code in the base, I thought "It would be reasonable to assume that if any valid identity string were to be encoded into binary, then decoded back into a decimal string, the result would match exactly the original fifteen digit string." So knowing exactly nothing yet about the specific algorithms from the spec, I wrote a little unit test to do precisely that.

How many of the possible values did I test for each encoding algorithm? All of them, of course. On a 2.8 gigahertz Pentium 4, cycles are easy to come by.

What did I find? Twenty percent of the possible values failed this simple unit test.

I am pleased to report, however, that code coverage was one hundred percent. In fact, testing any single input value for any of the algorithms would have yielded code coverage of one hundred percent. Given that eighty percent of the possible values passed unit testing, it would be easy, in fact, likely, that you would not catch this bug just testing a few selected values. Conceivably you could test a lot of values, and still not have any failures, if you just happened to stay within the eighty percent that worked.

It wasn't a matter of code coverage. It was a matter of input range coverage.

I have written unit tests that ran for many minutes, testing a huge range of input values. For an algorithm that did time and date calculations, I had a unit test that ran for days. Fortunately, I didn't have to run it very often. But when I did run it, and it completed successfully, I was pretty darn sure that the underlying code worked.

Cycles are cheap. When dealing with what are fundamentally mathematical algorithms, there is no reason not to test a lot of values. When in doubt, test all of them.