Unit Test Data Gotchas

You write unit tests right? Unit tests are the first line of defense against bugs creeping into our code over time. But have you thought about the data that you use? Writing tests is a great first step, but if you don’t consider the data used in those tests you may have a false sense of security. If you don’t consider the scenarios I present below, you could have tests that provide 100% code coverage but still let bugs slip through.

Invalid Data

Checking that our code handles strange values is one of the first things we typically learn to add to our tests. These tend to be the values we use when a bug is discovered, or we’re checking some of the conditions we have in our code. Some of the values you might typically test for are:

Null. The bane of our existence and the source of many bugs
Blank strings. This includes zero length strings as well as strings with only whitespace
Zero
Negative values
Large values. Both positive and negative

Off By One

“There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.” — Jeff Atwood

If you’re not paying attention, off-by-one errors are easy to slip by you. Here are some common cases to consider:

Array or List index. Include tests that reference the last element and the first element.
String manipulation. If you’re looking for something in a string, include conditions where that character or substring is at the beginning or end of the input string.
Range checks. When you compare that a value is greater than or less than (or equal) that another, test with values that are right at the boundary and those just past. For example, if you have (x < 5), test with 4, 5 and 6 as inputs.

Another common variation is the Fence Post Error. If I need to build a fence 24 feet long with 8 foot long sections, how many fence posts do I need? If you answered four, congratulations! But it’s easy to make a mistake in this is the type of problem if you’re not paying attention. So set up your data to help you catch this case. If I asked for a fence 8 feet long, it’s easier to see that the answer needs one more than expected.

Unique Values

One of the more insidious bugs is where everything appears to work, until real-world data hits your code. Consider the following class.

public class TestDataExample {
  private final String fooValue;
  private final String barValue;

  public TestDataExample(String fooValue, String barValue) {
    this.fooValue = fooValue;
    this.barValue = fooValue;
  }

  public String getFooValue() { return fooValue; }
  public String getBarValue() { return barValue; }
}

Now suppose you wrote the following test for this class:

import spock.lang.Specification
class TestDataExampleTest extends Specification {
  def 'Test Data Example'() {
    when:
      def obj = new TestDataExample('abc', 'abc')
    then:
      obj.fooValue == 'abc'
      obj.barValue == 'abc'
  }
}

You run the test, it passes, there’s 100% coverage of class. Life is good.

By now you have probably spotted the bug. The constructor of TestDataExample assigns fooValue to both of its fields. The test didn’t catch this because it uses the same value for both constructor arguments.

The main take-away here is to make sure you use unique values for each of your inputs. This is a simple example, but imagine you’re writing code that parses a search result. If you don’t give every field in that response a unique value, how do you know that it got to the right place when you mapped it to your object?

Data, Not Just Coverage

If you’re writing unit tests for your code, give yourself a pat on the back. You’re ahead of a lot of developers. If your tests cover a large percentage of the code, that’s even better!

Just remember that code coverage isn’t the end of the story. As you can see, it’s possible to have bugs even when there are tests that cover 100% of the code.

Think about the invalid data you could pass. Look at the code to see where the boundary conditions are. Make sure you’re using unique values for each input. Keep these things in mind and you’ll have tests that are much more robust and are much more likely to find bugs before they get anywhere near the release train.

Question: What test data do you find most useful in your unit tests?

David Hay

Building developers one line of code at a time

Unit Test Data Gotchas

Invalid Data

Off By One

Unique Values

Data, Not Just Coverage