Friday, September 11, 2009

Integration Tests Are a Scam?

I recently listened to a talk given by J. B. Rainsberger (author of JUnit Recipes) with the title Integration Tests Are a Scam (summary notes here). If the idea seems crazy, blame the fact that he's from Canada ;) These are some quick thoughts I had, I may expand on them later.

Here's some definitions he gives:
Basic Correctness
"Given the myth of perfect technology, do we compute the right answer?"
Myth of Perfect Technology
"Assuming we can use an arbitrary large amount of memory, for an arbitrary amount of time, on a Turing machine for spherical people[...]"
Integration Tests
"...any test whose result (pass or fail) depends on the correctness of the implementation of more than one piece of non-trivial behavior."
"You should never need to write an integration test to show basic correctness." He believes our largest problems lie in basic correctness. After we get this right, then we can worry about issues of performance, security, etc. The question of basic correctness is where he focuses his efforts. (He paraphrases a quote I believe based on the Pareto Principle).

Downsides of integration testing:
  • Intergration tests are slow
  • Integration tests don't tell you where the failure occurred (may be difficult to find even with debugger, assuming TDD hasn't caused you to forget how to use one)
  • In order to have enough tests at the integration level to test thoroughly, the number of tests that need to be written increases combinatorially, based on code paths
  • There is a lot of duplication in test setup

Now, it should be noted that he is not talking about acceptance tests. He says that acceptance tests tend to be end-to-end, and that is OK. But end-to-end tests should not be used for developer tests. He is also not altogether against integration tests for finding bugs, he just doesn't want them permanently added to the project. Bugs found through an integration test should create new object tests. "I don’t doubt the necessity of integration tests. I depend on them to solve difficult system-level problems. By contrast, I routinely see teams using them to detect unexpected consequences, and I don’t think we need them for that purpose. I prefer to use them to confirm an uneasy feeling that an unintended consequence lurks."

Instead, he recommends 'collaboration tests' (commonly called 'interaction tests') and 'contract tests'. By collaboration tests, he means to stubbing out or mocking the collaborators to isolate functionality and make sure all the ways it can interact with collaborators behave as expected. This is 1/2 of the work (and actually the easier 1/2). You've checked if you've asked the right questions and able provide an answer for all the responses.
The missing piece (that commonly causes people to rely on integration tests) is a misunderstanding between the interaction of piece in question and its collaborators.

The second 1/2 is 'contract tests'. The first of the two checks on the other side of the interface is whether the collaborator able to provide a response when "the star" (Class in Test CIT) asks for it (is it implemented? can it handle the request in the first place?). The second is whether the the collaborator responds in the way the CIT is expecting. "A contract test is a test that verifies whether the implementation respects the contract of the interface it implements." There should be a contract test for every case we send the collaborator and every case the collaborator might send back. Again this will using stubbing and mocking. The advantage of this approach is that you know when you have enough tests (two for each behavior). I've tried to diagram the idea thusly:

He claims that if you ask these questions between every two services and focus on basic correctness, we can be "arbitrarily confident" in the correctness. The number of tests increases additively instead of combinatorially and is easier to maintain, with less duplication, and faster to run. If something goes wrong, you are either missing a collaboration test or missing a contract test or the tests do not agree. This makes troubleshooting easier. As of yet, there is no automated way of testing that every collaboration test has a matching contract test.

When I saw the title of the talk, I initially reacted rather violently against the notion. I'm still not sure if I'm 100% behind it, but I think there are some good points raised about integration tests and their utility. However, as Dan Fabulich points out in a reply to a response Rainsberger gave to a comment about a Mars rover failure, figuring out that you are missing a test may not come easily.
"The ability to notice things" is high magic. If you have that, you can find/fix any bug without any tests... why don't we all just "notice" our mistakes when writing production code? In this case you're just using intuition to notice a missing test, but that's no easier than noticing bugs.

As you know, I share your view that integration tests are tricky, in the sense that writing one tempts you into writing two, where instead you should be writing more isolated unit tests. But unit tests have the opposite problem: once you have some unit tests, it's too easy to assume that no more testing is necessary, because your unit tests have covered everything. By exaggerating the power of unit tests and the weakness of integration tests, you may be doing more harm than good.

Imagine you're actually coding this. You just finished writing testDetachingWhileLanded and testDetachingWhileNotLanded. (It was at this point in your story that you first began to "notice" that a test was missing.) You go back over the code and find you have 100% branch coverage of the example application. Your unit tests LOOK good enough, to a superficial eye, to an ordinary mortal. But you're still missing a critical test. How are you supposed to just "notice" this?

More generally, how are you supposed to build a habit or process that notices missing tests *in general*?

I've got just the habit: write all the unit tests you can think of, and then, if you're not sure you've got enough unit tests, do an integration test. You don't even necessarily have to automate it; just try it out once, in real life, to see if it works. If your code doesn't work, that will help you find more unit tests to write. If it does work, don't integration-test every path; you were just double-checking the quality of your unit tests, after all."
<edit>
While I wouldn't go so far as calling it 'magic', finding all the edge cases can be difficult and may require a fair amount of knowledge about the collaborator. Rainsberger later commented that his method of ensuring every condition is tested is
Every time I stub a method, I say, "I have to write a test that expects the return value I've just stubbed." I use only basic logic there: if A depends on B returning x, then I have to know that B can return x, so I have to write a test for that.

Every time I mock a method, I say, "I have to write a test that tries to invoke that method with the parameters I just expected." Again, I use only basic logic there: if A causes B to invoke c(d, e, f) then I have to know that I've tested what happens when B invokes c(d, e, f), so I have to write a test for that.

Dan Fabulich suggests adding either "Every time I stub a method that can raise an exception, I have to stub it again with a test that expects the exception" or "Every time I stub a method to return X, I also have to write a test where the stub returns Y. And Z. For all possible return values of the method." Of course, it's impossible (or at least very difficult) to be sure you've gotten all edge cases.

My takeaway from all this is that integration tests are overused, often perhaps as a half-baked attempt to remedy poor unit tests (even though the two different tests try to solve different problems). While I'm not quite ready to do away with integration tests entirely (I think they provide a useful documentation of examples of use without going into the nitty gritty details of a unit test and make a nice supplement to unit tests), I think one should recognize their place: performance testing, and as general review. NOT for finding bugs or ensuring changes didn't break anything and certainly not for finding where they occurred. One should add them as a separate module that is only built when requested, or using something like the FailSafe plugin for Maven.


</edit>

One idea that he mentions early on in the talk is the idea of having only one assert per test. This is something I'm occasionally guilty of (especially if the method being tested does several things). This should be a testing smell that may indicate the need for some refactoring.

He also mentions what first got him interested in TDD, which I thought was one of the most compelling reasons I've heard so far to use TDD. When you don't use TDD you have a seemingly endlessly depressing cycle of writing tests, fixing bugs, writing more tests, and so on...how do you know when you're finished? When you do TDD, it has a bit more definitive ending point:
  • Think about what you want to do
  • Think about how to test it
  • Write a small test. Think about the desired API
  • Write just enough code to fail the test
  • Run and watch the test fail. (The test-runner, if you're using something like JUnit, shows the "Red Bar"). Now you know that your test is going to be executed
  • Write just enough code to pass the test (and pass all your previous tests)
  • Run and watch all of the tests pass. (The test-runner, if you're using JUnit, etc., shows the "Green Bar"). If it doesn't pass, you did something wrong, fix it now since it's got to be something you just wrote
  • If you have any duplicate logic, or inexpressive code, refactor to remove duplication and increase expressiveness -- this includes reducing coupling and increasing cohesion
  • Run the tests again, you should still have the Green Bar. If you get the Red Bar, then you made a mistake in your refactoring. Fix it now and re-run
  • Repeat the steps above until you can't find any more tests that drive writing new code
(from the C2 wiki)
I like that. This would help address my previously mentioned fear of knowing when you've tested everything. (Though I'm sure it's not foolproof).