There is hopefully no need for me to discuss the benefits of unit testing but with more and more shops adopting agile, “scrum-but” or other TDD-like processes, people start to realise that it’s not all plain sailing. One of the major headaches related to unit testing is the cost of maintaining the tests themselves. Unless you get smart about your testing approach, growing number of unit tests may become a drag over time seriously increasing the cost of any change, becoming a hindrance rather than help.
In order to be successful with any form of TDD it’s worthwhile to consider some basic rules. With notable exceptions of posts by Szczepan Faber and Michael Feathers, Google renders rather poor results on the subject so I decided to give it another go. What you are (hopefully) about to read is the result of the learning I have received with other #Fellows in the “school of hard knocks”. We have learned by our, sometimes grave, mistakes so be smart and do not repeat them.
Rob C. Martin in his Clean Code mentions the following FIRST rules when it comes to unit testing. According to the rules unit tests should be:
- Fast, as otherwise no-one will want to run them,
- Independent as otherwise they will affect each other’s results
- Repeatable in any environment
- Self-validating (either failing or passing)
- written in a Timely manner (together with the code being tested, not in some yet undetermined “future”)
The rules are by all means valuable but when it comes to practice of programming we need to get a bit more detailed. So let’s start with the basics
Design for testing
The prerequisite to successful unit tests is the “testability” of the code, and for the code to be testable you need to be able to inject it with mock implementation of dependencies. In practice it means that database, network, file, UI etc access has to be hidden behind some form of abstraction (interface). This gives us the possibility to use mocking framework to construct those dependencies and cheaply test various combinations of test inputs. The other obvious benefit is the fact that your tests become independent of difficult to maintain external “variables” as there is no need to deploy files, databases etc. When it comes to UI dependencies (using UI controls etc) avoid them in your testable code as they may introduce thread affinity which cannot always be guaranteed by the environment/test runner.
The testable architecture naturally gravitates towards SOLID design principles as individual classes tend to have single responsibility and have every chance to remain “solid” regardless of the project timescales and turns in direction it may take.
Do not repeat yourself
The primary sin of unit testing is repetition and it is easy to explain it: you start with one test of your class Foo. Then you add another test scenario, so now both of them will have to get an instance of Foo from somewhere and they will obviously “new()” the Foo. Then you will have more methods to tests and more lines of new Foo(). And one day you will need to add a parameter to Foo’s constructor and suddenly all those pesky tests will need to be reworked. You may argue that it should be easy enough to add an overload, but sometimes fabrication of objects gets more complicated: you need to use properties, call some methods on them etc. For this reason my advice is: Do not Repeat Yourself (stay DRY), use factory methods, builders etc in order to avoid the costs of refactoring which will inevitably get you one day. Funnily enough John Rayner of this shire produced recently a very nice automocking container which helps in instantiating types with large number of dependencies.
I find it quite interesting that developers are often willing to accept substantially lower coding standards when it comes to unit tests, often copy-pasting entire test cases only to change one line within it. Before you realise it, your test code base becomes a tangled, repeatable mess and introducing any change becomes a serious cost and challenge. If you ever come across a situation when you need to test similar cases over and over again, re-factor a common method, use a template method pattern or row based testing as implemented in NUnit etc but please, do not repeat yourself!
Keep the intent clear
How many times have you come across a test methods called “Test1”? Not a very helpful name, isn’t it? The purpose of the tests should be crystal clear as otherwise it may be very difficult to decipher over time the original intent. And if the intent needs to be ever deciphered, it adds to the cost of change. I have worked once on a project where the cost of refactoring tests was substantially higher than the cost of changing the code itself and other than violation of the DRY rule, the primary reason was the difficulty of “rebuilding” the tests so that they tested the original behaviour exactly as intended. To make the intent of the test clear I try to follow these rules:
- Have a consistent naming convention for tests and test methods: say Ctor_WIthInvalidArgs_ThrowsException(). This naming convention outlines what is being tested, how it is being tested and what is expected. Or better yet…
- Consider using NBehave to specify your expectations and assumptions. Side effect of using NBehave is the fact that your expectations and assumptions become crystal clear
- Follow “triple A” structure for test methods (Arrange, Act Assert) and make it obvious in the test code what are you doing
- Test one thing (method, property, behaviour) per test method
- Add messages to asserts. Asserting that (a == b) won’t tell you much when the expectation gets violated.
Do not share static state
In order to stick with the Isolation rule, it’s good practice to give up on shared static state as it is very easy to introduce frustrating dependencies between tests. Other than this, static variables are either difficult or impossible to mock, or lead to seriously hacked code (a singleton with a setter etc). For this reasons any statics, singletons etc have to be avoided for the sake of testability. If you think of this restriction as harsh, think again: there is always a way to replace your singleton with another pattern (say a factory returning a shared instance) which at the end of the day produces the same results, yet is far easier to test and more flexible in the long term.
I sometimes come across a phenomenon which I call mega-test: this usually involves running a process end-to-end and producing some output file (usually in 10s of MBs) which is then compared to a “master” file with the expectation that it has to be identical. This is probably the worst ever way to test software and the only circumstances where it could be acceptable practice is when refactoring a system which has no unit tests whatsoever. The problem with tests like this is the fact that when they fail, investigating the cause of failure is very expensive as you will have to trawl through tons of code to get to the bottom of the problem. Test failure should make you smarter and you should be able to say what went wrong pretty much immediately. Do not get me wrong, I am not saying here that you should not have “integration tests”. But the fact remains that majority of the testing work should be done through unit testing of individual components rather than slow and laborious “end to end” testing which often requires difficult preparation, setup and deployment.
Avoid strict mocks
This is another constant source of grief as strict mocks do not allow your code to deviate from pre determined expectations: so strict mock will accept the calls you set it to, but anything other than this will fail. If the code under test changes a little and does a bit less or more with the mock, the test will fail and you will need to revisit it adding extra expectations. Do not get me wrong here, strict mocks are valuable tool, but be very careful how you use them and ask yourself what exactly are you testing?
Symmetry between code being tested and unit tests is all about 1:1 relationship between the code and it’s unit tests: so you have one test fixture per class, one test project per assembly etc. The reason for this symmetry is the fact that if you do change a class, there is (hopefully) only one test fixture which you need to execute to get a good idea if the code still works. This setup makes life easier as if you run your tests constantly you do not have to run all of them all the time. It also becomes easier to package code together with tests in cases you want to share it with someone.
Szczepan Faber mentions that testing class hierarchies is difficult and I agree with him, but my approach to resolving this problem is slightly different. If you maintain symmetry, every type (including base and derived types) will have a corresponding test fixture, which means that you will not get into a situation where you test the base through tests designed for the derived class. This is a rather uncomfortable as you will potentially have the same functionality tested multiple times, or worse yet, when the derived type gets scraped, your base class will be left “uncovered”. Maintaining the symmetry will help you alleviate this issue.
Maintaining the symmetry may be tricky in case of abstract types which cannot be instantiated, but in this case you can either derive a special “test only” type or use one of the already derived types hidden behind a factory of sorts.
Sometimes in large projects it becomes necessary to follow certain convention in your unit tests, e.g. it sometimes makes sense to run the tests within transaction scope, or follow certain Setup/Teardown standards. In such cases you may want to derive all test fixtures from a common base, and to enforce the rule, write a test inspecting all test fixtures through reflection making sure that the test classes do indeed use common base class. I used this approach in the past to isolate all integration tests from each other through transaction scope.
Do not get too clever!
The most frustrating test failures are related to someone being unusually clever and writing a test which for example sets value held in a private field using reflection as the code cannot be tested otherwise. Then you happen to rename this field in good faith and everything builds, you do not see any reason to run the tests, check in your code and the integration build bombs out. Not a pretty sight but it’s an evil practice so please do not try to get overly smart with your testing.