What is code coverage and how do YOU measure it?
I was asked this question regarding our automating testing code coverage. It seems to be that, outside of automated tools, it is more art than science. Are there any real-world examples of how to use code coverage?
Code coverage is a measurement of how many lines/blocks/arcs of your code are executed while the automated tests are running.
Code coverage is collected by using a specialized tool to instrument the binaries to add tracing calls and run a full set of automated tests against the instrumented product. A good tool will give you not only the percentage of the code that is executed, but also will allow you to drill into the data and see exactly which lines of code were executed during a particular test.
Our team uses Magellan - an in-house set of code coverage tools. If you are a .NET shop, Visual Studio has integrated tools to collect code coverage. You can also roll some custom tools, like this article describes.
If you are a C++ shop, Intel has some tools that run for Windows and Linux, though I haven't used them. I've also heard there's the gcov tool for GCC, but I don't know anything about it and can't give you a link.
As to how we use it - code coverage is one of our exit criteria for each milestone. We have actually three code coverage metrics - coverage from unit tests (from the development team), scenario tests (from the test team) and combined coverage.
BTW, while code coverage is a good metric of how much testing you are doing, it is not necessarily a good metric of how well you are testing your product. There are other metrics you should use along with code coverage to ensure the quality.
Code coverage basically tells you how much of your code is covered under tests. For example, if you have 90% code coverage, it means 10% of the code is not covered under tests.
I know you might be thinking that if 90% of the code is covered, it's good enough, but you have to look from a different angle. What is stopping you from getting 100% code coverage?
A good example will be this:
if(customer.IsOldCustomer())
{
}
else
{
}
Now, in the code above, there are two paths/branches. If you are always hitting the "YES" branch, you are not covering the "else" part and it will be shown in the Code Coverage results. This is good because now you know that what is not covered and you can write a test to cover the "else" part. If there was no code coverage, you are just sitting on a time bomb, waiting to explode.
NCover is a good tool to measure code coverage.
Just remember, having "100% code-coverage" doesn't mean everything is tested completely - while it means every line of code is tested, it doesn't mean they are tested under every (common) situation..
I would use code-coverage to highlight bits of code that I should probably write tests for. For example, if whatever code-coverage tool shows myImportantFunction() isn't executed while running my current unit-tests, they should probably be improved.
Basically, 100% code-coverage doesn't mean your code is perfect. Use it as a guide to write more comprehensive (unit-)tests.
Complementing a few points to many of the previous answers:
Code coverage means, how well your test set is covering your source code. i.e. to what extent is the source code covered by the set of test cases.
As mentioned in above answers, there are various coverage criteria, like paths, conditions, functions, statements, etc. But additional criteria to be covered are
Condition coverage: All boolean expressions to be evaluated for true and false.
Decision coverage: Not just boolean expressions to be evaluated for true and false once, but to cover all subsequent if-elseif-else body.
Loop Coverage: means, has every possible loop been executed one time, more than once and zero time. Also, if we have assumption on max limit, then, if feasible, test maximum limit times and, one more than maximum limit times.
Entry and Exit Coverage: Test for all possible call and its return value.
Parameter Value Coverage (PVC). To check if all possible values for a parameter are tested. For example, a string could be any of these commonly: a) null, b) empty, c) whitespace (space, tabs, new line), d) valid string, e) invalid string, f) single-byte string, g) double-byte string. Failure to test each possible parameter value may leave a bug. Testing only one of these could result in 100% code coverage as each line is covered, but as only one of seven options are tested, means, only 14.2% coverage of parameter value.
Inheritance Coverage: In case of object oriented source, when returning a derived object referred by base class, coverage to evaluate, if sibling object is returned, should be tested.
Note: Static code analysis will find if there are any unreachable code or hanging code, i.e. code not covered by any other function call. And also other static coverage. Even if static code analysis reports that 100% code is covered, it does not give reports about your testing set if all possible code coverage is tested.
Code coverage has been explained well in the previous answers. So this is more of an answer to the second part of the question.
We've used three tools to determine code coverage.
JTest - a proprietary tool built over JUnit. (It generates unit tests as well.)
Cobertura - an open source code coverage tool that can easily be coupled with JUnit tests to generate reports.
Emma - another - this one we've used for a slightly different purpose than unit testing. It has been used to generate coverage reports when the web application is accessed by end-users. This coupled with web testing tools (example: Canoo) can give you very useful coverage reports which tell you how much code is covered during typical end user usage.
We use these tools to
Review that developers have written good unit tests
Ensure that all code is traversed during black-box testing
Code coverage is simply a measure of the code that is tested. There are a variety of coverage criteria that can be measured, but typically it is the various paths, conditions, functions, and statements within a program that makeup the total coverage. The code coverage metric is the just a percentage of tests that execute each of these coverage criteria.
As far as how I go about tracking unit test coverage on my projects, I use static code analysis tools to keep track.
For Perl there's the excellent Devel::Cover module which I regularly use on my modules.
If the build and installation is managed by Module::Build you can simply run ./Build testcover to get a nice HTML site that tells you the coverage per sub, line and condition, with nice colors making it easy to see which code path has not been covered.
In the previous answers Code coverage has been explained well . I am just adding some knowledge related to tools if your are working on iOS and OSX platforms, Xcode provides the facility to test and monitor code coverage.
Reference Links:
https://developer.apple.com/library/archive/documentation/DeveloperTools/Conceptual/testing_with_xcode/chapters/07-code_coverage.html
https://medium.com/zendesk-engineering/code-coverage-and-xcode-6b2fb8756a51
Both are helpful links for learning and exploring code coverage with Xcode.
The purpose of code coverage testing is to figure out how much code is being tested. Code coverage tool generate a report which shows how much of the application code has been run. Code coverage is measured as a percentage, the closer to 100%, the better. This is an example of a white-box test. Here are some open source tools for code coverage testing:
Simplecov - For Ruby
Coverlet - For .NET
Cobertura - For Java
Coverage.py - For Python
Jest - For JavaScript
For PHP you should take a look at the Github from Sebastian Bergmann
Provides collection, processing, and rendering functionality for PHP code coverage information.
https://github.com/sebastianbergmann/php-code-coverage
What code coverage IS NOT
To truly understand what code coverage is, it is very important to understand what it is not.
A couple of answers/comments here and on related questions have alluded to this:
Franci Penov
BTW, while code coverage is a good metric of how much testing you are doing, it is not necessarily a good metric of how well you are testing your product.
steve
Just because every line of your code is run at some point in your tests, it doesn't mean you have tested every possible scenario that the code could be run under. If you just had a function that took x and returned x/x and you ran the test using my_func(2) you would have 100% coverage (as the function's code will have been run) but you've missed a huge issue when 0 is the parameter. I.e. you haven't tested all necessary scenarios even with 100% coverage.
KeithS:
However, the flip side of coverage is actually twofold: first, a test that adds coverage for coverage's sake is useless; every test must prove that code works as expected in some novel situation. Also, "coverage" is not "exercise"; your test suites may execute every line of code in the SUT, but they may not prove that a line of logic works in every situation.
No one says it more succinctly and to the point than Mark Simpson:
Code coverage tells you what you definitely haven't tested, not what you have.
An Illustrative Example
I spent some time writing a reply to a feature request that Istanbul (a Javascript test coverage tool) "Change definition of coverage to require more than 1 hit" per line. No one will ever see it there đ¤Ł, so I thought it might be useful to reuse the gist of it here:
A coverage tool CANNOT prove that your code is tested adequately. All it can do is tell you that you provided some kind of coverage for every line of code in your codebase, but even then it doesn't prove the coverage means anything, because a test might execute a line of code without making any assertions on its results. Only you as a developer can decide the actual semantically unique input variations and boundary conditions that need to be covered by tests and ensure that the test logic does in fact make the right assertions.
For example, say you have the following Javascript function. A single test that asserts an input of (1, 1) returns 1 would give you 100% line coverage. What does that prove?
function max(a, b) {
return a > b ? a : b
}
Putting aside for a moment the semantically poor coverage of this test, the 100% line coverage is rather misleading too, as it doesn't provide 100% branch coverage. That's easily seen by splitting the branches onto different lines and rerunning the line coverage report:
function max(a, b) {
if (a > b) {
return a
} else {
return b
}
}
or even
function max(a, b) {
return a > b ?
a :
b
}
What this tells us is that the "coverage" metric depends too much on the implementation, whereas ideally testing should be black box. And even then it's a judgement call.
For example, would the following three input cases constitute complete testing of the max function?
(2, 1)
(1, 2)
(1, 1)
You'd get 100% line and 100% branch coverage for the above implementations. But what about non-number inputs? Ok, so you add two more input cases:
(null, 1)
(1, null)
which forces you to update the implementation:
function max(a, b) {
if (typeof a !== 'number' || typeof b !== 'number') {
return undefined
}
return a > b ? a : b
}
Looking good. You have 100% line and branch coverage, and you've covered invalid inputs.
But is that enough? What about negative numbers?
The ideal of 100% blackbox coverage is a fantasy
In my opinion, in this situation, for the simple nature of this function, testing negative number cases is anal overkill. If the situation were different, say the function only existed because we need to implemented some tricky algorithm or optimization, that may or may not work as expected for negative numbers, then I'd add more input cases including negative numbers.
Often times, you only discover corner cases because you have hundreds or thousands of users and only through their using your software in unexpected ways or in conditions and software environments you could not foresee or reproduce even if you could are such rare cases exposed. And often those rare cases are artifacts of the nature of your implementation, not something you'd arrive at from analysis of an idealized abstraction of the buggy code's interfaces.
I think what that shows is the ideal of 100% blackbox coverage is a bit of a fantasy. You would waste a lot of time writing unnecessary tests if you treated everything as an idealized black box. In the example above, I know the implementation uses a simple and reliable non-number check and then uses the native Javascript logic to compare values (a > b), and that it would be silly to do anything more complex. Knowing that, I'm not going to test passing in negative numbers, floats, strings, objects, etc.
At the end of the day, you have to be practical and use good judgement, and that judgement usually cannot ignore knowing something about the nature of what's in the black box, or at least the assumptions made inside the black box.
All this said, I don't have a CS degree đ. What's the equivalent of IANAL for programmer advice?
Related
I have a WCF service which runs and interacts with database, file system and few external web services, then creates the result and Xml Serialize it and returns it finally.
I'd like to write tests for this solution and I'm thinking how (it's all using dependency injection and design by contract).
There are 3 main approaches I can take.
1) I can pick smallest units of codes/methods and write tests for it. Pick one class and isolate it from its dependencies (other classes, etc). Although it guarantees quality but it takes lots of time writing them and that's slow.
2) Only make the interaction with external systems mockable and write some tests that cover the main scenarios from when the request is made until the response is serialized and returned. This will test all the interactions between my classes but mocks all external resource accesses.
3) I can setup a test environment where the interaction with external web services do happen, file access happens, database access happens, etc. Then writing the tests from end to end. this requires environmental setup and dependency on all other systems to be up and running.
About #1, I see no point in investing the time/money/energy on writing the tests for every single method or codes that I have. I mean it's a waste of time.
About #3, since it has dependency on external resources/systems, it's hard to set it up and running.
#2, sounds to be the best option to me. Since it will test what it should be testing. Only my system and all its classes and mocking all other external systems.
So basically, my conclusion after some years experience with unit tests is that writing unit tests is a waste to be avoided and instead isolated system tests are best return on investment.
Even if I was going to write the tests first (TDD) then the production code, still #2 I think would be best.
What's your view on this? would you write small unit tests for your application? would you consider it a good practice and best use of time/budget/energy?
If you want to talk about quality, you should have all 3:
Unit tests to ensure your code does what you think it does, expose any edge cases and help with regression. You (developer) should write such tests.
Integration tests to verify correctness of entire process, whether components talk to each other correctly and so on. And again, you as a developer write such tests.
System-wide tests in production-like environment (with some limitations naturally - you might not have access to client database, but you should have its exact copy on your local machines). Those tests are usually written by dedicated testers (often in programming languages different from application code), but of course can be written by you.
Second and third type of tests (integration and system) will be way too much effort to test edge cases of smaller components. This is what you usually want unit tests for. You need integration because something might fail on hooking-up of tested, verified and correct modules. And of course system tests is what you do daily, during development, or have assigned people (manual testers) do it.
Going for selected type of tests from the list might work to some point, but is far from complete solution or quality software.
All 3 are important and targeted at different test types that is a matrix of unit/integration/system categories with positive and negative testing in each category.
For code coverage Unit testing will yield the highest percentage, followed by Integration then System.
You also need to consider whether or not the purpose of the test is Validation (will meet the final user\customer requirements, i.e Value) or Verification (written to specification, i.e. Correct).
In summary the answer is 'it depends', and I would recommend following the SEI CMMi model for Verification and Validation (i.e. testing) which begins with the goals (value) of each activity then subjecting that activity to measures that will ultimately allow the whole process to be subjected to continuous improvement. In this way you have isolated the What and Why from the How and you will be able to answer time and value type questions for your given environment (which could be a Life support System or a Tweet of the day, to your favorite Aunt, App).
Summary: #2 (integration testing) seems most logical, but you shouldn't hesitate to use a variety of tests to achieve the best coverage for pieces of your codebase that need it most. Shooting for having tests for "everything" is not a worthy goal.
Long version
There is a school of thought out there where devs are convinced that adopting unit\integration\system tests means striving for every single chuck of code being tested. It's either no test coverage at all, or committing to testing "everything". This binary thinking always makes adopting any kind of testing strategy seem very expensive.
The truth is, forcing every single line of code\function\module to be tested is about as sound as writing all your code to be as fast as possible. It takes too much time and effort, and most of it nets very little return. Another truth is that you can never achieve true 100% coverage in a non-trivial project.
Testing is not a goal unto itself. It's a means to achieve other things: final product quality, maintainability, interoperability, and so on, all while expending the least amount of effort possible.
With that in mind, step back and evaluate your particular circumstances. Why do you want to "write tests for this solution"? Are you unhappy with the overall quality of the project today? Have you experienced high regression rates? Are you perhaps unsure about how some module works (and more importantly, what bugs it might have)? Regardless of what your exact goal is, you should be able to select pieces that pose particular challenges and focus your attention on them. Depending on what those pieces are, an appropriate testing approach can be selected.
If you have a particularly tricky function or a class, consider unit testing them. If you're faced with a complicated architecture with multiple, hard to understand interactions, consider writing integration tests to establish a clean baseline for your trickiest scenarios and to better understand where the problems are coming from (you'll probably flush out some bugs along the way). System testing can help if your concerns are not addressed in more localized tests.
Based on the information you provided for your particular scenario, external-facing unit testing\integration testing (#2) looks most promising. It seems like you have a lot of external dependencies, so I'd guess this is where most of the complexity hides. Comprehensive unit testing (#1) is a superset of #2, with all the extra internal stuff carrying questionable value. #3 (full system testing) will probably not allow you to test external edge cases\error conditions as well as you would like.
I have a function which saves photos(stored in database,app gives user option to save in a directory) to a given directory.Now, this was not working correctly.I just fixed it.Now, should I write unit test or integration test for the function?
For your case, you want to write an integration test to cover the scenario you mention. I have a full post on this topic. However, here's a summarized version specific to your question:
In his book The Art of Unit Testing, Roy Osherove describes a key principle that a unit test must be âtrustworthyâ. On the surface, this seems fairly obvious. However, this underlying highlights some of the key differences between a unit test vs an integration test.
With a trustworthy test, you must be able trust the results 100% of the time. If the test fails, you want to be certain that the code is broken and must be fixed. You shouldnât have to ask things like âWas the database down?â, âWas the connection string OK?â, âWas the stored procedure modified?â. By asking these questions, it shows that you aren't able to trust the results and you likely have a poorly designed âunit testâ.
As your scenario describes a situation with similar multiple dependencies, you want to cover it with a integration test. Again, for more details, see my full post here as well.
Good luck!
Integration tests and unit tests have different scopes and purposes:
Unit tests test small pieces of code (like a function) in isolation from the rest of the program, ideally covering all possible edge cases (like exceptions, null parameters, etc.)
Integration tests test an entire application from a use case point of view. They can never cover all edge cases, but they can catch problems with the interaction between parts of the code and the glue code that joins them together which unit tests often miss
For a singe function, you can really only have a unit test, and you should. But you could also have an integration test that shows that when the user presses a certain button, a photo is written into the directory, and can be opened in the program as well.
Integration tests help you to validate if your software is working properly.
Unit tests help you to find why your software is breaking.
Unit tests to some extent also contribute to the first goal. Plus it has a couple of advantages:
It's generally way cheaper to write and run a unit test with a much smaller scope.
It's easier to get coverage for the combinatoric explosion of states of you components using unit tests than an integration test. Say you have a setup involving three components. Each of them has 3 different states. Then integration testing the entire setup would involve checking 3 * 3 * 3 = 27 conditions. Unit testing the individual components would require testing 3 + 3 + 3 = 9 conditions. (This is oversimplified, but you will hopefully see the point.)
Because of this, unit tests are generally more popular than integration tests. However, you really cannot do without integration tests. Integration tests should be the cornerstone used for acceptance of your software. Having unit tests only just proves that you have a bunch of stuff doing something. An integration test proves that you have working software.
Some people would call a test for a DAO an integration test; others would say it's a unit test.
Whatever you call it, I'd say you should have a unit test for all the DAO functionality and an integration test for the front-to-back behavior embodied in the use case that says "give the user the option to save to the file system." I'd have integration tests for both scenarios, since it sounds like both are possible in your system.
I think it depends on the source of your problem.
If the function itself may have some problems in different scenarios you can have unit tests to test this scenarios over your function.
If integration of your function and other parts of your program may cause some problems you should think of an integration test.
Sometimes a function like yours may need some external resources to do its job it's not a bad idea to have some unit tests to see what will happen if some of these resources are not available
Let's assume one joins a project near the end of its development cycle. The project has been passed on across many teams and has been an overall free-for-all with no testing whatsoever taking place along the whole time. The other members on this team have no knowledge of testing (shame!) and unit testing each method seems infeasible at this point.
What would the recommended strategy for testing a product be at this point, besides usability testing? Is this normally the point where you're stuck with manual point-and-click expected output/actual output work?
I typically take a bottom-up approach to testing, but I think in this case you want to go top-down. Test the biggest components you can wrap unit-tests around and see how they fail. Those failures should point you towards what sub-components need tests of their own. You'll have a pretty spotty test suite when this is done, but it's a start.
If you have the budget for it, get a testing automation suite. HP/Mercury QuickTest is the leader in this space, but is very expensive. The idea is that you record test cases like macros by driving your GUI through use cases. You fill out inputs on a form (web, .net, swing, pretty much any sort of GUI), the engine learns the form elements names. Then you can check for expected output on the GUI and in the db. Then you can plug in a table or spreadsheet of various test inputs, including invalid cases where it should fail and run it through hundreds of scenarios if you like. After the tests are recorded, you can also edit the generated scripts to customize them. It builds a neat report for you in the end showing you exactly what failed.
There are also some cheap and free GUI automation testing suites that do pretty much the same thing but with fewer features. In general the more expensive the suite, the less manual customizition is necessary. Check out this list: http://www.testingfaqs.org/t-gui.html
I think this is where a good Quality Assurance test would come in. Write out old fashioned test cases and hand out to multiple people on the team to test.
What would the recommended strategy for testing a product be at this point, besides usability testing?
I'd recommend code inspection, by someone/people who know (or who can develop) the product's functional specification.
An extreme, purist way would be to say that, because it "has been an overall free-for-all with no testing whatsoever", therefore one can't trust any of it: not the existing testing, nor the code, nor the developers, nor the development process, nor management, nothing about the project. Furthermore, testing doesn't add quality to software (quality has to be built-in, part of the development process). The only way to have a quality product is to build a quality product; this product had no quality in its build, and therefore one needs to rebuild it:
Treat the existing source code as a throw-away prototype or documentation
Build a new product piece-by-piece, optionally incorporating suitable fragments (if any) of the old source code.
But doing code inspection (and correcting defects found via code inspection) might be quicker. That would be in addition to functional testing.
Whether or not you'll want to not only test it but also spend the extra time effort to develop automated tests depends on whether you'll want to maintain the software (i.e., in the future, to change it in any way and then retest it).
You'll also need:
Either:
Knowledge of the functional specification (and non-functional specification)
Developers and/or QA people with a clue
Or:
A small, simple product
Patient, forgiving end-users
Continuing technical support after the product is delivered
One technique that I incorporate into my development practice when entering a project at this time in the lifecycle is to add unit tests as defects are reported (by QA or end users). You won't get full code coverage of the existing code base, but at least this way future development can be driven and documented by tests. Also this way you should be assured that your tests fail before working on the implementation. If you write the test and it doesn't fail, the test is faulty.
Additionally, as you add new functionality to the system, start those with tests so that at least those sub-systems are tested. As the new systems interact with existing, try adding tests around the old boundary layers and work your way in over time. While these won't be Unit tests, these integration tests are better than nothing.
Refactoring is yet another prime target for testing. Refactoring without tests is like walking a tight rope without a net. You may get to the other side successfully, but is the risk worth the reward?
I am working on some code coverage for my applications. Now, I know that code coverage is an activity linked to the type of tests that you create and the language for which you wish to do the code coverage.
My question is: Is there any possible way to do some generic code coverage? Like in, can we have a set of features/test cases, which can be run (along with a lot more specific tests for the application under test) to get the code coverage for say 10% or more of the code?
More like, if I wish to build a framework for code coverage, what is the best possible way to go about making a generic one? Is it possible to have some functionality automated or generalized?
I'm not sure that generic coverage tools are the holy grail, for a couple of reasons:
Coverage is not a goal, it's an instrument. It tells you which parts of the code are not entirely hit by a test. It does not say anything about how good the tests are.
Generated tests can not guess the semantics of your code. Frameworks that generate tests for you only can deduct meaning from reading your code, which in essence could be wrong, because the whole point of unittesting is to see if the code actually behaves like you intended it too.
Because the automated framework will generate artificial coverage, you can never tell wether a piece of code is tested with a proper unittest, or superficially tested by a framework. I'd rather have untested code show up as uncovered, so I fix that.
What you could do (and I've done ;-) ) is write a generic test for testing Java beans. By reflection, you can test a Java bean against the Sun spec of a Java bean. Assert that equals and hashcode are both implemented (or neither of them), see that the getter actually returns the value you pushed in with the setter, check wether all properties have getters and setters.
You can do the same basic trick for anything that implements "comparable" for instance.
It's easy to do, easy to maintain and forces you to have clean beans. As for the rest of the unittests, I try to focus on getting important parts tested first and thouroughly.
Coverage can give a false sense of security. Common sense can not be automated.
This is usually achieved by combining static code analysis (Coverity, Klockwork or their free analogs) with dynamic analysis by running a tests against instrumented application (profiler + memory checker). Unfortunately, this is hard to automate test algorythms, most tools are kind of "recorders" able to record traffic/keys/signals - depending on domain and replay them (with minimal changes/substitutions like session ID/user/etc)
During software development, there may be bugs in the codebase which are known issues. These bugs will cause the regression/unit tests to fail, if the tests have been written well.
There is constant debate in our teams about how failing tests should be managed:
Comment out failing test cases with a REVISIT or TODO comment.
Advantage: We will always know when a new defect has been introduced, and not one we are already aware of.
Disadvantage: May forget to REVISIT the commented-out test case, meaning that the defect could slip through the cracks.
Leave the test cases failing.
Advantage: Will not forget to fix the defects, as the script failures will constantly reminding you that a defect is present.
Disadvantage: Difficult to detect when a new defect is introduced, due to failure noise.
I'd like to explore what the best practices are in this regard. Personally, I think a tri-state solution is the best for determining whether a script is passing. For example when you run a script, you could see the following:
Percentage passed: 75%
Percentage failed (expected): 20%
Percentage failed (unexpected): 5%
You would basically mark any test cases which you expect to fail (due to some defect) with some metadata. This ensures you still see the failure result at the end of the test, but immediately know if there is a new failure which you weren't expecting. This appears to take the best parts of the 2 proposals above.
Does anyone have any best practices for managing this?
I would leave your test cases in. In my experience, commenting out code with something like
// TODO: fix test case
is akin to doing:
// HAHA: you'll never revisit me
In all seriousness, as you get closer to shipping, the desire to revisit TODO's in code tends to fade, especially with things like unit tests because you are concentrating on fixing other parts of the code.
Leave the tests in perhaps with your "tri-state" solution. Howeveer, I would strongly encourage fixing those cases ASAP. My problem with constant reminders is that after people see them, they tend to gloss over them and say "oh yeah, we get those errors all the time..."
Case in point -- in some of our code, we have introduced the idea of "skippable asserts" -- asserts which are there to let you know there is a problem, but allow our testers to move past them on into the rest of the code. We've come to find out that QA started saying things like "oh yeah, we get that assert all the time and we were told it was skippable" and bugs didn't get reported.
I guess what I'm suggesting is that there is another alternative, which is to fix the bugs that your test cases find immediately. There may be practical reasons not to do so, but getting in that habit now could be more beneficial in the long run.
Fix the bug right away.
If it's too complex to do right away, it's probably too large a unit for unit testing.
Lose the unit test, and put the defect in your bug database. That way it has visibility, can be prioritized, etc.
I generally work in Perl and Perl's Test::* modules allow you to insert TODO blocks:
TODO: {
local $TODO = "This has not been implemented yet."
# Tests expected to fail go here
}
In the detailed output of the test run, the message in $TODO is appended to the pass/fail report for each test in the TODO block, so as to explain why it was expected to fail. For the summary of test results, all TODO tests are treated as having succeeded, but, if any actually return a successful result, the summary will also count those up and report the number of tests which unexpectedly succeeded.
My recommendation, then, would be to find a testing tool which has similar capabilities. (Or just use Perl for your testing, even if the code being tested is in another language...)
We did the following: Put a hierarchy on the tests.
Example: You have to test 3 things.
Test the login (login, retrieve the user name, get the "last login date" or something familiar etc.)
Test the database retrieval (search for a given "schnitzelmitkartoffelsalat" - tag, search the latest tags)
Test web services (connect, get the version number, retrieve simple data, retrieve detailed data, change data)
Every testing point has subpoints, as stated in brackets. We split these hierarchical. Take the last example:
3. Connect to a web service
...
3.1. Get the version number
...
3.2. Data:
3.2.1. Get the version number
3.2.2. Retrieve simple data
3.2.3. Retrieve detailed data
3.2.4. Change data
If a point fails (while developing) give one exact error message. I.e. 3.2.2. failed. Then the testing unit will not execute the tests for 3.2.3. and 3.2.4. . This way you get one (exact) error message: "3.2.2 failed". Thus leaving the programmer to solve that problem (first) and not handle 3.2.3. and 3.2.4. because this would not work out.
That helped a lot to clarify the problem and to make clear what has to be done at first.
I tend to leave these in, with an Ignore attribute (this is using NUnit) - the test is mentioned in the test run output, so it's visible, hopefully meaning we won't forget it. Consider adding the issue/ticket ID in the "ignore" message. That way it will be resolved when the underlying problem is considered to be ripe - it'd be nice to fix failing tests right away, but sometimes small bugs have to wait until the time is right.
I've considered the Explicit attribute, which has the advantage of being able to be run without a recompile, but it doesn't take a "reason" argument, and in the version of NUnit we run, the test doesn't show up in the output as unrun.
I think you need a TODO watcher that produces the "TODO" comments from the code base. The TODO is your test metadata. It's one line in front of the known failure message and very easy to correlate.
TODO's are good. Use them. Actively management them by actually putting them into the backlog on a regular basis.
#5 on Joel's "12 Steps to Better Code" is fixing bugs before you write new code:
When you have a bug in your code that you see the first time you try to run it, you will be able to fix it in no time at all, because all the code is still fresh in your mind.
If you find a bug in some code that you wrote a few days ago, it will take you a while to hunt it down, but when you reread the code you wrote, you'll remember everything and you'll be able to fix the bug in a reasonable amount of time.
But if you find a bug in code that you wrote a few months ago, you'll probably have forgotten a lot of things about that code, and it's much harder to fix. By that time you may be fixing somebody else's code, and they may be in Aruba on vacation, in which case, fixing the bug is like science: you have to be slow, methodical, and meticulous, and you can't be sure how long it will take to discover the cure.
And if you find a bug in code that has already shipped, you're going to incur incredible expense getting it fixed.
But if you really want to ignore failing tests, use the [Ignore] attribute or its equivalent in whatever test framework you use. In MbUnit's HTML output, ignored tests are displayed in yellow, compared to the red of failing tests. This lets you easily notice a newly-failing test, but you won't lose track of the known-failing tests.