How do you handle unit/regression tests which are expected to fail during development? - automated-tests

During software development, there may be bugs in the codebase which are known issues. These bugs will cause the regression/unit tests to fail, if the tests have been written well.
There is constant debate in our teams about how failing tests should be managed:
Comment out failing test cases with a REVISIT or TODO comment.
Advantage: We will always know when a new defect has been introduced, and not one we are already aware of.
Disadvantage: May forget to REVISIT the commented-out test case, meaning that the defect could slip through the cracks.
Leave the test cases failing.
Advantage: Will not forget to fix the defects, as the script failures will constantly reminding you that a defect is present.
Disadvantage: Difficult to detect when a new defect is introduced, due to failure noise.
I'd like to explore what the best practices are in this regard. Personally, I think a tri-state solution is the best for determining whether a script is passing. For example when you run a script, you could see the following:
Percentage passed: 75%
Percentage failed (expected): 20%
Percentage failed (unexpected): 5%
You would basically mark any test cases which you expect to fail (due to some defect) with some metadata. This ensures you still see the failure result at the end of the test, but immediately know if there is a new failure which you weren't expecting. This appears to take the best parts of the 2 proposals above.
Does anyone have any best practices for managing this?

I would leave your test cases in. In my experience, commenting out code with something like
// TODO: fix test case
is akin to doing:
// HAHA: you'll never revisit me
In all seriousness, as you get closer to shipping, the desire to revisit TODO's in code tends to fade, especially with things like unit tests because you are concentrating on fixing other parts of the code.
Leave the tests in perhaps with your "tri-state" solution. Howeveer, I would strongly encourage fixing those cases ASAP. My problem with constant reminders is that after people see them, they tend to gloss over them and say "oh yeah, we get those errors all the time..."
Case in point -- in some of our code, we have introduced the idea of "skippable asserts" -- asserts which are there to let you know there is a problem, but allow our testers to move past them on into the rest of the code. We've come to find out that QA started saying things like "oh yeah, we get that assert all the time and we were told it was skippable" and bugs didn't get reported.
I guess what I'm suggesting is that there is another alternative, which is to fix the bugs that your test cases find immediately. There may be practical reasons not to do so, but getting in that habit now could be more beneficial in the long run.

Fix the bug right away.
If it's too complex to do right away, it's probably too large a unit for unit testing.
Lose the unit test, and put the defect in your bug database. That way it has visibility, can be prioritized, etc.

I generally work in Perl and Perl's Test::* modules allow you to insert TODO blocks:
TODO: {
local $TODO = "This has not been implemented yet."
# Tests expected to fail go here
}
In the detailed output of the test run, the message in $TODO is appended to the pass/fail report for each test in the TODO block, so as to explain why it was expected to fail. For the summary of test results, all TODO tests are treated as having succeeded, but, if any actually return a successful result, the summary will also count those up and report the number of tests which unexpectedly succeeded.
My recommendation, then, would be to find a testing tool which has similar capabilities. (Or just use Perl for your testing, even if the code being tested is in another language...)

We did the following: Put a hierarchy on the tests.
Example: You have to test 3 things.
Test the login (login, retrieve the user name, get the "last login date" or something familiar etc.)
Test the database retrieval (search for a given "schnitzelmitkartoffelsalat" - tag, search the latest tags)
Test web services (connect, get the version number, retrieve simple data, retrieve detailed data, change data)
Every testing point has subpoints, as stated in brackets. We split these hierarchical. Take the last example:
3. Connect to a web service
...
3.1. Get the version number
...
3.2. Data:
3.2.1. Get the version number
3.2.2. Retrieve simple data
3.2.3. Retrieve detailed data
3.2.4. Change data
If a point fails (while developing) give one exact error message. I.e. 3.2.2. failed. Then the testing unit will not execute the tests for 3.2.3. and 3.2.4. . This way you get one (exact) error message: "3.2.2 failed". Thus leaving the programmer to solve that problem (first) and not handle 3.2.3. and 3.2.4. because this would not work out.
That helped a lot to clarify the problem and to make clear what has to be done at first.

I tend to leave these in, with an Ignore attribute (this is using NUnit) - the test is mentioned in the test run output, so it's visible, hopefully meaning we won't forget it. Consider adding the issue/ticket ID in the "ignore" message. That way it will be resolved when the underlying problem is considered to be ripe - it'd be nice to fix failing tests right away, but sometimes small bugs have to wait until the time is right.
I've considered the Explicit attribute, which has the advantage of being able to be run without a recompile, but it doesn't take a "reason" argument, and in the version of NUnit we run, the test doesn't show up in the output as unrun.

I think you need a TODO watcher that produces the "TODO" comments from the code base. The TODO is your test metadata. It's one line in front of the known failure message and very easy to correlate.
TODO's are good. Use them. Actively management them by actually putting them into the backlog on a regular basis.

#5 on Joel's "12 Steps to Better Code" is fixing bugs before you write new code:
When you have a bug in your code that you see the first time you try to run it, you will be able to fix it in no time at all, because all the code is still fresh in your mind.
If you find a bug in some code that you wrote a few days ago, it will take you a while to hunt it down, but when you reread the code you wrote, you'll remember everything and you'll be able to fix the bug in a reasonable amount of time.
But if you find a bug in code that you wrote a few months ago, you'll probably have forgotten a lot of things about that code, and it's much harder to fix. By that time you may be fixing somebody else's code, and they may be in Aruba on vacation, in which case, fixing the bug is like science: you have to be slow, methodical, and meticulous, and you can't be sure how long it will take to discover the cure.
And if you find a bug in code that has already shipped, you're going to incur incredible expense getting it fixed.
But if you really want to ignore failing tests, use the [Ignore] attribute or its equivalent in whatever test framework you use. In MbUnit's HTML output, ignored tests are displayed in yellow, compared to the red of failing tests. This lets you easily notice a newly-failing test, but you won't lose track of the known-failing tests.

Related

Does non-deterministic nature of property-based testing hurt build repeatability?

I am learning FP and got introduced to the concept of property-based testing and for someone from OOP world PBT looks both useful and dangerous. It does check a lot of options, but what if there is one (or some) options that fail, but they didn't fail during your first let's say Jenkins build. Then next time you run the build the test may or may not fail, doesn't it kill the entire idea of repeatable builds?
I see that some people explored options to make the tests deterministic, but then if such test doesn't catch an error it will never catch it.
So what's better approach here? Do we sacrifice build repeatability to eventually uncover a bug or do we take the risk of never uncovering it, but get our repeatability back?
(I hope that I properly understood the concept of PBT, but if I didn't I would appreciate if somebody could point out my misconceptions)
Doing a lot of property-based testing I don’t see indeterminism as a big problem. I basically experience three types of it:
A property is really indeterministic b/c some external factor - e.g. timeout, delay, db config - makes it so. Those flaky tests also show up in example-based testing and should be eliminated by making the external factor deterministic.
A property fails rarely because the triggering condition is only sometimes met by pseudo random data generation. Most PBT libraries have ways to reproduce those failing runs, eg by re-using the random seed of the failing test run or even remembering the exact constellation in a database of some sort. Those failures reveal problems and are one of the reasons why we’re doing random test cases generation in the first place.
Coverage assertions („this condition will be hit in at least 5 percent of all cases“) may fail from time to time even though they are generally true. This can be mitigated by raising the number of tries. Some libs, eg quickcheck, do their own calculation of how many tries are needed to prove/disprove coverage assumptions and thereby mostly eliminate those false positives.
The important thing is to always follow up on flaky failures and find the bug, the indeterministic external factor or the wrong assumption in the property‘s invariant. When you do that, sporadic failures will occur less and less often. My personal experience is mostly with jqwik but other people have been telling me similar stories.
You can have both non-determinism and reproducible builds by generating the randomness outside the build process. You could generate it during development or during external testing.
One example would be to seed your property based tests, and to automatically modify this seed on commit. You're still making a tradeoff. A developer could be alerted of a bug unrelated to what they're working on, and you lose some test capacity since the tests might change less often.
You can tip the tradeoff further in the deterministic direction by making the seed change less often. You could for example have one seed for each program component or file, and only change it when a related file is committed.
A different approach would be to not change the seed during development at all. You would instead have automatic QA doing periodic or continuous testing with random seeds and use them to generate bug reports/issues that can be dealt with when convenient.
johanneslink's analysis of non-determinism is spot on.
There's one thing I would like to add: non-determinism is not only a rare and small cost, it's also beneficial. If the first run of your test suite is successful, insisting on determinism means insisting that future runs (of the same suite against the same system) will find zero bugs.
Usually most test suites contain many independent tests of many independent system parts, and commits rarely change large parts of the system. So even across commits, most tests test exactly the same thing before and after, where once again determinism guarantees that you will find zero bugs.
Allowing for randomness means every run has at least a chance of discovering a bug.
That of course raises the question of regression tests. I think the standard argument is something like this: to maximize value per effort you should focus your testing on the most bug-prone parts of the code. Having observed a bug in the past provides evidence about which part of the code is buggy (and which kind of bug it's likely to have). You should use that evidence to guide your testing effort. (Often with a laser-like focus on one concrete bug.)
I think this is a very reasonable argument. I also think there's more than one way of making good use of the evidence provided by bugs.
For example, you might write a generator which produces data of the same kind and shape as the data which triggered the bug the first time, and/or which is tailor made to trigger the bug.
And/or, you might want to write tests verifying specifically those properties that were violated by the buggy behavior.
If you want to judge how good these tests are, I recommend running them a couple of times (on normally sized input batches). If they trigger the bug every time, it's likely to do so in the future also.
Here's a (hopefully thought-)provoking question: is it worse to release software which has a bug it has had before, or release software with new bugs? In other words: is catching past bugs more important than catching new ones—or do do it primarily because it's easier?
If you think we do it in part because it's easier, then I don't think it matters that re-catching the bug is probabilistic: what you should really care about is something like the average bug-catching abilities of property testing—its benefits elsewhere should outweigh the fairly small chance that an old bug squeaks through, even though it got caught in (say) 5 consecutive runs of the tests when you evaluated your regression tests.
Now, if you can't reliably generate random inputs that trigger the bug even though you understand the bug just fine, or the generator which does it is large and complicated and thus costly to maintain, hand-picking a regression example seems like a perfectly reasonable choice.

How should multi-part Codecept.js scenarios be organized?

What is the preferred (or just a good) pattern for a multi-part Codecept.js scenario, such as this:
Select file to upload.
Clear selection.
Select file to upload after having cleared selection.
I can do this in a single scenario and use I.say to delineate the parts, but it feels like I should be able to write these as independent scenarios and use .only on part 2, for example, and have part 1 run prior to part 2, because it depends on it.
I would also want to skip parts 2 and 3 if part 1 fails when running the whole suite.
I like thinking about behaviour in terms of capabilities. I can see that you have a couple here:
Uploading files
Correcting mistakes while uploading files.
So I would expect these to be in two scenarios:
One where you actually upload the file
One where you correct mistakes you make.
A lot of people say there should only be one "When" in scenarios, but that doesn't take into account interactions with people (including your past mistaken self) or time. In this situation, it's the whole process of correcting the file upload that provides the value. I can't see any value in the intermediate steps, so I'd leave them all in one scenario.
If there's any different behaviour associated with different contexts (eg: you've already got too many files uploaded) or outcomes (eg: your file system doesn't have room) or rules (eg: your status means you qualify for super-fast upload) then I would expect those to be new scenarios. If you start getting to the point where there are a lot of scenarios associated with file uploads and different things that happen to them, that might be a good time to separate this scenario out. Right now I can't see any reason to do that.
Re failing the first part: if you're doing BDD right, you'll be talking through not just the behaviour of your system, but the behaviour of individual bits of code too. That should help produce a good design which minimizes the chances of having bugs. Really good BDD teams produce scenarios that hardly ever catch bugs!
The scenarios act as living documentation, rather than regression tests; helping future devs understand the value of the code and get it right, rather than nailing it down to stop them getting it wrong.
So I wouldn't worry about it failing. If it's doing that a lot, you've got a different problem. Code it as if it's going to be passing most of the time, and make sure it's readable and comprehensible. As long as you can see when it fails and work it out, even if that takes a little bit of time, it'll be fine.
Having said that, I'd be surprised if Codecept doesn't have at least an option to stop on failure. Most BDD tools don't continue a scenario after a failed step; it would be an odd thing to do.
As far as I know, you're not be able to set priority for execution in codeceptjs. Better make one test. it is also will be more flexible if you will need to add or delete some part.

Proper Testing Methodology

I've got a bit of an issue. I'm writing and continually developing an ASP.NET MVC application. The problem is, everytime I update one small part of our site (not even anything to do with our database), it seems to break about 4 other things in other parts of the site. I've been doing my best to anticipate it, but in the end, I know there are better ways out there to test, and I'm just wondering what the general consensus is for best practices?
Thanks!
In short - not organized answer!
Have a plan.
Know what the change is.
Document what you are going to test and why.
Maintain a test suite (Regression) to execute after all major changes.
Keep improving your test cases and adding the ones you missed
(there should be a test case associated with all your bugs.
This will ensure the same bug doesn't get introduced due to regression.
And like you said have a proper structure, ensure process implementation, have better code reviews (catch the bugs before they get submitted) and keep reading a lot about all this.
This answer is open for editing to make it better.

Good method to handle large amounts of unused or deprecated feature implemenations in code

Currently we are facing following Problem in our Application:
Around 40 % of the Code that is in the Application is never used. That means the Code would be there, and maybe functional, but the Frontend Feature has been shut off so the Users cannot reach the Functionality anymore or other Methods are replacing the old, now deprecated Methods.
What i am currently doing is removing all the old code while not trying to break anything, manually.
The Question is:
Would you remove the old Code, or hope that it may awake some time ... zombie - like
Do you think that it is worth the Effort to remove the Code (less work to find stuff in the clutter, better test coverage, easier for other people to find their way)
Should we keep the Code somewhere, as a reference? ( We are using Version Control, but i find it is pretty hard to find old code in the Revision Jungle ... any tips for that? )
Do you have arguments for convincing the team / management / developers that wrote said code?
Reasons to not deprecate and then delete Code?
TL;DR: Delete unused code or leave it as it is? Discuss!
u
If you are certain that the code is unused, definitely delete it. I assume you have a version control system, so if you ever need it again, you can still find the code back.
Deleting the unused code will make the project easier to maintain, and your team probably will end up saving time in the long run (nobody will re-read the code to try and understand what it was used for, nobody will end up changing said code thinking it may still be used...)
However, if your code contains a public API that is distributed, you will probably want to mark the classes/methods deprecated for some time before effectively deleting the code, so the callers have some time to adapt (or inform you of the issue).
Would you remove the old Code, or hope that it may awake some time ... zombie - like
I'd definitely remove it. I hate having to work out if functions ever get called.
Do you think that it is worth the Effort to remove the Code (less work to find stuff in the clutter, better test coverage, easier for other people to find their way)
Yes, definitely worth the effort.
Should we keep the Code somewhere, as a reference?
Um, you are using version control softwarev, aren't you?

How to ensure quality checkins with continuous build systems?

I religiously go through all of my code before I check it in and do a diff of the before and after code and read through it and make sure I undersatnd the changes. Usually I end up having to add comments, amend variable names, amend algorithms, amend code, retest things, discuss with other developers about their code, add new bugs/issues, but I very rarely end up doing the checkin immediately.
I do notice however that alot of developers these days seem to check their code in and think that when the build breaks that it is enough, then they go back and look at their changes. This is one of the things about continuous build systems that I definitely do not like, in that sometimes I think developers stop thinking about their code enough.
What best practices are there for ensuring only quality code goes into continuous build systems?
I do notice however that alot of developers these days seem to check their code in and think that when the build breaks that it is enough, then they go back and look at their changes.
In my opinion, using the continuous build for "verification" purpose is indeed a bad practice, developers should always try to not commit bad code that can affect the team and interrupt the work (the why is so obvious that if you don't get that, just look for another job). So, if your CI engine doesn't offer Pre-tested commit (like TeamCity, Team Foundation Server as I just saw, maybe Hudson one day, etc), you should always sync/build/sync (and rebuild if necessary) locally before to commit. Not doing this is laziness and not respectful of the team.
What best practices are there for ensuring only quality code goes into continuous build systems?
Just in case, remind why breaking the build is bad: bad code potentially affects the whole team and interrupts the work (sigh).
If you can get tool support (see the mentioned solutions above), get it.
If not, document the right workflow: 1. sync 2. build locally 3. sync 4. back to 2 if required 5. commit. Make this visible so that there is no excuses.
If this is an option, use something like (or similar to) the Hudson's Continuous Integration Game plugin. This can make things more fun.
I've seen people using a light financial "penalty" for broken build but I don't really like this idea. First, we should be able to behave as responsible adults. Second, the obtained result was that people started to delay commits (which at the end was the opposite of the expected benefit).
Team Foundation Server has something called pre checkin validation.

Resources