Dealing with Selenium tests that fail occasionally during automated deployment - automated-tests

We have a C#/ASP .Net web application that is built and deployed by the build server (Jenkins). One of the build steps before the automated deployment is ensuring that all automated tests pass -- including functional tests we have using Selenium 2 WebDriver and NUnit.
The problem: Sometimes these tests fail randomly. They will succeed for 100 builds and then one just fails. They fail for various reasons -- a .Click() event is just ignored, element can't be found, IE has a bad day, etc. We have an AJAX heavy web app and so we depend heavily on WebDriverWaits but we always take this into account while writing tests, and like I said the tests do pass most of the time.
What are some ways to avoid or fix this problem? A couple that came to my mind:
Accept a certain number of failures (seems like a bad idea)
Rerun test failures?

I don't like either of the suggestions that you mention, but I admit to having used them occasionally. The best thing to do is to make sure that when there is a seemingly "random" failure to do everything you can to get all of the data about why it really failed. Was it an environment issue? Did some other process on the machine interfere with the tests? Was it a timing issue that only appears when the site loads excruciatingly slow, or blazing fast?
One thing that you might try is soak testing your automated tests. Run each one 100+ times on the same build and same environment (so you can rule those out as potential failure points) and find the ones that fail occasionally. See if they fail in the same place or in different places. Generally, when you go through this exercise you'll find some tests that really are a little bit flaky and you can remove them from the daily run until they are fixed. You could even include a soak as a check-in criteria for any automated test case.
Another useful thing I have found that helped me get to the bottom of some of the seemingly random failures was taking screenshots on failure. Often you can see that other windows or dialogs were popped up causing the browsers not to be able to be in the forefront, etc.

Of the two, I would prefer to rerun test failures, or rather, on test failure, retry the tests.
If you accept a certain number of test failures, then you get into problems about which tests are allowed to fail. You would have to have two sets of tests, some which are allowed to fail, some which are not.
For rerunning, I'm no expert on testing with NUnit, but you could have the tests themselves manage the retry. In JUnit, you can introduce a rule so that if a test fails, it would retry a maximum of 3 times. This would probably avoid most of the problems you're having. I don't know how to do this in NUnit, but see my answer to How to Re-run failed JUnit tests immediately?. This will give you the general idea.

Related

Continuous integration and Automated testing strategy

I’m working in a company that uses Continuous Integration (TeamCity). Every time someone does a check in the CI software starts a build and runs all the Unit/Automated Test . The problem is that we got more than 7000 unit tests + 756 automated tests (used to test the JavaScript as we got a very complex UI logic for making calculation etc). As you can imaging every time someone does a check in the all process takes more than 2 hours to go through all the steps (build-unitest-automated test) so that I need to wait that much before I can get a result to understand if my check in has broken perhaps an automated test or a unit test. Worst situation is when more than one people check in something so that TeamCity start queue up the build and before I can get a valid result (udated) I can wait up to half day ! what strategy should we adopt to speed up a bit this process? Is it a best practice run all the automated tests even against a little change?
I would look at breaking up your test suite in two ways - with the goal of making it so you and your team can check in, go get a cup of coffee and have some meaningful feedback from team city when you get back to your desk.
decide what you really want to test on every commit, move the remaining tests to a suite that runs at a scheduled interval (hourly, nightly - whatever works for you).
If the set of tests agreed upon to run every commit is still large - break that set up and distribute across multiple nodes running in parallel.
You may also want to beef up your CI machine, depending on the nature of your stuff have the working directory for the tests live in tmpfs (RAM disk).
I'm going to talk in theory, I have yet to put it into practice but CI is on my goals to have up and humming by the end of the summer.
From statements I've seen made by the people that have earned the most respect from me in developers the most common element for CI I've seen in regards to the testing strategy is to split your tests into Long Running and Short Running.
Then you would want to configure that standard check ins kick off the short running test for basic validation of the solution. Then on the nightly builds, and for deployment builds is the only time you NEED to run the full test suite to give your validation of regression tests.
Aside/Alternate answer: Seeing as I haven't setup CI for myself yet, I had never understood the TeamCity business model that they making the pricing based on build agents. Now I understand why multiple build agents really start to matter if your test suite takes that long, being able to run 5 builds at once becomes much more important. So one option could be to just spend more money and stick a band-aid on the bullet hole for now.
Continuous Integration works best with a distributed version control system like Git or Mercurial.
Every developer can check in often into their local repository without triggering the whole integration and UI testing ceremony all the time.
Once a feature is finished locally, it can be checked in to the central repository. Thus the CI server runs all the time-consuming tests only when new features and/or fixes have been added.
Have you considered using pre-tested commits? If you run a remote run build (without committing in to VCS), you can be sure that you didn't break anything in VCS (just because you didn't commit yet). And you can continue working without problems. If the build is successful, you can commit your change (even if you made some other changes in the same files) - if you commit via TeamCity's plugin it will commit exactly the code you sent to the server for the testing.
This way, you don't have to wait until TeamCity's build has finished to continue working with it.
Disclaimer: I'm a TeamCity developer.

free automatic regression test tool for asp.net applications?

We are working in a small team. We often had problems like developer1 did some changes in stored procedure or function and it affected work of developer2. Such issues are traced out by chance later. Please guide me how such issues can be stopped. Is there a free tool that we can run to test such issues?
Slowly introduce unit tests, focused integration tests and full system tests.
For all of those use a .net unit test framework to do it. It'll be what you do in the test what makes it be any of the above scenarios. Make sure to keep each of those 3 type of tests separately, as those will have a big difference on the speed it takes to execute them.
For the unit test framework I suggest NUnit but there are others, one that I've found interesting but never made the jump is xUnit.net.
For full system tests I suggest to run them in the unit test framework using WatiN. You could also go with Selenium RC.
We often had problems like developer1 did some changes in stored procedure or function and it affected work of developer2. Such issues are traced out by chance later.
For that specific type of scenario I strongly suggest focused integration tests. Full system tests might catch such scenario, but it will still left you to figure out why it broke.
Instead focus the test in the very specific db access code that makes the call to the procedure. By adding scenarios in there that reveal all the expectations developer2 had from said procedure when (s)he wrote the related .net code, regression issues with that integration code can be revealed very quickly and be dealt with very effectively. Also note that developer1 can easily run the focused integration tests that involve that procedure or area of the database many times / which is a lot more likely to happen than doing the same with full system tests.
You can do either automated unit testing using tools such as NUnit or automated black-box testing using tools such as Selenium. Note that both options (even with free tools) may need significant investment in terms of time and efforts. Typically, unit test cases are created by developers them selves while for automated black box testing, a separate team of QA is utilized - this is mostly because unit test cases are generally written in languages such as C#, VB.NET while automated black-box testing tools typically utilize scripting languages.

How can we improve our deployment and build systems?

We have 4 different environments:
Staging
Dev
User Acceptance
Live
We use TFS, pull down the latest code and code away.
When they finish a feature, the developers individually upload their changes to Staging. If the site is stable (determined by really loose testing), we upload changes to Dev, then UserAcceptance and then live.
We are not using builds/tags in our source control at all.
What should I tell management? They don't seem to think there is an issue as far as I can tell.
If it would be good for you, you could become the Continuous Integration champion of your company. You could do some research on a good process for CI with TFS, write up a proposed solution, evangelize it to your fellow developers and direct managers, revise it with their input and pitch it to management. Or you could just sit there and do nothing.
I've been in management for a long time. I always appreciate someone who identifies an issue and proposes a well thought-out solution.
Whose management? And how far removed are they from you?
I.e. If you are just a pleb developer and your managers are the senior developers then find another job. If you are a Senior developer and your managers are the CIO types, i.e. actually running the business... then it is your job to change it.
Tell them that if you were using a key feature of very expensive software they spent a lot of money on, it would be trivial to tell what code got pushed out when. That would mean in the event of a subtle bug getting introduced that gets passed user acceptance testing, it would be a matter of diffing the two versions to figure out what changed.
One of the most important parts of using TAGS is so you can rollback to a specific point in time. Think of it as an image backup. If something bad gets deployed you can safely assume you can "roll" back to a previous working version.
Also, developers can quickly grab a TAG (dev, prod or whatever) and deploy to their development PC...a feature I use all the time to debug production problems.
So you need someone to tell the other developers that they must label their code every time a build is done and increment a version counter. Why can't you do that?
You also need to tell management that you believe the level of testing done is not sufficient. This is not a unique problem for an organisation and they'll probably say they already know. No harm in mentioning it though rather than waiting for a major problem to arrive.
As far as individuals doing builds or automated build processes this depends on whether you really need this based on how many developers there are and how often you do builds.
What is the problem? As you said, you can't tell if management see the problem. Perhaps they don't! Tell them what you see as the current problem and what you would recommend to fix the problem. The problem has to of the nature of "our current process has failed 3 out of 10 times and implementing this new process would reduce those failures to 1 out of 10 times".
Management needs to see improvements in terms of: reduced costs, icreased profits, reduced time, reduced use of resources. "Because it's widely used best practice" isn't going to be enough. Neither is, "because it makes my job easier".
Management often isn't aware of a problem because everyone is too afraid to say anything or assumes they can't possibly fail to see the problem. But your world is a different world than theirs.
I see at least two big problems:
1) Developers loading changes up themselves. All changes should come from source control. Do you encounter times where someone made a change that went to production but never got into source control and then was accidentally removed on the next deploy? How much time (money) was spent trying to figure out what went wrong there?
2) Lack of a clear promotion model. It seems like you guys are moving changes between environments rather than "builds". The key distinction is that if two changes work great in UAT because of how they interact, if only one change is promoted to production it could break there. Promoting consistent code - whether by labeling it or by just zipping up the whole web application and promoting the zip file - should cause fewer problems.
I work on the continuous integration and deployment solution, AnthillPro. How we address this with TFS is to retrieve the new code from TFS based on a date-time stamp (of when someone pressed the "Deliver to Stage" button).
This gives you most (all?) the traceability you would have of using tags, without actually having to go around tagging things. The system just records the time stamp, and every push of the code through the testing environments is tied to a known snapshot of code. We also have customers who lay down tags as part of the build process. As the first poster mentioned - CI is a good thing - less work, more traceability.
If you already have TFS, then you are almost there.
The place I'm at was using TFS for source control only. We have a similar setup with Dev/Stage/Prod. I took it upon myself to get a build server installed. Once that was done I added in the ability to auto deploy to dev for one of my projects and told a couple of the other guys about it. Initially the reception was luke warm.
Later I added TFS Deployer to the mix and have it set to auto deploy the good dev build to stage.
During this time the main group of developers were constantly fighting the "Did you get latest before deploying to Stage or Production?" questions; my stuff was working without a hitch. Believe me, management and the other devs noticed.
Now (6 months into it), we have a written rule that you aren't even allowed to use the Publish command in visual studio. EVERYTHING goes through the CI build and deployments. When moving to prod, our production group pulls the appropriate copy off of the build server. I even trained our QA group on how to do web testing and we're slowly integrating automated tests into the whole shebang.
The point of this ramble is that it took awhile. But more importantly it only happened because I was willing to just run with it and show results.
I suggest you do the same. Start using it, then show the benefits to get everyone else on board.

Continuous Integration vs. Nightly Builds

Reading this post has left me wondering; are nightly builds ever better for a situation than continuous integration? The consensus of the answers seems to be pretty lopsided in favor of continuous integration, is that evangelism or is there really no reason to use nightly builds when continuous integration is an option?
If you're really doing continuous integration with all available tests, nightly builds would be redundant, since the last thing checked in that day would already have been tested.
On the other hand, if your CI regime only involves running a subset of all available tests, for example because some of your tests take a long time to run, then you can use nightlies additionally to run all tests. This'll let you catch many bugs early, and if you can't catch them early, you can at least catch them overnight.
I don't know, though, if that's technically still CI, since you're only doing a "partial" build each time, by ignoring some of the tests.
In our organization, nightly builds and CI builds have two distinct purposes. The CI build is a 'latest code' build in which the unit tests are run against the last check in as you would expect. We also run several code metrics on the CI build.
For nightly builds, however, we only incorporate source code that has been through the peer review process and is deemed ready for testing.
This way, the nightly build always contains build that is 'feature ready' for testing, while the CI build contains features that while functional (to the extent that the unit tests pass) may not be ready to send the to the test group.
The test groups writes new CRs exclusively from one of the nightly builds as opposed to the CI build, although those are also available for informal exploratory type testing.
Yes, if you have a process you want attached to a build, but it is resource heavy. For example, on my team we run JTest during the nightly build. We can't run it during the day because:
It requires a lot of resources, which may not be available
It takes 4 hours to complete each time
If you have a nice robust CI process in place a "nightly" is still useful.
As mentioned, a "nightly" build can do exhaustive tests and perhaps some high-level system tests. End-to-end stuff.
The concept of a "nightly" build is easily understood by everyone in the organization. If you have trouble communicating CI builds out to other groups (for example, a QA group that doesn't have the same handle on Agile that the Dev group might) a "nightly" is a powerful and simple concept.
If your nightly is a separate set of resources, it can be managed separately and used to cut "gold" images with some claims to software integrity. For example, developer writes code, some trusted build system that dev can't touch builds it, QA tests the gold build and signs it. In such a situation, the nightly build functions like a production build system.
Just some thoughts.
In my professional opinion, the only reason to use nightly builds is when the build process takes so long that it can't complete in a "reasonable" amount of time.
For example, if your build process takes 5 hours to complete there is really no reason to do a build on check in.
Beyond that, there is so much value in knowing as soon as possible when a build fails that it overrides other concerns.
It depends on the purpose and length of each of your builds. Basically, you should identify what you are trying to learn from CI and decide if it is worth while spending the resources on running multiple builds.
We used continuous integration at my last job for a few different purposes.
First, we used it to make sure that the repository and thus the developers always had a version of the code that compiled. Few things are worse for team members than having to manage another person's broken changes through commenting, uncommenting, reverting and merging, because one person checked in bad code. For this, we had a build that ran instantaneously with no tests or other validation so we knew as soon as possible if the code was safe to update. Builds usually took about ten minutes and the machine was probably running around 50% on a normal workday. No documentation was generated here, just a quiet pass or a loud fail siren.
Second, we wanted to know as soon as possible whether or not any rules were broken. The quicker that you find a broken rule, the easier it is to fix. For this purpose, we had a separate machine that ran a full build and validation of the code. This machine was running 12-14 hours a day continuously on a normal workday. Email status of the build was sent out describing broken unit tests, code compliance, etc.
We stopped there as far as automatically triggered builds go. A nightly build on top of that seemed a bit extreme for us. But I suppose if you wanted to have a snapshot build archived daily, you may want to schedule a third build with the extra steps required for that. Though, we did have another build that wrapped up and archived our QA deployment artifacts for quick and easy deployment, but we only ever manually triggered that one.
I think the other posts cover the common reasons, like having a build process that takes "too long" or having to run only a subset of tests during the CI build. But there's another reason which is political.
In some organizations the official builds are handled by a minimally responsive build/infrastructure/release management/SCM team. In these cases you might put them in charge of the nightly build and then run the CI build out of development. This avoids a fight because their build remains "the official build" and your CI build give you the feedback you need.
We have both continuous integration and nightly builds in place. They serve two different purposes.
Our continuous integration mechanism builds the software and runs unit tests under the continuous integration suite.
Our nightly build tags the source under version control, builds the software, runs the unit tests under nightly build suite. The software built here is then used in various system tests and stress tests.
I think that one of the main differentiator for nightly build is system tests.

What is a good CI build-process

What constitutes a good CI build-process?
We use CI, but is deployment to production even a realistic CI goal when you have dependencies on several services that should be deployed too and other apps may depend on these too.
Is a good good CI build process good enough when its automated to QA and manual from there?
Well "it depends" :)
We use our CI system to:
build & unit test
deploy to single box, run intergration tests and code analisys
deploy to lab environment
run acceptance tests in prod-like system
drop builds that pass to code drop for prod deployment
This is for a greenfield project of about a dozen services and databases deployed to 20+ servers, that also had dependencies on half a dozen other 'external' services.
Using a CI tool to deploy your product to a production environment as a realistic goal? again... "it depends"
Why would you want to do this?
if you have the process you can roll changes (and roll back) faster and more often
less chance for human error
you can test the same deployment strategy in a test environment before going to production and catch issues earlier
Some technical things you have to address before you can answer this:
what is the uptime requirements for your system -- Are you allowed to have downtime or does it need to be up 24/7?
do you have change control processes in place that require human intervention/approval?
is your deployment robust enough for any component to roll back to a known-good state if a deployment fails?
is your system designed to handle different versions of services or clients in case one or several component deployments fails (and you have the above rollback to last known good)?
does the process have the smarts to handle a partial deployment where a component cannot handle mixed versions of its dependencies/clients?
how are you handing database deployment/upgrades?
do you have monitoring in place so you know when something goes wrong?
Here are a couple of recent related links about automation and building the tools you need.
When it comes down to it the more complex your system the more difficult it is do automate everything, but that does not mean it is not a worthy goal, it just takes a lot more effort and willpower to get it done -- everything from knowing the difficulties you're going to face, the problems you have to account for (failure will happen), the political challenges of building infrastructure (vs. more product features).
Now heres the big secret... the technical challenges are challenging but not impossible... the political challenges may be insurmountable. Everything about this costs money whether its dev time or buying 3rd party solutions. So really, can you build the $1K, $10K, $100K, or $1M solution?
Whatever solution you go for make sure the automation is robust first, complete second... i.e. make sure you have as robust a solution as you can for getting deployment to a test environment rather than a fragile solution that deploys to production.
CI is not intended as a deployment mechanism. It is good to have your CI execute any automated deployment to a QA/Test server, to ensure those aspects of your build work, but I would not use a CI system like Cruise Control or Bamboo as the means of deployment.
CI is for building the codebase periodically to automate execution of automated tests, verification of the codebase via static analysis and other checks of that nature.
Be sure you understand the idea behind a CI build. CI stands for Continuous Integration and CI builds are really intended to be throw-away builds that are performed when a developer checks code in to the source control system (or at some specified interval) to ensure that the newest changes do not break the code base (hence the idea of continuously integrating the changes to the code base).
To that end, the technology used for the actual build server process is largely irrelevant compared to what actually happens during the build. As #pdavis mentioned, the CI build should compile the code base, execute some code analysis (FxCop, StyleCop, Lint, etc.), execute unit tests and code coverage, and execute any other custom analysis you want performed that should impact the concept of a "successful" or "failed" build.
Having a CI build automatically deploy to an environment really doesn't fall under the control of a build server. That being said, you can always create a separate project that runs on the build server that handles the deployment when it detects certain conditions (such as a build completes successfuly), but that should always be done as a completely independent thing.
I am starting on a new project at work that I am really looking forward to. We are still in the initial design stage and have just recently completed the Logical System Architecture. We have ordered new servers for the testing and staging environments and are setting up a Continuous Integration (CI) build system based on Cruise Control (http://cruisecontrol.sourceforge.net/) and MSBuild (http://msdn2.microsoft.com/en-us/library/wea2sca5.aspx) which is basically an improved port of ANT. It appears that Visual Studio 2005 project and solution files are all now in MSBuild format. Cruise Control will be automatically pulling the source from Visual Source Safe (ok, it isn't Subversion but we can deal), compiling it, and then running it through fxCop (http://www.gotdotnet.com/Team/FxCop/), nUnit (http://www.nunit.org/), nCover (http://ncover.org/site/), and last but not lease Simian (http://www.redhillconsulting.com.au/products/simian/). Cruise Control has a pretty good website interface for displaying all of the compiled results from the various tools and can even display code changes from one build to the next. It also keeps track of all builds in a build history. I'm looking forward to the test driven development and think that this type of approach combined with nUnit/nCover should give us a pretty good idea before we roll out changes that we haven't broken anything. There are also plans to incorporate some type of automated user interface testing once we are far enough along in the project. Depending on the tool, this should be just a matter of installing the tool on the build server and calling it from Cruise Control. Sweet.
A good CI process will have full or nearly-full unit test coverage. Unit tests test classes and methods, vs. integration tests, which test multiple parts of the system. When you set up your CI builds, have them automate the unit tests. That way, the CI builds can run multiple times per day. We have ours set to run every 2 hours.
You can have longer running builds that run once per day. These can use other services and run integration tests.
I was watching a ThoughtWorks presentation (creators of Cruise Control) and they actually addressed this issue. Their answer is that NO deployment is too complex to test. Why? Because otherwise, your customers become your testers, which is exactly where you don't want to be.
If you have a complex deployment structure, set up a visualization server. Have it pretend to be all the systems you need to talk to. They can always start in a known good state, because you can reset to a clean image.
To answer your initial question, a good process is one which enables communication between the repository and the developers. If the repository is in a bad state (non-compiling code, failed tests, etc.), the developers know about it as soon as possible, so that they can correct it.
The later a bug is discovered, the costlier it is to fix. So bugs should be discovered as early as possible. This is the motivation behind CI.
A good CI should ensure catching as many bugs as possible. The whole application comprises of code (often in multiple languages), Database schema, deployment files etc. Errors in any of these can cause bugs - so the CI should try to exercise as many of them as possible.
CI does not replace a proper QA discipline. Also, CI need not be very comprehensive on day one of the project. One can start with a simple CI process that does basic compilation & unit testing initially. As you discover more classes of bugs in QA, you should adapt the CI process to try to catch future occurrences of those bugs. It can also involve static code-analysis checks, so that you can implement consistent coding and design ideals across the codebase.

Resources