Random/Intermittant Service Unavailable - IIS7.5 - asp.net

We have recently implemented a new ASP.NET site to our webservers to replace our old Classic ASP site(Both severs are Windows 2008 R2 Using IIS 7.5). They are hosted on a Load Balancer.
This one .NET webform application is used for approximately 30 clients (each with their own URL. client1.mysite.biz, client2.mysite.biz etc...)
Our original plan was deploy our new application into 3 "WebSites" each with their own app pools and BIND the clients to the relevant Website.
When binding we bound to both Http and Https for the URL (we have certificates for each of the sites)
INITIAL PROBLEM:
We noticed that after we bound more than half the sites and tested, we were suddenly being greeted with "Service Unavailable. Service is Temporarily Unavailable" (NO NUMBER just the words) every time. We unbound everything and tried again (meticulously testing each time we bound a site). Each time after binding a certain number of sites the same thing happened.
We ran out of down time and went to Plan B. We put the whole thing in the "Default Website" as a virtual directory (No bindings) (This is how the Classic ASP site was setup)
OUR PROBLEM NOW:
Occasionally we get the same dreaded white screen with "Service Unavailable. Service is Temporarily Unavailable" (NO NUMBER just the words).
It seems to happen randomly (not load or time dependent as far as we can tell). If using AJAX it simply is caught in the "Error" portion of the AJAX code but I believe it is the same problem. The error occurs INSTANTLY when it does happen. If the user attempts to repeat the action that caused the problem everything is fine (they are not logged out and they proceed on their way).
However this is happening MULTIPLE times a day and it's across ALL of our sites (not just this new one).
One more item of great importance. This appears to be happening to ALL of our sites (Virtual Directories and custom WebSites on BOTH of our web servers). That seems to rule out a "bad" server (both are in the cloud did I mention?) and it also "seems" to rule out App Pool settings but what do I know?
About our IIS servers: We have multiple application pools running multiple different instances of websites (different code). Some are testing sites. Some are using classic ASP and others and using ASP.NET.
What we've tried: We scoured the web looking for answers and have edited our machine.config file to increase all manner of things such as "Threads, Max-Connections etc...". We've edited our App Pool settings by increasing our Queue Length and turning on ALL the logs.
Anyone seen anything like this before? My theory is it has something to do with the bindings and the frequency of the error is increased for each binding I initiate but that is difficult to test when it happens on my production servers only.

We have finally solved this problem. As mentioned previously, we noticed that the IIS logs contained a sc-win32-status 64 error when we experienced the Service Unavailable problem in the browser when (and only when) our site was using the Load Balancer.
To help look into this further, we did a network capture of the traffic on the Load Balancer while testing. We reproduced the random Service Unavailable problem, saw the associated win32-status 64 error in the IIS logs, and identified the specific packet of traffic on the network capture for this event.
Using Wireshark, we followed the TCP stream and noticed that the TCP connection was reset by the Load Balancer immediately after this packet. We reproduced the problem three times and every time there was a TCP reset immediately afterwards.
Walking backwards through the TCP stream, we noticed in all three instances a packet for HTTP/1.1 200 (accplication/octet-stream) and prior to that a request to download a document (ie. .pdf or .xlsx or .docx) from one of our sites. The server that contains all our documents is not a web server and does not have the IIS role active. The document server does not have a way to define the content/media type for the document that is being downloaded. Hence the generic (application/octet-stream) packet in the network capture. The Load Balancer treated the request for a document as potentially malicious and decided to reset the TCP connection if another request is made. To fix the problem, we added a content type library function to our application using this post as a guide. Sorted!
In Summary:
A document was requested from our document server via our web
application
The document was sent back to the user with a generic content type =
application/octet-stream
The Load Balancer flagged this activity to be potentially malicious
Another request within this TCP connection was made
The Load Balancer reset the TCP connection
This results in a Service Unavailable
Lesson Learned:
Always define your content/media types if you are serving content from a non web server or a web server running an IIS version less than 7 (Heaven forbid).

A UC Certificate was originally meant for Microsoft Exchange, but it can also be used to cover multiple domains. We use one and it covers about 60+ domains (actually 4 or 5 domains with lots of subdomains). We also apply the certificate to a load balancer and two web servers and we have multiple sites. So far as I can tell the certificates operate as expected. you can view it from any of the 60+ domains. One odd thing about our setup is that in the IIS UI, you can't bind the same certificate to more than one site so we had to use the appcmd command line interface to bind multiple sites to the same certificate.

After looking more closely at our IIS logs it appears that there is indeed something that coincides with this behavior. We get an error of 200 0 64 which is the sc-win32-status 64: "the specified network name is no longer available".
Now our 2 IIS servers are hosted in the cloud on Sungard, and we are using a load balancer that they setup for us. It was our theory that the load balancer was "losing" the proper session id of the user when this 64 error occurs and has no idea where it was supposed to be.
We ran some controlled tests. One group we took OFF the load balancer and sent them directly to one of the servers and another group used the load balancer but made sure to connect to the same server. Both teams conducted the tests of trying to reproduce the error (which is to say we clicked a popup on the site over and over).
The results were interesting. The group that was NOT on the load balancer NEVER received the "Service Unavailable" error! BUT the logs indicated they were getting 64 errors 45 times. The group that WAS on the load balancer was able to produce the "Service Unavailable" message twice and the logs confirmed that there were exactly 2 instances of the 64 error that coincided to the exact moment that the errors were observed.
So what does this mean?
1.) Load balancer has some settings "Sticky Sessions?" that aren't keeping the sessions in right (but we can't find the right settings. It's not even our load balancer it's SunGard's). Anyone have any advice on these settings for ASP.NET?
2.) 64 errors are a part of web life? We gave more cpu power to one of our Virtual IIS servers and received less 64 errors. This is all I can come up with. We've sunk too much time and money trying to solve this, but it appears that I have an option at least of taking people off the load balancer and just routing them to one or the other server and in addition I can at least beef up the server to handle more traffic and reduce the 64 errors.

Related

SMB - Server never responds to Session Setup Request

I am having very strange network problems. I am on a domain where a few servers are located on a different subnet. I can ping these servers, dns look them up and remote desktop to them by IP-address. I however cannot find them when using:
net view \server
or
Try to access them via windows explorer.
The person next to me who has an identical machine and is on the same subnet has no problems, as a matter of fact, I am the only one in a 50 person company having this problem!
This wouldn't be so much of a problem except for the fact that my machine cannot use web services located on these servers, neither via HTTP or NET.TCP.
After trying everything I can find on the internet and some more (added a new network card, reset policies, etc.) I finally got WireShark to see what is going on. When doing net view \server I notice that the server never responds to "Session Setup Request" but it did respond to "Negotiate Protocol Request". So what could possibly cause the server never to responde to the Session Setup Request?
Here is the server side capture (Not same session)
OK I found out what this was by comparing my tcpip registry (HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters) with a machine that worked. What I noticed is that I had the following 2 entries
EnablePMTUBHDetect 0
EnablePMTUDiscovery 1
but the other machine didn't. By deleting these entries, everything started working!
This however is very strange because these happen to be the default values for there registry keys so I do not understand why having these entries cause such a problem.

My Azure Website has an odd "HTTP success" pattern in the (Monitor) portal

I have a website hosted in Azure Websites as a Basic tier website.
I'm currently in the development stage, yet the site is live and accessible by the outside world (at least at a basic level), so I wanted to better understand the monitoring features in the Azure management portal.
When I looked at the monitoring tab inside the portal, I see an odd pattern for HTTP success. Looking at the past 60 minutes (which I personally have not been active on), the HTTP successes are very cyclic, with 80 connections, then 0, then 40, then 0, then repeat.
Does anyone have any pointers how I can figure out what the 80 and 40 connections are. I certainly don't have any timed events in my code, so there shouldn't be any calls being made unless a person is actually hitting the site.
UPDATE:
I setup a staging server and blocked all incoming traffic except my own IP. So the same code running, just without access from the outside world. And the HTTP success appears only when I hit the server myself (as expected). This suggests that my site is being hit by an outside bot maybe? Does anyone know how to protect against this? Or at least diagnose if the requests are not legitimate, etc?
I'd say it's this setting that causes the traffic:
Always On. By default, websites are unloaded if they are idle for some period of time. This lets the system conserve resources. In Basic or Standard mode, you can enable Always On to keep the site loaded all the time. If your site runs continuous web jobs, you should enable Always On, or the web jobs may not run reliably
http://azure.microsoft.com/en-us/documentation/articles/web-sites-configure/
It's just a keep alive to avoid cold starts every time you or someone else visit your site.
Here's another reference that describes this behavior:
What the always-on feature does is simply ping your site every now and
then, to keep the application pool up and running.
And Scott Gu says:
One of the other useful Web Site features that we are introducing
today is a feature we call “Always On”. When Always On is enabled on a
site, Windows Azure will automatically ping your Web Site regularly to
ensure that the Web Site is always active and in a warm/running state.
This is useful to ensure that a site is always responsive (and that
the app domain or worker process has not paged out due to lack of
external HTTP requests).
About the traffic in general: First of all, the requests could really only come from Microsoft, since any traffic pattern like this will quickly be automatically detected and blocked when using Azure Websites - you cannot set up a keep alive like this yourself. Second, no modern bot whatsoever would regularily ping a specific page with that kind of regularity since it's all to obvious. Any modern datacenter security appliance would catch that kind of traffic and block/ignore/nullroute it.
As for your question regarding protection and security: Microsoft cannot protect your code from yourself. However, everything at the perimeter is managed and handled by Microsoft. That's one of the USP features of Azure - Firewall, Load Balancing, Spoofing, Anti-bot and DDOS protection etc. There will of course always be security concerns regarding any publicly exposed service but you can stay focused on your application while Microsoft manages the rest.
When running Azure Websites, you're in the hands of Microsoft regarding security outside of your application scope. That's a great thing, but if you really like to be able to use other security measures you'll have to set up a virtual machine instead and run your site from there.
You may want to first understand what are these requests. Enable web server logging for the website on Azure Management portal and download IIS logs for your website after seeing this pattern. Then check those to understand the URL, client ip addresses for the requests and user agent field to identify if the requests are really from search bots. Based on the observation, you can either disable some IP statically, use dynamic ip restrictions or configure URLREWRITE to block requests with specific patterns in request or request headers
EDIT
This is how you can block search bots - http://moz.com/ugc/blocking-bots-based-on-useragent
You can configure the URLREWRITE locally on an IIS server in the way described in the above article and then copy the configuration generated in the web.config or connect to the azure website directly using IIS manager as described in http://azure.microsoft.com/blog/2014/02/28/remote-administration-of-windows-azure-websites-using-iis-manager/ and configure urlrewrite rule

Workaround if the Application is Down

We have deployed an application on the server.
Problem is, sometimes the application will be down due to some issue (Ex: While Downloading huge volume of data into Excel).
The application will be up after manually restarting the IIS.
We are creating a new application, so we are not working to fix this issue.
As a workaround, we are trying to build an exe with the below requirement:
Ping the application deployed on the server and find out whether the application is up or down, If the application is down, restart IIS.
Is it possible to ping a local website on the IIS? Is there any other way to do a temporary fix?
Hmmm, that kind of stability isn't good. However, you're interested in monitoring a URL and determining whether it is active...
TBH, I'm sure there are a few monitoring applications knocking around, some even free if that's you thing that will recognise specific ports and utilise appropriate protocols such as HTTP. But if you fancy having a go yourself you could always utilise the HttpWebRequest to mock up a request to the server and hopefully it will respond in a timely manner. Typically if you're just touching the server you can utilise a 'HEAD' request you just receives the header data rather than all the data. Check out this example.

Round robin load balancing options for a single client

We have a biztalk server that makes frequent calls to a web service that we also host.
The web service is hosted on 4 servers with a DNS load balancer sitting between them. The theory is that each subsequent call to the service will round robin the servers and balance the load.
However this does not work presumably because the result of the DNS lookup is cached for a small amount of time on the client. The result is that we get a flood of requests to each server before it moves on to the next.
Is that presumption correct and what are the alternative options here?
a bit more googling has suggested that I can disable client side caching for DNS: http://support.microsoft.com/kb/318803
...however this states the default cache time is 1 day which is not consistent with my experiences
You need to load balance at a lower level with NLB Clustering on Windows or LVS on Linux (or other equivalent piece of software). If you are letting clients to the web service keep an HTTP connection open for longer than a single request/response you still might not get the granularity of load balancing you are looking for so you might have to reconfigure your application servers if that is the case.
The solution we finally decided to go with was Application Request Routing which is an IIS extension. In tests this has shown to do what we want and is far easier for us (as developers) to get up and running as compared to a hardware load balancer.
http://www.iis.net/download/ApplicationRequestRouting

Redirecting http traffic to another server temporarily

Assume you have one box (dedicated server) that's on 24 7 and several other boxes that are user machines that have unused bandwidth. Assume you want to host several web pages. How can the dedicated server redirect http traffic to the user machines. It is desirable that the address field in the web browser still displays the right address, and not an ip. Ie. I don't want to redirect to another web page, I want to tell the web browser that it should request the same web page from a different server. I have been browsing through the 3xx codes, and I don't think they are made for anything like this.
It should work some what along these lines:
1. Dedicated server is online all the time.
2. User machine starts and tells the dedicated server that it's online.
(several other user machines can do similarly)
3. Web browser looks up domain name and finds out that it points to dedicated server.
4. Web browser requests page.
5. Dedicated server tells web browser to repeat request to user machine
Is it possible to use some kind of redirect, and preferably tell the browser to keep sending further requests to user machine. The user machine can close down at almost any point of time, but it is assumed that the user machine will wait for ongoing transactions to finish, no closing the server program in the middle of a get or something.
What you want is called a Proxy server or load balancer that would sit in front of your web server.
The web browser would always talk to the load balancer, and the load balancer would forward the request to one of several back-end servers. No redirect is needed on the client side, as the client always thinks it is just talking to the load balancer.
ETA:
Looking at your various comments and re-reading the question, I think I misunderstood what you wanted to do. I was thinking that all the machines serving content would be on the same network, but now I see that you are looking for something more like a p2p web server setup.
If that's the case, using DNS and HTTP 30x redirects would probably be what you need. It would probably look something like this:
Your "master" server would serve as an entry point for the app, and would have a well known host name, e.g. "www.myapp.com".
Whenever a new "user" machine came online, it would register itself with the master server and a the master server would create or update a DNS entry for that user machine, e.g. "user123.myapp.com".
If a request came to the master server for a given page, e.g. "www.myapp.com/index.htm", it would do a 302 redirect to one of the user machines based on whatever DNS entry it had created for that machine - e.g. redirect them to "user123.myapp.com/index.htm".
Some problems I see with this approach:
First, Once a user gets redirected to a user machine, if the user machine went offline it would seem like the app was dead. You could avoid this by having all the links on every page specifically point to "www.myapp.com" instead of using relative links, but then every single request has to be routed through the "master server" which would be relatively inefficient.
You could potentially solve this by changing the DNS entry for a user machine when it goes offline to point back to the master server, but that wouldn't work without an extremely short TTL.
Another issue you'll have is tracking sessions. You probably wouldn't be able to use sessions very effectively with this setup without a shared session state server of some sort accessible by all the user machines. Although cookies should still work.
In networking, load balancing is a technique to distribute workload evenly across two or more computers, network links, CPUs, hard drives, or other resources, in order to get optimal resource utilization, maximize throughput, minimize response time, and avoid overload. Using multiple components with load balancing, instead of a single component, may increase reliability through redundancy. The load balancing service is usually provided by a dedicated program or hardware device (such as a multilayer switch or a DNS server).
and more interesting stuff in here
apart from load balancing you will need to set up more or less similar environment on the "users machines"
This sounds like 1 part proxy, 1 part load balancer, and about 100 parts disaster.
If I had to guess, I'd say you're trying to build some type of relatively anonymous torrent... But I may be wrong. If I'm right, HTTP is entirely the wrong protocol for something like this.
You could use dns, off the top of my head, you could setup a hostname for each machine that is going to serve users:
www in A xxx.xxx.xxx.xxx # ip address of machine 1
www in A xxx.xxx.xxx.xxx # ip address of machine 2
www in A xxx.xxx.xxx.xxx # ip address of machine 3
Then as others come online, you could add then to the dns entries:
www in A xxx.xxx.xxx.xxx # ip address of machine 4
Only problem is you'll have to lower the time to live (TTL) entry for each record down to make it smaller (I think the default is 86400 - 1 day)
If a machine does down, you'll have to remove the dns entry, though I do think this is the least intensive way of adding capacity to any website. Jeff Attwood has more info here: is round robin dns good enough?

Resources