I have a VB.NET/Vue website hosted on an internal IIS 8.5 Windows 2012R2 Server. Our company has about 30 users using the site at any given time. The users are experiencing random delays throughout the day and on some days there's no delays (site works great most of the time). What I'm looking for is any suggestions on where to start looking to solve the issue. Here's what I've found so far.
User goes to site and initiates an api request from the UI
User sees a loading icon for anywhere up to a minute or so while the request returns
The request eventually reaches the server after some time and executes really fast within milliseconds and returns the response to the user
By this time, many users have already refreshed the page making new requests that succeed on page load. For the users that are patient and wait for the response, it eventually returns the response.
Here's some screenshots:
So to sum everything up, there are several users experiencing delays on a daily basis.
Some days we don’t have any delays, but most days we have several users experiencing multiple delays of several seconds to 30 seconds to 1 minute.
I’ve found all this using LogRocket and NewRelic and what is happening is all these requests are completing within milliseconds, but the request doesn’t seem to reach the server for some period of time.
I’ve been monitoring the CPU/Memory/Network on these servers and it all seems fine to me during when these issues occur.
It seems that the problem lies between the users computer and whatever hardware/software exists before reaching the web server.
Update here... Found that the problem is occurring on the users computer in all these instances. Using google chrome's performance api, I was able to track timing info for these requests and found that the problem is in the fetchStart. So whatever is happening here is the cause of the issue.
Example below:
entryType: resource
startTime: 1119531.820000033
duration: 56882.43999995757
initiatorType: xmlhttprequest
nextHopProtocol: http/1.1
workerStart: 0
redirectStart: 0
redirectEnd: 0
fetchStart: 1119531.820000033
domainLookupStart: 1176401.0199999902
domainLookupEnd: 1176402.2699999623
connectStart: 1176402.2699999623
connectEnd: 1176404.8350000521
secureConnectionStart: 1176403.6700000288
requestStart: 1176404.8549999716
responseStart: 1176413.5300000198
responseEnd: 1176414.2599999905
transferSize: 15145
encodedBodySize: 14884
decodedBodySize: 14884
serverTiming: []
workerTiming: []
fetchStart is at 1119531.820000033, then requestStart is at 1176404.8549999716 so the problem is something between fetchStart and requestStart. Still looking into what is causing this.
In 2022, we are experiencing something very similar with a small fraction of our customers. There is a significant gap between the timing api requestStart and the startTime. This gap can be up to 8 minutes -- I admire the patience of customers waiting that long. The wait periods are also close to multiples of a minute.
In our case, it appears that there is a (transparent?) proxy between those browsers and our server infrastructure which appears to be triggering the problem. In particular, it forces a downgrade of HTTP/2 to HTTP/1.1. Whitelisting our website in that proxy does solve the problem. This isn't a very satisfactory solution, but it does make the customer happier!
[UPDATE]
In our case, it turned out that we were sending a Content-Length header with a non-zero value on a 304 response. This is technically invalid and it caused problems with the proxy. This happened because of the Django CommonMiddleware which always puts a content-length header on responses. The solution was to add a new piece of middleware that strips out the content-length (and content) on a 304 response.
It turned out that the content was already being stripped by our nginx frontend, but it is better not to generate it in the first place.
And what was the content? -- in our case, it was the 4 characters 'null'!
Related
When I check (https://www.readonlinenewspaper.com) site speed using PageSpeed Insights.
I am not able to see and results and get an error message like below:
Lighthouse returned an error: FAILED_DOCUMENT_REQUEST. Lighthouse was unable to reliably load the page you requested. Make sure you are testing the correct URL and that the server is properly responding to all requests. (Details: net::ERR_CONNECTION_FAILED)
It is probably caused by one of two things
1. The site just takes too long to load.
Your page takes well over 40 seconds to load (on a high speed desktop connection, albeit in the UK and I am guessing this is somewhere else due to the long delay on requests.) so Page Speed Insights thinks it is broken as the page never completes loading within its timeout period.
Your country flags are the main cause of this, you should instead consider a CSS image sprite, or inline SVGs as the total of 438 requests on your page is so high you will never get good performance (generally only 8 requests can be made at once so that means you have over 50 round trips to your server for resources.)
If each set of eight resources takes 200ms to complete that is 10 seconds of latency (dead time waiting for a response) on its own, for me they were taking 800 to 1000 ms each!
This is particularly slow so perhaps there is something wrong with your hosting configuration or website setup? (You aren't storing the flag URLs in the database and looking them up one at a time in a loop by any chance are you?).
2. Hotjar
For some reason Page Speed Insights doesn't seem to play well with hotjar.
It is something to do with websockets but I never got to the bottom of it I just know that this is a problem I see often when people use hotjar and it is related to web sockets (maybe something to do with the wss:// protocol or their implementation).
Try disabling hotjar and run the test and see if it works then (perhaps test on another page when investigating this as it is only the homepage that is unbearably slow to load because of the flags as per point one).
p.s. the resource online-newspapers-banner-02.jpg is not being loaded over HTTPS so fix that, nothing to do with your question I just noticed the site was showing as "not secure" and I think that is the cause.
We have a web page that calls a stored procedure. The stored procedure takes ~ 5 minutes to run. When called from ASP.NET, it times out at ~ 2 minutes and 40 seconds with an HTTP execution timeout error.
I tried setting an HTTP timeout property in my web.config file as:
<httpRuntime executionTimeout="600">
But it didn't help.
Any ideas appreciated. thanks
You should not create a web application with a page that could require such a long response time from the server. As a general rule, anything that you know will take longer than 10 seconds or so should be done as an asynchronous process. You've probably seen websites that display a "please wait" screen for long running processes, most times these pages work by delegating the long-running job to a background process or message queue, then polling until the job either completes successfully or errors out.
I know this may seem like a tall order if you've not done it before, but it really is the professional way to handle the scenario you're faced with. In some cases, your clients may be working from networks with proxy servers set up to abort the HTTP request regardless of what you've set your timeouts to.
This is a dated link, and I believe the .NET framework has introduced other ways of doing this, but I actually still use the following approach today in certain scenarios.
http://www.devx.com/asp/Article/29617
At a customer of ours, candidates take tests with our software. If their test is finished, some calculations are done on the server. Now, sometimes, 200 candidates can end their test at the same time, so 200 calculations are done concurrent. The calculations all seem to go fine, but some calls to the IIS7 server get back a http error...
In Flex, this is the error:
code = "NetConnection.Call.Failed"
description = "HTTP: Status 200"
details = "http://servername/weborb.aspx"
level = "error"
Isn't Status 200 OK? So what's wrong here? Is it even a IIS7 problem? Of the 200 candidates 20 got this message. When restarting their test, everything worked well.
I have found this on the subject, but I wonder if this has anything to do with my problem (next week our customer will do some stresstests and I'll already asked them to test test if solution in this post works).
Some questions:
Can it be that IIS7 blocks certain http calls when load is to much?
How can you know that IIS7 blocked those calls because of too much load?
Is it possible to configure these things?
Technically, in the future I would like to queue the calculations, but for now, there isn't time nor budget for that.
Application: Flex, WebORB, ASP.NET, IIS7 en SQLSERVER2008. Server is Windows Server 2008.
This problem seems very familiar to me. We have a bunch of flex widgets which are connected to one server-side and sometimes it also returns "Netconnection.Call.Failed". For us, it seems that the IIS(and MSSql behind) cannot process all the requests in time, hence some of them are timed out.
Try to check how much time each request/all requests take, then check your timeout setting.
There are plenty of things you can do to fine tune the performance of both your server and IIS.
To answer your questions:
A maximum concurrent connections limit (plus other settings) in IIS 7 can be configured by selecting your website in IIS Manager and selecting 'Advanced Settings' in the Actions Pane on the right. Though by default this is a number much higher than 200.
Looking in the IIS log files, specifically the return status codes can give you an indication of what went wrong. Equally the Windows event log should also tell you of any exceptions that have occurred.
I suggest you turn on load balancing between instances of IIS, or consider using nginx for load balancing.
also set the limit of 200 User higher. Since in IIS, each user connect to your application is count as 1 instance of user, at some point you will use up 200 user slot. This is the default setting and you can set it to much higher number.
Also set your time out to a higher number.
Also look at Comet if you trying to call consistent result like live data (stock, weather, chat, shoutbox)
Technically, in the future I would like to queue the calculations, but for now, there isn't time nor budget for that.
A queue isn't that hard to put together with a batch-processing script running off Windows' scheduled tasks. Just dump results into a SQL DB, or if you're really lazy, insert rows in SQL with a serialized array, then have them "come back" to see their results. "Please wait, your results are still processing."
It'd take you less time than waiting around on SO for a silver-bullet answer in my opinion.
This may be the most mysterious problem I've ever encountered.
We have an IIS7 install with 3 Web Sites on it, each with it's own Application Pool. Once a day, for about an hour, a specific one of them goes down.
What I mean by "goes down" is:
It stops responding to requests for dynamic pages (ex. default.aspx) but will serve static files fine (logo.png).
Wireshark tells me that these dynamic page requests are actually return HTTP 500 Internal Server errors, but in the browser, I don't see an error. I just see the browser spinning.
If I log on locally to the box and surf around everything runs fine. All the pages pull up, so the database is being queried. It all seems perfectly normal.
There are no errors in the event log.
There are no errors recorded that have been captured by our internal (Application-level) error logging.
The basic IIS log file, which I thought logged every request, shows no record of these requests coming in.
And, if I restart the App Pool for the Web Site, everything comes back immediately. Or, if I just wait an hour or so, it comes back.
So, I've ruled out:
DNS issues, since I have no problem terminal servicing into the box by hostname.
Database issues, since the site works fine when I'm local to the box and surfing around
HTTP firewall issues, since I'm seeing the requests in wireshark, and am even getting images to serve up.
I have to assume it's a problem with my application, but IIS doesn't even show that these requests ever happened, and nothing in IIS or my app is logging errors.
It also doesn't even go down at the same time each day. This started at night (#midnight) and seems that it's gradually started moving it's daily time by an hour or so, until the point now where it hit at 9AM.
Any clues you might have for further troubleshooting would be greatly appreciated.
Tom
I'd fire up performance monitor and look for requests and exceptions being thrown. Not a whole lot of value in my answer but it might started pointing you in the right direction.
Actually, check the event logs first, see if something is throwing errors. Also, check memory usage and paging.
I'm encountering a situation where it takes a long time for ASP.NET to generate reply with the web page (more than 2 hours). It due to the codebehind running for a while (very long, slow loop).
Browser (both IE & Firefox) stops waiting for the reply (after about an hour) and gives generic cannot display webpage error (similar to what you would see if you'd try to navige to non-existing server).
At the same time asp.net app keeps going (I can see it in debugger) and eventually completes.
Why does this happen? Are there any settings in web.config to influence this? I'm hoping there's a timeout setting that I'm missing that's causing this.
Maybe a settings in IE or Firefox? But I think they wait while the server is keeping connection alive.
I'm experiencing this even when I launch app in debug mode (with compilation debug="true") on my local machine from VS (so it's not running on IIS, but on ASP.NET Dev Server).
I know it's bad that it takes so long to generate the page, but it doesn't matter at this stage. Speeding it up would take a lot of extra work and the delay doesn't really matter. This is used internally.
I realize I can redesign around this issue running logic to a background process and getting notified when it's done through AJAX, or pull it to a desktop app or service or whatever. Something along those lines will be done eventually, but that's not what I'm asking about right now.
Sounds like you're using IE and it is timing out while waiting for a response from the server.
You can find a technet article to adjust this limit:
http://support.microsoft.com/kb/181050
CAUSE
By design, Internet Explorer imposes a
time-out limit for the server to
return data. The time-out limit is
five minutes for versions 4.0 and 4.01
and is 60 minutes for versions 5.x, 6,
and 7. As a result, Internet Explorer
does not wait endlessly for the server
to come back with data when the server
has a problem. Back to the top
RESOLUTION
In general, if a page does not return within a few
minutes, many users perceive that a
problem has occurred and stop the
process. Therefore, design your server
processes to return data within 5
minutes so that users do not have to
wait for an extensive period of time.
The entire paradigm of the Web is of request/response. Not request, wait two hours, response!
If the work takes so long to do, then have the page request trigger the work, and then not wait for it. Put the long-running code into a Windows service, and have the service listen to an MSMQ queue (or use WCF with an MSMQ endpoint). Have the page send requests for work to this queue. The service will read a request, maybe start up a new thread to process it, then write a response to another queue, file, or whatever.
The same page, or a different, "progress" page can poll the response queue or file for responses, and update the user, assuming the user still cares after two hours.
For something that takes this long, I would figure out a way to kick it off via AJAX and then periodically check on it's status. The background process should update some status variable on a regular basis and store it's data in the cache or session when complete. When it completes and the browser detects this (via AJAX), have the browser do a real postback (or get by changing location.href), pick up the saved data, and generate the page.
I have a process that can take a few minutes so I spin off a separate thread and send the result via ftp. If an error occures in the process I send myself an error message including the stack trace. You may want to consider sending the results via email or some other place then the browser and use a thread as well.