IIS file download hangs/timeouts - sc-win32-status = 64 - http

Any thoughts on why I might be getting tons of "hangs" when trying to download a file via HTTP, based on the following?
Server is IIS 6
File being downloaded is a binary file, rather than a web page
Several clients hang, including TrueUpdate and FlexNet web updating packages, as well as custom .NET app that just does basic HttpWebRequest/HttpWebResponse logic and downloads using a response stream
IIS log file signature when success is 200 0 0 (sc-status sc-substatus sc-win32-status)
For failure, error signature is 200 0 64
sc-win32-status of 64 is "the specified network name is no longer available"
I can point firefox at the URL and download successfully every time (perhaps some retry logic is happening under the hood)
At this point, it seems like either there's something funky with my server that it's throwing these errors, or that this is just normal network behavior and I need to use (or write) a client that is more resilient to the failures.
Any thoughts?

Perhaps your issue was a low level networking issue with the ISP as you speculated in your reply comment. I am experiencing a similar problem with IIS and some mysterious 200 0 64 lines appearing in the log file, which is how I found this post. For the record, this is my understanding of sc-win32-status=64; I hope someone can correct me if I'm wrong.
sc-win32-status 64 means “The specified network name is no longer available.”
After IIS has sent the final response to the client, it waits for an ACK message from the client.
Sometimes clients will reset the connection instead of sending the final ACK back to server. This is not a graceful connection close, so IIS logs the “64” code to indicate an interruption.
Many clients will reset the connection when they are done with it, to free up the socket instead of leaving it in TIME_WAIT/CLOSE_WAIT.
Proxies may have a tendancy to do this more often than individual clients.

I've spent two weeks investigating this issue. For me I had the scenario in which intermittent random requests were being prematurely terminated. This was resulting in IIS logs with status code 200, but with a win32-status of 64.
Our infrastructure includes two Windows IIS servers behind two NetScaler load balancers in HA mode.
In my particular case, the problem was that the NetScaler had a feature called "Intergrated Caching" turned on (http://support.citrix.com/proddocs/topic/ns-optimization-10-5-map/ns-IC-gen-wrapper-10-con.html).
After disabling this feature, the request interruptions ceased. And the site operated normally. I'm not sure how or why this was causing a problem, but there it is.
If you use a proxy or a load balancer, do some investigation of what features they have turned on. For me the cause was something between the client and the server interrupting the requests.
I hope that this explanation will at least save someone else's time.

Check the headers from the server, especially content-type, and content-length, it's possible that your clients don't recognize the format of the binary file and hang while waiting for bytes that never come, or maybe they close the underlying TCP connection, which may cause IIS to log the win32 status 64.

Spent three days on this.
It was the timeout that was set to 4 seconds (curl php request).
Solution was to increase the timeout setting:
//curl_setopt($ch, CURLOPT_TIMEOUT, 4); // times out after 4s
curl_setopt($ch, CURLOPT_TIMEOUT, 60); // times out after 60s

You will have to use wireshare or network monitor to gather more data on this problem. Me think.

I suggest you put Fiddler in between your server and your download client. This should reveal the differences between Firefox and other cients.

Description of all sc-win32-status codes for reference
https://learn.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-
ERROR_NETNAME_DELETED
64 (0x40)
The specified network name is no longer available.

Related

Load balancing TCP traffic using Apache Camel with Netty leads to transaction failures

I am new to Apache Camel and Netty and this is my first project. I am trying to use Camel with the Netty component to load balance heavy traffic in a back end load test scenario.This is the setup I have right now:
from("netty:tcp:\\this-ip:9445?defaultCodec=false&sync=true").loadBalance().roundRobin().to("netty:tcp:\\backend1:9445?defaultCodec=false&sync=true,netty:tcp:\\backend2:9445?defaultCodec=false&sync=true)
The issue is unexpected buffer sizes that I am receiving in the response that I see in the client system sending tcp traffic to Camel. When I send multiple requests one after the other I see no issues and the buffer size is as expected. But, when I try running multiple users sending similar requests to Camel on the same port, I intermittently see unexpected buffer sizes, sometimes 0 bytes to sometimes even greater than the expected number of bytes. I tried playing around with multiple options mentioned in the Camel-Netty page like:
Increasing backlog
keepAlive
buffersizes
timeouts
poolSizes
workerCount
synchronous
stream caching (did not work)
disabled useOriginalMessage for performance
System level TCP parameters, etc. among others.
I am yet to resolve the issue. I am not sure if I'm fundamentally missing something. I did take a look at the encoder/decoders and guess if that could be an issue. But, I don't understand why a load balancer needs to encode/decode messages. I have worked with other load balancers which just require endpoint configurations and hence, I am assuming that Camel does not require this. Am I right? Please know that the issue is not with my client/backend as I ran a 2000 user load test from my client to the backend with less than 1% failures but see a large number of failure ( not that there are no successes) with Camel. I have the following questions:
1.Is this a valid use-case for Apache Camel- Netty? Should I be looking at Mina or others?
2.Can I try to route tcp traffic to JMS or other components and then finally to the tcp endpoint?
3.Do I need encoders/decoders or should this configuration work?
4.Should I continue with this approach or try some other load balancer?
Please let me know if you have any other suggestions. TIA.
Edit1:
I also tried the same approach with netty4 and mina components. The route looks similar to the one in netty. The route with netty4 is as follows:
from("netty4:tcp:\\this-ip:9445?defaultCodec=false&sync=true").to("netty4:tcp:\\backend1:9445?defaultCodec=false&sync=true")
I read a few posts which had the same issue but did not find any solution relevant to my issue.
Edit2:
I increased the receive timeout at my client and immediately noticed the mismatch in expected buffer length issue fall to less than 1%. However, I see that the response times for each transaction when using Camel and not using it is huge; almost 10 times higher. Can you help me with reducing the response times for each transaction? The message received back at my client varies from 5000 to 20000 bytes. Here is my latest route:
from("netty:tcp://this-ip:9445?sync=true&allowDefaultCodec=false&workerCount=20&requestTimeout=30000")
.threads(20)
.loadBalance()
.roundRobin()
.to("netty:tcp://backend-1:9445?sync=true&allowDefaultCodec=false","netty:tcp://backend-2:9445?sync=true&allowDefaultCodec=false")
I also used certain performance enhancements like:
context.setAllowUseOriginalMessage(false);
context.disableJMX();
context.setMessageHistory(false);
context.setLazyLoadTypeConverters(true);
Can you point me in the right direction about how I can reduce the individual transaction times?
For netty4 component there is no parameter called defaultCodec. It is called allowDefaultCodec. http://camel.apache.org/netty4.html
Also, try something like this first.
from("netty4:tcp:\\this-ip:9445?textline=true&sync=true").to("netty4:tcp:\\backend1:9445?textline=true&sync=true")
The above means the data being sent is normal text. If you are sending byte or something else you will need to provide decoding/encoding for netty to handle the data.
And a side note. Before running the Camel route, test manually to send test messages via a standard tcp tool like sockettest to verify that everything works. Then implement the same via Camel. You can find sockettest here http://sockettest.sourceforge.net/ .
I finally solved the issue with the same route settings as above. The issue was with the Request and Response Delimiter not configured properly due to which it was either closing the connection too early leading to unexpected buffer sizes or it was waiting too long even after the entire buffer was received leading to high response times.

Why is the initial connection time for a HTTP request so long?

My web app sits behind a Nginx. Occasionally, the loading of my web page takes more than 10 seconds, I used Chrome DevTools to track the timing, and it looks like this:
The weird thing is, when the page loads slowly, the initial connection time is always 11 seconds long. And after this slow request, subsequent loading of the same page becomes very fast.
What is the possible problem that cause this?
P.S. If this is caused by a resource limitation on my server, can I see some errors/warnings in some system log?
The initial connection refers to the time taken to perform the initial TCP handshake and negotiating an SSL (where applicable). The slowness could be caused by congestion, where the server has hit a limit and can't respond to new connections while existing ones are pending. You could look into some performance enhancements in your Nginx configuration.
Use dig command to check you domain name resolution process.
if return multi answer section, check these ips is valid.
You should eliminate the file incallings which are pointing to not-existing files. I had the same issue at a customer where a 404 image caused the problem, as it delayed the loading of other files.

When a browser says that an http request is aborted what has actually happened?

On some occasions an http request appears to be aborted by the browser. Using Firebug or something in the status column where it might normally say, for example, 200 OK it says "aborted" (in red). When this occurs in Internet Explorer the user may see an IE generated message "Internet Explorer cannot display this page".
What has happened here?
I don't think it is a timeout issue as this occurs in quite a short time frame and I believe that I can get a successful response (e.g. a 200) when the response takes longer.
And it isn't to do with the server; the request is aborted by the browser. It isn't that we have had a server error back. (E.g. 500).
Also; the same request (to the same URL with the same method) usually works. So it isn't something to do say with SSL being misconfigured.
I am assuming that this is something to do with internet connectivity. But I don't know enough about networking / the internet to know what that really means.
So. The specific question is; what cases could cause this error?
This can happen when the browser is using an outdated SSL/TLS version and requests a resource that requires a secure connection
The server, your browser or any machine (or operating system) in between can drop the underlying TCP connection for any reason (timeouts, digging machines, intrusion detection).
You won't get a server error from those situations, because the server either didn't receive your request, it did but it took too long to process, or the server sent its (proper) response but it wasn't fully transmitted.
This can happer when a post are fired during a get (for example during dowload of a image), or when some image tag have not a src

Different server & browser HTTP status codes

I have a small python web application running on nginx with unicorn. The web application refresh it's page automatically every 1 minute.
Every day I see that around the same hour, the browser reports a 504 Gateway Time-out error and the application stops refreshing obviously.
I checked it with both chrome and firefox on two different client machines and two different server machines and found out it happens almost everyday on the same time (different time for each web server).
The weird thing is that looking at the web server access log I identify these calls and they are reported with 200 OK status code.
Could it be the the browser reports a different error code than the server due to connection issues? Any ideas how should I keep investigating it?
We found out the indeed our server had a maintenance procedure which blocked the access to it. Although it finished the request after a while the browser "gave up" and returned a timeout error. Once the maintenance procedure was canceled - the issue was resolved.
Yes - the server is able to serve the page ok so returns 200, but the client cannot finish the connection.
It could be a part of your infrastructure (firewall?) is choosing to update or something, although the odds of this happening at the exact same time of your request is slim unless it's a long running request or gateway outage.

fsc.exe is very slow because it tries to access crl.microsoft.com

When I run F# compiler - fsc.exe - on our build server it takes ages (~20sec) to run even when there are no input files. After some investigation I found out that it's because the application tries to access crl.microsoft.com (probably to check if some certificates aren't revoked). However, the account under which it runs doesn't have an access to the Internet. And because our routers/firewalls/whatever just drops the SYN packets, fsc.exe tries several times before giving up.
The only solution which comes to mind is to set clr.microsoft.com to 127.0.0.1 in hosts file but it's pretty nasty solution. Moreover, I'll need fsc.exe on our production box, where I can't do such things. Any other ideas?
Thanks
Come across this myself - here are some links... to better descriptions and some alternatives
http://www.eggheadcafe.com/software/aspnet/29381925/code-signing-performance-problems-with-certificate-revocation-chec.aspx
I dug up this form an old MS KB for Exchange when we hit it... Just got the DNS Server to reply as stated (might be the solution for your production box.)
MS Support KB
The CRL check is timing out because it
never receives a response. If a router
were to send a “no route to host” ICMP
packet or similar error instead of
just dropping the packets, the CRL
check would fail right away, and the
service would start. You can add an
entry to crl.microsoft.com in the
hosts file or on the DNS server and
send the packets to a legitimate
location on the network, such as
127.0.0.1, which will reject the connection..."

Resources