Problem
In our webapp (Angular + JAX-RS REST backend running on WebLogic + IIS proxy) we have 1 REST endpoint which returns an XLSX download (octet-stream). These XLSX files can be huge (up to the XLSX limit of 1M rows).
After some time, on slow connections, the download fails (ERR_CONNECTION_RESET in Chrome devtools).
The exact time when this happens varies:
Some days after 4-6 minutes, other days after 10-12 minutes. No clear pattern.
Fast(er) downloads work fine, are always succesful. I have seen downloads of hundreds of MBs finish succesfully in (eg.) 8 minutes, but others fail at (eg.) 11 minutes.
The problem is that I do not understand why the download fails and why the connection is reset. Any pointers or tips on how to test and debug this problem are welcome.
As far as I understand ERR_CONNECTION_RESET just means that something reset the connection. Just looking at the response headers gives no indication on who reset it.
Question
How can I understand why the download fails and who resets the connection?
The logfiles do not state which component resets the connection.
Setup
The webapp is deployed on WebLogic 12.2 on the internal network.
IIS 8.5 acts as a reverse proxy making the webapp accessible on the internet.
Details
When I download without IIS as reverse proxy (from our internal network), but with a speed limit in Chrome devtools), the download is always succesful. I've had downloads with 50kb/s which finished fine in 2 hours.
We cannot find any settings in IIS which influence this behaviour, so I am hesitant to definitively conclude that IIS causes the connection reset, since the precise time varies.
The WebLogic (exception) logs state that writing to the OutputStream fails because of a closed connection. No exceptions or log entries which indicate that WebLogic closed the connection.
Using other download speeds makes no difference. There is no direct relation to speed and connection reset time.
The download is never stalled.
VPN connection does not seem a factor, people with and without VPN experience the same problem.
Changing proxy is unfortunately not an immediate solution. Large corporate. Without understanding and knowing precisely that (if) IIS is the problem - not going to happen.
WebLogic exception
Caused by: java.net.SocketException: Socket closed
at weblogic.socket.NIOOutputStream.convertToSocketException(NIOOutputStream.java:250) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
at weblogic.socket.NIOOutputStream.access$600(NIOOutputStream.java:33) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
at weblogic.socket.NIOOutputStream$BlockingWriter.flush(NIOOutputStream.java:482) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
at weblogic.socket.NIOOutputStream$BlockingWriter.write(NIOOutputStream.java:334) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
at weblogic.socket.NIOOutputStream.write(NIOOutputStream.java:220) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
at weblogic.socket.JSSEFilterImpl.writeToNetwork(JSSEFilterImpl.java:829) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
at weblogic.socket.JSSEFilterImpl.wrapAndWrite(JSSEFilterImpl.java:789) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
at weblogic.socket.JSSEFilterImpl.write(JSSEFilterImpl.java:503) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
at weblogic.socket.JSSESocket$JSSEOutputStream.write(JSSESocket.java:154) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
at weblogic.servlet.internal.ChunkOutput.writeChunkTransfer(ChunkOutput.java:628) ~[com.oracle.weblogic.servlet.jar:12.2.1.4]
at weblogic.servlet.internal.ChunkOutput.writeChunks(ChunkOutput.java:590) ~[com.oracle.weblogic.servlet.jar:12.2.1.4]
at weblogic.servlet.internal.ChunkOutput.flush(ChunkOutput.java:474) ~[com.oracle.weblogic.servlet.jar:12.2.1.4]
at weblogic.servlet.internal.ChunkOutput$3.checkForFlush(ChunkOutput.java:760) ~[com.oracle.weblogic.servlet.jar:12.2.1.4]
at weblogic.servlet.internal.ChunkOutput.write(ChunkOutput.java:373) ~[com.oracle.weblogic.servlet.jar:12.2.1.4]
at weblogic.servlet.internal.ChunkOutputWrapper.write(ChunkOutputWrapper.java:165) ~[com.oracle.weblogic.servlet.jar:12.2.1.4]
at weblogic.servlet.internal.ServletOutputStreamImpl.write(ServletOutputStreamImpl.java:186) ~[com.oracle.weblogic.servlet.jar:12.2.1.4]
at org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.write(ResponseWriter.java:325) ~[org.glassfish.jersey.containers.jersey-container-servlet-core.jar:?]
at org.glassfish.jersey.message.internal.CommittingOutputStream.write(CommittingOutputStream.java:229) ~[org.glassfish.jersey.core.jersey-common.jar:?]
at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(WriterInterceptorExecutor.java:299) ~[org.glassfish.jersey.core.jersey-common.jar:?]
at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:253) ~[?:1.8.0_261]
at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211) ~[?:1.8.0_261]
at java.util.zip.ZipOutputStream.write(ZipOutputStream.java:331) ~[?:1.8.0_261]
at org.apache.poi.util.IOUtils.copy(IOUtils.java:317) ~[org.apache.poi-poi-3.17.jar:3.17]
at org.apache.poi.xssf.streaming.SXSSFWorkbook.copyStreamAndInjectWorksheet(SXSSFWorkbook.java:501) ~[org.apache.poi-poi-ooxml-3.17.jar:3.17]
at org.apache.poi.xssf.streaming.SXSSFWorkbook.injectData(SXSSFWorkbook.java:391) ~[org.apache.poi-poi-ooxml-3.17.jar:3.17]
at org.apache.poi.xssf.streaming.SXSSFWorkbook.write(SXSSFWorkbook.java:936) ~[org.apache.poi-poi-ooxml-3.17.jar:3.17]
...
IIS logs
The only line I could find relevant to this problem is:
1.2.3.4, -, 9/15/2020, 9:20:14, W3SVC3, HSTWEB, 2.3.4.5, 561236, 1813, 9658662, 500, 0, GET, /api/resources/export/FOO, sorting=1,
Related
I get Timer_ConnectionIdle message from error logs of httperror folder in system32/logfiles.
And sometimes the web page return service unavailable or connection refused.
What is the problem?
How can I solve that?
Two Different issues that you are talking about.
Timer connection idle is not something you need to be worried about. It is HTTP.SYS's way of telling you that the client with which it established a connection did not disconnect because there is always a chance that the client would want to establish the connection again. I think it usually waits for 2 minutes before terminating the connection and that is when you get this message in the HTTPERR logs.
Now coming to Service Unavailable and Connection Timeout errors, this is something that you need to take note of. Check for event logs during the time of issue and see if you find anything there.
If you are unable to find anything in the event logs, my next question would be to identify what is done in order to overcome the issue ? Do you recycle the application pool to get the application up and running ? Do you reset IIS ? If you do any of the above, then please capture a full user dump of w3wp process using debug diag during the time of issue (before performing an iisreset or application pool recycle). Analyzing the dump will tell you exactly whats going wrong.
Feel free to follow up with any questions you have.
This question already has answers here:
Does asp.net lifecycle continue if I close the browser in the middle of processing?
(3 answers)
Closed 6 years ago.
Context: We run our ASP.NET site behind an AWS load balancer which times requests out after 60 seconds. The first ever request to our site may take longer than this while caches warm up etc. There's obvious improvements we could make to this strategy, but that's beside the point for this question.
Assuming that the connection to our IIS instance is closed after 60 seconds, what happens to the execution of that request, in terms of my code being run?
Does it
continue even though there's no one to send the eventual response to?
kill the process - possibly during some disk IO operation?
run for the server's configured timeout value?
do something smarter?
This is actually depends from your code.
If you have a loop/work that takes too long, then its waits to end, then if they go to make some data send to the connection and find that the connection is closed you have an exception. I mean that if no data send to the client to check that the connection is lost - then your process will be still running to the end.
In witch case a dead loop will close and hung up is depends from the pool configuration. There you can set the max running time, then the must waiting time until he kill your non responsive process.
The function HttpResponse.IsClientConnected checks if the client is still connected and you can use it if you like to check and abort some long time process.
Using QuickFixN, if I restart my trading application I occasionally am unable to logon, getting a "an existing connection was forcibly closed by the remote host" error.
The QuickFix engine retries to connect every 30secs, but always gets the same error.
If I close my application and re-open, it will connect correctly.
Speaking to my broker, it seems that they are rejecting my logins because they did not recognize my connection as being closed first time. 2nd time around, me forcing the application to close will tear-down the TCP connection, meaning that 3rd time logins work.
So my question is: is there a way to close and re-open the TCP connection without restarting the application?
Sounds like the problem is kinda on their end. Since the problem happens when you don't formally log out (e.g crash or abnormal termination), that means that their implementation apparently doesn't recognize the TCP termination.
At a higher-than-TCP layer, their FIX engine should somewhat compensate. If a few heartbeat durations occur after your disconnect, their implementation should realize you're not there anymore, since you're not responding to heartbeats.
So, neither their low-layer TCP handlers nor their FIX engine are able to set the right flag somewhere in their system that says you've gone offline. That's weird. I don't see what you can do about that, aside from intentionally doing a startup/shutdown to kludge their state flag for you.
I'm usually really hesitant to blame the other side (especially because I run the QF/n project), but that's where I'm at with the information provided.
I created a web application, here is the architecture :
Tomcat7 deploy on Amazon EC2
Granite DS
nginx to redirect HTTPS throught the tomct7 port 8181
Flex application that uses RemoteObject on a secure Channel.
Occasionaly, maybye when a request takes to long time, the execution of a RemoteObject in Flex triggers this error :
faultCode:Channel.Call.Failed faultString:'error' faultDetail:'NetConnection.Call.Failed: HTTP: Status 504'
But in the most time, the response of the Remote object is correct.
Could you tell me if Nginx could block something? Or if BlazeDs has a tomeout?Any clues?
Thank you very much
We've had this issue for a long time...problem is we haven't been able to find a repeatable way to force it to disconnect.
Here the most comprehensive list of things to try that i've been able to find:
http://www.bopit.in.th/2009/10/14/flex-channel-connect-failed-error-netconnection-call-failed-http-status-200/
We've tried a couple of those solutions and it seems like we're getting less client disconnects.
There also may be a problem with AVG's linkscanner hijacking the request as it leaves the browser, and then losing it somewhere. We had one machine in our shop that would disconnect when using IE, and since uninstalling AVG, it's never happened on that machine again.
another thing you could check is the socket timeout:
NetConnection.Call.Failed happening sporadically in Flex3/Tomcat/BlazeDS/Spring
and here is a thread on adobe forum about the issue:
http://forums.adobe.com/thread/552133
Any thoughts on why I might be getting tons of "hangs" when trying to download a file via HTTP, based on the following?
Server is IIS 6
File being downloaded is a binary file, rather than a web page
Several clients hang, including TrueUpdate and FlexNet web updating packages, as well as custom .NET app that just does basic HttpWebRequest/HttpWebResponse logic and downloads using a response stream
IIS log file signature when success is 200 0 0 (sc-status sc-substatus sc-win32-status)
For failure, error signature is 200 0 64
sc-win32-status of 64 is "the specified network name is no longer available"
I can point firefox at the URL and download successfully every time (perhaps some retry logic is happening under the hood)
At this point, it seems like either there's something funky with my server that it's throwing these errors, or that this is just normal network behavior and I need to use (or write) a client that is more resilient to the failures.
Any thoughts?
Perhaps your issue was a low level networking issue with the ISP as you speculated in your reply comment. I am experiencing a similar problem with IIS and some mysterious 200 0 64 lines appearing in the log file, which is how I found this post. For the record, this is my understanding of sc-win32-status=64; I hope someone can correct me if I'm wrong.
sc-win32-status 64 means βThe specified network name is no longer available.β
After IIS has sent the final response to the client, it waits for an ACK message from the client.
Sometimes clients will reset the connection instead of sending the final ACK back to server. This is not a graceful connection close, so IIS logs the β64β code to indicate an interruption.
Many clients will reset the connection when they are done with it, to free up the socket instead of leaving it in TIME_WAIT/CLOSE_WAIT.
Proxies may have a tendancy to do this more often than individual clients.
I've spent two weeks investigating this issue. For me I had the scenario in which intermittent random requests were being prematurely terminated. This was resulting in IIS logs with status code 200, but with a win32-status of 64.
Our infrastructure includes two Windows IIS servers behind two NetScaler load balancers in HA mode.
In my particular case, the problem was that the NetScaler had a feature called "Intergrated Caching" turned on (http://support.citrix.com/proddocs/topic/ns-optimization-10-5-map/ns-IC-gen-wrapper-10-con.html).
After disabling this feature, the request interruptions ceased. And the site operated normally. I'm not sure how or why this was causing a problem, but there it is.
If you use a proxy or a load balancer, do some investigation of what features they have turned on. For me the cause was something between the client and the server interrupting the requests.
I hope that this explanation will at least save someone else's time.
Check the headers from the server, especially content-type, and content-length, it's possible that your clients don't recognize the format of the binary file and hang while waiting for bytes that never come, or maybe they close the underlying TCP connection, which may cause IIS to log the win32 status 64.
Spent three days on this.
It was the timeout that was set to 4 seconds (curl php request).
Solution was to increase the timeout setting:
//curl_setopt($ch, CURLOPT_TIMEOUT, 4); // times out after 4s
curl_setopt($ch, CURLOPT_TIMEOUT, 60); // times out after 60s
You will have to use wireshare or network monitor to gather more data on this problem. Me think.
I suggest you put Fiddler in between your server and your download client. This should reveal the differences between Firefox and other cients.
Description of all sc-win32-status codes for reference
https://learn.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-
ERROR_NETNAME_DELETED
64 (0x40)
The specified network name is no longer available.