network error (Tcp error) - tcp

I am inside a network where I need proxy settings to access the internet.
I have a weird problem.
The internet is working fine.
But it is one particular instance when i get this error:
Network Error (tcp_error)
A communication error occurred: "Operation timed out"
The Web Server may be down, too busy, or experiencing other problems preventing it from responding to requests. You may wish to try again at a later time.
For assistance, contact your network support team.
This happens when I use hadoop in local mode.
I can access the UI interface. I can see the jobs running. but when I try to see the logs of each task.. i am not able to access those logs.
UI--> job-->map--> task--> all <-- this is where the error is..
Any clues?
THanks

Not sure about exactly what your tcp action is, or about Hadoop or your proxy setup, but if you can reliably repeat the error, and the timeout error happens at approximately the same time each time you test, and that time is on the order of minutes, my guess would be that you've got a true processing delay (perhaps caused by blocking somewhere) at the server, but not necessarily.

Related

Continuously running send pipeline instance

An instance of a BizTalk send pipeline has started to run continuously. On 09/12/2021 an attempt was made to send a file via SFTP, which retried several times but ultimately failed due to a network issue. The error from the event logs is:
The adapter failed to transmit message going to send port "Deliver Outgoing - SFTP" with URL "sftp://xxx.xxxxxx.co.nz:22/To_****/%SourceFileName%". It will be retransmitted after the retry interval specified for this Send Port. Details:"WinSCP.SessionRemoteException: Network error: Software caused connection abort.
For some reason BizTalk made another send attempt at 1:49pm on 10/12/2021 which succeeded as confirmed by the administrator of the SFTP site. Despite this, BizTalk continued making intermittent send attempts and the pipeline instance is still running. The same file has been sent 4 times to the SFTP server.
The pipeline instance in theory should have suspended at 9:47pm on 09/12/2021. I have been able to confirm definitively whether anybody resumed it, but it seems unlikely at this stage. In any case, after sending successfully the pipeline instance should have terminated and should not be re-executing intermittently.
Does anybody know what could account for this behaviour? This is occurring on BTS2020 with CU2 applied.
I've sent messages over SFTP where the WinSCP interpretation of the date-modified attribute doesn't work with a specific type of SFTP server.
With the WinSCP GUI a dialogue box appears and you can disregard this error, but this option isn't available with BizTalk's GUI. This error appears when a file with the same filename already exists on the server and is supposed to be overwritten.
My solution was to create a pipeline component that removed %SourceFileName% on the server. The pipeline component (just like WinSCP GUI) can disregard the modified-date.

Timer_ConnectionIdle IIS

I get Timer_ConnectionIdle message from error logs of httperror folder in system32/logfiles.
And sometimes the web page return service unavailable or connection refused.
What is the problem?
How can I solve that?
Two Different issues that you are talking about.
Timer connection idle is not something you need to be worried about. It is HTTP.SYS's way of telling you that the client with which it established a connection did not disconnect because there is always a chance that the client would want to establish the connection again. I think it usually waits for 2 minutes before terminating the connection and that is when you get this message in the HTTPERR logs.
Now coming to Service Unavailable and Connection Timeout errors, this is something that you need to take note of. Check for event logs during the time of issue and see if you find anything there.
If you are unable to find anything in the event logs, my next question would be to identify what is done in order to overcome the issue ? Do you recycle the application pool to get the application up and running ? Do you reset IIS ? If you do any of the above, then please capture a full user dump of w3wp process using debug diag during the time of issue (before performing an iisreset or application pool recycle). Analyzing the dump will tell you exactly whats going wrong.
Feel free to follow up with any questions you have.

Sandbox error after multiple flex clients from one http session

Dealing with a Flex client that needs to connect through BlazeDS to a Java backend. Currently, we have a requirement for fifty-plus clients to be attached at any given time. We need to test the load of this requirement against the server to see if we are going to have any performance issues.
So, I have written a client emulator that will act like real clients and connect to the server though Blazeds. The emulator was to make this test require less hardware for I would connect 50 client from one machine (or two 25, basically minimize hardware needs). Instead of having fifty different machines to run clients on. The issue I am running into is a limitation of emulated clients allowed from one session. There seems to be a five client limit. This goes for both IExplorer and FireFox browsers. The problem is with the JMS subscriptions. The JMS topic seem to get connected but never subscribed.
I played around with some settings on the BlazeDS server side to no avail.
- max-streaming-connections-per-session
- max-streaming-clients
After the sixth connection I start getting a
16:18:21.578 [ERROR] com.ray.sv.flex.util.SocketLogTarget SocketLogTarget failed with SecurityError: [SecurityErrorEvent type="securityError" bubbles=false cancelable=false eventPhase=2 text="Error #2048: Security sandbox violation: http://xxx.xxx.xxx.xxx:8080/ClientEmulator/clientemulator.swf cannot load data from 127.0.0.1:1337."]
Very strange, but what is stranger is that client needs to recover from server disconnects. Which was a totally different issue all together with the remote object calls and subscribing timing issues, kept getting Duplicate Session Id errors. But now I have that ironed out, but I see the same issue when the client fails a connection five times.
The sixth time always fails with the same error. And this is with only one client connected.
Has anyone seen this same issue?
Thanks for your time.

how to close and reopen TCP connection in QuickFix

Using QuickFixN, if I restart my trading application I occasionally am unable to logon, getting a "an existing connection was forcibly closed by the remote host" error.
The QuickFix engine retries to connect every 30secs, but always gets the same error.
If I close my application and re-open, it will connect correctly.
Speaking to my broker, it seems that they are rejecting my logins because they did not recognize my connection as being closed first time. 2nd time around, me forcing the application to close will tear-down the TCP connection, meaning that 3rd time logins work.
So my question is: is there a way to close and re-open the TCP connection without restarting the application?
Sounds like the problem is kinda on their end. Since the problem happens when you don't formally log out (e.g crash or abnormal termination), that means that their implementation apparently doesn't recognize the TCP termination.
At a higher-than-TCP layer, their FIX engine should somewhat compensate. If a few heartbeat durations occur after your disconnect, their implementation should realize you're not there anymore, since you're not responding to heartbeats.
So, neither their low-layer TCP handlers nor their FIX engine are able to set the right flag somewhere in their system that says you've gone offline. That's weird. I don't see what you can do about that, aside from intentionally doing a startup/shutdown to kludge their state flag for you.
I'm usually really hesitant to blame the other side (especially because I run the QF/n project), but that's where I'm at with the information provided.

TcpListener stops accepting or accepts broken connections

We currently experience a problem with a self-written server application running on Windows (occurs on different versions). The server listens at a TCP port, accepts connections, exchanges some data and then closes the connections again. There are about 100 clients that connect from time to time.
Sometimes the server stops to work: Log files show that connections are still accepted, but that at the first read attempt a socket error (10054 - Connection reset by peer) occurs. I don't think it is a client issue because it suddenly stops working for all clients.
Now we found out, that the same problem occurs with our old server software, that is even written in another programming language. So it doesn't seem to be an error in our program - I think it has to be some kind of OS / firewall issue? Of course, firewalls have been deactivated, which didn't solve the issue yet.
Any ideas where to look into? Wireshark logs will follow soon..
Excerpt from the log (Timestamp, Thread Id, message)
11:37:56.137 T#3960 Connection from 10.21.13.3
11:37:56.138 T#3960 Client Exception: Socket Error # 10054
Connection reset by peer.
11:37:56.138 T#3960 ClientDisconnected
11:38:00.294 T#4144 Connection from 10.21.13.3
You can see that the exception occurs almost at the same time as the connection is accepted, in this case the client reconnects after a few seconds.
A "stateful" firewall or NAT keeps track of connections, and ought to send RSTs for connectiosn it doesn't know about. If the firewall loses track of connections for some reason, then you'll probably see random connections being reset.
Our router at work does this — it forgets about connections when the PPP connection dies, which is remarkably unhelpful when it rains and the DSL restart takes a bit too long. However, instead of resetting connections, it just drops packets (even more unhelpful!).
Sounds like a firewall or routing issue - maybe stale connections get disconnected after a timeout period. Are you using a ping/keepalive inside your protocol.
Otherwise you may ask Wireshark to see what is going on.
First, thanks for many hints - I'm afraid the problem was a completely different one which you couldn't possibly solve by reading my question.
The server application uses log4net, configured with a log file an ImmediateFlush = true. If every log statement is directly written into the file and multiple socket connections occur this slows down the whole application.
The server needed about a minute to really accept the connection. This was far more than the timeout on clientside. So in the log there was only shown "accepted" followed by "disconnected" - even the log was delayed!
Sorry for the inconvenience...
Have you tried changing the backlog and then see how much time or how many clients are served before this problem occurs
You don't say what Windows versions you're using for the server, but you should be aware that the Windows TCP/IP stack behaves differently in server and client OSes. There are limits on how many simultaneous incoming connections a client OS will allow, and they are significantly lower than you might expect.
What do the logs look like from the client side?
Since the error is stating that the client is dropping the connection; if you see the same error on the client side then it is a firewall or proxy that is dropping the connection (both side seeing the opposite side dropping the connection is indicative of a proxy/firewall).
If the error is not present on the client side; then I would say that your client side is where you will see the actual error.

Resources