winsock2: send() fails if socket is dead [not really]

winsock2: send() fails if socket is dead [not really] - tcp

Calling send() on a TCP socket which has already been dropped by the client causes what appears to be a memory access violation, as when I run a server application I made and then bombard it with requests from a browser, it crashes after serving between about 7 and 11 requests. Specifically, it accepts the connections and then sits for up to 10 seconds or so, then Windows throws up the "This program has stopped working..." message. No such crash happens if I remove the send() calls, leading me to believe that Microsoft's send() does not safely handle a socket being closed from the other end.
I am aware there are various ways to check whether the socket has in fact been closed, but I don't want to check then send, because there's still a chance a client could cut out between checking and sending.
Edit: I noticed close() socket directly after send(): unsafe? in the "Similar Questions" box, and although it doesn't quite fit my situation, I am now wondering if calling close() quickly after send() could be the contributing to the problem.
If this is the case, a solution involving checking then closing would work as it does not have the implication stated above. However, I am unaware of how to check whether closesocket() would be safe.
Edit: I would also be fine with a way to detect that send() has in fact broken and prevent the entire application from crashing.
Edit: I thought I'd finally update this question, considering I figured out the issue a while ago and there may be curious people stumbling across this. As it turns out, the issue had nothing to do with the send function or anything else related to sockets. In fact, the problem was something incredibly stupid I was doing: calling free on invalid pointers (NULL and junk-data addresses alike). A couple of years ago I had finally updated my compiler from a very outdated version I was originally using, and I suppose the very outdated standard library implementation was what allowed me to get away with such a cringe-worthy practice, and it seems that what I saw as an issue with send was a side-effect of that.

I have been programming in WinSock for over a decade, and have never seen or heard of send() throwing an exception on failure (in fact, no Win32 API function throws an exception on failure). If the socket is closed, an appropriate error code is reported. So something else is going on in your code. Maybe the pointer you pass to the buf parameter is not pointing at a valid memory block, or maybe if the value you pass to the len parameter is beyond the bounds of buf.

Like #RemyLebeau, I have been programming in Winsock for over a decade, in my case well over two decades, and I have never seen this either.
Microsoft's send() handles sending to a connection that has already been closed by the other end, by returning SOCKET_ERROR (-1) with WSAGetLastError() returning WSAECONNRESET. Unless the connection was lost abnormally (network failure, etc), in which case WinSock does not know the connection is gone, and send() happily keeps buffering outbound data until the socket's buffer fills up, or the socket times out internally so failures are then reported.
The send/close question you refer to contains nothing about memory access errors, and in any case calling close() after send() can't possibly cause the prior send() to misbehave, unless you have managed to get time running backwards.
You have a bug in your code somewhere.

Related

What exactly happens when you cancel a network request? [duplicate]

This question already has answers here:
FIN vs RST in TCP connections
(3 answers)
Closed 2 years ago.
I am using iOS but I am asking for networking in general. What does it mean to cancel a network request? Is there a message sent to the server or does the server acknowledge the socket being disconnected?

As you mention using NSURLSessionTask as your way to request, cancel() means a urlSession(_:task:didCompleteWithError:) will be send to the tasks delegate. But passing in a global error code NSURLErrorCancelled (-999) to the defined NSURLErrorDomain.
It is possible that cancelation is later called on the task as a complete processing of the request message is done. So it's up to you to act accordingly once your ErrorDomain is getting the error code NSURLErrorCancelled marking your intention to cancel, and therefore would want to throw away any data that is received since last request.
The Server gets possibly a complete request but your client is not receiving answers anymore. Or the request sequence is not complete so the Server recognises not correct what was intended but would work thru the request until it fails cause of incomplete or wrong formatted request data.
When your receiver callback is down do to canceling you just don't parse any answer of the Server and if you could still parse the Server data that would mean your task is still running. Any result after cancel() should be treated as possibly incomplete or misleading/wrong/invalid. This is why you set a NSURLErrorCancelled error to a NSURLErrorDomain, you want to know what the status is before you assume any received data is of value for you.
By the way NSURLErrorCancelled is also thrown when NSURLSessionAuthChallengeCancelAuthenticationChallenge is marking a server with no trust. So it's actually the same procedure, you decide if any received data is something you want to trust to.
If a socket is disconnected, there is no connection at all, no data passing thru, nothing to receive. nothing to request from. Any Error on both sides can't be exchanged. Server and Client are disconnected then.
Canceling a request does not imply a socket is stopped from working.
It just means the data since the last request is to be handled as invalid.
Why is this?
Because you can construct your own sockets, ignoring ErrorDomain stuff with a complete different request pattern.
Also means in case of client error/crash/canceling nothing is send, you just do not accept any answer as valid even if it was delivered thru the sockets.
For this reasons there are Protocols that define how a message should look like and what should happen in case it was incomplete or would need any kind of validation in a given pattern that validates any data that was send. TCP, UDP, JS-Websocket with handshake and ongoing "dataflow", even OSC etc. and lots of other protocols.

Will the behavior "Committing entries from previous terms" of raft cause unexpected result?

In raft's paper, there is a situation described by the figure.
the entry2 may be commited after server1 restart.
my question is:
If entry2 is requested by mistake, the request of client failed because of the failed of server1. Thus, the client may think that the mistaken behavior is not applied by state machine which in fact do after the restart of server1 like figure(e).

With Raft, and any other transactional system based on unreliable communication, there is always the possibility that a client's request may return an "undefined" result if the network fails at just the wrong time.
This problem is inherent; see Two Generals' Problem.
Here "undefined" means that the client does not know whether or not the transaction was actually committed. The only way to tell is to open a new transaction and look and see.
In software this is often reported as a "retryable" exception.
A practical way to deal with this is to (a) always retry transactions when getting a retryable exception, and (b) ensure client transactions are always idempotent.

TcpClient/NetworkStream not detecting disconnection

I created a simple persistent socket connection for our game using TcpClient and NetworkStream. There's no problem connecting, sending messages, and disconnecting normally (quitting app/server shuts down/etc).
However, I'm having some problems where, in certain cases, the client isn't detecting a disconnection from the server. The easiest way I have of testing this is to pull out the network cable on the wifi box, or set the phone to airplane mode, but it's happened in the middle of a game on what should otherwise be a stable wifi.
Going through the docs for NetworkStream etc, it says that the only way to detect a disconnection is to try to write to the socket. Fair enough, except, when I try, the write passes as if nothing is wrong. I can write multiple messages like this, and everything seems fine. It's only when I plug the cable back in that it sees that it's disconnected (all messages are buffered?).
The TcpClient is set to NoDelay, and there's a Flush() called after every write anyway.
What I've tried:
Writing a message to the NetworkStream - no joy
CanWrite, Connected, etc all return true
TcpClient.Client.Poll( 1000, SelectMode.SelectWrite ); - returns true
TcpClient.Client.Poll( 1000, SelectMode.SelectRead ) && TcpClient.Client.Available == 0 - returns true
TcpClient.Client.Receive(buffer, SocketFlags.Peek) == 0 - when connected, blocks for about 10-20s, then returns true. When no server, blocks forever(?)
NetworkStream.Write() - doesn't throw an error
NetworkStream.BeginWrite() - doesn't throw an error (not even when calling EndWrite())
Setting a WriteTimeout - had no effect
Having a specific time where we haven't received a message from the server (normally there's a keep-alive) - I had this, but removed it, as we were getting a lot of false-positives due to lag etc (some clients would see between 10-20s of lag)
So am I doing something wrong here? Is there any way to get the NetworkStream to throw an error (like it should) when writing to a socket that should be disconnected?
I've no problem with a keep-alive (the default case is the server will notify the client that it hasn't received anything in a while, and the client will send a heartbeat), but at the minute, according to the NetworkStream everything's hunky-dory.
It's for a game, so ideally the detection should be quick enough (as the user can still move through the game until they need to make a server call, some of which will block the UI, so the game seems broken).
I'm using Unity, so it's .Net 2.0

is to pull out the network cable on the wifi box
That's a good test. If you do that the remote party is not notified. How could it possibly find out? It can't.
when I try, the write passes as if nothing is wrong
Writes can (and are) buffered. They eventually enter a block hole... No reply comes back. The only way to detect this is a timeout.
So am I doing something wrong here?
You have tried a lot of things but fundamentally you cannot find out about disconnects if no reply comes back telling you that. Use a timeout.

SignalR duplicating responses

I'm using SignalR with Redis as a message bus on a server that sits behind an Nginx proxy for load balancing. I used SignalR's PersistentConnection class to write a simple chat program that broadcasts messages to users belonging to the same certain group. Users are added to a group in OnConnectedAsync, removed in OnDisconnectAsync, and the user-to-group mapping is deterministic.
Currently, the client side falls back to long polling for whatever reason (I'm not entirely sure why), and whenever the client sets up a new connection after waiting for and receiving a response, seemingly at random, the server will sometimes respond to the new connection immediately with the previous response, despite there having only been one POST.
The message ID's tend to differ by exactly one, (the smaller ID coming first), with the rest of the response remaining the same. I logged some debug info and am quite positive that my override of OnReceivedAsync is sending one response per one request. I tried the same implementation without the Redis message bus, and got the same problem. Running locally (with long polling) however yielded good results so I suspect that the problem might be with the way the message bus might be buffering messages to refresh clients who might not be caught up, and some weird timing with the cutting/setting up of connections with the Nginx load balancer, but beyond that, I am very much at a loss.
Any help would be appreciated.
EDIT: Further investigation reveals that duplication occurs at somewhat regular intervals of approximately 20-30 seconds. I'm led to believe that the message expiration in the message bus might have something to do with the bug.
EDIT: Bug can be seen here: http://tinyurl.com/9q5t3va
The server is simply broadcasting a counter being sent by the client. You will notice some responses are duplicated every 20 or so.

Reducing the number of worker processes in the IIS (6.0) Server Manager from 2 to 1 solved the problem.

What can cause a spontaneous EPIPE error without either end calling close() or crashing?

I have an application that consists of two processes (let's call them A and B), connected to each other through Unix domain sockets. Most of the time it works fine, but some users report the following behavior:
A sends a request to B. This works. A now starts reading the reply from B.
B sends a reply to A. The corresponding write() call returns an EPIPE error, and as a result B close() the socket. However, A did not close() the socket, nor did it crash.
A's read() call returns 0, indicating end-of-file. A thinks that B prematurely closed the connection.
Users have also reported variations of this behavior, e.g.:
A sends a request to B. This works partially, but before the entire request is sent A's write() call returns EPIPE, and as a result A close() the socket. However B did not close() the socket, nor did it crash.
B reads a partial request and then suddenly gets an EOF.
The problem is I cannot reproduce this behavior locally at all. I've tried OS X and Linux. The users are on a variety of systems, mostly OS X and Linux.
Things that I've already tried and considered:
Double close() bugs (close() is called twice on the same file descriptor): probably not as that would result in EBADF errors, but I haven't seen them.
Increasing the maximum file descriptor limit. One user reported that this worked for him, the rest reported that it did not.
What else can possibly cause behavior like this? I know for certain that neither A nor B close() the socket prematurely, and I know for certain that neither of them have crashed because both A and B were able to report the error. It is as if the kernel suddenly decided to pull the plug from the socket for some reason.

Perhaps you could try strace as described in: http://modperlbook.org/html/6-9-1-Detecting-Aborted-Connections.html
I assume that your problem is related to the one described here: http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable
Unfortunately I'm having a similar problem myself but couldn't manage to get it fixed with the given advices. However, perhaps that SO_LINGER thing works for you.

shutdown()
may have been called on one of the
socket endpoints.
If either side may fork and execute a
child process, ensure that the
FD_CLOEXEC
(close-on-exec) flag is set on the
socket file descriptor if you did not
intend for it to be inherited by the
child. Otherwise the child process
could (accidentally or otherwise) be
manipulating your socket connection.

I would also check that there's no sneaky firewall in the middle. It's possible an intermediate forwarding node on the route sends an RST. The best way to track that down is of course the packet sniffer (or its GUI cousin.)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex