Will the behavior "Committing entries from previous terms" of raft cause unexpected result? - raft

In raft's paper, there is a situation described by the figure.
the entry2 may be commited after server1 restart.
my question is:
If entry2 is requested by mistake, the request of client failed because of the failed of server1. Thus, the client may think that the mistaken behavior is not applied by state machine which in fact do after the restart of server1 like figure(e).

With Raft, and any other transactional system based on unreliable communication, there is always the possibility that a client's request may return an "undefined" result if the network fails at just the wrong time.
This problem is inherent; see Two Generals' Problem.
Here "undefined" means that the client does not know whether or not the transaction was actually committed. The only way to tell is to open a new transaction and look and see.
In software this is often reported as a "retryable" exception.
A practical way to deal with this is to (a) always retry transactions when getting a retryable exception, and (b) ensure client transactions are always idempotent.

Related

handle server shutdown while serving http request

Scenario : The server is in middle of processing a http request and the server shuts down. There are multiple points till where the code has executed. How are such cases typically handled ?. A typical example could be that some downstream http calls had to be made as a part of the incoming http request. How to find whether such calls were made or not made when the shutdown occurred. I assume that its not possible to persist every action in the code flow. Suggestions and views are welcome.
There are two kinds of shutdowns to consider here.
There are graceful shutdowns: when the execution environment politely asks your process to stop (e.g. systemd sends a SIGTERM) and expects it to exit on its own. If your process doesn’t exit within a few seconds, the environment proceeds to kill the process in a more forceful way.
A typical way to handle a graceful shutdown is:
listen for the signal from the environment
when you receive the signal, stop accepting new requests...
...and then wait for all current requests to finish
Exactly how you do this depends on your platform/framework. For instance, Go’s standard net/http library provides a Server.Shutdown method.
In a typical system, most shutdowns will be graceful. For example, when you need to restart your process to deploy a new version of code, you do a graceful shutdown.
There can also be unexpected shutdowns: e.g. when you suddenly lose power or network connectivity (a disconnected server is usually as good as a dead one). Such faults are harder to deal with. There’s an entire body of research dedicated to making distributed systems robust to arbitrary faults. In the simple case, when your server only writes to a single database, you can open a transaction at the beginning of a request and commit it before returning the response. This will guarantee that either all the changes are saved to the database or none of them are. But if you call multiple downstream services as part of one upstream HTTP request, you need to coordinate them, for example, with a saga.
For some applications, it may be OK to ignore unexpected shutdowns and simply deal with any inconsistencies manually if/when they arise. This depends on your application.

Timeout vs no response from server, how can I separate these?

This question is regarding a bot of mine which's primary focus is scraping.
The path is mapped out correctly and it does what it needs to do.
Rate limits are tested and I am certain this is not a factor, if it was and where it was we received actual responses.
However, the webpage(s) I am trying to scrape seem to have build in a kind of weird/ unfamiliar security manner, something that I haven't came across before. And here I am wondering, how it's executed and how I deal with it appropriately.
While the scraper/bot is doing it's thing, sending requests getting responses, at random times it will encounter this what I suspect is a security measure. There are simply no responses back from the server, not a 4xx error or any at all.
At first sight the proxies just appear dead, but that's not it, because they are not. The proxies work just fine, and manually I can just browse the page on them, no issues here.
The server just stops giving responses.
Now to find a workaround for this, I would need to be able to tell the difference between a timeout (for my proxies) and a no response. They appear the same, but are not.
Does anyone have insight on this problem, maybe there is a genius way to separate those that I am not aware of.
Now to find a workaround for this, I would need to be able to tell the difference between a timeout (for my proxies) and a no response. They appear the same, but are not.
A timeout is if the server does not respond within a specific time. No response means, that the server either closes the connection either before the timeout occurs or that it will close the connection after the timeout occurred without sending anything back.
The first case can be easily detected by the connection close before timeout. If you want to detect instead if the server will close the connection without response only after your current timeout then your only option is to extend the timeout. There is nothing in the server which will indicate that the server will close the connection without response at some future time.
And since your only connection is with the proxy there is no real way to detect if the problem is at the proxy or the server. Your only hope might be to set your timeout waiting for the proxy larger then the timeout the proxy has waiting for the server. This way you'll maybe get a response from the proxy indicating that the connection to the server timed out.
They appear the same, but are not.
They are the same. There is no difference. A read timeout means that data didn't arrive within the timeout period. For whatever reason. TCP doesn't know, and can't tell you. At the C level, recv() returned -1 with errno == EAGAIN/EWOULDBLOCK. That's all the information there is.
What you are asking is tantamount to 'data didn't arrive: where didn't it arrive from?' It's not a meaningful question.

winsock2: send() fails if socket is dead [not really]

Calling send() on a TCP socket which has already been dropped by the client causes what appears to be a memory access violation, as when I run a server application I made and then bombard it with requests from a browser, it crashes after serving between about 7 and 11 requests. Specifically, it accepts the connections and then sits for up to 10 seconds or so, then Windows throws up the "This program has stopped working..." message. No such crash happens if I remove the send() calls, leading me to believe that Microsoft's send() does not safely handle a socket being closed from the other end.
I am aware there are various ways to check whether the socket has in fact been closed, but I don't want to check then send, because there's still a chance a client could cut out between checking and sending.
Edit: I noticed close() socket directly after send(): unsafe? in the "Similar Questions" box, and although it doesn't quite fit my situation, I am now wondering if calling close() quickly after send() could be the contributing to the problem.
If this is the case, a solution involving checking then closing would work as it does not have the implication stated above. However, I am unaware of how to check whether closesocket() would be safe.
Edit: I would also be fine with a way to detect that send() has in fact broken and prevent the entire application from crashing.
Edit: I thought I'd finally update this question, considering I figured out the issue a while ago and there may be curious people stumbling across this. As it turns out, the issue had nothing to do with the send function or anything else related to sockets. In fact, the problem was something incredibly stupid I was doing: calling free on invalid pointers (NULL and junk-data addresses alike). A couple of years ago I had finally updated my compiler from a very outdated version I was originally using, and I suppose the very outdated standard library implementation was what allowed me to get away with such a cringe-worthy practice, and it seems that what I saw as an issue with send was a side-effect of that.
I have been programming in WinSock for over a decade, and have never seen or heard of send() throwing an exception on failure (in fact, no Win32 API function throws an exception on failure). If the socket is closed, an appropriate error code is reported. So something else is going on in your code. Maybe the pointer you pass to the buf parameter is not pointing at a valid memory block, or maybe if the value you pass to the len parameter is beyond the bounds of buf.
Like #RemyLebeau, I have been programming in Winsock for over a decade, in my case well over two decades, and I have never seen this either.
Microsoft's send() handles sending to a connection that has already been closed by the other end, by returning SOCKET_ERROR (-1) with WSAGetLastError() returning WSAECONNRESET. Unless the connection was lost abnormally (network failure, etc), in which case WinSock does not know the connection is gone, and send() happily keeps buffering outbound data until the socket's buffer fills up, or the socket times out internally so failures are then reported.
The send/close question you refer to contains nothing about memory access errors, and in any case calling close() after send() can't possibly cause the prior send() to misbehave, unless you have managed to get time running backwards.
You have a bug in your code somewhere.

SignalR duplicating responses

I'm using SignalR with Redis as a message bus on a server that sits behind an Nginx proxy for load balancing. I used SignalR's PersistentConnection class to write a simple chat program that broadcasts messages to users belonging to the same certain group. Users are added to a group in OnConnectedAsync, removed in OnDisconnectAsync, and the user-to-group mapping is deterministic.
Currently, the client side falls back to long polling for whatever reason (I'm not entirely sure why), and whenever the client sets up a new connection after waiting for and receiving a response, seemingly at random, the server will sometimes respond to the new connection immediately with the previous response, despite there having only been one POST.
The message ID's tend to differ by exactly one, (the smaller ID coming first), with the rest of the response remaining the same. I logged some debug info and am quite positive that my override of OnReceivedAsync is sending one response per one request. I tried the same implementation without the Redis message bus, and got the same problem. Running locally (with long polling) however yielded good results so I suspect that the problem might be with the way the message bus might be buffering messages to refresh clients who might not be caught up, and some weird timing with the cutting/setting up of connections with the Nginx load balancer, but beyond that, I am very much at a loss.
Any help would be appreciated.
EDIT: Further investigation reveals that duplication occurs at somewhat regular intervals of approximately 20-30 seconds. I'm led to believe that the message expiration in the message bus might have something to do with the bug.
EDIT: Bug can be seen here: http://tinyurl.com/9q5t3va
The server is simply broadcasting a counter being sent by the client. You will notice some responses are duplicated every 20 or so.
Reducing the number of worker processes in the IIS (6.0) Server Manager from 2 to 1 solved the problem.

What can cause a spontaneous EPIPE error without either end calling close() or crashing?

I have an application that consists of two processes (let's call them A and B), connected to each other through Unix domain sockets. Most of the time it works fine, but some users report the following behavior:
A sends a request to B. This works. A now starts reading the reply from B.
B sends a reply to A. The corresponding write() call returns an EPIPE error, and as a result B close() the socket. However, A did not close() the socket, nor did it crash.
A's read() call returns 0, indicating end-of-file. A thinks that B prematurely closed the connection.
Users have also reported variations of this behavior, e.g.:
A sends a request to B. This works partially, but before the entire request is sent A's write() call returns EPIPE, and as a result A close() the socket. However B did not close() the socket, nor did it crash.
B reads a partial request and then suddenly gets an EOF.
The problem is I cannot reproduce this behavior locally at all. I've tried OS X and Linux. The users are on a variety of systems, mostly OS X and Linux.
Things that I've already tried and considered:
Double close() bugs (close() is called twice on the same file descriptor): probably not as that would result in EBADF errors, but I haven't seen them.
Increasing the maximum file descriptor limit. One user reported that this worked for him, the rest reported that it did not.
What else can possibly cause behavior like this? I know for certain that neither A nor B close() the socket prematurely, and I know for certain that neither of them have crashed because both A and B were able to report the error. It is as if the kernel suddenly decided to pull the plug from the socket for some reason.
Perhaps you could try strace as described in: http://modperlbook.org/html/6-9-1-Detecting-Aborted-Connections.html
I assume that your problem is related to the one described here: http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable
Unfortunately I'm having a similar problem myself but couldn't manage to get it fixed with the given advices. However, perhaps that SO_LINGER thing works for you.
shutdown()
may have been called on one of the
socket endpoints.
If either side may fork and execute a
child process, ensure that the
FD_CLOEXEC
(close-on-exec) flag is set on the
socket file descriptor if you did not
intend for it to be inherited by the
child. Otherwise the child process
could (accidentally or otherwise) be
manipulating your socket connection.
I would also check that there's no sneaky firewall in the middle. It's possible an intermediate forwarding node on the route sends an RST. The best way to track that down is of course the packet sniffer (or its GUI cousin.)

Resources