Raft: followers always reply upon an AppendEntriesRequest? TLA specification is not clear? - raft

I am implementing the Raft protocol in Rust, reading the specification given by the original author.
The description of HandleAppendEntriesRequest says that followers, upon an AppendEntries message sometimes reply with an AppendEntriesResponse with success=false/true, but they are cases where there is no response at all, e.g., when a new entry is appended to the log (line 382) or when there is a conflict (line 374). This will require a second message (like a heartbeat) from the Leader to achieve a reply from the follower confirming that the entry was inserted successfully. I understand from the Raft Visualization that followers always reply if they are not stopped and don't wait for second messages.
Am I understanding incorrectly the TLA+ specification?

Related

How to keep order when consuming async messages (such as SQS or any other messaging service)

I've encountered this problem a few times and now I wonder what the industry best practice is, the context is, we have a data store which aggregates pieces of information taken from multiple micro-services, the way the data comes to us is through messages broadcasted by every source when there is a change
The problem is how to guarantee that our data will be eventually consistent and that the updates were applied in the order they were meant to be received. For example, Let's say we have an entity User
User {
display_name : String,
email: String,
bio: String
}
And we are listening changes on those users to keep "display_name" updated in our data store, the messages come in a format such as
{
event: "UserCreated",
id: 1000,
display_name: "MyNewUser"
}
{
event: "UserChanged",
id: 1000,
display_name: "MyNewUser2"
}
There is a scenario where "UserChanged" reaches our listeners before "UserCreated" therefore our code won't be able to find user with id 1000 and fail both transactions. This is where a mechanism to sort those two is desired, we have considered:
Timestamps: The problem with timestamps is that although we know the last time we read an update we don't know how many events happened between the last event seen and the one we are currently processing
Sequence numbers: This is slightly better but if a sequence is lost then we won't update our storage unless we relax the rules a little bit, we could say that after some time if a sequence hasn't been seen then proceed with the rest of operations
If anyone knows common design patterns that tackle this sort of issue would be great to know, also open to suggestions on perhaps data modeling, etc. Bottomline, I'm pretty sure this is a common software problem that has been solved many times before
Thanks a lot for the help!
My first thought here would be to jump directly to a sequence numbers-based approach, but this works when you got 1 to 1 communication, like in TCP orientated communications. In your case, there is many to one, so without a coordination between the senders, it would be challenging to implement this approach correctly (ex. 2 senders can use the same sequence number).
Yes, losing the messages would be problematic, but I don't think that's the case of SQS or other cloud-based message queues (of course, it depends on the scale you're working on), because they're known for data duplication instead of data loss (AFAIK).
One idea I can think of right now is to add a new layer between the senders and the consumer, which will orchestrate the events. It can be the consumer itself, but it can be another service in front of it, let's call it orchestrator.
The orchestrator is connected with each senders (individually) via 2 queues:
The first queue is used to get the actual events from the sender
The second queue is used to signal back to the sender an ACK-like event (the message has been received, validated and successfully passed downstream to the consumer (or consumed directly)).
The way it works is the following:
The orchestrator gets the event from the sender A
It tries to execute a validation-like operation specific to the message (update on an inexisting user), the operation fails, so it sends a N-ACK message back to sender A, signaling that its message was not able to be processed successfully. Sender A will try to resend the message after some time.
In the mean time, it gets the "create user" message from sender B, the message get passed downstream to the consumer
Finally, it will get the message from sender A (after some retries).
This solution ensures message ordering in a pretty basic way, without keeping the events in memory, but rather in the queues. It may work, but it depends on a lot of factors, like number of events, number of senders, etc.

How to protect a private message from Replay attack?

What if I send a message with a private key
My message is "Today is party at 7"
Devil copied my encrypted text with signature
After some days the devil sent the message to the same guy I sent.
The message is not changed,my friend still got the same message of party at 7 and it is digitally signed by my private key.
What should I do to prevent this type of scenario?
Replay attacks are most commonly prevented by adding an extra piece of unique information that is not part of the message. Here is a common solution:
Add to the message a timestamp and a random value.
If the message is older than some age (a minute, an hour, a day, depending on how messages are delivered), it generates a "message too old" error
If the message is within that time frame, make sure you have never seen that random value before. If you have, then it is a repeat and can be simply ignored.
By adding a timestamp, you bound how long you have to keep track of "what you've seen before."
Another general approach is to make all messages idempotent. This means that applying the same message multiple times is not problematic. Systems like git have this quality. Building idempotent systems is somewhat tricky, and not easily achieved for all problems, but is a powerful solution when possible. An example of making something idempotent is to say "at this point in time, X had value Y." You can apply that message repeatedly without causing any problems (either because it updates the same record in exactly the same way, or because you ignore all points in time older than the latest value you have).
Addressing the replay attack problem happens to also solve several other problems, which is nice. Messaging systems face a fundamental problem that no message can be guaranteed to be delivered exactly once. You can guarantee at most once, or at least once, but never exactly once. (Study the Two Generals' Problem for a common way of thinking through this. It is arguable that you can't actually promise "at least once" because the systems may never be connected, but we typically ignore that corner case). Idempotent systems are very nice because they are highly tolerant of "at least once" solutions.

Handling Race Conditions / Concurrency in Network Protocol Design

I am looking for possible techniques to gracefully handle race conditions in network protocol design. I find that in some cases, it is particularly hard to synchronize two nodes to enter a specific protocol state. Here is an example protocol with such a problem.
Let's say A and B are in an ESTABLISHED state and exchange data. All messages sent by A or B use a monotonically increasing sequence number, such that A can know the order of the messages sent by B, and A can know the order of the messages sent by B. At any time in this state, either A or B can send a ACTION_1 message to the other, in order to enter a different state where a strictly sequential exchange of message needs to happen:
send ACTION_1
recv ACTION_2
send ACTION_3
However, it is possible that both A and B send the ACTION_1 message at the same time, causing both of them to receive an ACTION_1 message, while they would expect to receive an ACTION_2 message as a result of sending ACTION_1.
Here are a few possible ways this could be handled:
1) change state after sending ACTION_1 to ACTION_1_SENT. If we receive ACTION_1 in this state, we detect the race condition, and proceed to arbitrate who gets to start the sequence. However, I have no idea how to fairly arbitrate this. Since both ends are likely going to detect the race condition at about the same time, any action that follows will be prone to other similar race conditions, such as sending ACTION_1 again.
2) Duplicate the entire sequence of messages. If we receive ACTION_1 in the ACTION_1_SENT state, we include the data of the other ACTION_1 message in the ACTION_2 message, etc. This can only work if there is no need to decide who is the "owner" of the action, since both ends will end up doing the same action to each other.
3) Use absolute time stamps, but then, accurate time synchronization is not an easy thing at all.
4) Use lamport clocks, but from what I understood these are only useful for events that are causally related. Since in this case the ACTION_1 messages are not causally related, I don't see how it could help solve the problem of figuring out which one happened first to discard the second one.
5) Use some predefined way of discarding one of the two messages on receipt by both ends. However, I cannot find a way to do this that is unflawed. A naive idea would be to include a random number on both sides, and select the message with the highest number as the "winner", discarding the one with the lowest number. However, we have a tie if both numbers are equal, and then we need another way to recover from this. A possible improvement would be to deal with arbitration once at connection time and repeat similar sequence until one of the two "wins", marking it as favourite. Every time a tie happens, the favourite wins.
Does anybody have further ideas on how to handle this?
EDIT:
Here is the current solution I came up with. Since I couldn't find 100% safe way to prevent ties, I decided to have my protocol elect a "favorite" during the connection sequence. Electing this favorite requires breaking possible ties, but in this case the protocol will allow for trying multiple times to elect the favorite until a consensus is reached. After the favorite is elected, all further ties are resolved by favoring the elected favorite. This isolates the problem of possible ties to a single part of the protocol.
As for fairness in the election process, I wrote something rather simple based on two values sent in each of the client/server packets. In this case, this number is a sequence number starting at a random value, but they could be anything as long as those numbers are fairly random to be fair.
When the client and server have to resolve a conflict, they both call this function with the send (their value) and the recv (the other value) values. The favorite calls this function with the favorite parameter set to TRUE. This function is guaranteed to give the opposite result on both ends, such that it is possible to break the tie without retransmitting a new message.
BOOL ResolveConflict(BOOL favorite, UINT32 sendVal, UINT32 recvVal)
{
BOOL winner;
int sendDiff;
int recvDiff;
UINT32 xorVal;
xorVal = sendVal ^ recvVal;
sendDiff = (xorVal < sendVal) ? sendVal - xorVal : xorVal - sendVal;
recvDiff = (xorVal < recvVal) ? recvVal - xorVal : xorVal - recvVal;
if (sendDiff != recvDiff)
winner = (sendDiff < recvDiff) ? TRUE : FALSE; /* closest value to xorVal wins */
else
winner = favorite; /* break tie, make favorite win */
return winner;
}
Let's say that both ends enter the ACTION_1_SENT state after sending the ACTION_1 message. Both will receive the ACTION_1 message in the ACTION_1_SENT state, but only one will win. The loser accepts the ACTION_1 message and enters the ACTION_1_RCVD state, while the winner discards the incoming ACTION_1 message. The rest of the sequence continues as if the loser had never sent ACTION_1 in a race condition with the winner.
Let me know what you think, and how this could be further improved.
To me, this whole idea that this ACTION_1 - ACTION_2 - ACTION_3 handshake must occur in sequence with no other message intervening is very onerous, and not at all in line with the reality of networks (or distributed systems in general). The complexity of some of your proposed solutions give reason to step back and rethink.
There are all kinds of complicating factors when dealing with systems distributed over a network: packets which don't arrive, arrive late, arrive out of order, arrive duplicated, clocks which are out of sync, clocks which go backwards sometimes, nodes which crash/reboot, etc. etc. You would like your protocol to be robust under any of these adverse conditions, and you would like to know with certainty that it is robust. That means making it simple enough that you can think through all the possible cases that may occur.
It also means abandoning the idea that there will always be "one true state" shared by all nodes, and the idea that you can make things happen in a very controlled, precise, "clockwork" sequence. You want to design for the case where the nodes do not agree on their shared state, and make the system self-healing under that condition. You also must assume that any possible message may occur in any order at all.
In this case, the problem is claiming "ownership" of a shared clipboard. Here's a basic question you need to think through first:
If all the nodes involved cannot communicate at some point in time, should a node which is trying to claim ownership just go ahead and behave as if it is the owner? (This means the system doesn't freeze when the network is down, but it means you will have multiple "owners" at times, and there will be divergent changes to the clipboard which have to be merged or otherwise "fixed up" later.)
Or, should no node ever assume it is the owner unless it receives confirmation from all other nodes? (This means the system will freeze sometimes, or just respond very slowly, but you will never have weird situations with divergent changes.)
If your answer is #1: don't focus so much on the protocol for claiming ownership. Come up with something simple which reduces the chances that two nodes will both become "owner" at the same time, but be very explicit that there can be more than one owner. Put more effort into the procedure for resolving divergence when it does happen. Think that part through extra carefully and make sure that the multiple owners will always converge. There should be no case where they can get stuck in an infinite loop trying to converge but failing.
If your answer is #2: here be dragons! You are trying to do something which buts up against some fundamental limitations.
Be very explicit that there is a state where a node is "seeking ownership", but has not obtained it yet.
When a node is seeking ownership, I would say that it should send a request to all other nodes, at intervals (in case another one misses the first request). Put a unique identifier on each such request, which is repeated in the reply (so delayed replies are not misinterpreted as applying to a request sent later).
To become owner, a node should receive a positive reply from all other nodes within a certain period of time. During that wait period, it should refuse to grant ownership to any other node. On the other hand, if a node has agreed to grant ownership to another node, it should not request ownership for another period of time (which must be somewhat longer).
If a node thinks it is owner, it should notify the others, and repeat the notification periodically.
You need to deal with the situation where two nodes both try to seek ownership at the same time, and both NAK (refuse ownership to) each other. You have to avoid a situation where they keep timing out, retrying, and then NAKing each other again (meaning that nobody would ever get ownership).
You could use exponential backoff, or you could make a simple tie-breaking rule (it doesn't have to be fair, since this should be a rare occurrence). Give each node a priority (you will have to figure out how to derive the priorities), and say that if a node which is seeking ownership receives a request for ownership from a higher-priority node, it will immediately stop seeking ownership and grant it to the high-priority node instead.
This will not result in more than one node becoming owner, because if the high-priority node had previously ACKed the request sent by the low-priority node, it would not send a request of its own until enough time had passed that it was sure its previous ACK was no longer valid.
You also have to consider what happens if a node becomes owner, and then "goes dark" -- stops responding. At what point are other nodes allowed to assume that ownership is "up for grabs" again? This is a very sticky issue, and I suspect you will not find any solution which eliminates the possibility of having multiple owners at the same time.
Probably, all the nodes will need to "ping" each other from time to time. (Not referring to an ICMP echo, but something built in to your own protocol.) If the clipboard owner can't reach the others for some period of time, it must assume that it is no longer owner. And if the others can't reach the owner for a longer period of time, they can assume that ownership is available and can be requested.
Here is a simplified answer for the protocol of interest here.
In this case, there is only a client and a server, communicating over TCP. The goal of the protocol is to two system clipboards. The regular state when outside of a particular sequence is simply "CLIPBOARD_ESTABLISHED".
Whenever one of the two systems pastes something onto its clipboard, it sends a ClipboardFormatListReq message, and transitions to the CLIPBOARD_FORMAT_LIST_REQ_SENT state. This message contains a sequence number that is incremented when sending the ClipboardFormatListReq message. Under normal circumstances, no race condition occurs and a ClipboardFormatListRsp message is sent back to acknowledge the new sequence number and owner. The list contained in the request is used to expose clipboard data formats offered by the owner, and any of these formats can be requested by an application on the remote system.
When an application requests one of the data formats from the clipboard owner, a ClipboardFormatDataReq message is sent with the sequence number, and format id from the list, the state is changed to CLIPBOARD_FORMAT_DATA_REQ_SENT. Under normal circumstances, there is no change of clipboard ownership during that time, and the data is returned in the ClipboardFormatDataRsp message. A timer should be used to timeout if no response is sent fast enough from the other system, and abort the sequence if it takes too long.
Now, for the special cases:
If we receive ClipboardFormatListReq in the CLIPBOARD_FORMAT_LIST_REQ_SENT state, it means both systems are trying to gain ownership at the same time. Only one owner should be selected, in this case, we can keep it simple an elect the client as the default winner. With the client as the default owner, the server should respond to the client with ClipboardFormatListRsp consider the client as the new owner.
If we receive ClipboardFormatDataReq in the CLIPBOARD_FORMAT_LIST_REQ_SENT state, it means we have just received a request for data from the previous list of data formats, since we have just sent a request to become the new owner with a new list of data formats. We can respond with a failure right away, and sequence numbers will not match.
Etc, etc. The main issue I was trying to solve here is fast recovery from such states, with going into a loop of retrying until it works. The main issue with immediate retrial is that it is going to happen with timing likely to cause new race conditions. We can solve the issue by expecting such inconsistent states as long as we can move back to proper protocol states when detecting them. The other part of the problem is with electing a "winner" that will have its request accepted without resending new messages. A default winner can be elected by default, such as the client or the server, or some sort of random voting system can be implemented with a default favorite to break ties.

Are HTTP Status code descriptions used?

Is the description component of a HTTP status code used? For example, in the HTTP response '200 OK', is the OK (i.e. the description) ever used? Or is it just for humans to read?
No, the "reason phrase" is purely there for humans to read. Nothing should be using it programmatically - especially because in HTTP/2, it's been eliminated:
HTTP/2 does not define a way to carry the version or reason phrase that is included in an HTTP/1.1 status line.
The status code is often used. For example, if you are using AJAX, you will most likely check the HTTP status code before using the data returned.
However the description is just for humans as computers recognize the status code and what it entails without the description.
The description is a standard message associated with the code itself to get a quick description of the code. It is used for a diagnostic, along with its full description wich can be found on the https status code list. So, it is not "used" as i think you're implying, but indeed only used for diagnostic by humans
The data sections of messages Error, Forward and redirection responses may be used to contain human-readable diagnostic information.

How to handle passing different types of serialized messages on a network

I'm currently sitting with the problem of passing messages that might contain different data over a network. I have created a prototype of my game, and now I'm busy implementing networking for my game.
I want to send different types of messages, as I think it would be silly to constantly send all the information every network-tick and I would rather send different messages that contain different data. What would be the best way to distinguish what message is received on the receiving side?
Currently I have a system where I prepend a string which distinguishes a certain type of message. My message is then sent through my own message parser class where it determines the type, and deserializes it to the correct type.
What I would like to know is if there is a better way of doing this? It seems like it should be a fairly common problem and so there must be a more trivial solution, unless I'm already doing it the trivial way.
Thanks!
I have read again carefully your question, and now I do not understand what is your problem, you say Currently I have a system where I prepend a string which distinguishes a certain type of message. My message is then sent through my own message parser class where it determines the type, and deserializes it to the correct type.
Looks OK, you may reduce the size of your message with my answer below horizontal line but the principle stays identical.
This the right way for asynchronous communication, but if you do synchrone you know that when you send A message you will receive B answer, so you do not have to prepend with a string which distinguishes the message, but you have to take care not sending another message before having the answer from the previous ...
So if you know how is formatted the answer you do not need any identification bytes, for example you know that the first four bytes is an integer, then a float on eight bytes, etc ...
Use boost::serialization, typically you save your structures, even with pointers, within a dumb bytes buffer, send that buffer over your network, and the other side de-serialize.
This example shows how Boost.Serialization can be used with asio to encode and decode structures for transmission over a socket.
Even if it is using boost::asio you could extract only the serialization part easily.

Resources