How do I know which UDP packets I have already received? - networking

I am making a game using a client-server model with UDP. Here's how I have implemented it so far:
All packets include a sequence number and a flag specifying whether they are "important".
Important message types require acknowledgement and will be re-sent after a delay if no acknowledgement is received.
Most message types are "unimportant" - that is, they do not require an acknowledgement, and if such a message is received with an older sequence number than the latest, it is dropped.
My dilemma is this: if an "important" message arrives twice, I only want to process it once. But how will I know that I have already received it, without keeping an ever-expanding list in memory?
Ideas
Just remember the last X "important" messages received - the likelihood of receiving a VERY old message is slim (not ideal as it's not 100% reliable).
Use TCP for "important" messages (not ideal due to the complications and overhead involved in managing 2 protocols simultaneously).
Have a separate sequence number for "important" messages and ensure that these are always received in order, so only the most recent message needs to be remembered (I'm leaning towards this).
Any other ideas?

Use #3. The fact that you are ACK-ing the important messages provides the mechanism to ensure they are received in order, i.e. don't ACK an out-of-sequence one, and just remember the sequence number of the last one you ACK-ed.

Have a separate sequence number for "important" messages (starting from zero), and the following variables:
a variable min_recv, indicating that you received all "important" messages from 0 to min_recv (excluded);
a list of the "important" sequence number that you already have received.
At any time (e.g. after receiving another "important" message), you store its sequence number in the list; then you can check if you can compact the list:
while list contains `min_recv`:
remove `min_recv` from list
increment min_recv
In this way you consume minimal memory, because even when you receive out-of-order important messages (and the size of the list will start to grow), eventually you will receive the missing message, because it will be retransmitted if lost, and you will empty the list.

Related

Handling Race Conditions / Concurrency in Network Protocol Design

I am looking for possible techniques to gracefully handle race conditions in network protocol design. I find that in some cases, it is particularly hard to synchronize two nodes to enter a specific protocol state. Here is an example protocol with such a problem.
Let's say A and B are in an ESTABLISHED state and exchange data. All messages sent by A or B use a monotonically increasing sequence number, such that A can know the order of the messages sent by B, and A can know the order of the messages sent by B. At any time in this state, either A or B can send a ACTION_1 message to the other, in order to enter a different state where a strictly sequential exchange of message needs to happen:
send ACTION_1
recv ACTION_2
send ACTION_3
However, it is possible that both A and B send the ACTION_1 message at the same time, causing both of them to receive an ACTION_1 message, while they would expect to receive an ACTION_2 message as a result of sending ACTION_1.
Here are a few possible ways this could be handled:
1) change state after sending ACTION_1 to ACTION_1_SENT. If we receive ACTION_1 in this state, we detect the race condition, and proceed to arbitrate who gets to start the sequence. However, I have no idea how to fairly arbitrate this. Since both ends are likely going to detect the race condition at about the same time, any action that follows will be prone to other similar race conditions, such as sending ACTION_1 again.
2) Duplicate the entire sequence of messages. If we receive ACTION_1 in the ACTION_1_SENT state, we include the data of the other ACTION_1 message in the ACTION_2 message, etc. This can only work if there is no need to decide who is the "owner" of the action, since both ends will end up doing the same action to each other.
3) Use absolute time stamps, but then, accurate time synchronization is not an easy thing at all.
4) Use lamport clocks, but from what I understood these are only useful for events that are causally related. Since in this case the ACTION_1 messages are not causally related, I don't see how it could help solve the problem of figuring out which one happened first to discard the second one.
5) Use some predefined way of discarding one of the two messages on receipt by both ends. However, I cannot find a way to do this that is unflawed. A naive idea would be to include a random number on both sides, and select the message with the highest number as the "winner", discarding the one with the lowest number. However, we have a tie if both numbers are equal, and then we need another way to recover from this. A possible improvement would be to deal with arbitration once at connection time and repeat similar sequence until one of the two "wins", marking it as favourite. Every time a tie happens, the favourite wins.
Does anybody have further ideas on how to handle this?
EDIT:
Here is the current solution I came up with. Since I couldn't find 100% safe way to prevent ties, I decided to have my protocol elect a "favorite" during the connection sequence. Electing this favorite requires breaking possible ties, but in this case the protocol will allow for trying multiple times to elect the favorite until a consensus is reached. After the favorite is elected, all further ties are resolved by favoring the elected favorite. This isolates the problem of possible ties to a single part of the protocol.
As for fairness in the election process, I wrote something rather simple based on two values sent in each of the client/server packets. In this case, this number is a sequence number starting at a random value, but they could be anything as long as those numbers are fairly random to be fair.
When the client and server have to resolve a conflict, they both call this function with the send (their value) and the recv (the other value) values. The favorite calls this function with the favorite parameter set to TRUE. This function is guaranteed to give the opposite result on both ends, such that it is possible to break the tie without retransmitting a new message.
BOOL ResolveConflict(BOOL favorite, UINT32 sendVal, UINT32 recvVal)
{
BOOL winner;
int sendDiff;
int recvDiff;
UINT32 xorVal;
xorVal = sendVal ^ recvVal;
sendDiff = (xorVal < sendVal) ? sendVal - xorVal : xorVal - sendVal;
recvDiff = (xorVal < recvVal) ? recvVal - xorVal : xorVal - recvVal;
if (sendDiff != recvDiff)
winner = (sendDiff < recvDiff) ? TRUE : FALSE; /* closest value to xorVal wins */
else
winner = favorite; /* break tie, make favorite win */
return winner;
}
Let's say that both ends enter the ACTION_1_SENT state after sending the ACTION_1 message. Both will receive the ACTION_1 message in the ACTION_1_SENT state, but only one will win. The loser accepts the ACTION_1 message and enters the ACTION_1_RCVD state, while the winner discards the incoming ACTION_1 message. The rest of the sequence continues as if the loser had never sent ACTION_1 in a race condition with the winner.
Let me know what you think, and how this could be further improved.
To me, this whole idea that this ACTION_1 - ACTION_2 - ACTION_3 handshake must occur in sequence with no other message intervening is very onerous, and not at all in line with the reality of networks (or distributed systems in general). The complexity of some of your proposed solutions give reason to step back and rethink.
There are all kinds of complicating factors when dealing with systems distributed over a network: packets which don't arrive, arrive late, arrive out of order, arrive duplicated, clocks which are out of sync, clocks which go backwards sometimes, nodes which crash/reboot, etc. etc. You would like your protocol to be robust under any of these adverse conditions, and you would like to know with certainty that it is robust. That means making it simple enough that you can think through all the possible cases that may occur.
It also means abandoning the idea that there will always be "one true state" shared by all nodes, and the idea that you can make things happen in a very controlled, precise, "clockwork" sequence. You want to design for the case where the nodes do not agree on their shared state, and make the system self-healing under that condition. You also must assume that any possible message may occur in any order at all.
In this case, the problem is claiming "ownership" of a shared clipboard. Here's a basic question you need to think through first:
If all the nodes involved cannot communicate at some point in time, should a node which is trying to claim ownership just go ahead and behave as if it is the owner? (This means the system doesn't freeze when the network is down, but it means you will have multiple "owners" at times, and there will be divergent changes to the clipboard which have to be merged or otherwise "fixed up" later.)
Or, should no node ever assume it is the owner unless it receives confirmation from all other nodes? (This means the system will freeze sometimes, or just respond very slowly, but you will never have weird situations with divergent changes.)
If your answer is #1: don't focus so much on the protocol for claiming ownership. Come up with something simple which reduces the chances that two nodes will both become "owner" at the same time, but be very explicit that there can be more than one owner. Put more effort into the procedure for resolving divergence when it does happen. Think that part through extra carefully and make sure that the multiple owners will always converge. There should be no case where they can get stuck in an infinite loop trying to converge but failing.
If your answer is #2: here be dragons! You are trying to do something which buts up against some fundamental limitations.
Be very explicit that there is a state where a node is "seeking ownership", but has not obtained it yet.
When a node is seeking ownership, I would say that it should send a request to all other nodes, at intervals (in case another one misses the first request). Put a unique identifier on each such request, which is repeated in the reply (so delayed replies are not misinterpreted as applying to a request sent later).
To become owner, a node should receive a positive reply from all other nodes within a certain period of time. During that wait period, it should refuse to grant ownership to any other node. On the other hand, if a node has agreed to grant ownership to another node, it should not request ownership for another period of time (which must be somewhat longer).
If a node thinks it is owner, it should notify the others, and repeat the notification periodically.
You need to deal with the situation where two nodes both try to seek ownership at the same time, and both NAK (refuse ownership to) each other. You have to avoid a situation where they keep timing out, retrying, and then NAKing each other again (meaning that nobody would ever get ownership).
You could use exponential backoff, or you could make a simple tie-breaking rule (it doesn't have to be fair, since this should be a rare occurrence). Give each node a priority (you will have to figure out how to derive the priorities), and say that if a node which is seeking ownership receives a request for ownership from a higher-priority node, it will immediately stop seeking ownership and grant it to the high-priority node instead.
This will not result in more than one node becoming owner, because if the high-priority node had previously ACKed the request sent by the low-priority node, it would not send a request of its own until enough time had passed that it was sure its previous ACK was no longer valid.
You also have to consider what happens if a node becomes owner, and then "goes dark" -- stops responding. At what point are other nodes allowed to assume that ownership is "up for grabs" again? This is a very sticky issue, and I suspect you will not find any solution which eliminates the possibility of having multiple owners at the same time.
Probably, all the nodes will need to "ping" each other from time to time. (Not referring to an ICMP echo, but something built in to your own protocol.) If the clipboard owner can't reach the others for some period of time, it must assume that it is no longer owner. And if the others can't reach the owner for a longer period of time, they can assume that ownership is available and can be requested.
Here is a simplified answer for the protocol of interest here.
In this case, there is only a client and a server, communicating over TCP. The goal of the protocol is to two system clipboards. The regular state when outside of a particular sequence is simply "CLIPBOARD_ESTABLISHED".
Whenever one of the two systems pastes something onto its clipboard, it sends a ClipboardFormatListReq message, and transitions to the CLIPBOARD_FORMAT_LIST_REQ_SENT state. This message contains a sequence number that is incremented when sending the ClipboardFormatListReq message. Under normal circumstances, no race condition occurs and a ClipboardFormatListRsp message is sent back to acknowledge the new sequence number and owner. The list contained in the request is used to expose clipboard data formats offered by the owner, and any of these formats can be requested by an application on the remote system.
When an application requests one of the data formats from the clipboard owner, a ClipboardFormatDataReq message is sent with the sequence number, and format id from the list, the state is changed to CLIPBOARD_FORMAT_DATA_REQ_SENT. Under normal circumstances, there is no change of clipboard ownership during that time, and the data is returned in the ClipboardFormatDataRsp message. A timer should be used to timeout if no response is sent fast enough from the other system, and abort the sequence if it takes too long.
Now, for the special cases:
If we receive ClipboardFormatListReq in the CLIPBOARD_FORMAT_LIST_REQ_SENT state, it means both systems are trying to gain ownership at the same time. Only one owner should be selected, in this case, we can keep it simple an elect the client as the default winner. With the client as the default owner, the server should respond to the client with ClipboardFormatListRsp consider the client as the new owner.
If we receive ClipboardFormatDataReq in the CLIPBOARD_FORMAT_LIST_REQ_SENT state, it means we have just received a request for data from the previous list of data formats, since we have just sent a request to become the new owner with a new list of data formats. We can respond with a failure right away, and sequence numbers will not match.
Etc, etc. The main issue I was trying to solve here is fast recovery from such states, with going into a loop of retrying until it works. The main issue with immediate retrial is that it is going to happen with timing likely to cause new race conditions. We can solve the issue by expecting such inconsistent states as long as we can move back to proper protocol states when detecting them. The other part of the problem is with electing a "winner" that will have its request accepted without resending new messages. A default winner can be elected by default, such as the client or the server, or some sort of random voting system can be implemented with a default favorite to break ties.

Handling messages over TCP

I'm trying to send and receive messages over TCP using a size of each message appended before the it starts.
Say, First three bytes will be the length and later will the message:
As a small example:
005Hello003Hey002Hi
I'll be using this method to do large messages, but because the buffer size will be a constant integer say, 200 Bytes. So, there is a chance that a complete message may not be received e.g. instead of 005Hello I get 005He nor a complete length may be received e.g. I get 2 bytes of length in message.
So, to get over this problem, I'll need to wait for next message and append it to the incomplete message etc.
My question is: Am I the only one having these difficulties to appending messages to each other, appending lengths etc.. to make them complete Or is this really usually how we need to handle the individual messages on TCP? Or, if there is a better way?
What you're seeing is 100% normal TCP behavior. It is completely expected that you'll loop receiving bytes until you get a "message" (whatever that means in your context). It's part of the work of going from a low-level TCP byte stream to a higher-level concept like "message".
And "usr" is right above. There are higher level abstractions that you may have available. If they're appropriate, use them to avoid reinventing the wheel.
So, there is a chance that a complete message may not be received e.g.
instead of 005Hello I get 005He nor a complete length may be received
e.g. I get 2 bytes of length in message.
Yes. TCP gives you at least one byte per read, that's all.
Or is this really usually how we need to handle the individual messages on TCP? Or, if there is a better way?
Try using higher-level primitives. For example, BinaryReader allows you to read exactly N bytes (it will internally loop). StreamReader lets you forget this peculiarity of TCP as well.
Even better is using even more higher-level abstractions such as HTTP (request/response pattern - very common), protobuf as a serialization format or web services which automate pretty much all transport layer concerns.
Don't do TCP if you can avoid it.
So, to get over this problem, I'll need to wait for next message and append it to the incomplete message etc.
Yep, this is how things are done at the socket level code. For each socket you would like to allocate a buffer of at least the same size as kernel socket receive buffer, so that you can read the entire kernel buffer in one read/recv/resvmsg call. Reading from the socket in a loop may starve other sockets in your application (this is why they changed epoll to be level-triggered by default, because the default edge-triggered forced application writers to read in a loop).
The first incomplete message is always kept in the beginning of the buffer, reading the socket continues at the next free byte in the buffer, so that it automatically appends to the incomplete message.
Once reading is done, normally a higher level callback is called with the pointers to all read data in the buffer. That callback should consume all complete messages in the buffer and return how many bytes it has consumed (may be 0 if there is only an incomplete message). The buffer management code should memmove the remaining unconsumed bytes (if any) to the beginning of the buffer. Alternatively, a ring-buffer can be used to avoid moving those unconsumed bytes, but in this case the higher level code should be able to cope with ring-buffer iterators, which it may be not ready to. Hence keeping the buffer linear may be the most convenient option.

RPC semantics what exactly is the purpose

I was going through the rpc semantics, at-least-once and at-most-once semantics, how does they work?
Couldn't understand the concept of their implementation.
In both cases, the goal is to invoke the function once. However, the difference is in their failure modes. In "at-least-once", the system will retry on failure until it knows that the function was successfully invoked, while "at-most-once" will not attempt a retry (or will ensure that there is a negative acknowledgement of the invocation before retrying).
As to how these are implemented, this can vary, but the pseudo-code might look like this:
At least once:
request_received = false
while not request_received:
send RPC
wait for acknowledgement with timeout
if acknowledgment received and acknowledgement.is_successful:
request_received = true
At most once:
request_sent = false
while not request_sent:
send RPC
request_sent = true
wait for acknowledgement with timeout
if acknowledgment received and not acknowledgement.is_successful:
request_sent = false
An example case where you want to do "at-most-once" would be something like payments (you wouldn't want to accidentally bill someone's credit card twice), where an example case of "at-least-once" would be something like updating a database with a particular value (if you happen to write the same value to the database twice in a row, that really isn't going to have any effect on anything). You almost always want to use "at-least-once" for non-mutating (a.k.a. idempotent) operations; by contrast, most mutating operations (or at least ones that incrementally mutate the state and are thus dependent on the current/prior state when applying the mutation) would need "at-most-once".
I should add that it is fairly common to implement "at most once" semantics on top of an "at least once" system by including an identifier in the body of the RPC that uniquely identifies it and by ensuring on the server that each ID seen by the system is processed only once. You can think of the sequence numbers in TCP packets (ensuring the packets are delivered once and in order) as a special case of this pattern. This approach, however, can be somewhat challenging to implement correctly on distributed systems where retries of the same RPC could arrive at two separate computers running the same server software. (One technique for dealing with this is to record the transaction where the RPC is received, but then to aggregate and deduplicate these records using a centralized system before redistributing the requests inside the system for further processing; another technique is to opportunistically process the RPC, but to reconcile/restore/rollback state when synchronization between the servers eventually detects this duplication... this approach would probably not fly for payments, but it can be useful in other situations like forum posts).

TCP: multiple messages in a row

Is it within TCP standard that multiple messages, sent from server to client in a row, will be accepted by client at same order (and bytes of one message will not be scattered within other messages)?
TCP provides an in-order byte stream delivery service. The bytes won't arrive in another order but the number of writes need not be equal to the number of reads.
You will never read bytes in another order than that in which they were sent
You can make no assumptions on "messages". TCP doesn't know about messages, only bytes (see above). Both the sender and the receiver can coalesce and split such "messages"
TCP uses a sequence number to identify each byte of data. The sequence number identifies the order of the bytes sent from each computer so that the data can be reconstructed in order, regardless of any fragmentation, disordering, or packet loss that may occur during transmission.
I agree with #cnicutar.
How are you deserializing the objects? I suspect the problem lies there.
For example if your messages are like
ABCD followed 200 ms later by PQR. It may appear as:
ABC followed by PQR
or ABCDPQR
or even AB followed by CD followed by PQ followed by R.
Basically you cannot make assumptions based on time of receiving the data.
The deserialization logic should know the object boundaries within a stream of bytes. This information should be encoded into the stream by the serialization logic.
If you are using Java, you can use ObjectInputStream & ObjectOutputStream and not be bothered about serialzation issues.
J2ME Polish has a good serialization utility that can be very easily ported to other platforms. I have myself used it in live environment.

Data Error Checking

I've got a bit of an odd question. A friend of mine and I thought it would be funny to make a serial port kind of communication between computers using sound. Basically, computers emit a series of beeps to send data, and listen for beeps over a microphone to receive data. In short, the world's most annoying serial port. I have all of the basics worked out. I can filter out sounds of only one frequency and I have sent data from one computer to another. Although the transmission is fairly error free, being affected only by very loud noises, some issues still exist. My question is, what are some good ways to check the data for errors and, more importantly, recover from these errors.
My serial communication is very standard once you get past the fact it uses sound waves. I use one start bit, 8 data bits, and one stop bit in every frame. I have already considered Cyclic Redundancy Checks, and I plan to factor this into my error checking, but CRCs don't account for some of the more insidious issues. For example, consider sending two bytes of data. You send the first one, and it received correctly, but just after the stop bit of the first byte, and the start bit of the next, a large book falls on the floor, which the receiver interprets to be a start bit, now the true start bit is read as part of the data and the receiver could be reading garbage data for many bytes to come. Eventually, a pause in the data could get things back on track.
That isn't the worst of it though. Bits can be dropped too, and most error checking schemes I can think of rely on receiving a certain number of bytes. What happens when the receiver keeps waiting for bytes that may not come?
So, you can see the complexity of this question. If you can direct me to any resources, or just give me a few tips, I would greatly appreciate your help.
A CRC is just a part of the solution. You can check for bad data but then you have to do something about it. The transmitter has to re-send the data, it needs to be told to do that. A protocol.
The starting point is that you split up the data into packets. A common approach is a start byte that indicates the start of the packet, followed by a packet number, followed by a length byte that indicates the length of the packet. Followed by the data bytes and the CRC. The receiver sends an ACK or NAK back to indicate success.
This solves several problems:
you don't care about a bad start bit anymore, the pause you need to recover is always there
you start a timer when you receive the first bit or byte, declare failure when the timer expires before the entire packet is received
the packet number helps you recover from bad ACK/NAK returns. The transmitter times out and resends the packet, you can detect the duplicate
RFC 916 describes such a protocol in detail. I never heard of anybody actually implementing it (other than me). Works pretty well.

Resources