How are UDP messages received, exactly? [closed] - networking

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
So anywhere I read anything about UDP, people say this;
Messages can be received out of order
It's possible a message never arrives at all
The first one isn't clear to me. This is what can happen with TCP:
I send 1234, client receives 12 and then 34
So no problem; just prepend the message length and it's all good. After all, an integer is always 4 bytes, so even if the client receives the prepended length in 2 goes, it will know to keep reading until it has at least 4 bytes to know the msg length.
Anyway, back to UDP, what's the deal now when people say 'packages can be received out of order'?
A) Send `1234`, client receives `34` and then `12`
B) Send `1234` and `5678`, client receives `5678` and then `1234`
If it's A, I don't see how I can make UDP work for me at all. How would the client ever know what's what?

It's entirely possible that a network has many paths to reach a given point, so one of the datagram could take one route to reach the other end, another packet could take another path. Given this, the last packet sent could arrive before another packet. UDP takes no measures to correct this, as there's no notion of a connection, and in-order delivery.
At this points it depends on how you send your data. For UDP, each send() or similar call sends one UDP datagram, and recv() receives one datagram. A datagrams can be reordered with respect to other datagrams, or disappear entirely. Data cannot be reordered or dropped within a datagram, you either receive exactly the message that was sent, or you don't receive it at all.
If you need datagrams/messages to arrive in order, you need to add a sequence number to your packets, queue and reorder them at the receiving end.

The usual metaphore is:
TCP is a telephone conversation: words arrive in the same order as they were spoken
UDP is sending a series of letters by mail: the letters may get lost, may arrive, and can arrive in any order.
TCP also involves a connection : if the telephone line is disrupted by a thunderstorm, the connection breaks, and has to be built up again. (you need to dial again)
UDP is connectionless and unreliable: if the mailman is hit by a truck, some letters may be lost. Some letters could also be picked up and delivered by other mailmen. Letters can even be dropped on the floor if your mailbox is full, and even without any reason.

Related

Parsing TCP Packets back together

I am working on a tool that takes a pcap file (from wireshark in this case), and attempts to parse out data from the TCP packets.
Now in this case, I only care about the data in one direction. So my logic was to sort out each wireshark captured packet into a list by the protocol-destIP-sourceIP-destPort-sorcePort.
So from this point, I now have a list of only packets for one direction on a particular port.
From there I just want to be able to walk through the bodies of the TCP payloads in order. is it as simple as then going in order by Sequence numbers?
I would simply take the first sequence number captured, add the payload size to it and expect that to be the next TCP packet sent? Is there more to this that I am missing?
I was noticing when sorting the interfaces this way, eventually I would come up to a sequence that dosent make sense. I guess I could just assume that is the start of the next stream? I know it becomes more difficult if I have to consider traffic going back and forth... but in this case I only want to watch packets in one direction.
Wireshark captures the packet on the wire. It might happen that the packets don't arrive in sequential order, that packets are corrupt (bad checksum), that there are duplicate packets ... - and this is ignoring possible attacks designed to confuse the analysis. The TCP stack will take care of all these problems so that the application gets the right packets, but Wireshark works outside the TCP stack. Thus, while in most cases your simple procedure will likely work (assuming that you at least check TCP flags for start and end of connection), it might fail in some cases.

What indicates the end of a TCP stream?

I'm attempting to make a low-level packet reader, sort of like wireshark, for learning purposes and fun.
Going into http parsing, I quickly found that there are these TCP streams, and that data doesn't all come in one packet.
EDIT: What I consider to be the stream I've been told is not the right term. David Schwartz referred to it as a Query and EJP as a Sequence of Segments.
So I did a little reading on TCP, and I've read everywhere about the sequence and acknowledgment numbers, how they increase and wrap around etc..
Connection establishment and termination I already had knowledge of.
To be honest I've never been too good at interpretation or using google, but I've been at this for a bit, and I just can't seem to understand how the server/client ends a TCP stream.
I'm asking this because, after sending the totality of the http request, the server immediately replies with the page, so, as I've been going back and forward through packets in wireshark and I can't seem to find what delimits this stream (anything indicating length in the first packet or difference between the penultimate and last packets).
Is there no actual delimiter in TCP, and you just get the total length from the http headers's Content-Length?
As seen here, the first 3 packets are connection establishment.
The 3 after those are the sending of the http request (with first being ACK (the selected one), and the latest 2 being PSH & ACK).
The two after those are ACKs from the server, followed by the entirety of the response.
The last one is my ACK and the connection is never closed for the time I "recorded".
Thank you.
The end of a TCP stream is marked by a segment whose FIN bit is sent.
However that isn't the answer to your real question, which is about the end of an HTTP request or response, which is determined by:
The Content-length, if present, or
The final chunk in a chunked transfer, if used, or
The FIN
if there is a body, otherwise the blank line after the headers as pointed out by #Barmar below.

Filtering UDP packets accidentally sent to my port

I am designing a simple protocol on top of UDP and now I've realized that someone else can send a packet to the port I'm listening on. Such packet will obviusly be incorrect for my application (I'm not concerned about security now)
Is there the way of filtering these invalid messages? I was thinking about adding some fixed magic number at the beginning of each packet, but how large should it be? Is 16 bits enough?
I believe the typical solution is to require a handshake (that could include a "long enough" magic number) in the beginning of the session. Of course the "session" is something your protocol needs to keep track of, UDP does not have the concept. Keeping a list ip, port and last packet receive time for all current sessions should do it: then you can drop all packets from a peers that have not done the handshake beforehand. This not only prevents random unknown app traffic from breaking your application but will also prevent multiple legitimate peers from screwing up each others traffic.
Additionally you could add either a session id or an increasing packet sequence number (with allowances for missing packets) to packets if you need to ensure that the peer agrees with you on which session this is.
Probably. Java uses 32 for .class files (0xcafebabe). But you only have 534 bytes in practical UDP so you might need to save a couple.

tcpip 3-way handshake

Why is data not transferred during the 3rd part of TCP 3-way handshake?
e.g.
(A to B)SYN
(B to A)ACK+SYN
(A to B) ACK.... why cant data be transferred along with this ACK?
I've always believed it was to keep the session establishment phase separate from the data transfer phase so that no real data is transferred until both ends of the session have agreed on the sequence numbers and session options, especially since packets arriving may be from a totally different, previous, session that just happens to have the same endpoints.
However, on further investigation, I'm not entirely certain that transmitting data with the handshake packets is disallowed. The section on TCP connection establishment in my Internetworking with TCP/IP1 book contains the following snippet:
Because of the protocol design, it is possible to send data along with the initial sequence numbers in the handshake segments. In such cases, the TCP software must hold the data until the handshake completes. Once a connection has been established, the TCP software can release data being held and deliver it to a waiting application program quickly.
Since it's certainly possible to construct a TCP packet with SYN (or ACK) and data, this may well be allowed. I've never seen it happen in the wild but, then again, I've never seen a hairy-eared dwarf lemur in the wild either, though I'm assured they exist.
It may be that it's the sockets software that prevents data going out before the session is fully established but TCP appears to consider it valid. It appears you can send data with a SYN-ACK packet (phase 2 of the connection establishment) since you have the other end's sequence number and options. Likewise, sending data with the phase 3 ACK packet appears to be possible as well.
The reason the TCP software holds on to the data until the handshake is fully complete is probably due to the reason mentioned above - only once both ends have agreed on the sequence numbers can you be sure that the data is not from a previous session.
1 Internetworking with TCP/IP Volume 1 Principles, Protocols and Architecture, 3rd edition, Douglas E. Comer, ISBN 0-13-216987-8.

What factors other than latency and bandwidth affect network speed?

I have noticed that viewing images or websites that are hosted on US servers (Im in europe) is considerably slower. The main reason would be the latency because of the distance.
But if 1 packet takes n milliseconds to be received, can't this be alleviated by sending more packets simultaneously?
Does this actually happen or are the packets sent one by one? And if yes what determines how many packets can be send simultaneously (something to do with the cable i guess)?
But if 1 packet takes n milliseconds
to be received, can't this be
alleviated by sending more packets
simultaneously?
Not in a boundless way, by TCP/IP standards, because there algorithms that determine how much can be in flight and not yet acknowledged to avoid overloading the whole network.
Does this actually happen or are the
packets sent one by one?
TCP can and does keep up to a certain amount of packets and data "in flight".
And if yes what determines how many
packets can be send simultaneously
(something to do with the cable i
guess)?
What cable? The same standards apply whether you're on cabled, wireless, or mixed sequences of connections (remember your packet goes through many routers on its way to the destination, and the sequence of router can change among packets).
You can start your study of TCP e.g. wikipedia. Your specific questions deal with congestion control algorithms and standard, Wikipedia will give you pointers to all relevant algorithms and RFCs, but the whole picture won't do you much good if you try to start studying at that spot without a lot of other understanding of TCP (e.g., its flow control concepts).
Wikipedia and similar encyclopedia/tutorial sites can only give you a summary of the summary, while RFCs are not studied to be readable, or understandable to non-experts. If you care about TCP, I'd recommend starting your study with Stevens' immortal trilogy of books (though there are many other valid ones, Stevens' are by far my personal favorites).
The issue is parallelism.
Latency does not directly affect your pipe's throughput. For instance, a dump truck across the country has terrible latency, but wonderful throughput if you stuff it full of 2TB tapes.
The problem is that your web browser can't start asking for things until it knows what to ask for. So, when you load a web page with ten images, you have to wait until the img tags arrive before you can send the request for them. So everything is perceptibly slower, not because your connection is saturated but because there's down time between making one request and the next.
A prefetcher helps alleviate this issue.
As far as "multiple packets at a time" are concerned, a single TCP connection will have many packets in transit at once, as specified by the window scaling algorithm the ends are using. But that only helps with one connection at a time...
TCP uses what's called a sliding window. Basically the amount of buffer space, X, the receiver has to re-assemble out of order packets. The sender can send X bytes past the last acknowledged byte, sequence number N, say. This way you can fill the pipe between sender and receiver with X unacknowledged bytes under the assumption that the packets will likely get there and if not the receiver will let you know by not acknowledging the missing packets. On each response packet the receiver sends a cumulative acknowledgment, saying "I've got all the bytes up to byte X." This lets it ack multiple packets at once.
Imagine a client sending 3 packets, X, Y, and Z, starting at sequence number N. Due to routing Y arrives first, then Z, and then X. Y and Z will be buffered at the destination's stack and when X arrives the receiver will ack N+ (the cumulative lengths of X,Y, and Z). This will bump the start of the sliding window allowing the client to send additional packets.
It's possible with selective acknowledgement to ack portions of the sliding window and ask the sender to retransmit just the lost portions. In the classic scheme is Y was lost the sender would have to resend Y and Z. Selective acknowledgement means the sender can just resend Y. Take a look at the wikipedia page.
Regarding speed, one thing that may slow you down is DNS. That adds an additional round-trip, if the IP isn't cached, before you can even request the image in question. If it's not a common site this may be the case.
TCP Illustrated volume 1, by Richard Stevens is tremendous if you want to learn more. The title sounds funny but the packets diagrams and annotated arrows from one host to the other really make this stuff easier to understand. It's one of those books that you can learn from and then end up keeping as a reference. It's one of my 3 go-to books on networking projects.
The TCP protocol will try to send more and more packets at a time, up to a certain amount (I think), until it starts noticing that they're dropping (the packets die in router/switch land when their Time To Live expires) and then it throttles back. This way it determines the size of the window (bandwidth) that it can use. If the sending host is noticing from its end a lot of dropped packets, then the receiving host is just going to see it as a slow connection. It could very well be blasting you with data, you're just not seeing it.
I guess parallel packets transmission is also possible(ya.. there can be limiit on No. of packets to be send at a time)..U will get more information about the packet transmission from topics::> message switching,packet switching ,circuit switching & virtual circuit packet switching...

Resources