I need to send and receive very large data using udp. Unfortunately udp provides 8192 bytes per diagram, so there is need to divide data into smaller pieces.
I'm using Qt and QUdpSocket. There is a QByteArray with length of 921600 I want to send to client. I want to send 8192 bytes each time.
What is the fast way to split a QByteArray?
You should never need to explicitly split the data, just step through it 8 KB at a time. Typically the functions that write data to a socket (including QUdpSocket::writeDatagram, it seems) accept a pointer to the first byte and a byte count, so you can just provide a pointer into the array.
Do note that sending 8 KB datagrams is quite aggressive; they will very likely be fragmented at the IP layer which can affect delivery speed and reliability negatively.
Research the concept of "path MTU", and try to use that for the sends, it might be faster although it will result in more datagrams.
Actually the length field on a UDP header is 16 bits so a UDP datagram can be up to ~65k (minus the size of the header stuff).
However, as unwind pointed out, it will most likely be fragmented along the way to fit the smallest MTU on the path to the destination.
8192 bytes is the default receive buffer size for Windows operating systems. The MTU is likely 1500 bytes if you are using Ethernet. Any UDP datagram larger than that will be fragmented. If your datagram encounters a device along the route with an even smaller MTU, it will be fragmented yet again.
You can use the QByteArray.mid(int start, int len) method (see documentation here) to get a QByteArray of length len starting from start.
Just make len your datagram size and start with 0*len, 1*len, 2*len, ... until everything is sent.
Related
While experimenting with a UDP server that runs on esp32, I found that the size of the received packet is limited to 1500 bytes: 20 (IP header) + 8 (UDP header) + 1472 (data), (although in theory UDP as if can support packets data up to 64K). This means that in order to transfer a larger amount of data, the client must split it into several chunks and send them one after the other, and on the server side, this data will need to be restored. I think that the overhead of such a solution will be quite high. I also know that Toit provides TCP/IP connection. Naturally, the packet size is also limited in the case of TCP/IP. This is 64K (65535 bytes). Does Toit have any additional restrictions on the TCP/IP connection, or 64K value is fact also for Toit?
As described in this question/answer, it's a matter of avoiding packet fragmentation. Sending packages above this size will force the system to split them up into multiple fragments of size MTU, with each of them being individually unreliable. As memory is already very limited on embedded systems, sending large (> MTU) packages where all fragments has to arrive before it can be processed, can be very unfortunate for the overall application behavior as it can time out or go out-of-memory.
Instead the application should look at a streaming pipeline (perhaps even TCP to handle the unreliable aspects as well).
As TCP/IP is a streaming protocol, any sized "packages" can be sent, as they are automatically split into fragments of size MTU. Note that the data is received in "random"-sized packages, though the order of the bytes is fully preserved.
When transferring, say, 1GB worth of data over the internet this data is split into packets, each packet containing a small piece of data and aach of these packets are part of a frame.
Eg. Windows reports that you are transferring the file in 100kb/s over a TCP connection, but this appears to be the amount of data from the file being transferred per second, and does not seem to include the ip or tcp header, or ethernet frame.
What is the actual amount of traffic on the network needed to transfer at this speed? Or is that data actually already included in the transfer speed, but just small enough that it makes no significant difference?
Also, IP supports up to 1500 byte / packet (I think?), but what is the common size of data packets when loading, say, an HD image on reddit?
Sorry for the rather basic questions I probably should have figured out myself by now...
It depends on where you look at the transfer rate:
Task Manager will report all the transferred bytes (i.e. the sum of all the packets including their headers).
A file transfer program will report the transmitted payload.
Task Manager
If you look at Task Manger / Network, you can see the transmitted bytes together with the number of transmitted packets (unicast or non-unicast).
That data comes from the network driver (or at least something close to it), so it makes sense to report the total amount of data here (otherwise each packet would need to be inspected to calculate the payload).
There is also a graph showing the transfer rate. Those numbers could easily be compared with the reported numbers in file transfer software.
File transfer program
A file transfer program on the other hand, does not know the details about the packets being created in the lower layers (those could be any size). So the only option here is to report the amount of transmitted payload data / part of the file, which also makes more sense to the user.
Network packets
On normal networks (there could also be jumbo frames), a TCP-packet (full ethernet frame) is around 1500 bytes when fully loaded (on my system (IPv4) the packets are 1514 bytes with a total header size of 54 bytes -- 14 for the Ethernet header, 20 for the IP header and 20 for the TCP header). Those could be split in smaller packets along the way in the network, but in most cases they won't.
Transfer rate
When transferring a file (or other large datastream), on average 2 full packets (1514 bytes) will be sent each time, and 1 small packet (54 bytes) is received (the [ACK] packet). In this optimal case we have 2 x 1460 payload, and 2 x 54 bytes of overhead on the sending side + 54 bytes on the receiving side. When comparing to the maximum transfer rate of the internet connection, we also have to consider some latency.
Not all transmissions are optimal:
There could be packets that never arrived or where the checksum was wrong, so a retransmit would be needed.
In some cases data could be sent in smaller parts, causing a higher overhead/payload ratio (but with small chunks Nagle's algorithm could take care of that).
Certain software could be reading the file contents into small buffers (say 4096 bytes). Those could then be split in 2 x 1460 and 1 x 1176, introducing some extra overhead.
Conclusion
It's hard to tell or calculate the exact ratio transferred_bytes/payload. It depends on the quality of the internet connection (lost packets, retransmits), the software or API calls used to transfer the data, and even the underlying network (small frames vs jumbo frames for example).
A typical full-size TCP/IPv4 packet on the Internet is of size 1500B (maximum transmission unit (MTU)), of which (minimum) 20B are of TCP header and (minimum) 20B are of IPv4. This MTU was chosen to be compatible with Ethernet. Furthermore, there are application headers (e.g., HTTP for web, SIP/RTP/RTCP for voice call, etc.) included in this packet. The minimum MTU is 576B for IPv4 and 1280B for IPv6. One can see MTU on Linux with ifconfig command.
The best way to figure these values is by using a pcap tool/network analyzer such as Wireshark. Also refer to wiki pages or a good networking book for headers and fields of the protocols.
I'm pretty sure that the reported transer rate doen't include all the headers and overhead of the different layers in the protocal stack, since the reported thruput usually comes from some user-space application which would only get the actual data from the network stream object. It would need to do additional work to find out about all the headers and frames and other overhead that occurred in the different layers and affected the actual physical transmission.
I've been doing some work with C# Networking using UDP. I'm getting on fine but need the answer to a couple of fundamental questions I'm having problems testing:
Currently I'm sending data in a ~16000 byte datagrams, which according to wireshark is getting split into several 1500 byte packets (because of max packet size limits) and then reassembled at the other end.
Am I right in understanding the datagram will be received complete at the other end OR not at all. IE it's an all or nothing thing. There is no chance of ending up with a fragmented datagram due to packet loss?
Therefore, I only need to ACK per datagram, rather than ensuring my datagrams are < 1500 bytes and ACK each one?
I've looked in a lot of places but there seems to be a lot of confusion between the differences between datagrams and the underlying packets...
Thanks for you help!
There is no chance of ending up with a fragmented datagram due to packet loss?
I believe that's true: that fragmentation and fragment reassembly is handled by the protocol layer below UDP, i.e. that it's handled by the "IP" layer, which will error if it fails to reassemble the packet-fragments into a datagram (for example, search for "fragment" in RFC 792).
http://www.pcvr.nl/tcpip/udp_user.htm#11_5 says,
"The IP layer at the destination performs the reassembly. The goal is to make fragmentation and reassembly transparent to the transport layer (TCP and UDP), which it is, except for possible performance degradation."
As you may now 16 bit UDP length field indicates that you can send a total of 65535 bytes. However, the data can be theoretically (sizeof(IP Header) + sizeof(UDP Header)) = 65535-(20+8) = 65507 bytes.
But this does not mean that all applications that are using UDP will send this amount of data as an example DNS packets limits to 512 bytes. This is because you don't get any ACK packets from server. This is one reason that packets may get lost in the network (packet transmission problems and loss). Secondly intermediate nodes may encapsulate datagrams inside of another protocol, as an example IPSEC or other protocols do that.
For UDP there is no ACK packets, so in your case if underlying application uses UDP you should not see any ACK packets. Secondly, some of the server limit their sizes to the max UDP packets depending on the application, so if you have data transfer from client to server you should see same bytes e.g 512 bytes. going and coming back in wireshark. Mostly, source makes the request and destination sends X bytes UDP datagrams back.
These links may be good for your questions:
Wireshark UDP analysis
RFC 1122 (states that 576 is the minimum maximum reassembly buffer size)
Am I right in understanding the datagram will be received complete at the other end OR not at all. IE it's an all or nothing thing. There is no chance of ending up with a fragmented datagram due to packet loss?
That is correct.
Therefore, I only need to ACK per datagram, rather than ensuring my datagrams are < 1500 bytes and ACK each one?
I don't understand this question. You need to ACK each datagram regardless of its size, and you should make them < 1500 bytes so they won't get fragmented. Otherwise you may never be able to transmit any specific datagrams at all, if it repeatedly gets fragmented and a fragment repeatedly gets lost.
What is the maximum packet size for a TCP connection or how can I get the maximum packet size?
The absolute limitation on TCP packet size is 64K (65535 bytes), but in practicality this is far larger than the size of any packet you will see, because the lower layers (e.g. ethernet) have lower packet sizes.
The MTU (Maximum Transmission Unit) for Ethernet, for instance, is 1500 bytes. Some types of networks (like Token Ring) have larger MTUs, and some types have smaller MTUs, but the values are fixed for each physical technology.
This is an excellent question and I run in to this a lot at work actually. There are a lot of "technically correct" answers such as 65k and 1500. I've done a lot of work writing network interfaces and using 65k is silly, and 1500 can also get you in to big trouble. My work goes on a lot of different hardware / platforms / routers, and to be honest the place I start is 1400 bytes. If you NEED more than 1400 you can start to inch your way up, you can probably go to 1450 and sometimes to 1480'ish? If you need more than that then of course you need to split in to 2 packets, of which there are several obvious ways of doing..
The problem is that you're talking about creating a data packet and writing it out via TCP, but of course there's header data tacked on and so forth, so you have "baggage" that puts you to 1500 or beyond.. and also a lot of hardware has lower limits.
If you "push it" you can get some really weird things going on. Truncated data, obviously, or dropped data I've seen rarely. Corrupted data also rarely but certainly does happen.
At the application level, the application uses TCP as a stream oriented protocol. TCP in turn has segments and abstracts away the details of working with unreliable IP packets.
TCP deals with segments instead of packets. Each TCP segment has a sequence number which is contained inside a TCP header.
The actual data sent in a TCP segment is variable.
There is a value for getsockopt that is supported on some OS that you can use called TCP_MAXSEG which retrieves the maximum TCP segment size (MSS). It is not supported on all OS though.
I'm not sure exactly what you're trying to do but if you want to reduce the buffer size that's used you could also look into: SO_SNDBUF and SO_RCVBUF.
There're no packets in TCP API.
There're packets in underlying protocols often, like when TCP is done over IP, which you have no interest in, because they have nothing to do with the user except for very delicate performance optimizations which you are probably not interested in (according to the question's formulation).
If you ask what is a maximum number of bytes you can send() in one API call, then this is implementation and settings dependent. You would usually call send() for chunks of up to several kilobytes, and be always ready for the system to refuse to accept it totally or partially, in which case you will have to manually manage splitting into smaller chunks to feed your data into the TCP send() API.
According to http://en.wikipedia.org/wiki/Maximum_segment_size, the default largest size for a IPV4 packet on a network is 536 octets (bytes of size 8 bits). See RFC 879
Generally, this will be dependent on the interface the connection is using. You can probably use an ioctl() to get the MTU, and if it is ethernet, you can usually get the maximum packet size by subtracting the size of the hardware header from that, which is 14 for ethernet with no VLAN.
This is only the case if the MTU is at least that large across the network. TCP may use path MTU discovery to reduce your effective MTU.
The question is, why do you care?
If you are with Linux machines, "ifconfig eth0 mtu 9000 up" is the command to set the MTU for an interface. However, I have to say, big MTU has some downsides if the network transmission is not so stable, and it may use more kernel space memories.
One solution can be to set socket option TCP_MAXSEG (http://linux.die.net/man/7/tcp) to a value that is "safe" with underlying network (e.g. set to 1400 to be safe on ethernet) and then use a large buffer in send system call.
This way there can be less system calls which are expensive.
Kernel will split the data to match MSS.
This way you can avoid truncated data and your application doesn't have to worry about small buffers.
It seems most web sites out on the internet use 1460 bytes for the value of MTU. Sometimes it's 1452 and if you are on a VPN it will drop even more for the IPSec headers.
The default window size varies quite a bit up to a max of 65535 bytes. I use http://tcpcheck.com to look at my own source IP values and to check what other Internet vendors are using.
The packet size for a TCP setting in IP protocol(Ip4). For this field(TL), 16 bits are allocated, accordingly the max size of packet is 65535 bytes: IP protocol details
How do you get the maximum number of bytes that can be passed to a sendto(..) call for a socket opened as a UDP port?
Use getsockopt(). This site has a good breakdown of the usage and options you can retrieve.
In Windows, you can do:
int optlen = sizeof(int);
int optval;
getsockopt(socket, SOL_SOCKET, SO_MAX_MSG_SIZE, (int *)&optval, &optlen);
For Linux, according to the UDP man page, the kernel will use MTU discovery (it will check what the maximum UDP packet size is between here and the destination, and pick that), or if MTU discovery is off, it'll set the maximum size to the interface MTU and fragment anything larger. If you're sending over Ethernet, the typical MTU is 1500 bytes.
On Mac OS X there are different values for sending (SO_SNDBUF) and receiving (SO_RCVBUF).
This is the size of the send buffer (man getsockopt):
getsockopt(sock, SOL_SOCKET, SO_SNDBUF, (int *)&optval, &optlen);
Trying to send a bigger message (on Leopard 9216 octets on UDP sent via the local loopback) will result in "Message too long / EMSGSIZE".
As UDP is not connection oriented there's no way to indicate that two packets belong together. As a result you're limited by the maximum size of a single IP packet (65535). The data you can send is somewhat less that that, because the IP packet size also includes the IP header (usually 20 bytes) and the UDP header (8 bytes).
Note that this IP packet can be fragmented to fit in smaller packets (eg. ~1500 bytes for ethernet).
I'm not aware of any OS restricting this further.
Bonus
SO_MAX_MSG_SIZE of UDP packet
IPv4: 65,507 bytes
IPv6: 65,527 bytes