add latency with tc without restricting bandwidth - networking

I'm trying to use tc to add latency to responses from a webserver in order to simulate a WAN.
I found a few related posts and tried out the command:
tc qdisc add dev eth0 root netem delay 100ms
I am using a 10G NIC to make a high volume of requests equalling about 3Gbps. After using tc to add latency, I see a massive drop off in throughput, and the latency of responses gets closer to about 3 seconds.
Am I missing something in the above command, it it limiting the rate / throughput in addition to adding latency?
N.B tc qdisc returns the following:
qdisc netem 8005: dev eth0 root refcnt 72 limit 1000 delay 100.0ms 10.0ms rate 10000Mbit

Firstly, I think tc cannot process packets at such high data rates. I experienced drop in throughput as well when I was playing with it few years ago. I used both 10GbE and 40GbE.
Unfortunately, I have no access to such hardware now.
I would suggest you to check the buffer sizes as you are emulating a delay of 100ms. packets are getting dropped somewhere and affecting your throughput. The increased latency can be because of the packet making it to the destination after being dropped many times (small buffer size) or being queued for a ver long time (very large buffer size)

Related

adding random noise for sending few Mbyte data using netem

I am sending a 30Mbyte file from one machine to another(both raspi 4) using scp. I tried to add noise to the network using netem on the client.
sudo tc qdisc change dev wlan0 root netem delay 50ms 30ms distribution normal
and it takes 07:17 minutes (or rate of 70.6KB/s)
However, when I increase the mean of the added noise normal distribution using
sudo tc qdisc change dev wlan0 root netem delay 500ms 30ms distribution normal
it only takes 00:33 minutes (or rate of 920.1KB/s). Logically it has to take at least the same time as the former distribution, however, it is way faster. I repeated it multiple times, and I got the same result.
I was wondering if I am using the tc in a correct way.

SMP affinity vs XPS on paired queues and TX queue selection control

I have a solarflare nic with paired rx and tx queues (8 sets, 8 core machine real machine, not hyperthreading, running ubuntu) and each set shares an IRQ number. I have used smp_affinity to set which irqs are processed by which core. Does this ensure that the transmit (tx) interrupts are also handled by the same core. How will this work with xps?
For instance, lets say the irq# is 115, set to core 2 (via smp_affinity). Say the nic chooses tx-2 for outgoing tcp packets, which also happens to have 115 irq number. If I have an xps setting saying tx-2 should be accessible by cpu 4, then which one takes precedence - xps or smp_affinity?
Also is there a way to see/set which tx queue is being used for a particular app/tcp connection? I have an app that receives udp data, processes it and sends tcp packets, in a very latency sensitive environment. I wish to handle the tx interrupts on the outgoing on the same cpu (or one on the same numa node) as the app creating this traffic, however, I have no idea how to find which tx queue is being used by this app for this purpose. While the receive side has indirection tables to set up rules, I do not know if there is a way to set the tx-queue selection and therefore pin it to a set of dedicated cpus.
You can tell the application the preferred CPU by setting the cpu affinity (taskset) or numa node affinity, and you can also set the IRQ affinities (in /proc/irq/270/node, or by using the old intel script floating around 'set_irq_affinity.sh' which is on github). This won't completely guarantee which irq / cpu is being used, but it will give you a good head start on it. If all that fails, to improve latency you might want to enable packet steering in the rxqueue so you get the packets in quicker to the correct cpu (/sys/class/net//queues/rx-#/rps_cpus and tx-#/xps-cpus). There is also the irqbalance program and more....it is a broad subject and i am just learning much of it myself.

netem loopback interface reordering packets

I have two apps communicating over UDP on the same host and I would like to send packets with varying delays (jitter) but no out of order packets. I have this rule for loopback interface:
sudo tc qdisc add dev lo root handle 1: netem delay 10ms 100ms
This seems to create the jitter successfully; however, there are out of order packets.. Basically I would like to recieve the packets on the receiver side in the order that they are sent from the sender, with just varying delay, i.e. with jitter.
I tried some basic reorder commands.. when I use reorder 100%, it does the reorder but there is no jitter in this case. If I use reorder command with anything less than 100%, then there is out of order packets.
It says here that if execute the following command, the packets will stay in order:
sudo tc qdisc add dev lo parent 1:1 pfifo limit 1000
But I still get out of order packets. Any help is much appreciated.
(§1) According to the official documentation - delay section this code
# tc qdisc change dev eth0 root netem delay 100ms 10ms.
... causes the added delay to be 100ms ± 10m
In your code the second ms command line argument is greater than the first.
(§2) Additionally, under the packet re-ordering section this code
# tc qdisc change dev eth0 root netem delay 100ms 75ms
... will cause some reordering. If the first packet gets a random delay of 100ms (100ms base - 0ms jitter) and the second packet is sent 1ms later and gets a delay of 50ms (100ms base - 50ms jitter); the second packet will be sent first.
Educated guess: (didn't test)
Switch the position of your last two arguments from
sudo tc qdisc add dev lo root handle 1: netem delay 10ms 100ms
to
sudo tc qdisc add dev lo root handle 1: netem delay 100ms 10ms
Although according to (§2) it is still possible that your packets can get reordered if you send them back-to-back in under 20ms: 1st packet gets 100+10=110ms delay, 2nd packet that you send 1ms later gets 100-10=90ms delay; 2nd packet will arrive before 1st one.

Ethernet device MTU setting implications in packet ingress path

What is the behavior of an ethernet device in the packet ingress path?
If the sender is sending a larger-than-MTU frame, then:
1) does the receiver's device drop it directly in hardware,
2) or accept it and send it up for the kernel's IP stack to handle it?
3) when is ICMP frag-required sent?
4) does it make a difference if the ethernet device is say on a intermediate router vs a end host?
It is impossible to answer 1), 2) definitively for all devices and networking stacks. The Ethernet standard defines an MTU of 1500 bytes so that is all you can rely on and in general you should expect frames with larger MTUs to be dropped.
However in reality it is likely in an end host that if the network interface hardware doesn't drop the oversized frame (which is typically referred to as giant) then it will make it's way up the software stack and be processed. Even though the stack may not drop the oversized frame due to it being over the MTU it may still get dropped for other reasons, e.g. due to an internal queue exhaustion.
Although the maximum MTU of an Ethernet frame has remained constant the maximum size of an Ethernet frame has grown over time to encompass features like 802.1Q VLAN single and double tags. MPLS further increases the frame size to include label stacks. This means intermediate switches are usually tolerant of frames that exceed the interface MTU by some amount. One major vendor effectively tolerates a max MTU of 2000 bytes by default in their current switches. Older switches may be less tolerant.
To get a definitive answer you'll need to do some research into the specific hardware and software that you care about.

Maximum packet size for a TCP connection

What is the maximum packet size for a TCP connection or how can I get the maximum packet size?
The absolute limitation on TCP packet size is 64K (65535 bytes), but in practicality this is far larger than the size of any packet you will see, because the lower layers (e.g. ethernet) have lower packet sizes.
The MTU (Maximum Transmission Unit) for Ethernet, for instance, is 1500 bytes. Some types of networks (like Token Ring) have larger MTUs, and some types have smaller MTUs, but the values are fixed for each physical technology.
This is an excellent question and I run in to this a lot at work actually. There are a lot of "technically correct" answers such as 65k and 1500. I've done a lot of work writing network interfaces and using 65k is silly, and 1500 can also get you in to big trouble. My work goes on a lot of different hardware / platforms / routers, and to be honest the place I start is 1400 bytes. If you NEED more than 1400 you can start to inch your way up, you can probably go to 1450 and sometimes to 1480'ish? If you need more than that then of course you need to split in to 2 packets, of which there are several obvious ways of doing..
The problem is that you're talking about creating a data packet and writing it out via TCP, but of course there's header data tacked on and so forth, so you have "baggage" that puts you to 1500 or beyond.. and also a lot of hardware has lower limits.
If you "push it" you can get some really weird things going on. Truncated data, obviously, or dropped data I've seen rarely. Corrupted data also rarely but certainly does happen.
At the application level, the application uses TCP as a stream oriented protocol. TCP in turn has segments and abstracts away the details of working with unreliable IP packets.
TCP deals with segments instead of packets. Each TCP segment has a sequence number which is contained inside a TCP header.
The actual data sent in a TCP segment is variable.
There is a value for getsockopt that is supported on some OS that you can use called TCP_MAXSEG which retrieves the maximum TCP segment size (MSS). It is not supported on all OS though.
I'm not sure exactly what you're trying to do but if you want to reduce the buffer size that's used you could also look into: SO_SNDBUF and SO_RCVBUF.
There're no packets in TCP API.
There're packets in underlying protocols often, like when TCP is done over IP, which you have no interest in, because they have nothing to do with the user except for very delicate performance optimizations which you are probably not interested in (according to the question's formulation).
If you ask what is a maximum number of bytes you can send() in one API call, then this is implementation and settings dependent. You would usually call send() for chunks of up to several kilobytes, and be always ready for the system to refuse to accept it totally or partially, in which case you will have to manually manage splitting into smaller chunks to feed your data into the TCP send() API.
According to http://en.wikipedia.org/wiki/Maximum_segment_size, the default largest size for a IPV4 packet on a network is 536 octets (bytes of size 8 bits). See RFC 879
Generally, this will be dependent on the interface the connection is using. You can probably use an ioctl() to get the MTU, and if it is ethernet, you can usually get the maximum packet size by subtracting the size of the hardware header from that, which is 14 for ethernet with no VLAN.
This is only the case if the MTU is at least that large across the network. TCP may use path MTU discovery to reduce your effective MTU.
The question is, why do you care?
If you are with Linux machines, "ifconfig eth0 mtu 9000 up" is the command to set the MTU for an interface. However, I have to say, big MTU has some downsides if the network transmission is not so stable, and it may use more kernel space memories.
One solution can be to set socket option TCP_MAXSEG (http://linux.die.net/man/7/tcp) to a value that is "safe" with underlying network (e.g. set to 1400 to be safe on ethernet) and then use a large buffer in send system call.
This way there can be less system calls which are expensive.
Kernel will split the data to match MSS.
This way you can avoid truncated data and your application doesn't have to worry about small buffers.
It seems most web sites out on the internet use 1460 bytes for the value of MTU. Sometimes it's 1452 and if you are on a VPN it will drop even more for the IPSec headers.
The default window size varies quite a bit up to a max of 65535 bytes. I use http://tcpcheck.com to look at my own source IP values and to check what other Internet vendors are using.
The packet size for a TCP setting in IP protocol(Ip4). For this field(TL), 16 bits are allocated, accordingly the max size of packet is 65535 bytes: IP protocol details

Resources