Packet loss issue - networking

Packet loss issue - networking

My internet download speed is 8 Mbps, upload speed is 17.6 Mbps ping is 2 ms & jitter is also 2 ms. The problem is when I run the packet loss check by running this command: ping -n 100 8.8.8.8 it gives 100% packet loss but it should be under or equals 1%. What should I do to make it under 1% ?

Porbably ICMP (which ping uses by default, ICMP ECHO_REQUEST (type 8) to be specific) is blocked by a firewall, therefore you have a 100% packet loss

Related

How to debug silently dropped packets from Linux 7 or NIC driver?

Our realtime application is detecting lost/dropped packets during network spikes.
Our technology uses UDP multicast where all our consumers subscribe to the multicast groups. Our servers have SolarFlare 40G 7142 cards. We've tuned the servers to have 128M ipv4 send and receive buffers, increased the reassembly memory, and more. During network spikes, we see increased "packet reassemblies failed" and "packets dropped after timeout" from netstat -s . All other network statistics from the NIC and kernel look clean (0 discards, 0 bad bytes, 0 bad headers, etc). Given the NIC statistics, we aren't sure why we are experiencing packet reassembly failures.
We have packet captures that capture the entire network as well as a packet capture on a mirror port of a test consumer server. We have verified that the packet captures do NOT show lost packets.
We suspect packets are getting silently dropped between the NIC and application level. Are there ways to get additional statistics from the NIC or the kernel that aren't reported by ethtool or netstat? I noticed that SolarFlare has the SolarCapture tool to perform packet capture at the NIC level and bypass the kernel. That requires a license and I'm not sure we have that.
Setup
Servers:
OS: Oracle Linux 7.3
NIC: SolarFlare SFN7142Q 40 GB cards (driver 4.12.2.1014)
Topology:
Spine Switches connecting to multiple leaf switches (40G switches)
8x producer applications on 1 leaf switch - all connecting at 40G
Multiple consumer servers on other leaf switches
sysctl
net.ipv4.ipfrag_high_thresh = 4194304
net.ipv4.ipfrag_low_thresh = 3145728
net.ipv4.ipfrag_max_dist = 2048
net.ipv4.ipfrag_secret_interval = 600
net.ipv4.ipfrag_time = 30
net.ipv4.tcp_rmem = 16777216 16777216 16777216
net.ipv4.tcp_wmem = 16777216 16777216 16777216
net.ipv4.udp_mem = 3861918 5149227 7723836
net.ipv4.udp_rmem_min = 134217728
net.ipv4.udp_wmem_min = 134217728
net.core.netdev_max_backlog = 100000

why does TCP over VXLAN in mininet stop sending after switching tunnel?

topology
This is my experimental setup in Mininet. VM1 and VM2 are separate Virtualbox VM instances running on my computer connected by Bridged adapter, and S1 and S2 are connected with vxlan forwarding.
Then I used D-ITG on H1 and H2 to generate traffic. I send TCP traffic from H1 to H2 and use wireshark to capture. During a 10sec TCP flow, I used a python script that changes the tunnel id of the first rule on S1 from 100 to 200.
If the packet/sec rate and payload size is small enough, the TCP session does not seem to be affected, but when I start sending around 100 packet/sec each with payload of 64 bytes, TCP stop sending after receiving a dup ACK. Here is the wireshark capture:
wireshark1
wireshark2
On the link between H1 and S1 I received ICMP destination unreachable (fragmentation needed).
After the two errors, TCP stopped sending. I understand that the "previous segment not captured" is caused by the fact that when I alter the S1 routing table, there is some down time and packets are dropped by the switch. However, I don't understand why TCP does not initiate retransmission.
This does not happen if I reduce the packet rate or the payload to a smaller amount, or if I use UDP. Is this an issue with the TCP stack, or maybe D-ITG? Or maybe it is an issue with the sequence numbers? Is there a range where if very previous packets are not ACKed, they will not be retransmitted?
This problem has been bothering me for a while, so I hope someone here can maybe provide some clarification. Thanks a lot for reading XD.

I suspected it may be a problem with mininet NICs, so I tried to disable TCP fragmentation offload, and it worked much better. I suppose that the virtual NICs in mininet in a VM could not handle the large amount of traffic generated by D-ITG, so using TCP fragmentation offload can overload? the NIC and cause segmentation errors.
This is just my speculation, but disabling TSO did help my case. Additional input is welcomed!

Why "ping" is so big despite vast speed transmission?

"Ping" measures response delay of any host. And it use ICMP protocol and Network card should process it at hardware low level.
Why "ping" to some host is measured by 50 - 100 milliseconds, if speed internet of downloading file using TCP to same host is > 10 MiB.
If "ping" > 100 ms, then <10 packets per second can be transferred, and internet speed using TCP should be < 10 KiB.
Is "Ping" the logarithmic measurement ?
"Ping" should be less than 1 milli-second

How to set ssthresh value in tcp

I am trying to start TCP slow start congestion algoritham in my raspberry device. As it documented in RFC 2581, it needs to set ssthresh value greater than the congestion window (cwnd). So I have chnaged /sys/module/tcp_cubic/parameters# sudo nano initial_ssthresh value to 65000 and cwnd was set to 10 ( checked with ss -i). After this settings I tried to send big packet from raspberry of size 19000 bytes. According to slow start it first needs to send to the destination device 2 packtes and then 4, then 8 ..etc.
But its not happening at raspberry. it sending me 10 packtes. Did I do something worng ?. In this case How can i start slow start algoritham.
Thanks

When CWND is less than ssthresh, the connection is in slowstart. When the CWND becomes greater than the ssthresh, the connection goes into congestion avoidance.
What you're seeing is that newer versions of linux have the initial congestion window set to 10. Before it was the default setting, you could change your initial congestion window from 3 through an ip route command. I haven't tried it, but I'm guessing you can do the opposite here.
Long story short, your machine is doing slow start. It is just starting with a larger initial congestion window.

Why are MTU sizes checked by "netsh interface ip show subinterfaces" and "ping google.com -l 1472 -f" different?

I am checking the MTU size of a USB tethering connection, but I got different results with different commands...
By using "netsh interface ip show subinterfaces" I get the following results (Local Area Connection 8 is the tethering connection):
C:\Users\Chris>netsh interface ip show subinterfaces
MTU MediaSenseState Bytes In Bytes Out Interface
4294967295 1 0 1350760 Loopback Pseudo-Interface 1
1500 2 3756376356 10363121083 Wireless Network Connection
1500 5 0 0 Local Area Connection
1500 1 178477 238360 Local Area Connection 8
But by using "ping google.com -l 1472 -f" I got the following results:
C:\Users\Chris>ping google.com -l 1472 -f
Pinging google.com [216.58.220.142] with 1472 bytes of data:
Reply from 192.168.42.129: Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
The tethering connection is the only working connection when I check. If the MTU size is 1500 the ping method should work... I am a little bit confused here. Could anyone tell me what's the difference between the 2 method?

netsh shows the MTU of the interface itself. But your ping command sends packets through your interface into the wider world of the Internet. Somewhere along the path between your interface and google.com there is a router with a smaller MTU than 1472 bytes. This is called Path MTU.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex