Packet loss showing at point of entry onto network - what could cause? - networking

A traffic source (server) with a 1gigabit NIC is attached to a 1gigabit port of a Cisco switch.
I mirror this traffic (SPAN) to a separate gigabit port on the same switch and then capture this traffic on a high throughput capture device (riverbed shark).
Wireshark analysis of the capture shows that there is a degree of packet loss - around 0.1% of TCP segments are being lost (based on sequence number analysis).
Given that this is the first point on the network for this traffic, what can cause this loss?
The throughput is not anywhere near 1gigabit, there are no port errors (which might indicate a dodgy patch lead).
In Richard Stevens illustrated TCP book he makes mention of 'local congestion' - where the TCP stack is producing data at a rate faster than the underlying local queues can be emptied.
Could this be what I am seeing?
If so, is there a way to confirm it on an AIX box?
(Stevens example used the Linux 'tc' command for a ppp0 device to demonstrate drops at the lower level)

The lost can be anywhere along the network path.
If there is loss between two hosts, you should be seeing DUP ACKs. You need to see what side is sending the DUP ACKs. This would be the host that isn't receiving all the packets. ( When a packet is not seen, it will send a DUP ACK to ask for the packet again.)
There may be congestion somewhere else along the path. Look for output drops on interfaces. Or CRC erros .

Related

why does TCP over VXLAN in mininet stop sending after switching tunnel?

topology
This is my experimental setup in Mininet. VM1 and VM2 are separate Virtualbox VM instances running on my computer connected by Bridged adapter, and S1 and S2 are connected with vxlan forwarding.
Then I used D-ITG on H1 and H2 to generate traffic. I send TCP traffic from H1 to H2 and use wireshark to capture. During a 10sec TCP flow, I used a python script that changes the tunnel id of the first rule on S1 from 100 to 200.
If the packet/sec rate and payload size is small enough, the TCP session does not seem to be affected, but when I start sending around 100 packet/sec each with payload of 64 bytes, TCP stop sending after receiving a dup ACK. Here is the wireshark capture:
wireshark1
wireshark2
On the link between H1 and S1 I received ICMP destination unreachable (fragmentation needed).
After the two errors, TCP stopped sending. I understand that the "previous segment not captured" is caused by the fact that when I alter the S1 routing table, there is some down time and packets are dropped by the switch. However, I don't understand why TCP does not initiate retransmission.
This does not happen if I reduce the packet rate or the payload to a smaller amount, or if I use UDP. Is this an issue with the TCP stack, or maybe D-ITG? Or maybe it is an issue with the sequence numbers? Is there a range where if very previous packets are not ACKed, they will not be retransmitted?
This problem has been bothering me for a while, so I hope someone here can maybe provide some clarification. Thanks a lot for reading XD.
I suspected it may be a problem with mininet NICs, so I tried to disable TCP fragmentation offload, and it worked much better. I suppose that the virtual NICs in mininet in a VM could not handle the large amount of traffic generated by D-ITG, so using TCP fragmentation offload can overload? the NIC and cause segmentation errors.
This is just my speculation, but disabling TSO did help my case. Additional input is welcomed!

Detect faulty physical links with ping

I would have a question regarding physical problem detection in a link with ping.
If we have a fiber or cable which has a problem and generate some CRC errors on the frame (visible with switch or router interface statistics), it's possible all ping pass because of the default small icmp packet size and statistically fewer possibilities of error. First, can you confirm this ?
Also, my second question, if I ping with a large size like 65000 bytes, one ping will generate approximately 65000 / 1500(mtu) = 43 frames, as ip framgents, then the statistics to get packet loss (because normally if one ip fragment is lost the entire ip packet is lost) with large ping is clearly higher ? Is this assumption is true ?
The global question is, with large ping, could we easier detect a physical problem on a link ?
A link problem is a layer 1 or 2 problem. ping is a layer 3 tool, if you use it for diagnosis you might get completely unexpected results. Port counters are much more precise in diagnosing link problems.
That said, it's quite possible that packet loss for small ping packets is low while real traffic is impacted more severely.
In addition to cable problems - that you'll need to repair - and a statistically random loss of packets there are also some configuration problems that can lead to CRC errors.
Most common in 10/100 Mbit networks is a duplex mismatch where one side uses half-duplex (HDX) transmission with CSMA/CD while the other one uses full-duplex (FDX) - once real data is transmitted, the HDX side will detect collisions, late collisions and possibly jabber while the FDX side will detect FCS errors. Throughput is very low, put ping with its low bandwidth usually works.
Duplex mismatches happen most often when one side is forced to full duplex, thus deactivating auto-negotiation and the other side defaults to half duplex.

Capturing data packets in closed LAN

In my college lab, all the PCs are connected via a hub. I want to capture data packets using Wireshark, but it only displays the interface of my own PC. How can I capture the packets of other PCs?
I've tried all the interfaces, and I can't get it to work.
Odds are you're connected to a switch rather than a hub. The problem there is that only packets intended for your network card's hardware (MAC) address and broadcast packets will be sent to your PC. The switch remembers the hardware address of devices plugged into it and performs packet forwarding based on those addresses. This vastly increases the potential bandwidth of your network segment, but makes snooping on other traffic more difficult. You will need to perform what's called ARP cache poisoning. Basically you need to trick every other computer connected to the switch to send its traffic to you rather than its true destination. You will then need to forward those packets not actually for you onto the correct destination otherwise it will take down the entire segment you're on and people will get nosy.
This type of redirection is possible, but it seems like you'll need to do quite a bit more research and understand exactly what is going on before attempting it. To get started, look into the Address Resolution Protocol; understand what a "layer 2" switch is doing; find out how to inject and reroute packets on the network; think about the consequences of getting caught.
If you're serious about moving forward, check out http://www.admin-magazine.com/Articles/Arp-Cache-Poisoning-and-Packet-Sniffing for some starting tips.

Understanding [TCP ACKed unseen segment] [TCP Previous segment not captured]

We are doing some load testing on our servers and I'm using tshark to capture some data to a pcap file then using the wireshark GUI to see what errors or warnings are showing up by going to Analyze -> expert Info with my pcap loaded in..
I'm seeing various things that I'm not sure or do not completely understand yet..
Under Warnings I have:
779 Warnings for TCP: ACKed segment that wasn't captured (common at capture start)
446 TCP: Previous segment not captured (common at capture start)
An example is :
40292 0.000 xxx xxx TCP 90 [TCP ACKed unseen segment] [TCP Previous segment not captured] 11210 > 37586 [PSH, ACK] Seq=3812 Ack=28611 Win=768 Len=24 TSval=199317872 TSecr=4506547
We also ran the pcap file though a nice command that creates a command line column of data
command
tshark -i 1 -w file.pcap -c 500000
basically just saw a few things in the tcp.analysis.lost_segment column but not many..\
Anyone enlighten what might be going on? tshark not able to keep up with writing data, some other issue? False positive?
That very well may be a false positive. Like the warning message says, it is common for a capture to start in the middle of a tcp session. In those cases it does not have that information. If you are really missing acks then it is time to start looking upstream from your host for where they are disappearing. It is possible that tshark can not keep up with the data and so it is dropping some metrics. At the end of your capture it will tell you if the "kernel dropped packet" and how many. By default tshark disables dns lookup, tcpdump does not. If you use tcpdump you need to pass in the "-n" switch. If you are having a disk IO issue then you can do something like write to memory /dev/shm. BUT be careful because if your captures get very large then you can cause your machine to start swapping.
My bet is that you have some very long running tcp sessions and when you start your capture you are simply missing some parts of the tcp session due to that. Having said that, here are some of the things that I have seen cause duplicate/missing acks.
Switches - (very unlikely but sometimes they get in a sick state)
Routers - more likely than switches, but not much
Firewall - More likely than routers. Things to look for here are resource exhaustion (license, cpu, etc)
Client side filtering software - antivirus, malware detection etc.
Another cause of "TCP ACKed Unseen" is the number of packets that may get dropped in a capture. If I run an unfiltered capture for all traffic on a busy interface, I will sometimes see a large number of 'dropped' packets after stopping tshark.
On the last capture I did when I saw this, I had 2893204 packets captured, but once I hit Ctrl-C, I got a 87581 packets dropped message. Thats a 3% loss, so when wireshark opens the capture, its likely to be missing packets and report "unseen" packets.
As I mentioned, I captured a really busy interface with no capture filter, so tshark had to sort all packets, when I use a capture filter to remove some of the noise, I no longer get the error.
Acked Unseen sampleHi guys! Just some observations from what I just found in my capture:
On many occasions, the packet capture reports “ACKed segment that wasn't captured” on the client side, which alerts of the condition that the client PC has sent a data packet, the server acknowledges receipt of that packet, but the packet capture made on the client does not include the packet sent by the client
Initially, I thought it indicates a failure of the PC to record into the capture a packet it sends because “e.g., machine which is running Wireshark is slow” (https://osqa-ask.wireshark.org/questions/25593/tcp-previous-segment-not-captured-is-that-a-connectivity-issue)
However, then I noticed every time I see this “ACKed segment that wasn't captured” alert I can see a record of an “invalid” packet sent by the client PC
In the capture example above, frame 67795 sends an ACK for 10384
Even though wireshark reports Bogus IP length (0), frame 67795 is
reported to have length 13194
Frame 67800 sends an ACK for 23524
10384+13194 = 23578
23578 – 23524 = 54
54 is in fact length of the
Ethernet / IP / TCP headers (14 for Ethernt, 20 for IP, 20 for TCP)
So in fact, the frame 67796 does represent a large TCP packets (13194
bytes) which operating system tried to put on the wore
NIC driver will fragment it into smaller 1500 bytes pieces in order to transmit over the network
But Wireshark running on my PC fails to understand it is a valid packet and parse it. I believe Wireshark running on 2012 Windows
server reads these captures correctly
So after all, these “Bogus IP
length” and “ACKed segment that wasn't captured” alerts were in fact
false positives in my case
I just came across this and would like to share my observation of TCP ACKed unseen segment. In my case the client was trying to initiate connection on same source port and destination port it used previously and thus the server was confused and replied with old TCP SEQ number saying I havent seen this new packet

how can you count the number of packet losses in a file transfer?

One of my networks course projects has to do with 802.11 protocol.
Me and my parther thought about exploring the "hidden terminal" problem, simulating it.
We've set up a private network. We have 2 wireless terminals that will attempt to send a file
to a 3rd terminal that is connected to the router via ethernet. RTS/CTS will be disabled.
To compare results, we'd like to measure the number of packet collisions that occured during the transfer so as to conclude that is due to RTS being disabled.
We've read that it is imposible to measure packet collisions as it is basically noise. We'll have to make do with counting the packets that didnt recieve an "ACK". Basically, the number of retransmitions.
How can we do that?
I suggested that instead of sending a file, we could make the 2 wireless terminals ping the 3rd terminal continually. The ping feature automatically counts the ping packets that didnt recieve the "pong". Do you think its a viable approach?
Thank you very much.
No, you'll get incorrect results. Ping is an application, i.e. working at application (highest) level of the network. 802.11 protocol operates at MAC layer - there are at least 2 layers separating between ping and 802.11. Whatever retransmissions happen at MAC layer - they are hidden by the layers above it. You'll see failure in ping only if all the retransmissions initiated by lower levels have failed.
You need to work on the same level that you're investigating - in your case it's the MAC layer. You can use a sniffer (google for it) to get the statistics you want.

Resources