We are doing some load testing on our servers and I'm using tshark to capture some data to a pcap file then using the wireshark GUI to see what errors or warnings are showing up by going to Analyze -> expert Info with my pcap loaded in..
I'm seeing various things that I'm not sure or do not completely understand yet..
Under Warnings I have:
779 Warnings for TCP: ACKed segment that wasn't captured (common at capture start)
446 TCP: Previous segment not captured (common at capture start)
An example is :
40292 0.000 xxx xxx TCP 90 [TCP ACKed unseen segment] [TCP Previous segment not captured] 11210 > 37586 [PSH, ACK] Seq=3812 Ack=28611 Win=768 Len=24 TSval=199317872 TSecr=4506547
We also ran the pcap file though a nice command that creates a command line column of data
command
tshark -i 1 -w file.pcap -c 500000
basically just saw a few things in the tcp.analysis.lost_segment column but not many..\
Anyone enlighten what might be going on? tshark not able to keep up with writing data, some other issue? False positive?
That very well may be a false positive. Like the warning message says, it is common for a capture to start in the middle of a tcp session. In those cases it does not have that information. If you are really missing acks then it is time to start looking upstream from your host for where they are disappearing. It is possible that tshark can not keep up with the data and so it is dropping some metrics. At the end of your capture it will tell you if the "kernel dropped packet" and how many. By default tshark disables dns lookup, tcpdump does not. If you use tcpdump you need to pass in the "-n" switch. If you are having a disk IO issue then you can do something like write to memory /dev/shm. BUT be careful because if your captures get very large then you can cause your machine to start swapping.
My bet is that you have some very long running tcp sessions and when you start your capture you are simply missing some parts of the tcp session due to that. Having said that, here are some of the things that I have seen cause duplicate/missing acks.
Switches - (very unlikely but sometimes they get in a sick state)
Routers - more likely than switches, but not much
Firewall - More likely than routers. Things to look for here are resource exhaustion (license, cpu, etc)
Client side filtering software - antivirus, malware detection etc.
Another cause of "TCP ACKed Unseen" is the number of packets that may get dropped in a capture. If I run an unfiltered capture for all traffic on a busy interface, I will sometimes see a large number of 'dropped' packets after stopping tshark.
On the last capture I did when I saw this, I had 2893204 packets captured, but once I hit Ctrl-C, I got a 87581 packets dropped message. Thats a 3% loss, so when wireshark opens the capture, its likely to be missing packets and report "unseen" packets.
As I mentioned, I captured a really busy interface with no capture filter, so tshark had to sort all packets, when I use a capture filter to remove some of the noise, I no longer get the error.
Acked Unseen sampleHi guys! Just some observations from what I just found in my capture:
On many occasions, the packet capture reports “ACKed segment that wasn't captured” on the client side, which alerts of the condition that the client PC has sent a data packet, the server acknowledges receipt of that packet, but the packet capture made on the client does not include the packet sent by the client
Initially, I thought it indicates a failure of the PC to record into the capture a packet it sends because “e.g., machine which is running Wireshark is slow” (https://osqa-ask.wireshark.org/questions/25593/tcp-previous-segment-not-captured-is-that-a-connectivity-issue)
However, then I noticed every time I see this “ACKed segment that wasn't captured” alert I can see a record of an “invalid” packet sent by the client PC
In the capture example above, frame 67795 sends an ACK for 10384
Even though wireshark reports Bogus IP length (0), frame 67795 is
reported to have length 13194
Frame 67800 sends an ACK for 23524
10384+13194 = 23578
23578 – 23524 = 54
54 is in fact length of the
Ethernet / IP / TCP headers (14 for Ethernt, 20 for IP, 20 for TCP)
So in fact, the frame 67796 does represent a large TCP packets (13194
bytes) which operating system tried to put on the wore
NIC driver will fragment it into smaller 1500 bytes pieces in order to transmit over the network
But Wireshark running on my PC fails to understand it is a valid packet and parse it. I believe Wireshark running on 2012 Windows
server reads these captures correctly
So after all, these “Bogus IP
length” and “ACKed segment that wasn't captured” alerts were in fact
false positives in my case
I just came across this and would like to share my observation of TCP ACKed unseen segment. In my case the client was trying to initiate connection on same source port and destination port it used previously and thus the server was confused and replied with old TCP SEQ number saying I havent seen this new packet
Related
This question was posted on StackExchange - NetworkEngineering. People suggested me to post it here. Here is my original post.
I am trying to write a client that initiates a TCP connection and sends some data. On the server side, I am using nc that listens to a certain port. Now I am able to complete the 3-way handshaking. netstats shows that the connection is established. However, after my client starts sending data, it never gets an ACK.
The client is implemented on top of DPDK, and thus bypasses the kernel stack. The server binds to a different NIC. The two NICs are directly connected. The TCP part is handled by my own code. Due to the lack of knowledge, the implementation is greatly simplified in the sense that I set a lot of fields to some fixed numbers, such as the window size.
As a newbie in networking, I have no clue what could go wrong, and thus not sure what information I should provide to help you identify the problem. Here is a screenshot from wireshark. My client is at 192.168.0.10:12345 and my server listens at 192.168.0.42:3456. No ACK is sent from the server side after packet 6.
Also, the reason for the incorrect FCS is that I had to pad zeros to the SYN and the first ACK packets, so that they are at least 64 bytes, which is a requirement from the client NIC.
I did a comparison between packets from my client and packets from nc client. It seems that for the first data packet, the only real difference is that mine does not have any TCP options, while the nc one carries a timestamp. Could this be the problem?
Please let me know if you spot anything that may cause this no-ACK issue.
Do long BPF filters slow down tcpdump?
I replay a packet trace where all the packets have ttl=k and wait for ICMP messages back. What I've been noticing is that if I use the following filter (on eth0):
(ip and ip[8]=$k and src host $myAddress) or (icmp and dst host $myAddress and icmp[0]=11)
...I always miss 20-30 packets among the sent packets, whereas if I just do:
ip
... and then do the exact above filtering offline on the capture file, I find all the packets I had sent.
Is this a known behaviour?
If tcpdump is not fast enough to pop out captured packets from the queue, the kernel could drop some of them.
Look at the "XXXX packets dropped by kernel" message at the end of the dump to see if effectively some of them is lost.
Ensure to add the -n option to the command line. This will avoid DNS resolving and it will speed up a little (depending on your network)
A traffic source (server) with a 1gigabit NIC is attached to a 1gigabit port of a Cisco switch.
I mirror this traffic (SPAN) to a separate gigabit port on the same switch and then capture this traffic on a high throughput capture device (riverbed shark).
Wireshark analysis of the capture shows that there is a degree of packet loss - around 0.1% of TCP segments are being lost (based on sequence number analysis).
Given that this is the first point on the network for this traffic, what can cause this loss?
The throughput is not anywhere near 1gigabit, there are no port errors (which might indicate a dodgy patch lead).
In Richard Stevens illustrated TCP book he makes mention of 'local congestion' - where the TCP stack is producing data at a rate faster than the underlying local queues can be emptied.
Could this be what I am seeing?
If so, is there a way to confirm it on an AIX box?
(Stevens example used the Linux 'tc' command for a ppp0 device to demonstrate drops at the lower level)
The lost can be anywhere along the network path.
If there is loss between two hosts, you should be seeing DUP ACKs. You need to see what side is sending the DUP ACKs. This would be the host that isn't receiving all the packets. ( When a packet is not seen, it will send a DUP ACK to ask for the packet again.)
There may be congestion somewhere else along the path. Look for output drops on interfaces. Or CRC erros .
PSH is a way to send data via TCP. Besides that, I can find very little info on how to implement it properly.
Here is what interests me:
Let's say, server window is 8000 bytes, and I send 2 requests with 150 and 600 bytes. Do I get some sort of confirmation that the data has been received? Can I somehow trigger a confirmacion?
I've seen some ACK packets, which does not contain PSH but do contain some sort of payload data (Wireshark marks it as "TCP segment data"). Is this data passed on to user, and if it is, why do we need PSH flag?
TCP PSH generally doesn't 'work' at all. Berkely-derived TCP implementations completely ignore it.
Source: W.R. Stevens, TCP/IP Illustrated, vol I: 20.5 PUSH Flag.
#Arsen: Answering to the second part of your question "why do we need PSH flag?"
The PSH flag in the TCP header informs the receiving host that the data should be pushed up to the receiving application immediately.
We are using PSH flag to exchange Time Stamp value between two servers.
i am assuming , if we set push flag , packet wont wait in receive buffer , it will directly send to receiver.
The data doesn't sit waiting in the receive buffer anyhow.
TCP apps must go out of their way to have the TCP layer bulk up a few packets and deliver full data buffers.
In fact its somewhat frustrating to see applications allocate 64KB buffers to receive data and see them getting a gazillion 1480/1472 byte messages.
What could be good list of failure scenaros for testing a reliable UDP layer? I have thought of the below cases:
Drop Data packets
Drop ACK, NAK Packets
Send packets in out of sequence.
Drop intial hand shaking packets
Drop close / shutdown packets
Duplicate packets
Please help in identifying other cases that reliable UDP needs to handle?
The list you've given sounds pretty good. Also think about:
Very delayed packets (where most packets come through fine, but one or two are delayed by several minutes);
Very delayed duplicates (where the original came through quickly, but the duplicate arrived after several minutes delay);
Silent dropping of all packets above a certain size (both unidirectional and bidirectional cases);
Highly variable delays;
Sequence number wrapping tests.
Have you tried intentionally corrupting packets in transit?
Also, have you considered a scenario where only one-way communication is possible? In this case, the sending host thinks that the send failed, but the receiving end successfully processes the message. For instance:
host A sends a message to host B
B successfully receives message and replies with ACK
ACK gets dropped in the network
A waits for timeout and re-sends message (repeats steps 1-3)
host A exceeds retry count and thinks the send failed, but host B has in fact processed the message
I have thought UDP is a connectionless and unreliable protocol and that is does not require and specific transport handshake between hosts. And hence there is no such thing as a reliable UDP protocol.