I am seeing in the network traces, that my client is sending the following ACK frame to the DC:
Frame: Number = 87641, Captured Frame Length = 54, MediaType = ETHERNET
+ Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[AA-AA-AA-AA-AA-AA],SourceAddress:[AA-AA-AA-AA-AA-AA]
+ Ipv4: Src = XXX.XXX.XXX.XXX(CLIENT), Dest = YYY.YYY.YYY.YYY, Next Protocol = TCP, Packet ID = 26829, Total IP Length = 40
- Tcp: [Bad CheckSum]Flags=...A...., SrcPort=14229, DstPort=LDAP(389), PayloadLen=0, Seq=4175359382, Ack=1985975784, Win=1357
- SrcPort: 14229
- DstPort: LDAP(389)
- SequenceNumber: 4175359382 (0xF8DEED96)
- AcknowledgementNumber: 1985975784 (0x765F95E8)
+ DataOffset: 80 (0x50)
+ Flags: ...A....
- Window: 1357
- Checksum: 0xDEAD, Bad
- UrgentPointer: 0 (0x0)
Is the Checksum 0xDEAD an indication of something?
TCP packet checksums have no special meaning, even if their hex representations spell an English word. If the packets with a bad check-sum are a common occurrence, you should investigate the source of the packets... often running TCP offload on your NIC will break the checksums passed to the sniffer (if TCP Offload is on the NIC doing the sniffing)... however, as a general rule, bad TCP checksums could also indicate a faulty driver, faulty software, or possibly malicious activity...
As a side note, faulty ethernet frame checksums may indicate a bad NIC, bad cable, or possible malicious activity as well.
Related
I am a student just beginning to dive into the network stack, so please forgive any misconceptions.
If you are using TCP as a transport layer protocol, and sending payloads that happen to be greater than the MTU lower down in the stack, the Network Layer protocol (IPv4 for example) will fragment the payload into separate IP datagrams. What happens if a single one of these "fragmented" IP datagrams is dropped?
Example
Sending 4000 byte datagram with MTU of 1500 bytes
[MF: 1 Offset: 0 Length: 1500]
[MF: 1 Offset: 185 Length: 1500] <--------- Dropped/Lost
[MF: 0 Offset: 370 Length: 1040]
Which would happen?
Only IP datagram #2 get re-transmitted
The entire 4000 byte datagram be fragmented again and entirely re-transmitted after TCP determines the datagram was lost
I was originally thinking that the individual fragment would be re-transmitted, however given that this is below the reliability ensured by TCP in the transport layer, I am now thinking that the entire TCP datagram is just dropped/re-transmitted (which seems inefficient).
As a follow up, is there any reason that this "fragmentation" occurs below the transport layer (specifically below TCP)? I feel like this could be handled most of the time by ensuring that packets are broken up into a generally acceptable MTU size above/at the transport layer. Upon some research I found some information on how TCP should have nothing to do with packetizing things, which I did not quite understand.
The entire 4000 byte datagram be fragmented again and entirely re-transmitted after TCP determines the datagram was lost. Although TCP should break it up into MTU fit-able chunks (aka segments) in the first place.
I'm working on a server/client socket application that is using Linux TUN interface.
Server gets packets directly from TUN interface and pass them to clients and clients put received packets directly in the TUN interface.
<Server_TUN---><---Server---><---Clients---><---Client_TUN--->
Sometimes the packets from Server_TUN need to be fragmented in IP layer before transmitting to a client.
So at the server I read a packet from TUN, start fragmenting it in the IP layer and send them via socket to clients.
When the fragmentation logic was implemented, the solution did not work well.
After starting Wireshark on Client_TUN I noticed for all incoming fragmented packets I get TCP Checksum error.
At the given screenshot, frame number 154 is claimed to be reassembled in in 155.
But TCP checksum is claimed to be incorrect!
At server side, I keep tcp data intact and for the given example, while you see the reverse in Wireshark, I've split a packet with 1452 bytes (including IP header) and 30 bytes (Including IP header)
I've also checked the TCP checksum value at the server and its exactly is 0x935e and while I did not think that Checksum offloading matters for incoming packets, I checked offloading at the client and it was off.
$ sudo ethtool -k tun0 | grep ": on"
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: on
generic-segmentation-offload: on
generic-receive-offload: on
tx-vlan-offload: on
tx-vlan-stag-hw-insert: on
Despite that, because of the solution is not working now, I don't think its caused by offload effect.
Do you have any idea why TCP checksum could be incorrect for fragmented packets?
Hopefully I found the issue. It was my mistake. Some tcp data was missing when I was coping buffers. I was tracing on the indexes and lengths but because of the changes in data, checksum value was calculating differently in the client side.
I'm injecting ICMP "Fragmentation needed, DF bit set" into the server and ideally server should start sending packets with the size mentioned in the field 'next-hop MTU' in ICMP. But this is not working.
Here is the server code:
#!/usr/bin/env python
import socket # Import socket module
import time
import os
range= [1,2,3,4,5,6,7,8,9]
s = socket.socket() # Create a socket object
host = '192.168.0.17' # Get local machine name
port = 12349 # Reserve a port for your service.
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind((host, port)) # Bind to the port
rand_string = os.urandom(1600)
s.listen(5) # Now wait for client connection.
while True:
c, addr = s.accept() # Establish connection with client.
print 'Got connection from', addr
for i in range:
c.sendall(rand_string)
time.sleep(5)
c.close()
Here is the client code:
#!/usr/bin/python # This is client.py file
import socket # Import socket module
s = socket.socket() # Create a socket object
host = '192.168.0.17' # Get local machine name
port = 12348 # Reserve a port for your service.
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.connect((host, port))
while 1:
print s.recv(1024)
s.close()
Scapy to inject ICMP:
###[ IP ]###
version= 4
ihl= None
tos= 0x0
len= None
id= 1
flags= DF
frag= 0
ttl= 64
proto= ip
chksum= None
src= 192.168.0.45
dst= 192.168.0.17
\options\
###[ ICMP ]###
type= dest-unreach
code= fragmentation-needed
chksum= None
unused= 1300
Send(ip/icmp)
Unused field shows as next-hop MTU in wireshark. Is server smart enough to check that DF Bit was not set when it was communicating with client and it is still receiving ICMP "Fragmentation needed, DF bit set" message? If it is not then why is server not reducing its packet size from 1500 to 1300?
First of all, let's answer your first question (is ICMP sent over TCP?).
ICMP runs directly over IP, as specified in RFC 792:
ICMP messages are sent using the basic IP header.
This can be a bit confusing as ICMP is classified as a network layer protocol rather than a transport layer protocol but it makes sense when taking into account that it's merely an addition to IP to carry error, routing and control messages and data. Thus, it can't rely on the TCP layer to transfer itself since the TCP layer depends on the IP layer which ICMP helps to manage and troubleshoot.
Now, let's deal with your second question (How does TCP come to know about the MTU if ICMP isn't sent over TCP?). I've tried to answer this question to the best of my understanding, with reliance on official specifications, but perhaps the best approach would be to analyze some open source network stack implementation in order to see what's really going on...
The TCP layer may come to know of the path's MTU value even though the ICMP message is not layered upon TCP. It's up to the implementation of OS the network stack to notify the TCP layer of the MTU so it can then use this value to update its MSS value.
RFC 1122 requires that the ICMP message includes the IP header as well as the first 8 bytes of the problematic datagram that triggered that ICMP message:
Every ICMP error message includes the Internet header and at least the first 8 data octets of the datagram that triggered the error; more than 8 octets MAY be sent; this header and data MUST be unchanged from the received datagram.
In those cases where the Internet layer is required to pass an ICMP error message to the transport layer, the IP protocol number MUST be extracted from the original header and used to select the appropriate transport protocol entity to handle the error.
This illustrates how the OS can pinpoint the TCP connection whose MSS should be updated, as these 8 bytes include the source and destination ports.
RFC 1122 also states that there MUST be a mechanism by which the transport layer can learn the maximum transport-layer message size that may be sent for a given {source, destination, TOS} triplet. Therefore, I assume that once an ICMP Fragmentation needed and DF set error message is received, the MTU value is somehow made available to the TCP layer that can use it to update its MSS value.
Furthermore, I think that the application layer that instantiated the TCP connection and taking use of it may handle such messages as well and fragment the packets at a higher level. The application may open a socket that expects ICMP messages and act accordingly when such are received. However, fragmenting packets at the application layer is totally transparent to the TCP & IP layers. Note that most applications would allow the TCP & IP layers to handle this situation by themselves.
However, once an ICMP Fragmentation needed and DF set error message is received by a host, its behavior as dictated by the lower layers is not conclusive.
RFC 5927, section 2.2 refers to RFC 1122, section 4.2.3.9 which states that TCP should abort the connection when an ICMP Fragmentation needed and DF set error message is passed up from the IP layer, since it signifies a hard error condition. The RFC states that the host should implement this behavior, but it is not a must (section 4.2.5). This RFC also states in section 3.2.2.1 that a Destination Unreachable message that is received MUST be reported to the TCP layer. Implementing both of these would result in the destruction of a TCP connection when an ICMP Fragmentation needed and DF set error message is received on that connection, which doesn't make any sense, and is clearly not the desired behavior.
On the other hand, RFC 1191 states this in regard to the required behavior:
RFC 1191 does not outline a specific behavior that is expected from the sending
host, because different applications may have different requirements, and
different implementation architectures may favor different strategies [This
leaves a room for this method-OA].
The only required behavior is that a host must attempt to avoid sending more
messages with the same PMTU value in the near future. A host can either
cease setting the Don't Fragment bit in the IP header (and allow
fragmentation by the routers in the way) or reduce the datagram size. The
better strategy would be to lower the message size because fragmentation
will cause more traffic and consume more Internet resources.
For conclusion, I think that the specification is not definitive in regard to the required behavior from a host upon receipt of an ICMP Fragmentation needed and DF set error message. My guess is that both layers (IP & TCP) are notified of the message in order to update their MTU & MSS values, respectively and that one of them takes upon the responsibility of retransmitting the problematic packet in smaller chunks.
Lastly, regarding your implementation, I think that for full compliance with RFC 1122, you should update the ICMP message to include the IP header of the problematic packet, as well as its next 8 bytes (though you may include more than just the first 8 bytes). Moreover, you should verify that the ICMP message is received before the corresponding ACK for the packet to which that ICMP message refers. In fact, just in order to be on the safe side, I would abolish that ACK altogether.
Here is a sample implementation of how the ICMP message should be built. If sending the ICMP message as a response to one of the TCP packets fails, I suggest you try sending the ICMP message before even receiving the TCP packet to which it relates at first, in order to assure it is received before the ACK. Only if that fails as well, try abolishing the ACK altogether.
The way i understand it, the host receives a "ICMP Fragmentation needed and DF set" but the message can come from a intermediate device(router) in the path, thus the host cant directly matched the icmp response with a current session, the icmp only contains the destination ip and mtu limit.
The host then adds a entry to the routing table for the destination ip that records the route and mtu with a expiry of 10min.
This can be observed on linux by asking for the specific route with ip route get x.x.x.x after doing a tracepath or ping that triggers the icmp response.
$ ip route get 10.x.y.z
10.z.y.z via 10.a.b.1 dev eth0 src 10.a.b.100
cache expires 598sec mtu 1300
I have problem with identifying communication established by TCP.
I have to identify first completed communication, for example first complete http communication.
I have dump .pcap file with capture. I know that communication should start by three way handshake ( SYN, SYN - ACK, ACK ) and then closing of communication by double FIN flag from both side.
But I have a lot of communication in that dump file.
So here is the question. Which things i need to remember to match exact one communication ?
I thought about source IP, destination IP, protocol, maybe port but i am not sure.
Thank you for every advice.
And sorry for my english.
You stated that you need:
To identify a particular conversation
To identify the first completed conversation
You can identify a particular TCP or UDP conversation by filtering for
the 5-tuple of the connection:
Source IP
Source Port
Destination IP
Destination Port
Transport (TCP or UDP)
As Shane mentioned, this is protocol dependent e.g. ICMP does not have the concept of
ports like TCP and UDP do.
A libpcap filter like the following would work for TCP and UDP:
tcp and host 1.1.1.1 and port 53523 and dst ip 1.1.1.2 and port 80
Apply it with tcpdump:
$ tcpdump -nnr myfile.pcap 'tcp and host 1.1.1.1 and port 53523 and dst ip 1.1.1.2 and port 80'
To identify the first completed connection you will have to follow the timestamps.
Using a tool like Bro to read a PCAP would yield the answer as it will list each connection
attempt seen (complete or incomplete):
$ bro -r myfile.pcap
$ bro-cut -d < conn.log | head -1
2014-03-14T10:00:09-0500 CPnl844qkZabYchIL7 1.1.1.1 57596 1.1.1.2 80 tcp http 0.271392 248 7775 SF F ShADadfF 14 1240 20 16606 (empty) US US
Use the flag data for TCP to judge whether there was a successful handshake and tear down.
For other protocols you can make judgements based on byte counts, sent and received.
Identifying the first completed communication is highly protocol specific. You are on the right track with your filters. If your protocol is a commonly used one there are plug ins called protocol analyzers and filters that can locate "conversations" for you from a pcap data stream. If you know approximate start time and end time that would help narrow it down too.
I've been doing some work with C# Networking using UDP. I'm getting on fine but need the answer to a couple of fundamental questions I'm having problems testing:
Currently I'm sending data in a ~16000 byte datagrams, which according to wireshark is getting split into several 1500 byte packets (because of max packet size limits) and then reassembled at the other end.
Am I right in understanding the datagram will be received complete at the other end OR not at all. IE it's an all or nothing thing. There is no chance of ending up with a fragmented datagram due to packet loss?
Therefore, I only need to ACK per datagram, rather than ensuring my datagrams are < 1500 bytes and ACK each one?
I've looked in a lot of places but there seems to be a lot of confusion between the differences between datagrams and the underlying packets...
Thanks for you help!
There is no chance of ending up with a fragmented datagram due to packet loss?
I believe that's true: that fragmentation and fragment reassembly is handled by the protocol layer below UDP, i.e. that it's handled by the "IP" layer, which will error if it fails to reassemble the packet-fragments into a datagram (for example, search for "fragment" in RFC 792).
http://www.pcvr.nl/tcpip/udp_user.htm#11_5 says,
"The IP layer at the destination performs the reassembly. The goal is to make fragmentation and reassembly transparent to the transport layer (TCP and UDP), which it is, except for possible performance degradation."
As you may now 16 bit UDP length field indicates that you can send a total of 65535 bytes. However, the data can be theoretically (sizeof(IP Header) + sizeof(UDP Header)) = 65535-(20+8) = 65507 bytes.
But this does not mean that all applications that are using UDP will send this amount of data as an example DNS packets limits to 512 bytes. This is because you don't get any ACK packets from server. This is one reason that packets may get lost in the network (packet transmission problems and loss). Secondly intermediate nodes may encapsulate datagrams inside of another protocol, as an example IPSEC or other protocols do that.
For UDP there is no ACK packets, so in your case if underlying application uses UDP you should not see any ACK packets. Secondly, some of the server limit their sizes to the max UDP packets depending on the application, so if you have data transfer from client to server you should see same bytes e.g 512 bytes. going and coming back in wireshark. Mostly, source makes the request and destination sends X bytes UDP datagrams back.
These links may be good for your questions:
Wireshark UDP analysis
RFC 1122 (states that 576 is the minimum maximum reassembly buffer size)
Am I right in understanding the datagram will be received complete at the other end OR not at all. IE it's an all or nothing thing. There is no chance of ending up with a fragmented datagram due to packet loss?
That is correct.
Therefore, I only need to ACK per datagram, rather than ensuring my datagrams are < 1500 bytes and ACK each one?
I don't understand this question. You need to ACK each datagram regardless of its size, and you should make them < 1500 bytes so they won't get fragmented. Otherwise you may never be able to transmit any specific datagrams at all, if it repeatedly gets fragmented and a fragment repeatedly gets lost.