Receiving TCP segments bigger than MTU with libpcap - http

Hello fellow network adventurers,
I'm implementing a network attack, which ARP spoofs a gateway and a victim, filters the HTTP data and reassemble the web pages in my browser. Also known as webspy.
However, I'm having some issues with libpcap. When I receive the packets with TCP segments contaning the HTTP data, some of them are bigger than MTU! Like 1922, 2878 and even 4909 bytes.
At first, I thought that these were reassembled HTTP packets, given by the kernel. But, according with this post, libpcap doesn't reassemble packets, so it won't bring me a entire, well-formet packet with all the HTTP response from a given request.
For testing, I printed all these packet which are bigger than MTU. All of them contained normal data (CSS, JS, HTML, images, ...).
So what the hell is going on? What are these big guys? I'm struggling with this for a few days.
BONUS QUESTION: Do I'll really need to reassemble by myself all these HTTP data?

However, I'm having some issues with libpcap. When I receive the packets with TCP segments contaning the HTTP data, some of them are bigger than MTU! Like 1922, 2878 and even 4909 bytes.
Your network adapter may be acting as a TCP offload engine, reassembling multiple incoming TCP segments and handing one reassembled segment to the host. At least on Linux, the networking stack might be performing Large Receive Offload, and if that's done before handing packets to "taps" (the PF_PACKET sockets used by libpcap on Linux), you'd get the reassembled segments.
For your program, this shouldn't be an issue, given that...
Do I'll really need to reassemble by myself all these HTTP data?
...you will need to reassemble all the components of an HTTP request or reply yourself.

Related

Reassembly of segments

I am working on an application that intercepts various kinds of traffic. Recently I have been receiving out-of-order segments. This traffic is over TCP. The SIP header is among multiple segments. I am trying to understand a protocol to be followed to reassemble packets that arrive out of order to be able to display them in my application. To clarify the data is segmented by TCP. By receiving out of order, I mean:
SIP INVITE header first half received later, second-half earlier.
TCP seq and ack are such that the segment received later is expected to be received first.
I would greatly appreciate any leads towards established protocols to implement this.
I suspect you may need to look deeper into your architecture as TCP is designed to deliver packets in order.
One thing in particular to check is whether you are using multiple TCP connections in some way, maybe to boost bandwidth - this could allow out of order delivery if different packets could take different TCP connections, but within a TCP connection delivery should still be in order.

UDP numbered segments?

My firewall textbook says: "UDP breaks a message into numbered segments so that it can be transmitted."
My understanding was UDP had no sequence or other numbering scheme? That data was broken into packets and sent out with no ordered reconstruction on the other end, at least on this level. Am I missing something?
The book is just wrong here. The relevant section says:
User Datagram Protocol (UDP)—This protocol is similar to TCP in that it handles the addressing of a message. UDP breaks a message into numbered segments so that it can be transmitted. It then reassembles the message when it reaches the destination computer.
UDP does not include any mechanism to segment or reassemble messages; each message is sent as a single UDP datagram. If you look at the UDP "packet" (technically datagram) structure on page 108, there's no segment number or anything like that.
Mind you, segmentation can happen at other layers, either above or below UDP:
IP packets can be fragmented if they're too big for a network link's MTU (maximum transfer unit). This can happen to IP packets that contain UDP, TCP, or whatever. This is actually relevant for firewalls because creative fragmentation can sometimes be used to bypass packet filtering rules.
Some protocols that run on top of UDP also use something like numbered segments. For example, TFTP (trivial file transfer protocol) breaks files into "blocks", and transmits a block number in the header for each block. (And the receiver responds acknowledging the block number it's received -- it's like a drastically simplified version of TCP.) But this is part of the TFTP protocol, not part of UDP.
QUIC is another example of a protocol that runs over UDP and supports segmentation (and multiple connections, and...), and each packet contains a packet number. But again it's part of the QUIC protocol, not UDP.

UDP - Optional Checksum

From what I have read about UDP, it has no error handling, no checking for things like sequence of data sent/recieved, no checking for duplicate packets, no checking for corrupt packets and obviously no guarantee that the packets sent are even received...
So with that in mind, why an earth is there actually an option to use checksums in UDP?? Because surely if you want to make sure the data being sent is received in the correct order (and not corrupt and so on) then you would use TCP...
UDP packets include a field for a 16 bit CRC checksum which the receiving operating system will use to check for packet corruption. If the checksum is present and fails, then the packet will be silently discarded. It is up to the application to notice that the packet disappeared and take corrective action.
UDP checksums are enabled by default on all modern operating systems. It is possible to disable UDP checksums in IPv4, either at the socket or OS level. Doing so would reduce the CPU overhead of processing each packet at both the sender and receiver. This might be desirable if, for example, the application were calculating its own checksum separately. Without any checksum, there would be no guarantee that the bytes received are the same as the bytes sent.
The task of UDP is to transport datagrams, which are "network data packets". For UDP, every data packet is a transmission of its own. If you send 3 packets, those are three independent transmissions for UDP. Whether the content of these 3 packets somehow belongs together or if these are three individual requests (think of DNS requests, where every request is sent as an own UDP packet), UDP doesn't know and doesn't care. All that UDP guarantees is that a packet is either transmitted as a whole or not at all; either the entire packet arrives or the entire packet is lost, you will never see "half of a packet" arriving. So if you just want to send a bunch of data packets, you use UDP.
The task of TCP, on the other hand, is to transport a stream of data. It's not about packets. It's about a stream of bytes somehow making it from one host to another. How this happens, e.g. how TCP is breaking the data stream into chunks and sending these chunks over the network and ensuring that no data is lost and all data is in order, is up to TCP. All that TCP guarantees is that the bytes will arrive correctly and in order at the other side, unless the TCP connection is lost, in which case the stream ends abruptly somewhere in the middle but all data, that arrived up to that point, did arrive correctly and in correct order. So despite TCP also working with packets, the transmission behaves like a stream that has no internal "data units". When sending 80 bytes over TCP, there may be one packet with 80 bytes or 10 packets with each 8 bytes or anything in between, you cannot know and you don't have to.
But just because you use UDP doesn't mean you don't care for data corruption in UDP packets. Keep in mind that corruption may not just affect your data, it may also affect the UDP header itself. If only a single bit swaps, the UDP packets may have an incorrect destination port. So they added a checksum which ensures that neither the UDP header nor the data payload has been corrupted but made it optional, so it's up to you whether you want to use it or not. If used, corrupt packets are dropped and thus behave like lost packets. If your code takes care of lost packets, it will automatically take care of corrupt packets, too.
With IPv6 though, the checksum was dropped from the IP header, which means that IP header corruptions are no longer detected. But this was seen as a small problem, as most layer 2 protocols have their own mechanism to detect corrupt data (e.g. Ethernet and WiFi already guarantee that data is not corrupted on its way through the network) and the checksums of UDP/TCP also cover some of the IP header fields, so even without layer 2 error checking, the recipient would notice if the IP addresses in the header have been corrupted along the way and drop the packet. As a consequence, the UDP checksum is no longer optional with IPv6.

UDP Networking Fundamentals

I've been doing some work with C# Networking using UDP. I'm getting on fine but need the answer to a couple of fundamental questions I'm having problems testing:
Currently I'm sending data in a ~16000 byte datagrams, which according to wireshark is getting split into several 1500 byte packets (because of max packet size limits) and then reassembled at the other end.
Am I right in understanding the datagram will be received complete at the other end OR not at all. IE it's an all or nothing thing. There is no chance of ending up with a fragmented datagram due to packet loss?
Therefore, I only need to ACK per datagram, rather than ensuring my datagrams are < 1500 bytes and ACK each one?
I've looked in a lot of places but there seems to be a lot of confusion between the differences between datagrams and the underlying packets...
Thanks for you help!
There is no chance of ending up with a fragmented datagram due to packet loss?
I believe that's true: that fragmentation and fragment reassembly is handled by the protocol layer below UDP, i.e. that it's handled by the "IP" layer, which will error if it fails to reassemble the packet-fragments into a datagram (for example, search for "fragment" in RFC 792).
http://www.pcvr.nl/tcpip/udp_user.htm#11_5 says,
"The IP layer at the destination performs the reassembly. The goal is to make fragmentation and reassembly transparent to the transport layer (TCP and UDP), which it is, except for possible performance degradation."
As you may now 16 bit UDP length field indicates that you can send a total of 65535 bytes. However, the data can be theoretically (sizeof(IP Header) + sizeof(UDP Header)) = 65535-(20+8) = 65507 bytes.
But this does not mean that all applications that are using UDP will send this amount of data as an example DNS packets limits to 512 bytes. This is because you don't get any ACK packets from server. This is one reason that packets may get lost in the network (packet transmission problems and loss). Secondly intermediate nodes may encapsulate datagrams inside of another protocol, as an example IPSEC or other protocols do that.
For UDP there is no ACK packets, so in your case if underlying application uses UDP you should not see any ACK packets. Secondly, some of the server limit their sizes to the max UDP packets depending on the application, so if you have data transfer from client to server you should see same bytes e.g 512 bytes. going and coming back in wireshark. Mostly, source makes the request and destination sends X bytes UDP datagrams back.
These links may be good for your questions:
Wireshark UDP analysis
RFC 1122 (states that 576 is the minimum maximum reassembly buffer size)
Am I right in understanding the datagram will be received complete at the other end OR not at all. IE it's an all or nothing thing. There is no chance of ending up with a fragmented datagram due to packet loss?
That is correct.
Therefore, I only need to ACK per datagram, rather than ensuring my datagrams are < 1500 bytes and ACK each one?
I don't understand this question. You need to ACK each datagram regardless of its size, and you should make them < 1500 bytes so they won't get fragmented. Otherwise you may never be able to transmit any specific datagrams at all, if it repeatedly gets fragmented and a fragment repeatedly gets lost.

how to reassemble tcp segment?

im now developing a project using winpcap..as i have known packets being sniffed are usually fragmented packets.
how to reassemble this TCP segements?..any ideas, suggestion or tutorials available?..
this i assume to be the only way i can view the HTTP header...
thanks!..
tcp is a byte stream protocol.
the sequence of bytes sent by your http application is encapsulated in tcp data segments and the byte stream is recreated before the data is delivered to the application on the other side.
since you are accessing the tcp datasegments using winpcap, you need to go to the data portion of the segment. the header of tcp has a fixed length of 20 bytes + an optional part which you need to determine using the winpcap api.
the length of data part in the tcp segment is determined by subtracting the tcp header length (obtained from a field in the tcp segment) and the ip header length (from a field in the ip datagram that encapsulates the tcp segment) from the total length (obtained from another field in the ip datagram).
so now you have the total segment length and the length of the data part within the segment. so you know offset where the http request data starts.
the offset is
total length-length of data part
or
length of ip-header + length of tcp header
i have not used winpcap. so you will have to find out how to get these fields using the api.
also ip datagrams may be further fragmented but i am expecting that you are provided only reassembled datagrams using this api. you are good to go!
There is no such thing as a TCP fragment. The IP protocol has fragments. TCP is a stream protocol. You can assemble the stream to its intended order by following the sequence numbers of both sides. Every TCP Packet goes to the IP level and can be fragmented there. You can assemble each packet by collecting all of the fragments and following the fragment offset from the header.
All of the information you need is in the headers. The wikipedia articles are quite useful in explaining what each field is
http://en.wikipedia.org/wiki/TCP_header#Packet_structure
http://en.wikipedia.org/wiki/IPv4#Header
PcapPlusPlus offers this capability out-of-the-box for all major OS's (including Windows). Please check out the TcpReassembly example to see a working code and the API documentation to understand how to use the TCP reassembly feature
Depending on the whose traffic you're attempting to passively reassemble, you may run into some TCP obfuscation techniques designed to confuse people trying to do exactly what you're trying to do. Check out this paper on different operating system reassembly behaviors.
libtins provides classes to perform TCP stream reassembly in a very high level way, so you don't have to worry about TCP internals to do so.

Resources