Understanding tshark output - networking

I am trying to understand the output of network data captured by tshark using the following command
sudo tshark -i any ‘tcp port 80’ -V -c 800 -R ‘http contains <filter__rgument>' > <desired_file_location>
Accordingly, I get some packets in output each starting with a line something like this:
Frame 5: 1843 bytes on wire (14744 bits), 1843 bytes captured (14744 bits) on interface 0
I have some basic questions regarding a packet:
Is a frame and a packet the same thing (used interchangeably)?
Does a packet logically represent 1 request (in my case HTTP request)? If not, can a request span across multiple packets, or can a packet contain multiple requests? A more basic question will be what does a packet represent?
I see a lot of information being captured in the request. Is there a way using tshark to just capture the http headers and http reqeust body? Basically, my motive of this whole exercise is to capture all these requests to replay them later.
Any pointers in order to answer these doubts will be really helpful.

You've asked several questions. Here are some answers.
Are frames and packets the same things?
No. Technically, when you are looking at network data and that data includes the Layer 2 frame header, you are looking at a frame. The IP packet inside of that frame is just data from Layer 2's point of view. When you look at the IP datagram (or strip off the frame header), you are now looking at a packet.
Ultimately, I tell people that you should know the difference and try to use the terms properly, but in practice it's not an extremely important distinction.
Does a packet represent a single request?
This really depends. With HTTP 1.0 and 1.1, you could look at it this way, though there's no reason that, if the client has a significant amount of POST data to send, the request can't span multiple packets. It is better to think of a single "connection" or "session" as a single request/response. (This is not necessarily strictly true with HTTP 1.1, but it is generally true)
With HTTP 2.0, this is by design not true. A single connection or session is used to handle multiple data streams (requests/responses).
How can I get at the request headers?
This is far too lengthy for me to answer here. The simplest thing to do, most likely, is to simply fire up WireShark, go into the filter bar and type "http." As soon as you hit the dot, you will see a list of all of the different sub-elements that you can look at. You can use these in tshark using the '-Y' option, and you can additionally specify columns that you would like to display (so you can add and remove columns, effectively).
An alternative way to see this information is to use the filter expression button to bring up the protocols selector. If you scroll down to HTTP, you can select it and then see all of the fields that are available.
When looking through these, realize that some of the fields are in the top-level rather than within request or response. For example, content-length appears as a field under http rather than http.request.content_length. This is because content-length is a field common to all requests and responses.

Related

Generating a multipart/byterange response without scanning the parts ahead of sending

I would like to generate a multipart byte range response. Is there a way for me to do it without scanning each segment I am about to send out, since I need to generate multipart boundary strings?
For example, I can have a user request a byterange that would have me fetch and scan 2GB of data, which in my case involves me loading that data into my (slow) VM as strings and so forth. Ideally I would like to simply state in the response that a part has a length of a certain number of bytes, and be done with it. Is there any tooling that could provide me with this option? I see that many developers just grab a UUID as the boundary and are probably willing to risk a tiny probability that it will appear somewhere within the part, but that risk seems to be small enough multiple people are taking it?
To explain in more detail: scanning the parts ahead of time (before generating the response) is not really feasible in my case since I need to fetch them via HTTP from an upstream service. This means that I effectively have to prefetch the entire part first to compute a non-matching multipart boundary, and only then can I splice that part into the response.
Assuming the data can be arbitrary, I don’t see how you could guarantee absence of collisions without scanning the data.
If the format of the data is very limited (like... base 64 encoded?), you may be able to pick a boundary that is known to be an illegal sequence of bytes in that format.
Even if your boundary does collide with the data, it must be followed by headers such as Content-Range, which is even more improbable, so the client is likely to treat it as an error rather than consume the wrong data.
Major Web servers use very simple strategies. Apache grabs 8 random bytes at startup and renders them in hexadecimal. nginx uses a sequential counter left-padded with zeroes.
UUIDs are designed to avoid collisions with other UUIDs, not with arbitrary data. A UUID is no more likely to be a good boundary than a completely random string of the same length. Moreover, some UUID variants include information that you may not want to disclose, such as your machine’s MAC address.
Ideally I would like to simply state in the response that a part has a length of a certain number of bytes, and be done with it. Is there any tooling that could provide me with this option?
Maybe you can avoid supporting multiple ranges and simply tell the clients to request each range separately. In that case, you don’t use the multipart format, so there is no problem.
If you do want to send multiple ranges in one response, then RFC 7233 requires the multipart format, which requires the boundary string.
You can, of course, invent your own mechanism instead of that of RFC 7233. In that case:
You cannot use 206 (Partial Content). You must use 200 (OK) or some other applicable status code.
You cannot use the multipart/byteranges media type. You must come up with your own media type.
You cannot use the Range request header.
Because a 200 (OK) response to a GET request is supposed to carry a (full) representation of the resource, you must do one of the following:
encode the requested ranges in the URL; or
use something like POST instead of GET; or
use a custom, non-standard status code instead of 200 (OK); or
(not sure if this is a correct approach) use media type parameters, send them in Accept, and add Accept to Vary.
The chunked transfer coding may be useful, but you cannot rely on it alone, because it is a property of the connection, not of the payload.

Dissector for TCP Option

I am new to writing dissectors in Lua and I had two quick questions. I have a packet which has the TCP Options as MSS, TCP SACK, TimeStamps, NOP, Window Scale, Unknown. I am basically trying to dissect the unknown section in the TCP Options field. I am aware that I will have to use the chained dissector.
The first question is while using the chained dissector to parse the TCP Options, do I have to parse all the Options from the beginning. For Example will I need to parse MSS, TCP SACK, .... and then finally parse Unknown section or is there any direct way for me to jump to the Unknown section.
The second question I have is I have seen the code for many custom protocol dissectors and if I need to dissect a protocol which follows (for example)TCP, then I will have to include the following:
-- load the tcp.port table
tcp_table = DissectorTable.get("tcp.port")
-- register our protocol to handle tcp port
tcp_table:add(port,myproto_tcp_proto)
My question is, is there anyway for me to jump to the middle of the protocol. For example in my case I want to parse TCP Options. Can I directly call tcp.options and the parser will start dissecting from where the options will start?
The TCP option is "uint8_t type; uint8_t len; uint8_t* data" structure.
I usually give common used ones a name. For example getSack(), getMss().
For others, keep them in an array(maximum size like 20).
For your second question, you mean you don't care about TCP header, right? If so, just move your pointer 20 bytes further to get access the TCP options.

How to manage multi-packet sends with gsocket?

I got a question regarding tcp/ip socket networking. Basically it is there: are there any parts of tcp/ip that I can leverage to help manage multi-packet sends. For example, I want to send a 100 mb binary file which would take something like 70-80 tcp packets. Meanwhile I have a relatively fast polling receive on the other side. Would my receive have to receive each packet it individually and "stitch" together the data packet by packet, looking for some size to be reached(it can look at the opcode and determine size) or is there some way to tell tcp to say "hey I'm sending 100 mb here, let them know when it is finished."
I am using glib's low level socket library (gsocket).
When using a binary encoding like, say protocol buffers, you would wrap the actual payload by inserting a header that would include the information necessary to decode the payload on the other end.
Say appending 8 bytes where the first 4 signify the type of the encoded message and the second four indicate the length of the entire message.
On the receiving side you are then reading this header, that's part of the payload, to determine the message type and length of the message. This lets you combine multiple messages in one payload or split messages across packets and reliably recombine them.

Split CRLF between TCP payloads

I'm currently writing a low-level HTTP parser and have run into the following issue:
I am receiving HTTP data on a packet-by-packet basis, i.e. TCP payloads one at a time. When parsing this data, I am using the HTTP protocol standards of searching for CRLF to delineate header lines, chunk data (in the case of chunked-encoding), and the dual CRLF to delineate header from body.
My question is: do I need to worry about the possibility of CRLF being split between two TCP packet payloads? For example, the HTTP header will finish with CRLFCRLF. Is it possible that two subsequent TCP packets will have CR, and then LFCRLF?
I am assuming that yes; this is a case to worry about, since the application (HTTP) and TCP layers are rather independent of each other.
Any insight into this would be highly appreciated, thank you!
Yes, it is possible that the CRLF gets split into different TCP packets. Just think about the possibility that a single HTTP header is exactly one byte longer than the TCP MTU. In that case, there is only room for the CR, but not for the NL.
So no matter how tricky your code will get, it must be able to handle this case of splitting.
What language are you working in? Does it not have some form of buffered read functionality for the socket, so you don't have this issue?
The short answer to your question is yes, theoretically you do have to worry about it, because it is possible the packets would arrive like that. It is very unlikely, because most HTTP endpoints will tend to send the header in one packet and the body in subsequent packets. This is less by convention and more by the nature of the way most socket-based programs/languages work.
One thing to bear in mind is that while the protocol standards are quite clear about the CRLF separation, many people who implement HTTP (clients in particular, but to some degree servers as well) don't know/care what they are doing and will not obey the rules. They will tend to separate lines with LF only - particularly the blank line between the head and the body, the number of code segments I have seen with this problem I could not count up to quickly. While this is technically a protocol violation, most servers/clients will accept this behaviour and work around it, so you will need to as well.
If you can't do some kind of buffered read functionality, there is some good news. All you need to do is read a packet at a time into memory and tag the data on to the previous packet(s). Every time you have read a packet, scan your data for a double CRLF sequence, if you don't find it, read the next packet, and so on until you find the end of the head. This will be relatively small memory usage, because the head of any request shouldn't ever be more than 5-6KB, which given an ethernet MTU of (averaging around) 1450 bytes means you shouldn't ever need to load more than 4 or 5 packets into memory to cope with it.

How can I use the Packet Structure from an RFC and apply it to my socket program?

Here's an example 'Packet Structure' image: http://freesoft.org/CIE/Course/Section3/7.htm
Lets say I had a small Python program that listened on X port and captured that packet and saved it to the variable 'data'.
How would I pull out the packet information from data? For example, say I wanted to read the 'version', is it just:
print data[0:4] ?
How would I get the Source IP Address?
I've been doing more socket coding lately and have ran into quite a few of these 'packet structure' images. I'm yet to figure out how to apply them to my code :/
Note that your example shows an IP header - if you are simply using sockets, you will not see this information (its already been digested by the system IP and TCP stacks).
If you want to capture raw data, look into using libpcap, which will allow raw packets. You can also use tcpdump to produce a file with raw packets.
As for structures, you can read the first 4 bytes if your data was a string with your command. You would likely want to encode the string as "hex" (or integers for the normal representation) or you will see "garbage" characters instead.
For more powerful unpacking, use the struct module which comes with python.

Resources