I am new to writing dissectors in Lua and I had two quick questions. I have a packet which has the TCP Options as MSS, TCP SACK, TimeStamps, NOP, Window Scale, Unknown. I am basically trying to dissect the unknown section in the TCP Options field. I am aware that I will have to use the chained dissector.
The first question is while using the chained dissector to parse the TCP Options, do I have to parse all the Options from the beginning. For Example will I need to parse MSS, TCP SACK, .... and then finally parse Unknown section or is there any direct way for me to jump to the Unknown section.
The second question I have is I have seen the code for many custom protocol dissectors and if I need to dissect a protocol which follows (for example)TCP, then I will have to include the following:
-- load the tcp.port table
tcp_table = DissectorTable.get("tcp.port")
-- register our protocol to handle tcp port
tcp_table:add(port,myproto_tcp_proto)
My question is, is there anyway for me to jump to the middle of the protocol. For example in my case I want to parse TCP Options. Can I directly call tcp.options and the parser will start dissecting from where the options will start?
The TCP option is "uint8_t type; uint8_t len; uint8_t* data" structure.
I usually give common used ones a name. For example getSack(), getMss().
For others, keep them in an array(maximum size like 20).
For your second question, you mean you don't care about TCP header, right? If so, just move your pointer 20 bytes further to get access the TCP options.
Related
I am trying to understand the output of network data captured by tshark using the following command
sudo tshark -i any ‘tcp port 80’ -V -c 800 -R ‘http contains <filter__rgument>' > <desired_file_location>
Accordingly, I get some packets in output each starting with a line something like this:
Frame 5: 1843 bytes on wire (14744 bits), 1843 bytes captured (14744 bits) on interface 0
I have some basic questions regarding a packet:
Is a frame and a packet the same thing (used interchangeably)?
Does a packet logically represent 1 request (in my case HTTP request)? If not, can a request span across multiple packets, or can a packet contain multiple requests? A more basic question will be what does a packet represent?
I see a lot of information being captured in the request. Is there a way using tshark to just capture the http headers and http reqeust body? Basically, my motive of this whole exercise is to capture all these requests to replay them later.
Any pointers in order to answer these doubts will be really helpful.
You've asked several questions. Here are some answers.
Are frames and packets the same things?
No. Technically, when you are looking at network data and that data includes the Layer 2 frame header, you are looking at a frame. The IP packet inside of that frame is just data from Layer 2's point of view. When you look at the IP datagram (or strip off the frame header), you are now looking at a packet.
Ultimately, I tell people that you should know the difference and try to use the terms properly, but in practice it's not an extremely important distinction.
Does a packet represent a single request?
This really depends. With HTTP 1.0 and 1.1, you could look at it this way, though there's no reason that, if the client has a significant amount of POST data to send, the request can't span multiple packets. It is better to think of a single "connection" or "session" as a single request/response. (This is not necessarily strictly true with HTTP 1.1, but it is generally true)
With HTTP 2.0, this is by design not true. A single connection or session is used to handle multiple data streams (requests/responses).
How can I get at the request headers?
This is far too lengthy for me to answer here. The simplest thing to do, most likely, is to simply fire up WireShark, go into the filter bar and type "http." As soon as you hit the dot, you will see a list of all of the different sub-elements that you can look at. You can use these in tshark using the '-Y' option, and you can additionally specify columns that you would like to display (so you can add and remove columns, effectively).
An alternative way to see this information is to use the filter expression button to bring up the protocols selector. If you scroll down to HTTP, you can select it and then see all of the fields that are available.
When looking through these, realize that some of the fields are in the top-level rather than within request or response. For example, content-length appears as a field under http rather than http.request.content_length. This is because content-length is a field common to all requests and responses.
I was trying to read some messages from a tcp connection with a redis client (a terminal just running redis-cli). However, the Read command for the net package requires me to give in a slice as an argument. Whenever I give a slice with no length, the connection crashes and the go program halts. I am not sure what length my byte messages need going to be before hand. So unless I specify some slice that is ridiculously large, this connection will always close, though this seems wasteful. I was wondering, is it possible to keep a connection without having to know the length of the message before hand? I would love a solution to my specific problem, but I feel that this question is more general. Why do I need to know the length before hand? Can't the library just give me a slice of the correct size?
Or what other solution do people suggest?
Not knowing the message size is precisely the reason you must specify the Read size (this goes for any networking library, not just Go). TCP is a stream protocol. As far as the TCP protocol is concerned, the message continues until the connection is closed.
If you know you're going to read until EOF, use ioutil.ReadAll
Calling Read isn't guaranteed to get you everything you're expecting. It may return less, it may return more, depending on how much data you've received. Libraries that do IO typically read and write though a "buffer"; you would have your "read buffer", which is a pre-allocated slice of bytes (up to 32k is common), and you re-use that slice each time you want to read from the network. This is why IO functions return number of bytes, so you know how much of the buffer was filled by the last operation. If the buffer was filled, or you're still expecting more data, you just call Read again.
A bit late but...
One of the questions was how to determine the message size. The answer given by JimB was that TCP is a streaming protocol, so there is no real end.
I believe this answer is incorrect. TCP divides up a bitstream into sequential packets. Each packet has an IP header and a TCP header See Wikipedia and here. The IP header of each packet contains a field for the length of that packet. You would have to do some math to subtract out the TCP header length to arrive at the actual data length.
In addition, the maximum length of a message can be specified in the TCP header.
Thus you can provide a buffer of sufficient length for your read operation. However, you have to read the packet header information first. You probably should not accept a TCP connection if the max message size is longer than you are willing to accept.
Normally the sender would terminate the connection with a fin packet (see 1) not an EOF character.
EOF in the read operation will most likely indicate that a package was not fully transmitted within the allotted time.
I'm currently writing a low-level HTTP parser and have run into the following issue:
I am receiving HTTP data on a packet-by-packet basis, i.e. TCP payloads one at a time. When parsing this data, I am using the HTTP protocol standards of searching for CRLF to delineate header lines, chunk data (in the case of chunked-encoding), and the dual CRLF to delineate header from body.
My question is: do I need to worry about the possibility of CRLF being split between two TCP packet payloads? For example, the HTTP header will finish with CRLFCRLF. Is it possible that two subsequent TCP packets will have CR, and then LFCRLF?
I am assuming that yes; this is a case to worry about, since the application (HTTP) and TCP layers are rather independent of each other.
Any insight into this would be highly appreciated, thank you!
Yes, it is possible that the CRLF gets split into different TCP packets. Just think about the possibility that a single HTTP header is exactly one byte longer than the TCP MTU. In that case, there is only room for the CR, but not for the NL.
So no matter how tricky your code will get, it must be able to handle this case of splitting.
What language are you working in? Does it not have some form of buffered read functionality for the socket, so you don't have this issue?
The short answer to your question is yes, theoretically you do have to worry about it, because it is possible the packets would arrive like that. It is very unlikely, because most HTTP endpoints will tend to send the header in one packet and the body in subsequent packets. This is less by convention and more by the nature of the way most socket-based programs/languages work.
One thing to bear in mind is that while the protocol standards are quite clear about the CRLF separation, many people who implement HTTP (clients in particular, but to some degree servers as well) don't know/care what they are doing and will not obey the rules. They will tend to separate lines with LF only - particularly the blank line between the head and the body, the number of code segments I have seen with this problem I could not count up to quickly. While this is technically a protocol violation, most servers/clients will accept this behaviour and work around it, so you will need to as well.
If you can't do some kind of buffered read functionality, there is some good news. All you need to do is read a packet at a time into memory and tag the data on to the previous packet(s). Every time you have read a packet, scan your data for a double CRLF sequence, if you don't find it, read the next packet, and so on until you find the end of the head. This will be relatively small memory usage, because the head of any request shouldn't ever be more than 5-6KB, which given an ethernet MTU of (averaging around) 1450 bytes means you shouldn't ever need to load more than 4 or 5 packets into memory to cope with it.
Are these codes the same behaviour for the remote:
a:
socket.write("aaaa");
socket.waitForBytesWrite(3000);
socket.write("b");
b:
socket.write("aaaa");
socket.write("b");
I know the first code will get "aaaab" but..
I don't know if the second codes would result in "aabaa" or something else.
They are equivalent (as in, the remote end should receive the same order of data). In your second case if the socket has not finished sending it's current chunk of data, the new data to send will be appended to the end of the internal buffer for later writing.
This assumes, of course, that you're using TCP - if you use UDP, there's no guarantees the packets will arrive in the order you send them.
What type of socket are you using? TCP or UDP?
If you use TCP socket:
First and second lines will result in "aaaab".
If you are using UDP:
First and second lines in a very bad condition will result in "aaaab" or "baaaa". Below code is better to insure the sequence of UDP packets
socket.write("aaaa");
if (socket.waitForBytesWrite(3000))
socket.write("b");
Here's an example 'Packet Structure' image: http://freesoft.org/CIE/Course/Section3/7.htm
Lets say I had a small Python program that listened on X port and captured that packet and saved it to the variable 'data'.
How would I pull out the packet information from data? For example, say I wanted to read the 'version', is it just:
print data[0:4] ?
How would I get the Source IP Address?
I've been doing more socket coding lately and have ran into quite a few of these 'packet structure' images. I'm yet to figure out how to apply them to my code :/
Note that your example shows an IP header - if you are simply using sockets, you will not see this information (its already been digested by the system IP and TCP stacks).
If you want to capture raw data, look into using libpcap, which will allow raw packets. You can also use tcpdump to produce a file with raw packets.
As for structures, you can read the first 4 bytes if your data was a string with your command. You would likely want to encode the string as "hex" (or integers for the normal representation) or you will see "garbage" characters instead.
For more powerful unpacking, use the struct module which comes with python.