Why router can use layer 4 infomation in emcp hash? - networking

Router is layer 3 device, but it can use layer 4 info (like:dst.port, src.port, protocol) in ecmp hash function.
Why?

Because it cheats and looks at that information. Nothing is stopping it.

The reason is that you want every packet in a given stream (defined by src addr, dst addr, src proto/port, dst proto/port) to take the same path so that these pkts arrive in order. Otherwise the application might have issues trying to re-sort. Different ECMP paths have different delays.

Related

Understanding format of RIB dumps from Oregon Route-views

I am working on a project in which I need to analyse the rib-dumps from the Oregon Routeviews Project.
I download the .bz2 file from here for a specific time and date for a specific node. These files are generated every 2 hours.
Then I unzipped and parsed using a zebra parser.
In the end, I get a text file with almost a million entries in the following format
194.33.63.0/24 58511 8468 31493 31493
There are also a lot of entries with the same last number but different IP in the beginning.
For example
194.28.28.0/22 58511 31500 50911
194.28.28.0/23 58511 31133 50911
My inference is that these numbers are Autonomous System numbers and they somehow denote BGP Hops, but I am not clear how they relate to the IP address in the starting. And what exactly is the source/destination AS?
I really think you should go and do some reading on how BGP works and what the routeing information carried by the BGP messages you are looking at is and means.
To get you started...
...a route in BGP speak is a prefix and some attributes. Key among the attributes are the next-hop and the AS-Path. In announcing a route to a BGP peer (neighbour) the BGP router is saying that it can reach the prefix and if packets with destinations in the prefix are forwarded to the next-hop, they will be forwarded on towards their destination. The AS-PATH lists the ASes through which packets are (expected to) travel on their way to the destination.
So what you are seeing is reachable prefixes and the AS-PATH attribute for each one. I'm guessing you left out the next-hop (for eBGP, that will generally be the/an address of the BGP router which is advertising the route -- but in any case all eBGP routes will generally have the same next-hop).
The AS-PATH can be read from left to right: the first AS is the one from whom the route was learnt, the last AS is the one that contains the prefix. Packets forwarded to the next-hop are (currently) expected to travel through those ASes, in that order, on their way to their destination. So the first AS would be the source -- the immediate source of the route. The last AS can be called the destination, but is also known as the origin -- the origin of the route.
[Technically, the AS-Path should be read from right to left, and lists the ASes which the route has traversed this far. Most of the time that's the same as reading left to right for packets traversing the network towards their destination.]
as-50911 origin or destination,
as-58511 source
194.28.28.0/22 should be the owner of as-50911 origin
I think you are confused about /23 or /22. 194.28.28.0/23 its not different IP. Its actually the same IP with different prefix length, i.e., /23. The autonomous systems registered their IP addresses with prefix lengths in IRR. Less specific, i.e., /22 means more end node. More specific, i.e., /23 means less end node. Moreover, You should read about prefix length.

Network layer Lan network

When we sent packets from one router to another router on the network layer and the packet size is greater than the MTU (maximum transferable unit) of the router, we have to fragment the packet. My questions is: suppose we need to add padding bits in last fragment, then where do we add padding bits (in the LSB or MSB) and how does the destination router differentiate between packet bits or padding bits?
I want you to consider the following things before:
Limit on the maximum size of IP data-gram is imposed by data link protocol.
IP is the highest layer protocol that is implemented both at routers and hosts.
Reassembly of original data-grams is only done at destination host. This takes off the extra work that need to be done by the routers present in the network core.
I will use the information from the following image to help you get to the answer with an example.
Here initial length of the packet is 2400 bytes which needs to to fragmented according to MTU limit of 1000 bytes.
There are only 13 bits available for the fragment offset and the offset is given as a multiple of eight bytes. This is why the data fields in first and second fragment has size of 976 bytes (It is the highest number divisible by 8, which is smaller than 1000 - 20 bytes). This makes first and second fragment of total size of 996 bytes. The last fragment contains the remaining of 428 bytes of payload (with 448 bytes of total).
Offset can be calculated as 0; 976/8 = 122 and 1952/8 = 244.
When these fragments reach the destination host, reassembly needs to be done. Host uses identification, flag and fragmentation offset for this task. In order to make sure which fragments belong to which data-gram, host uses source, destination addresses and identification to uniquely identify them. Offset values and more fragment bits are used to determine whether all fragments have arrived or not.
Answer to your question
The need to divide payload into multiples of 8 is only required for non-last fragment. Reason of using offset dividing by 8 helps the host to identify the starting address of the next fragment. The host don't need the address of the next fragment if it encounters the last fragment. Thus, no need to worry about payload being multiple of 8 in case of last fragment. Host checks the more fragment flag to identify the last fragment.
A bit of additional information: It is not the responsibility of the network layer to guarantee the delivery of the data-gram. If it encounters that one or more fragment(s) have not arrived then, it simply discards the whole data-gram. Transport layer, which is working above network layer, will take care of this thing, if it is using TCP, by asking the source to re-transmit the data.
Reference: Computer Networking-A Top Down Approach, James F. Kurose, Keith W. Ross (Fifth Edition)
You don't need to add any padding bits. All bits will be push on down the route until the full frame has been sent.

Dissector for TCP Option

I am new to writing dissectors in Lua and I had two quick questions. I have a packet which has the TCP Options as MSS, TCP SACK, TimeStamps, NOP, Window Scale, Unknown. I am basically trying to dissect the unknown section in the TCP Options field. I am aware that I will have to use the chained dissector.
The first question is while using the chained dissector to parse the TCP Options, do I have to parse all the Options from the beginning. For Example will I need to parse MSS, TCP SACK, .... and then finally parse Unknown section or is there any direct way for me to jump to the Unknown section.
The second question I have is I have seen the code for many custom protocol dissectors and if I need to dissect a protocol which follows (for example)TCP, then I will have to include the following:
-- load the tcp.port table
tcp_table = DissectorTable.get("tcp.port")
-- register our protocol to handle tcp port
tcp_table:add(port,myproto_tcp_proto)
My question is, is there anyway for me to jump to the middle of the protocol. For example in my case I want to parse TCP Options. Can I directly call tcp.options and the parser will start dissecting from where the options will start?
The TCP option is "uint8_t type; uint8_t len; uint8_t* data" structure.
I usually give common used ones a name. For example getSack(), getMss().
For others, keep them in an array(maximum size like 20).
For your second question, you mean you don't care about TCP header, right? If so, just move your pointer 20 bytes further to get access the TCP options.

Split CRLF between TCP payloads

I'm currently writing a low-level HTTP parser and have run into the following issue:
I am receiving HTTP data on a packet-by-packet basis, i.e. TCP payloads one at a time. When parsing this data, I am using the HTTP protocol standards of searching for CRLF to delineate header lines, chunk data (in the case of chunked-encoding), and the dual CRLF to delineate header from body.
My question is: do I need to worry about the possibility of CRLF being split between two TCP packet payloads? For example, the HTTP header will finish with CRLFCRLF. Is it possible that two subsequent TCP packets will have CR, and then LFCRLF?
I am assuming that yes; this is a case to worry about, since the application (HTTP) and TCP layers are rather independent of each other.
Any insight into this would be highly appreciated, thank you!
Yes, it is possible that the CRLF gets split into different TCP packets. Just think about the possibility that a single HTTP header is exactly one byte longer than the TCP MTU. In that case, there is only room for the CR, but not for the NL.
So no matter how tricky your code will get, it must be able to handle this case of splitting.
What language are you working in? Does it not have some form of buffered read functionality for the socket, so you don't have this issue?
The short answer to your question is yes, theoretically you do have to worry about it, because it is possible the packets would arrive like that. It is very unlikely, because most HTTP endpoints will tend to send the header in one packet and the body in subsequent packets. This is less by convention and more by the nature of the way most socket-based programs/languages work.
One thing to bear in mind is that while the protocol standards are quite clear about the CRLF separation, many people who implement HTTP (clients in particular, but to some degree servers as well) don't know/care what they are doing and will not obey the rules. They will tend to separate lines with LF only - particularly the blank line between the head and the body, the number of code segments I have seen with this problem I could not count up to quickly. While this is technically a protocol violation, most servers/clients will accept this behaviour and work around it, so you will need to as well.
If you can't do some kind of buffered read functionality, there is some good news. All you need to do is read a packet at a time into memory and tag the data on to the previous packet(s). Every time you have read a packet, scan your data for a double CRLF sequence, if you don't find it, read the next packet, and so on until you find the end of the head. This will be relatively small memory usage, because the head of any request shouldn't ever be more than 5-6KB, which given an ethernet MTU of (averaging around) 1450 bytes means you shouldn't ever need to load more than 4 or 5 packets into memory to cope with it.

64/66b encoding

There are a few things I don't understand about 64/66bit encoding, and failed to find the answers to on the web. Any help/links would be greatly appreciated:
i) how is the start of a frame recognised? I don't think it can be by the initial 10/01 bits called the preamble on wikipedia because you cannot tell them apart (if an idle link is 0, then 0000 10 and 000 01 0 look rather similar). I expect the end of a frame is indicated by a control word, with the rest of the bits perhaps used for the CRC?
ii) how do the scramblers synchronise, and how do they avoid scrambling the same packet the same way? Or to put this another way, why is not possible for a malicious user to induce substantial packet loss by carefully choosing a bad message?
iii) this might have been answered in ii), but if a packet is sent to a switch, and then onto another host, is it scrambled the same way both times?
Once again, many thanks in advance
Layers
First of all the OSI model needs to be clear.
The ethernet frame is a data link layer, while the 64b/66b encoding is part of the physical layer (More precisely the PCS of the physical layer)
The physical layer doesn't know anything about the start of a frame. It sees only data. (The start of an ethernet frame are data bytes which contain the preamble.)
64b/66b encoding
Now let's assume that the link is up and running.
In this case the idle link is not full of '0'-s. (In that case the link wouldn't be self-synchronous) Idle messages (idle characters and/or synchronization blocks ie control information) are sent over the idle link. (The control information encoded with 0b10 preamble) (This is why the emitted spectrum and power dissipation don't depend on if the link is in idle state or not)
So a start of a new frame acts like following:
The link sends idle information. (with 0b10 preamble)
Upper layer (data link layer) sends the frame (in 64bit chunks of data) to physical layer.
The physical layer sends the data (with 0b01 preamble) over the link.
(Note that physical layer frequently inserts control (sync) symbols into the raw frame even during a data burst)
Synchronization
Before data transmission 64b/66b encoded lane must be initialized. This initialization includes the lane initialization which the block synchronization. Xilinx's Aurora's specification (P34) is an example of link initialization.
Briefly receiver tries to match the sync character in different bit-position, and when it match multiple times it reports link-up.
Note, that the 64b/66b encoding uses self-synchronous scrambler. This is why the scrambler (itself) doesn't need to know anything about where we are in the data stream. If you run a self-synchronous (de-)scrambler long enough, it produces the decoded bit stream.
Maliciousness
Note, that 64b/66b encoding is not an encryption. This scrambling won't protect you from eavesdropping/tamper. (Encryption should placed at higher level of the OSI model)
Same packet multiple times
Because the scrambler is in different state/seed when you sending the same packet second time, the two encoded packet will differ. (Theoretically we can creates packets, which sets back the shift register of the scramble, but we need to consider the control symbols, so practically this is impossible.)

Resources