Computer Networking - Bit stuffing - networking

In bit stuffing why always add non information bits after consecutive 5 bits? Any reason behind that?

Here is some information from tutorialspoint:
Bit-Stuffing: A pattern of bits of arbitrary length is stuffed in the message to differentiate from the delimiter.
The flag field is some fixed sequence of binary values like 01111110. Now the payload can also have similar pattern, but the machine on the network can get confused and misinterpret that payload data as the flag field (indicating end of frame). So, to avoid the machine getting confused, some bits are stuffed into the payload (especially at points where payload data looks like the flag) so as to differentiate it from flag.

Related

What Are The Reasons For Bit Shifting A Float Before Sending It Via A Network

I work with Unity and C# - when making multiplayer games I've been told that when it comes to values like positions that are floats, I should use a bit shift operator on them before sending them and reverse the operation on receive. I have been told this not only allows for larger numbers values and is capable of maintaining floating point precision which may be lost. However, if I do not have to, I do not wish to run this operation every time I receive a packet unless I have to. Though the bottle necks seem to be the actual parsing of the bytes received. Especially without message framing and attempting to move from string to byte array. (But that's another story!)
My question are:
Are these valid reason to undergo the operation? Are they accurate statements?
If not should I be running bit shift ops on my floats?
If I should, what are the real reasons to do it?
Any additional information would be most appreciated.
One of the resourcesI'm referring to:
Main reasons for going back and forth to/from network byte order is to combat endianness caused problems, mainly to ensure each byte of multi byte values (long, int but also floats) is read and written in the way giving the same results regardless of architecture. This issue can be theoretically ignored if you are sure you are exchanging data between systems using the same endianness, but that's rather bad idea from very beginning as you are simply creating unneded technological debt and keep unjustified exceptions in the code ("It all works BUT on the same endianness only. What can go wrong?").
Depending on your app architecture you can rewrite the packet payload/data once you receive it and then use that version further in the code. Also note that you need to encode the data again prior sending it out.

what is the simplest compression techniques for a single network packet?

Need a simple compression method for a single network packet.
simple in the sense a technique which uses least computation.
Thanks!
lz4 compresses and decompresses very fast. zlib can compress better, but not quite as fast. The "least computation" would be to not compress at all.
The "PPP Predictor Compression Protocol" is one of the lowest-computation algorithms available for single-packet compression.
Source code is available in RFC1978.
The decompressor guesses what the next byte is in the current context.
If it guesses correctly, the next bit from the compressed text is "1";
If it guesses incorrectly, the next bit from the compressed text is "0" and the next byte from the compressed text is passed through literally (and the guess table for this context is updated so next time it guesses this literal byte).
The compressor attempts to compress the data field of the packet.
Even if half of the guesses are wrong, the compressed data field will end up smaller than the plaintext data, and the compressed flag for that packet is set to 1, and the compressed data is sent in the packet.
If too many more guesses are wrong, however, the "compressed" data ends up the same or even longer than the plaintext -- so the compressor instead sets the compressed flag for that packet to 0, and simply sends the raw plaintext in the packet.
There are two basic types of compression: loss-less and lossy. Loss-less means that if you have two algorithms c(msg) which is the compression algorithm and d(msg) which is the decompression algorithm then
msg == d(c(msg))
Of course then, this implies that a lossy compression would be:
msg != d(c(msg))
With some information, lossy is ok. This is typically how sound is handled. You can lose some bits without any noticeable loss. MP3 compression works this way, for example. Lossy algorithms are usually specific to the type of information that you are compressing.
So, it really depends upon the data that you are transmitting. I assume that you are speaking strictly about the payload and not any of the addressing fields and that you are interested in loss-less compression. The simplest would be run length encoding (RLE). In RLE, you basically find duplicate successive values and you replace the values with a flag followed by a count followed by the value to repeat. You should only do this if the length of the run is greater (or equal) to the length of the tuple
(sizeof(flag)+sizeof(count)+sizeof(value).
This can work really well if the data is less than 8 bits in which case you can use the 8th bit as the flag. So for instance if you had three 'A's "AAA" then in hex that would be 414141. You can encode that to C103. In this case, the max run would be 255 (FF) before you would have to start the compression sequence again if there were more than 255 characters of the same value. So in this case 3 bytes becomes 2 bytes. In a best case, 255 7-bit values of the same value would then be 2 characters instead of 255.
A simple state machine could be used to handle the RLE.
See http://en.wikipedia.org/wiki/Run-length_encoding

Is there a good way to frame a protocol so data corruption can be detected in every case?

Background: I've spent a while working with a variety of device interfaces and have seen a lot of protocols, many serial and UDP in which data integrity is handled at the application protocol level. I've been seeking to improve my receive routine handling of protocols in general, and considering the "ideal" design of a protocol.
My question is: is there any protocol framing scheme out there that can definitively identify corrupt data in all cases? For example, consider the standard framing scheme of many protocols:
Field: Length in bytes
<SOH>: 1
<other framing information>: arbitrary, but fixed for a given protocol
<length>: 1 or 2
<data payload etc.>: based on length field (above)
<checksum/CRC>: 1 or 2
<ETX>: 1
For the vast majority of cases, this works fine. When you receive some data, you search for the SOH (or whatever your start byte sequence is), move forward a fixed number of bytes to your length field, and then move that number of bytes (plus or minus some fixed offset) to the end of the packet to your CRC, and if that checks out you know you have a valid packet. If you don't have enough bytes in your input buffer to find an SOH or to have a CRC based on the length field, then you wait until you receive enough to check the CRC. Disregarding CRC collisions (not much we can do about that), this guarantees that your packet is well formed and uncorrupted.
However, if the length field itself is corrupt and has a high value (which I'm running into), then you can't check the (corrupt) packet's CRC until you fill up your input buffer with enough bytes to meet the corrupt length field's requirement.
So is there a deterministic way to get around this, either in the receive handler or in the protocol design itself? I can set a maximum packet length or a timeout to flush my receive buffer in the receive handler, which should solve the problem on a practical level, but I'm still wondering if there's a "pure" theoretical solution that works for the general case and doesn't require setting implementation-specific maximum lengths or timeouts.
Thanks!
The reason why all protocols I know of, including those handling "streaming" data, chop up the datastream in smaller transmission units each with their own checks on board is exactly to avoid the problems you describe. Probably the fundamental flaw in your protocol design is that the blocks are too big.
The accepted answer of this SO question contains a good explanation and a link to a very interesting (but rather heavy on math) paper about this subject.
So in short, you should stick to smaller transmission units not only because of practical programming related arguments but also because of the message length's role in determining the security offered by your crc.
One way would be to encode the length parameter so that it would be easily detected to be corrupted, and save you from reading in the large buffer to check the CRC.
For example, the XModem protocol embeds an 8 bit packet number followed by it's one's complement.
It could mean doubling your length block size, but it's an option.

How to determine the length of an Ethernet II frame?

The Ethernet II frame format does not contain a length field, and I'd like to understand how the end of a frame can be detected without it.
Unfortunately, I have no idea of physics, but the following sounds reasonable to me: we assume that Layer 1 (Physical Layer) provides us with a way of transmitting raw bits in such a way that it is possible to distinguish between the situation where bits are being sent and the situation where nothing is sent (if digital data was coded into analog signals via phase modulation, this would be true, for example - but I don't know if this is really what's done). In this case, an ethernet card could simply wait until a certain time intervall occurs where no more bits are being transmitted, and then decide that the frame transmission has to be finished.
Is this really what's happening?
If yes: where can I find these things, and what are common values for the length of "certain time intervall"? Why does IEEE 802.3 have a length field?
If not: how is it done instead?
Thank you for your help!
Hanno
Your assumption is right. The length field inside the frame is not needed for layer1.
Layer1 uses other means to detect the end of a frame which vary depending on the type of physical layer.
with 10Base-T a frame is followed by a TP_IDL waveform. The lack of further Manchester coded data bits can be detected.
with 100Base-T a frame is ended with an End of Stream Delimiter bit pattern that may not occur in payload data (because of its 4B/5B encoding).
A rough description you can find e.g. here:
http://ww1.microchip.com/downloads/en/AppNotes/01120a.pdf "Ethernet Theory of Operation"

Packet data structure?

I'm designing a game server and I have never done anything like this before. I was just wondering what a good structure for a packet would be data-wise? I am using TCP if it matters. Here's an example, and what I was considering using as of now:
(each value in brackets is a byte)
[Packet length][Action ID][Number of Parameters]
[Parameter 1 data length as int][Parameter 1 data type][Parameter 1 data (multi byte)]
[Parameter 2 data length as int][Parameter 2 data type][Parameter 2 data (multi byte)]
[Parameter n data length as int][Parameter n data type][Parameter n data (multi byte)]
Like I said, I really have never done anything like this before so what I have above could be complete bull, which is why I'm asking ;). Also, is passing the total packet length even necessary?
Passing the total packet length is a good idea. It might cost two more bytes, but you can peek and wait for the socket to have a full packet ready to sip before receiving. That makes code easier.
Overall, I agree with brazzy, a language supplied serialization mechanism is preferrable over any self-made.
Other than that (I think you are using a C-ish language without serialization), I would put the packet ID as the first data on the packet data structure. IMHO that's some sort of convention because the first data member of a struct is always at position 0 and any struct can be downcast to that, identifying otherwise anonymous data.
Your compiler may or may not produce packed structures, but that way you can allocate a buffer, read the packet in and then either cast the structure depending on the first data member. If you are out of luck and it does not produce packed structures, be sure to have a serialization method for each struct that will construct from the (obviously non-destination) memory.
Endiannes is a factor, particularly on C-like languages. Be sure to make clear that packets are of the same endianness always or that you can identify a different endian based on a signature or something. An odd thing that's very cool: C# and .NET seems to always hold data in little-endian convention when you access them using like discussed in this post here. Found that out when porting such an application to Mono on a SUN. Cool, but if you have that setup you should use the serialization means of C# anyways.
Other than that, your setup looks very okay!
Start by considering a much simpler basic wrapper: Tag, Length, Value (TLV). Your basic packet will look then like this:
[Tag] [Length] [Value]
Tag is a packet identifier (like your action ID).
Length is the packet length. You may need this to tell whether you have the full packet. It will also let you figure out how long the value portion is.
Value contains the actual data. The format of this can be anything.
In your case above, the value data contains a further series of TLV structures (parameter type, length, value). You don't actually need to send the number of parameters, as you can work it from the data length and walking the data.
As others have said, I would put the packet ID (Tag) first. Unless you have cross-platform concerns, I would consider wrapping your application's serialised object in a TLV and sending it across the wire like that. If you make a mistake or want to change later, you can always create a new tag with a different structure.
See Wikipedia for more details on TLV.
To avoid reinventing the wheel, any serialization protocol will work for on the wire data (e.g. XML, JSON), and you might consider looking at BEEP for the basic protocol framework.
BEEP is summed up well in its FAQ document as 'kind of a "best hits" album of the tricks used by experienced application protocol designers since the early 80's.'
There's no reason to make something so complicated like that. I see that you have an action ID, so I suppose there would be a fixed number of actions.
For each action, you would define a data structure, and then you would put each one of those values in the structure. To send it over the wire, you just allocate sum(sizeof(struct.i)) bytes for each element in your structure. So your packet would look like this:
[action ID][item 1 (sizeof(item 1 bytes)][item 1 (sizeof(item 2 bytes)]...[item n (sizeof(item n bytes)]
The idea is, you already know the size and type of each variable on each side of the connection is, so you don't need to send that information.
For strings, you can just throw 'em in in a null terminated form, and then when you 'know' to look for a string based on your packet type, start reading and looking for a null.
--
Another option would be to use '\r\n' to delineate your variables. That would require some overhead, and you would have to use text, rather then binary values for numbers. But that way you could just use readline to read each variable. Your packets would look like this
[action ID]
[item 1 (as text)]
...
[item n (as text)]
--
Finally, simply serializing objects and passing them down the wire is a good way to do this too, with the least amount of code to write. Remember that you don't want to prematurely optimize, and that includes network traffic as well. If it turns out you need to squeeze out a little bit more performance later on you can go back and figure out a more efficient mechanism.
And check out google's protocol buffers, which are supposedly an extreemly fast way to serialize data in a platform-neutral way, kind of like a binary XML, but without nested elements. There's also JSON, which is another platform neutral encoding. Using protocol buffers or JSON would mean you wouldn't have to worry about how to specifically encode the messages.
Do you want the server to support multiple clients written in different languages? If not, it's probably not necessary to specify the structure exactly; instead use whatever facility for serializing data your language offers, simply to reduce the potential for errors.
If you do need the structure to be portable, the above looks OK, though you should specify stuff like endianness and text encoding as well in that case.

Resources