Why are the protocol headers like TCP or UDP normally using Hex representation while filling the particular fields in the protocol header ? Is there any specific advantage ?
Depending upon the fields in question (flags, such as SYN FIN ACK RST URG PSH ..) it is easiest to set the fields using bitshifting arithmetic (0x1 < TCP_OFFSET_SYN) and OR | or AND & the results with the existing field. Bitshifts just go easier with hexadecimal than decimal, and is often more convenient to read than octal.
It boils down to, whoever wrote the code you're reading probably thought hex was more understandable than decimal in that instance, but this is obviously subjective. Your opinion may vary. :)
well that's for convenience in reality, if you inspect the actual packet, they are all binary data, which for sake of readability is shown as hex
Related
I'm a bit confused how people represent binary data, and how it is sent over networks. I will explain through Wikipedia's example. Shown here <- https://imgur.com/a/POELH -> So I have my binary data encoded as base 64, and I am sending the text TWFU. So I am sending T then W then F and finally U. But to send T, a char. I will need one byte to send it, like I've always been told. One character sent over a network is one byte.
Because now I've come to think that if I encode 24 bytes, I will be sending over 4 characters, but to send over 4 characters I need the same amount of bytes as characters??
So when sending over the network "Man" (unencoded) (Requiring 3 bytes normally) vs "TWFu" (encoded) (requiring 4 bytes normally) in the example from above, are the same sequence of bits sent over the network the same. Because the last time I've used a socket to send over data, they just ask for a string input, never a text + encoding input.
Synopsis: "How" is an agreement. "Raw" is common.
Data is sent in whichever way the sender and receiver agree. There are many protocols that are standard agreements. Protocols operate at many levels. A very common pair that covers two levels is TCP/IP. Many higher-level protocols are layered on top of them. (A higher-level protocol may or may not depend on specific underlying protocols.) HTTP and SMTP are very common higher-level protocols, often with SSL sandwiched in between.
Sometimes the layers or the software that implements them is called a stack. There is also the reference (or conceptual) OSI Model. The key point about it is that it provides a language to talk about different layers. The layers it defines may or may not map to any specific stack.
Your question is too vague to answer directly. With HTTP, "raw" binary data is transferred all the time. The HTTP headers can give the length of the body in octets and the body follows the header. As part of the agreement between the sender and receiver, the header might give meta-data about the binary data using MIME headers. For example: Your gravatar
is sent with headers including:
content-length:871
content-type:image/png
That's enough for the receiver to know that the sender claims that it is a PNG graphic of 871 bytes. The receiver will read the header and then read 871 bytes for the body and then assume that what follows is another HTTP header.
Some protocols use synchronizations methods other than bodies with pre-declared sizes. They might be entirely text-based and use a syntax that allows only certain characters. They can be extended by a nesting agreement to use something like Base64 to represent binary data as text.
Some layers might provide data compression of sufficient density that expansion by higher layers, such as Base64, is not a great concern. See HTTP Compression, for example.
If you want to see HTTP in action, hit F12 and go the Network tab. If you want to see other protocols active on your computer try WireShark, Microsoft Message Analyzer, Fiddler or similar.
Base64 is a method for encoding arbitrary 8-bit data in a purely 7-bit channel. As much as the internet is based on the principle of 8-bit bytes, for text mode it's presumed to be 7-bit ASCII unless otherwise specified.
If you're sending that data Base64 encoded then you'll literally send TWFU. Many text-based protocols use Base64 out of convenience: It's an established standard and it's efficient enough for most applications.
The foundation of the internet, IP, is a protocol based on 8-bit bytes. When sending binary data you can make full use of all 8 bits, but if you're working with a text-mode protocol, of which there are many, you're generally stuck using 7-bit ASCII unless the protocol has a way of specifying which character set or encoding you're using.
If you have the option to switch to a "binary" transfer then you can side-step the need for Base64. If you're working with a 7-bit ASCII protocol then you're probably going to need Base64.
Note this isn't the only method for encoding arbitrary binary characters. There's also quoted printable as used in email, and URI encoding for URLs. These are more efficient in cases where escaping is exceptional, but far less efficient if it's required for each character.
If you know you're dealing with 7-bit text only there's no need for base-64 encoding.
However, if you'd need to send
Man
Boy
over a purely 7-bit text channel you couldn't send it as literal with the line breaks. Instead, you'd send encoded in base64
TWFuDQpCb3kNCg==
which has encoded line breaks but doesn't use incompatible characters. Of course, the receiver needs to know that you're sending encoded text - either implied by the protocol or explicitly marked in some way.
This post to the question "What is base 64 encoding used for?" says:
When you have some binary data that you want to ship across a network, you generally don't do it by just streaming the bits and bytes over the wire in a raw format. Why? because some media are made for streaming text. You never know -- some protocols may interpret your binary data as control characters (like a modem), or your binary data could be screwed up because the underlying protocol might think that you've entered a special character combination (like how FTP translates line endings).
I've used sockets in Java a hundert times to send binary data over networks. And as far as I know it very common to send binary data over networks especially if you have big data. I don't see why some devices could interpret binary data wrong, since it contains TCP header etc.
SOAP MTOM also sends binary data over networks.
Am I misunderstanding something? I'm irritated, because this post has many upvotes and is accepted.
The answer you link to isn't incorrect, it just fails to explicitly mention some examples. The answer is in the quote as well:
because some media are made for streaming text
Sockets deal in bytes, they don't care what they transport. It is the higher-level protocols, or the message formats they transport, that do.
It's when this binary data is wrapped in envelopes of such protocols or formats that they can wreak havoc. A less than (<) character in image bytes is perfectly valid, but when used in an XML message, it will break the XML. Other characters, like control characters, can have an influence on how further data is to be interpreted by a protocol handler.
So base64 is used to wrap binary data in a safe-for-transport way where that would otherwise not be safe.
I would like to receive and send bytes that have special meaning in ASCII code like End of Text, End of Transmission etc. but I am not sure if it is allowed. Can it break my communication? Sending and receiving looks like reading from file that is why I doubt I can directly use this specific values. I use Windows OS.
EDIT: I have tested it and there is no problem with any sign. All of control ASCII characters can be sent via RS 232. Both reading and writing cause no unexpected behaviour.
RS232 is a very binary protocol. It does not even assume 8-bit bytes, let alone ASCII. The fact that on Windows, you use file functions does not matter either. Those too do not assume text data, although those do assume 8-bit byets.
RS-232 nodes do not interpret the data except in software flow control mode (XOn/XOff). You use this mode only if both parties agree and agree on the values of XOn and XOff.
The values are historically based on the ASCII DC1 and DC3 characters but the only thing that matters is their values, 0x11, and 0x13.
If you haven't set up software flow control, all values are passed through as-is.
Need a simple compression method for a single network packet.
simple in the sense a technique which uses least computation.
Thanks!
lz4 compresses and decompresses very fast. zlib can compress better, but not quite as fast. The "least computation" would be to not compress at all.
The "PPP Predictor Compression Protocol" is one of the lowest-computation algorithms available for single-packet compression.
Source code is available in RFC1978.
The decompressor guesses what the next byte is in the current context.
If it guesses correctly, the next bit from the compressed text is "1";
If it guesses incorrectly, the next bit from the compressed text is "0" and the next byte from the compressed text is passed through literally (and the guess table for this context is updated so next time it guesses this literal byte).
The compressor attempts to compress the data field of the packet.
Even if half of the guesses are wrong, the compressed data field will end up smaller than the plaintext data, and the compressed flag for that packet is set to 1, and the compressed data is sent in the packet.
If too many more guesses are wrong, however, the "compressed" data ends up the same or even longer than the plaintext -- so the compressor instead sets the compressed flag for that packet to 0, and simply sends the raw plaintext in the packet.
There are two basic types of compression: loss-less and lossy. Loss-less means that if you have two algorithms c(msg) which is the compression algorithm and d(msg) which is the decompression algorithm then
msg == d(c(msg))
Of course then, this implies that a lossy compression would be:
msg != d(c(msg))
With some information, lossy is ok. This is typically how sound is handled. You can lose some bits without any noticeable loss. MP3 compression works this way, for example. Lossy algorithms are usually specific to the type of information that you are compressing.
So, it really depends upon the data that you are transmitting. I assume that you are speaking strictly about the payload and not any of the addressing fields and that you are interested in loss-less compression. The simplest would be run length encoding (RLE). In RLE, you basically find duplicate successive values and you replace the values with a flag followed by a count followed by the value to repeat. You should only do this if the length of the run is greater (or equal) to the length of the tuple
(sizeof(flag)+sizeof(count)+sizeof(value).
This can work really well if the data is less than 8 bits in which case you can use the 8th bit as the flag. So for instance if you had three 'A's "AAA" then in hex that would be 414141. You can encode that to C103. In this case, the max run would be 255 (FF) before you would have to start the compression sequence again if there were more than 255 characters of the same value. So in this case 3 bytes becomes 2 bytes. In a best case, 255 7-bit values of the same value would then be 2 characters instead of 255.
A simple state machine could be used to handle the RLE.
See http://en.wikipedia.org/wiki/Run-length_encoding
Here is a question I've been trying to solve since quite some time ago. This does not attain a particular languaje, although it's not really beneficial for some that have a VM that specifies endianess. I know, like the 99.9999% of people that use sockets to send data using TCP/IP, that the protocol specifies a endianess for the transmission elements, like destination address, port and such. The thing I don't know is if it requires the payload to be in a specific format to prevent incompatibilities.
For example, let's say I develop a protocol that is not a presentation layer, and that I, due to the inmense dominance that little endian devices have nowadays, decide to make it little endian (for example the positions of the players and such are transmitted in little endian order). For example a network module for a game engine, where latencies matter and byte conversion would cost a noticeable amount of time. Of course the address, port and all of that data that is protocol related would be specified in big endian as is mandatory, I'm talking about the payload, and only that.
Would that protocol work out of the box (translating the contents as necessary, of course, once the the transmission is received) on a big endian machine? Or would the checksums of the IP protocol or something of the kind get computed wrong since the data is in a different order, and the programmer does not have control of them if raw_sockets aren't used?
Since the whole explanation can be misleading, feel free to ask for clarifications.
Thank you very much.
The thing I don't know is if it requires the payload to be in a specific format to prevent incompatibilities.
It doesn't, and it doesn't have a way of telling. To TCP it's just a byte-stream. It is up to the application protocol to decide endian-ness, and it is up to the implementors at each end to implement it correctly. There is a convention to use big-endian, but there's no compulsion.
Application-layer protocols dictate their own endianness. However, by convention, multi-byte integer values should be sent in network-byte order (big endian) for consistency across platforms, such as by using platform-provided hton...() (host-to-network) and ntoh...() (network-to-host) function implementations in your code. On little-endian systems, they will do the necessary byte swapping. On big endian systems, they are no-ops. The functions provide an abtraction layer so code does not have to worry about that.