I'm reading up on network technology, but there's something that's got me scratching my head. I've read that a popular encoding for sending data across Ethernet is 8B/10B "Gigabit Ethernet".
I've read how the data is packaged up in "frames" which in turn package up "packets" of the data the application needs. Here's where it gets fuzzy. When I write a page of HTML, I set the encoding to Unicode. I understand that that page is packaged in the packet (formatted using the HTTP protocol, etc.)
If the HTML is in Unicode, but the Ethernet encoding is 8B/10B, how do the two encodings coexist? Is the message part of the packet in Unicode while the rest of the frame is 8B/10B?
Thanks for any help!
They really don't have much to do with each other. Ethernet is a "lower level" protocol than the HTTP over which your HTML is sent.
The HTML itself is simply data, and Unicode is a way to encoding characters with bits/bytes.
In contrast, Ethernet is a communications protocol for transfering bits/bytes/packets on a link between devices.
See here: http://en.wikipedia.org/wiki/OSI_model
Ethernet in the OSI 7 layer model is basically layer 2, the data link layer. HTTP and your HTML character encoding are the "Data" layers above layer 4 (which is basically TCP). The abstractions at each layer mean that each layer only has to worry about its job. The layers of 4 and below are responsible for getting your data from point A to point B. Ethernet is part of the "getting data from point A to point B" problem. The layers above that are for figuring what to do with that data. Your Unicode encoding is a "what to do with that data" question.
Related
What I read everywhere is http is a text based and http/2 is a binary based protocol. Lots of articles online suggests that http/2 binary protocol is more compact and efficient to process.
Where exactly in the http work flow , text based protocol is adding the overhead ? At the application layer, we would always need to serialize the data (text) into binary anyway to transfer on the wire. So, essentially are we not transferring the data in a binary using both http/http2 ?
Where exactly binary protocol in http2 work flow is bringing in that compactness and processing efficiency
At the application layer, we would always need to serialize the data (text) into binary anyway to transfer on the wire.
True, but till HTTP/1, data was written to the underneath layer like tcp layer in text. But in Http/2 the data is encoded into binary packed into frames and sent to the underlying layer.
Text format cannot fit exactly into the fixed sized frames. A character may get split across multiple frames. Rather binary encoding the data and splitting it into multiple frames looks more preferable.
When a computer X sends data through a network to computer Y the data goes down through the OSI layer. This is ok. I understand. But once the data is put on the media as eletric signals then how does the computer Y know what to reassmble, given the headers and trailers of the data model generated in OSI, once it is put on the electric media at layer 1 does not exist any more?
The physical layer is just 1's and 0's as you say - the trick is that there is a pattern that tells the receiver that this is the start of a packet. This is usual referred to as 'Framing'.
Once the receiver knows that, it simply reads in as many bits as its needs for the Layer 2 header and it then has that and so on.
The headers are clear in a typical OSI or networking diagrams, e.g. (https://www.ciscopress.com/articles/article.asp?p=2738463):
So the way the first two layers work on the receiver is:
layer 1 just recognises whether the signal is a one or a zero and creates the stream of ones and zeros.
layer 2 reads this stream and when it recognises the start pattern it then know the following bits are the header and so on and hence it can identify the frames.
You can see examples of start and stop patterns online e.g. (http://sinauonline.50webs.com/Cisco/Cisco%20Exploration%20Sem1Chap7.html):
I'm a bit confused how people represent binary data, and how it is sent over networks. I will explain through Wikipedia's example. Shown here <- https://imgur.com/a/POELH -> So I have my binary data encoded as base 64, and I am sending the text TWFU. So I am sending T then W then F and finally U. But to send T, a char. I will need one byte to send it, like I've always been told. One character sent over a network is one byte.
Because now I've come to think that if I encode 24 bytes, I will be sending over 4 characters, but to send over 4 characters I need the same amount of bytes as characters??
So when sending over the network "Man" (unencoded) (Requiring 3 bytes normally) vs "TWFu" (encoded) (requiring 4 bytes normally) in the example from above, are the same sequence of bits sent over the network the same. Because the last time I've used a socket to send over data, they just ask for a string input, never a text + encoding input.
Synopsis: "How" is an agreement. "Raw" is common.
Data is sent in whichever way the sender and receiver agree. There are many protocols that are standard agreements. Protocols operate at many levels. A very common pair that covers two levels is TCP/IP. Many higher-level protocols are layered on top of them. (A higher-level protocol may or may not depend on specific underlying protocols.) HTTP and SMTP are very common higher-level protocols, often with SSL sandwiched in between.
Sometimes the layers or the software that implements them is called a stack. There is also the reference (or conceptual) OSI Model. The key point about it is that it provides a language to talk about different layers. The layers it defines may or may not map to any specific stack.
Your question is too vague to answer directly. With HTTP, "raw" binary data is transferred all the time. The HTTP headers can give the length of the body in octets and the body follows the header. As part of the agreement between the sender and receiver, the header might give meta-data about the binary data using MIME headers. For example: Your gravatar
is sent with headers including:
content-length:871
content-type:image/png
That's enough for the receiver to know that the sender claims that it is a PNG graphic of 871 bytes. The receiver will read the header and then read 871 bytes for the body and then assume that what follows is another HTTP header.
Some protocols use synchronizations methods other than bodies with pre-declared sizes. They might be entirely text-based and use a syntax that allows only certain characters. They can be extended by a nesting agreement to use something like Base64 to represent binary data as text.
Some layers might provide data compression of sufficient density that expansion by higher layers, such as Base64, is not a great concern. See HTTP Compression, for example.
If you want to see HTTP in action, hit F12 and go the Network tab. If you want to see other protocols active on your computer try WireShark, Microsoft Message Analyzer, Fiddler or similar.
Base64 is a method for encoding arbitrary 8-bit data in a purely 7-bit channel. As much as the internet is based on the principle of 8-bit bytes, for text mode it's presumed to be 7-bit ASCII unless otherwise specified.
If you're sending that data Base64 encoded then you'll literally send TWFU. Many text-based protocols use Base64 out of convenience: It's an established standard and it's efficient enough for most applications.
The foundation of the internet, IP, is a protocol based on 8-bit bytes. When sending binary data you can make full use of all 8 bits, but if you're working with a text-mode protocol, of which there are many, you're generally stuck using 7-bit ASCII unless the protocol has a way of specifying which character set or encoding you're using.
If you have the option to switch to a "binary" transfer then you can side-step the need for Base64. If you're working with a 7-bit ASCII protocol then you're probably going to need Base64.
Note this isn't the only method for encoding arbitrary binary characters. There's also quoted printable as used in email, and URI encoding for URLs. These are more efficient in cases where escaping is exceptional, but far less efficient if it's required for each character.
If you know you're dealing with 7-bit text only there's no need for base-64 encoding.
However, if you'd need to send
Man
Boy
over a purely 7-bit text channel you couldn't send it as literal with the line breaks. Instead, you'd send encoded in base64
TWFuDQpCb3kNCg==
which has encoded line breaks but doesn't use incompatible characters. Of course, the receiver needs to know that you're sending encoded text - either implied by the protocol or explicitly marked in some way.
This post to the question "What is base 64 encoding used for?" says:
When you have some binary data that you want to ship across a network, you generally don't do it by just streaming the bits and bytes over the wire in a raw format. Why? because some media are made for streaming text. You never know -- some protocols may interpret your binary data as control characters (like a modem), or your binary data could be screwed up because the underlying protocol might think that you've entered a special character combination (like how FTP translates line endings).
I've used sockets in Java a hundert times to send binary data over networks. And as far as I know it very common to send binary data over networks especially if you have big data. I don't see why some devices could interpret binary data wrong, since it contains TCP header etc.
SOAP MTOM also sends binary data over networks.
Am I misunderstanding something? I'm irritated, because this post has many upvotes and is accepted.
The answer you link to isn't incorrect, it just fails to explicitly mention some examples. The answer is in the quote as well:
because some media are made for streaming text
Sockets deal in bytes, they don't care what they transport. It is the higher-level protocols, or the message formats they transport, that do.
It's when this binary data is wrapped in envelopes of such protocols or formats that they can wreak havoc. A less than (<) character in image bytes is perfectly valid, but when used in an XML message, it will break the XML. Other characters, like control characters, can have an influence on how further data is to be interpreted by a protocol handler.
So base64 is used to wrap binary data in a safe-for-transport way where that would otherwise not be safe.
There are a few things I don't understand about 64/66bit encoding, and failed to find the answers to on the web. Any help/links would be greatly appreciated:
i) how is the start of a frame recognised? I don't think it can be by the initial 10/01 bits called the preamble on wikipedia because you cannot tell them apart (if an idle link is 0, then 0000 10 and 000 01 0 look rather similar). I expect the end of a frame is indicated by a control word, with the rest of the bits perhaps used for the CRC?
ii) how do the scramblers synchronise, and how do they avoid scrambling the same packet the same way? Or to put this another way, why is not possible for a malicious user to induce substantial packet loss by carefully choosing a bad message?
iii) this might have been answered in ii), but if a packet is sent to a switch, and then onto another host, is it scrambled the same way both times?
Once again, many thanks in advance
Layers
First of all the OSI model needs to be clear.
The ethernet frame is a data link layer, while the 64b/66b encoding is part of the physical layer (More precisely the PCS of the physical layer)
The physical layer doesn't know anything about the start of a frame. It sees only data. (The start of an ethernet frame are data bytes which contain the preamble.)
64b/66b encoding
Now let's assume that the link is up and running.
In this case the idle link is not full of '0'-s. (In that case the link wouldn't be self-synchronous) Idle messages (idle characters and/or synchronization blocks ie control information) are sent over the idle link. (The control information encoded with 0b10 preamble) (This is why the emitted spectrum and power dissipation don't depend on if the link is in idle state or not)
So a start of a new frame acts like following:
The link sends idle information. (with 0b10 preamble)
Upper layer (data link layer) sends the frame (in 64bit chunks of data) to physical layer.
The physical layer sends the data (with 0b01 preamble) over the link.
(Note that physical layer frequently inserts control (sync) symbols into the raw frame even during a data burst)
Synchronization
Before data transmission 64b/66b encoded lane must be initialized. This initialization includes the lane initialization which the block synchronization. Xilinx's Aurora's specification (P34) is an example of link initialization.
Briefly receiver tries to match the sync character in different bit-position, and when it match multiple times it reports link-up.
Note, that the 64b/66b encoding uses self-synchronous scrambler. This is why the scrambler (itself) doesn't need to know anything about where we are in the data stream. If you run a self-synchronous (de-)scrambler long enough, it produces the decoded bit stream.
Maliciousness
Note, that 64b/66b encoding is not an encryption. This scrambling won't protect you from eavesdropping/tamper. (Encryption should placed at higher level of the OSI model)
Same packet multiple times
Because the scrambler is in different state/seed when you sending the same packet second time, the two encoded packet will differ. (Theoretically we can creates packets, which sets back the shift register of the scramble, but we need to consider the control symbols, so practically this is impossible.)