What is the difference between baud rate and bit rate? - microcontroller

I am really having hard time understanding the difference. Some say they are same, while others say there is a slight difference. What's the difference, exactly? I would like it if you explained with some analogy.

Bits per second is straightforward. It is exactly what it sounds like. If I have 1000 bits and am sending them at 1000 bps, it will take exactly one second to transmit them.
Baud is symbols per second. If these symbols — the indivisible elements of your data encoding — are not bits, the baud rate will be lower than the bit rate by the factor of bits per symbol. That is, if there are 4 bits per symbol, the baud rate will be ¼ that of the bit rate.
This confusion arose because the early analog telephone modems weren't very complicated, so bps was equal to baud. That is, each symbol encoded one bit. Later, to make modems faster, communications engineers invented increasingly clever ways to send more bits per symbol.¹
Analogy
System 1, bits: Imagine a communication system with a telescope on the near side of a valley and a guy on the far side holding up one hand or the other. Call his left hand "0" and his right hand "1," and you have a system for communicating one binary digit — one bit — at a time.
System 2, baud: Now imagine that the guy on the far side of the valley is holding up playing cards instead of his bare hands. He is using a subset of the cards, ace through 8 in each suit, for a total of 32 cards. Each card — each symbol — encodes 5 bits: 00000 through 11111 in binary.²
Analysis
The System 2 guy can convey 5 bits of information per card in the same time it takes the System 1 guy to convey one bit by revealing one of his bare hands.
You see how the analogy seems to break down: finding a particular card in a deck and showing it takes longer than simply deciding to show your left or right hand. But, that just provides an opportunity to extend the analogy profitably.
A communications system with many bits per symbol faces a similar difficulty, because the encoding schemes required to send multiple bits per symbol are much more complicated than those that send only one bit at a time. To extend the analogy, then, the guy showing playing cards could have several people behind him sharing the work of finding the next card in the deck, handing him cards as fast as he can show them. The helpers are analogous to the more powerful processors required to produce the many-bits-per-baud encoding schemes.
That is to say, by using more processing power, System 2 can send data 5 times faster than the more primitive System 1.
Historical Vignette
What shall we do with our 5-bit code? It seems natural to an English speaker to use 26 of the 32 available code points for the English alphabet. We can use the remaining 6 code points for a space character and a small set of control codes and symbols.
Or, we could just use Baudot code, a 5-bit code invented by Émile Baudot, after whom the unit "baud" was coined.³
Footnotes and Digressions:
For example, the V.34 standard defined a 3,429 baud mode at 8.4 bits per symbol to achieve 28.8 kbit/sec throughput.
That standard only talks about the POTS side of the modem. The RS-232 side remains a 1 bit per symbol system, so you could also correctly call it a 28.8k baud modem. Confusing, but technically correct.
I've purposely kept things simple here.
One thing you might think about is whether the absence of a playing card conveys information. If it does, that implies the existence of some clock or latch signal, so that you can tell the information-carrying absence of a card from the gap between the display of two cards.
Also, what do you do with the cards left over in a poker deck, 9 through King, and the Jokers? One idea would be to use them as special flags to carry metadata. For example, you'll need a way to indicate a short trailing block. If you need to send 128 bits of information, you're going to need to show 26 cards. The first 25 cards convey 5×25=125 bits, with the 26th card conveying the trailing 3 bits. You need some way to signal that the last two bits in the symbol should be disregarded.
This is why the early analog telephone modems were specified in terms of baud instead of bps: communications engineers had been using that terminology since the telegraph days. They weren't trying to confuse bps and baud; it was simply a fact, in their minds, that these modems were transmitting one bit per symbol.

I don't understand why everyone is making this complicated (answers).
I'll just leave this here.
So above would be:
Signal Unit: 4 bits
Baud Rate [Signal Units per second]: 1000 Bd (baud)
Bit Rate [Baud Rate * Signal Unit]: 4000 bps (bits per second)
Bit rate and Baud rate, these two terms are often used in data
communication. Bit rate is simply the number of bits (i.e., 0’s and
1’s) transmitted per unit time. While Baud rate is the number of
signal units transmitted per unit time that is needed to represent
those bits.

Bit rate:-
Bit rate is nothing but number of bits transmitted per second.For example if Bit rate is 1000bps then 1000 bits are i.e. 0s or 1s transmitted per second.
Baud rate:-
It means number of time signal changes its state.When the signal is binary then baud rate and bit rate are same.

According to What’s The Difference Between Bit Rate And Baud Rate?:
Bit Rate
The speed of the data is expressed in bits per second (bits/s or bps).
The data rate R is a function of the duration of the bit or bit time
(TB) (Fig. 1, again):
R = 1/TB
Rate is also called channel capacity C. If the bit time is 10 ns, the
data rate equals:
R = 1/10 x 10–9 = 100 million bits/s
This is usually expressed as 100 Mbits/s.
Baud Rate
The term “baud” originates from the French engineer Emile Baudot, who
invented the 5-bit teletype code. Baud rate refers to the number of
signal or symbol changes that occur per second. A symbol is one of
several voltage, frequency, or phase changes.
NRZ binary has two symbols, one for each bit 0 or 1, that represent
voltage levels. In this case, the baud or symbol rate is the same as
the bit rate. However, it’s possible to have more than two symbols per
transmission interval, whereby each symbol represents multiple bits.
With more than two symbols, data is transmitted using modulation
techniques.
When the transmission medium can’t handle the baseband data,
modulation enters the picture. Of course, this is true of wireless.
Baseband binary signals can’t be transmitted directly; rather, the
data is modulated on to a radio carrier for transmission. Some cable
connections even use modulation to increase the data rate, which is
referred to as “broadband transmission.”
By using multiple symbols, multiple bits can be transmitted per
symbol. For example, if the symbol rate is 4800 baud and each symbol
represents two bits, that translates into an overall bit rate of 9600
bits/s. Normally the number of symbols is some power of two. If N is
the number of bits per symbol, then the number of required symbols is
S = 2^N. Thus, the gross bit rate is:
R = baud rate x log2S = baud rate x 3.32 log10S
If the baud rate is 4800 and there are two bits per symbol, the number
of symbols is 2^2 = 4. The bit rate is:
R = 4800 x 3.32 log(4) = 4800 x 2 = 9600 bits/s
If there’s only one bit per symbol, as is the case with binary NRZ,
the bit and baud rates remain the same.

First something I think necessary to know:
It is symbol that is transferred on a physical channel. Not bit. Symbol is the physical signals that is transferred over the physical medium to convey the data bits. A symbol can be one of several voltage, frequency, or phase changes. Symbol is decided by the physical nature of the medium. While bit is a logical concept.
If you want to transfer data bits, you must do it by sending symbols over the medium. Baud rate describes how fast symbols change over a medium. I.e. it describes the rate of physical state changes over the medium.
If we use only 2 symbols to transfer binary data, which means one symbol for 0 and another symbol for 1, that will lead to baud rate = bit rate. And this is how it works in the old days.
If we are lucky enough to find a way to encode more bits into a symbol, we can achieve higher bit rate with the same baud rate. And this is when the baud rate < bit rate. This doesn't mean the transfer speed is slowed down. It actually means the transfer efficiency/speed is increased.
And the communicating parties have to agree on how bits are represented by each physical symbol. This is where the modulation protocols come in.
But the ability of sending multiple bits per symbol doesn't come free. The transmitter and receiver will be complex depending on the modulation methods. And more processing power is required.
Finally, I'd like to make an analogy:
Suppose I stand on the roof of my house and you stand on your roof. There's a rope between you and me. I want to send some apples to you through a basket down the rope.
The basket is the symbol. The apple is the data bits.
If the basket is small (a physical limitation of the symbol), I may only send one apple per basket. This is when baud/basket rate = bit/apple rate.
If the basket is big, I can send more apples per basket. This is when baud rate < bit rate. I can send all the apples with less baskets. But it takes me more effort (processing power) to put more apples into the basket than put just one apple. If the basket rate remains the same, the more apples I put in one basket, the less time it takes.
Here are some related threads:
How can I be sure that a multi-bit-per-symbol encoding schema exists?
What is difference between the terms bit rate,baud rate and data rate?

bit rate : no of bits(0 or 1 for binary signal) transmitted per second.
baud rate : no of symbols per second.
A symbol consists of 'n' number of bits.
Baud rate = (bit rate)/n
So baud rate is always less than or equal to bit rate.It is equal when signal is binary.

Baud rate is mostly used in telecommunication and electronics, representing symbol per second or pulses per second, whereas bit rate is simply bit per second. To be simple, the major difference is that symbol may contain more than 1 bit, say n bits, which makes baud rate n times smaller than bit rate.
Suppose a situation where we need to represent a serial-communication signal, we will use 8-bit as one symbol to represent the info. If the symbol rate is 4800 baud, then that translates into an overall bit rate of 38400 bits/s. This could also be true for wireless communication area where you will need multiple bits for purpose of modulation to achieve broadband transmission, instead of simple baseline transmission.
Hope this helps.

Bit per second is what is means - rate of data transmission of ones and zeros per second are used.This is called bit per second(bit/s. However, it should not be confused with bytes per second, abbreviated as bytes/s, Bps, or B/s.
Raw throughput values are normally given in bits per second, but many software applications report transfer rates in bytes per second.
So, the standard unit for bit throughput is the bit per second, which is commonly abbreviated bit/s, bps, or b/s.
Baud is a unit of measure of changes , or transitions , that occurs in a signal in each second.
For example if the signal changes from one value to a zero value(or vice versa) one hundred times per second, that is a rate of 100 baud.
The other one measures data(the throughput of channel), and the other ones measures transitions(called signalling rates).
For example if you look at modern modems they use advanced modulation techniques that encoded more than one bit of data into each transition.
Thanks.

The bit rate is a measure of the number of bits that are transmitted per unit of time.
The baud rate, which is also known as symbol rate, measures the number of symbols that are transmitted per unit of time.
A symbol typically consists of a fixed number of bits depending on what the symbol is defined as(for example 8bit or 9bit data). The baud rate is measured in symbols per second.
Take an example, where an ascii character 'R' is transmitted over a serial channel every one second.
The binary equivalent is 01010010.
So in this case, the baud rate is 1(one symbol transmitted per second) and the bit rate is 8 (eight bits are transmitted per second).

This topic is confusing because there are 3 terms in use when people think there are just 2, namely:
"bit rate": units are bits per second
"baud": units are symbols per second
"Baud rate": units are bits per second
"Baud rate" is really a marketing term rather than an engineering term. "Baud rate" was used by modem manufactures in a similar way to megapixels is used for digital cameras. So the higher the "Baud rate" the better the modem was perceived to be.
The engineering unit "baud" is already a rate (symbols per second) which distinguishes it from the "Baud rate" term. However, you can see from the answers that people are confusing these 2 terms together such as baud/sec which is wrong.
From an engineering point of view, I recommend people use the term "bit rate" for "RS-232" and consign to history the term "Baud rate". Use the term "baud" for modulation schemes but avoid it for "RS-232".
In other words, "bit rate" and "Baud rate" are the same thing which means how many bits are transmitted along a wire in one second. Note that bits per second (bps) is the low-level line rate and not the information data rate because asynchronous "RS-232" has start and stop bits that frame the 8 data bits of information so bps includes all bits transmitted.

Bit rate is a measure of the number of data bits (that's 0's and 1's) transmitted in one second. A figure of 2400 bits per second means 2400 zeros or ones can be transmitted in one second, hence the abbreviation 'bps'.
Baud rate by definition means the number of times a signal in a communications channel changes state. For example, a 2400 baud rate means that the channel can change states up to 2400 times per second. When I say 'change state' I mean that it can change from 0 to 1 up to 2400 times per second. If you think about this, it's pretty much similar to the bit rate, which in the above example was 2400 bps.
Whether you can transmit 2400 zeros or ones in one second (bit rate), or change the state of a digital signal up to 2400 times per second (baud rate), it the same thing.

Serial Data Speed:
Data rate (bps) = 1/Tb
Tb is the time duration of 1 bit
If the bit duration is 2ms then data rate is 1/2x10-3 , which is about 500 bps.
Baud rate:
Baud rate is defined as no. of signalling elements(symbols) in a given unit of time (say 1 sec) or it means number of time signal changes its state.When the signal is binary then baud rate and bit rate are same.
Bit rate:- Bit rate is nothing but number of bits transmitted per second.For example if Bit rate is 1000 bps then 1000 bits are i.e. 0s or 1s transmitted per second.
There are few other terms similar to this (i.e serial speed, bit rate, baud rate, USB transfer rate),and i guess(?) the values that are printed on serial monitor relates to serial speed, baud rate and USB transfer rate. Bit rate isn't an another term, please correct me if i am wrong, because serial monitor prints some values at an interval of time and value is definitely a set of bits. so if one value is printed i can say no of bits present in the respective value which gets printed on serial monitor per unit time will be the bit rate.

Replies here are misleading. Saying true, but no one tell that for UART a symbol is not a single character but a single bit and this way the question was tagged.
For example 115200/8n1 is 11520 bytes per second as a single ASCII character is a 1 start bit plus 8 data bit plus 1 stop bit.

As correctly pointed out in the other replies, the bitrate is the amount of logical (or "abstract high level") information transferred in a given time, while baud rate is the number of symbols (more or less "signal changes") in the physical line in a given time.
While it is easy to understand that if a transmitted symbol carries 4 bits of information, then the bitrate is four time the baud rate, things get blurred in case of, for example, a RS-232 serial line.
The classic serial line works on bytes (well, "frames"), not bits. There is no way to transmit fewer that 8 bits (i.e. a byte), because the serial line defines a "frame" (I assume frames with 8 data bits, no parity, 1 start bit and 1 stop bit); and this is usually OK, because devices (computers) work probably on bytes, not single bits.
Given that, when a device sends a byte, i.e. 8 bits, the physical lines transmits 10 symbols, because to the original data composed of 8 bits, 2 more are added (start and stop bits, they are needed for synchronization). Some confusion can arise because the symbols transmitted on the physical line are also called "bits", but they are really symbols (MARK and SPACE, actually).
So in that classic RS-232 (in case of "8N1" frame) the bitrate is actually 8/10 of the baudrate. If we add the parity bit, the ratio lowers further and becomes 8/11.
The number of bits or symbols per second translates directly to the duration of them (bits or symbols). What does it mean for an engineer designing a system? It means that if he is designing a line filter to protect the line or reduce the noise, he should take the duration (or frequency) of the symbols transmitted on that line. For a baudrate of 1000 baud, he knows that the frequency of the signal is 1 KHz, and that a symbol has a duration of 1 ms. Fine. But if he has to calculate how much time is needed to transfer a file from a device to another, say a file of 1000 bytes, he must consider the bitrate, not the baudrate! Because the devices, at higher level, do not even see the start and stop bits, they are only a burden which slow down the communication (but they are useful for error checking).
To take it to the extreme case, imagine that a serial frame is just a bit long. For every bit transmitted by a device, three symbols would travel in the physical line. And if a parity were added, then four symbol would travel: the bitrate would be 1/4 of the baudrate. And if we add a second stop bit, the bitrate goes down to 1/5 of the baudrate!

Related

Arduino accelerometer MPU-6050

This might sound like a very silly question, so I apologize if this is something very simple but I just cannot get my head around it. I am trying to understand what the data provides in terms of real time information, for example, the MPU-6050:
Gyroscope - is a 16 bit data register with a range from (0 <-> 65535)
There is a selection of ranges (±250, ±500, ±1000, and ±2000°/sec)
If the range is set to ±250°/sec, is the reading 360/65535 = 0.0054 resolution?
What does °/sec mean, if the sensor does not move and reads zero and then turned quickly does it mean it will be reading the angle at the set range? For example, if the range was set to ±2000°/sec and it was moved 200° would the read move from 0 to (2/65535 *200) and keep sending this value once the sensor stopped moving?
Accelerometer - is a 16 bit data register with a range from (0 <-> 65535)
There is a selection of ranges (±2g, ±4g, ±8g and ±16g)
If the sensor is not moving, completely flat the reading will be 0?
If the sensor is shocked at 2g will the max reading be 65535 (if set of 2g, with a resolution of 2/65535)
If the sensor is shocked at 16g will the max reading be 65535 (if set of 16g, with a resolution of 16/65535))
There are two main documents regarding the MPU6050, and those are the datasheet and the register map.
The gyro measurements are stored in the GYRO_XOUT, GYRO_YOUT, GYRO_ZOUT parameters, as you can see in the register map document, page 31. Each parameter is stored as a 2-complement signed 16 bit value split into two 8-bit registers: the GYRO_xOUT_L and _H.
In the same page, you can see the sensitivity for each full-scale range. For example, if your FSR is +/- 250º/sec, and you want to measure 1º/sec, the GYRO_xOUT parameter should read 131 counts.
The accelerometer-related registers can be seen in the same document, page 29. The idea is the same, two 8-bit registers to form a 2-complement signed 16-bit value, and the sensitivity values for each FSR.
Regarding your question in comments, if you rotate the device 125º in a second, at constant rotation speed, you should read 16375 in the rotation registers during the movement. This value comes from 131 counts/(º/sec) * 125º/sec = 16375 counts.

Generate signals with 0.1Hz resolutions using AD9833 via Arduino Uno

I would like to generate a frequency with the resolution of 0.1Hz from the range of 0.0 up til 1000.0 Hz ( Example such as 23.1 Hz, 100.5 Hz and 999.7 Hz) I have found that using AD9833 we can generate the signal as what I was required, but the notes are a bit confusing to me.
The specification can be obtained HERE .
Need your kind assist to if we can make the Arduino code.. lets say, to generate a signal of 123.4 Hz via Serial monitor from Arduino and it displayed as it is in the oscilloscope?
Thank you.
Looking at the notes, it appears that programming this chip will be non-trivial. If you don't require frequencies all the way down to 0 Hz, this job can be done much more easily with a standard Windows sound card. (Sound cards are AC-coupled, so won't go below a few Hz.) For one example, my Daqarta software can generate frequencies (with any waveform you want) at a resolution better than 0.001 Hz. The maximum frequency will be a bit less than half the sound card's sample rate... typically 20 kHz at the default 48000 Hz sample rate.
You don't have to buy Daqarta to get this capability; the Generator function will continue to work after the trial period... free, forever.
UPDATE: You don't mention what sort of waveforms you need, but note that if you can use square waves you may be able to do the whole job with the Arduino alone. The idea is to set up a timer to produce interrupts at some desired sample rate. On each interrupt you add a step value to an accumulator, and send the MSB of the accumulator to an output pin. You control the output frequency by changing the step value. This is essentially a 1-bit version of the phase accumulator approach used by the AD9833 (and by the Daqarta Generator). The frequency resolution is controlled by the sample rate and the size of the accumulator. You can easily get much better than 0.1 Hz resolution.
Best regards,

Determining a formula for a packet switching network?

Let's say we have a packet of length L bits. It is transmitted from system A through three links to system B. The three links are connected by two packet switches. di, si and Ri are the length, propagation speed and transmission rate for each link, i, in the example network. Each packet switch delays each packed by dproc (processing time).
Lets also say that there are no queuing delays; so how would i go about writing a formula for computing the end-to-end delay for a packet of length L on this theoretical network?
This is what i have so far:
End-End Delay = L/R_1 + L/R_2 + L/R_3 + d_1/s_1 + d_2/s_2 + d_3/s_3 +2(d_proc)
Is this correct, if not, what is the correct formula and why so?
Yes, your formula is correct, assuming that the processing time of each switch is the same. Also, is calculating actual delay be sure to use same dimensions for units - bits and bits/s for size and transfer rate and meters and meters/s for propagation. Take note that if the switches are connected by the fiber-optic links you will have to divide speed of light by the diffraction rating of the fiber in calculations.

Sampling / Quantization / PCM networks

Suppose an analog audio signal is sampled 16,000 times per second, and each sample is quantized into one of 1024 levels. What would be the resulting bit rate of the PCM digital audio signal?
so that a question in Top down approach book , i answered it but just want to make sure it is correct
my answer is
1024 = 2 ^10
so PCM bit rate = 10 * 16000 = 160 , 000 bps
is that correct
Software often makes the trade off between time and space. Your answer is correct, however to write software you typically read/write data into storage units of bytes (8 bits). Since your answer says 10 bits, your code would use two bytes (16 bits) per sample. So the file consumption rate would be 16 * 160000 == 256000 bits per second (32000 bytes per second). This is mono so stereo would double this. Your software to actually store 10 bits per sample instead of 16 bits would shift this time/space trade-off in the direction of increased computational time (and code complexity) to save storage space.

Why are 8 and 256 such important numbers in computer sciences?

I don't know very well about RAM and HDD architecture, or how electronics deals with chunks of memory, but this always triggered my curiosity:
Why did we choose to stop at 8 bits for the smallest element in a computer value ?
My question may look very dumb, because the answer are obvious, but I'm not very sure...
Is it because 2^3 allows it to fit perfectly when addressing memory ?
Are electronics especially designed to store chunk of 8 bits ? If yes, why not use wider words ?
It is because it divides 32, 64 and 128, so that processor words can be be given several of those words ?
Is it just convenient to have 256 value for such a tiny space ?
What do you think ?
My question is a little too metaphysical, but I want to make sure it's just an historical reason and not a technological or mathematical reason.
For the anecdote, I was also thinking about the ASCII standard, in which most of the first characters are useless with stuff like UTF-8, I'm also trying to think about some tinier and faster character encoding...
Historically, bytes haven't always been 8-bit in size (for that matter, computers don't have to be binary either, but non-binary computing has seen much less action in practice). It is for this reason that IETF and ISO standards often use the term octet - they don't use byte because they don't want to assume it means 8-bits when it doesn't.
Indeed, when byte was coined it was defined as a 1-6 bit unit. Byte-sizes in use throughout history include 7, 9, 36 and machines with variable-sized bytes.
8 was a mixture of commercial success, it being a convenient enough number for the people thinking about it (which would have fed into each other) and no doubt other reasons I'm completely ignorant of.
The ASCII standard you mention assumes a 7-bit byte, and was based on earlier 6-bit communication standards.
Edit: It may be worth adding to this, as some are insisting that those saying bytes are always octets, are confusing bytes with words.
An octet is a name given to a unit of 8 bits (from the Latin for eight). If you are using a computer (or at a higher abstraction level, a programming language) where bytes are 8-bit, then this is easy to do, otherwise you need some conversion code (or coversion in hardware). The concept of octet comes up more in networking standards than in local computing, because in being architecture-neutral it allows for the creation of standards that can be used in communicating between machines with different byte sizes, hence its use in IETF and ISO standards (incidentally, ISO/IEC 10646 uses octet where the Unicode Standard uses byte for what is essentially - with some minor extra restrictions on the latter part - the same standard, though the Unicode Standard does detail that they mean octet by byte even though bytes may be different sizes on different machines). The concept of octet exists precisely because 8-bit bytes are common (hence the choice of using them as the basis of such standards) but not universal (hence the need for another word to avoid ambiguity).
Historically, a byte was the size used to store a character, a matter which in turn builds on practices, standards and de-facto standards which pre-date computers used for telex and other communication methods, starting perhaps with Baudot in 1870 (I don't know of any earlier, but am open to corrections).
This is reflected by the fact that in C and C++ the unit for storing a byte is called char whose size in bits is defined by CHAR_BIT in the standard limits.h header. Different machines would use 5,6,7,8,9 or more bits to define a character. These days of course we define characters as 21-bit and use different encodings to store them in 8-, 16- or 32-bit units, (and non-Unicode authorised ways like UTF-7 for other sizes) but historically that was the way it was.
In languages which aim to be more consistent across machines, rather than reflecting the machine architecture, byte tends to be fixed in the language, and these days this generally means it is defined in the language as 8-bit. Given the point in history when they were made, and that most machines now have 8-bit bytes, the distinction is largely moot, though it's not impossible to implement a compiler, run-time, etc. for such languages on machines with different sized bytes, just not as easy.
A word is the "natural" size for a given computer. This is less clearly defined, because it affects a few overlapping concerns that would generally coïncide, but might not. Most registers on a machine will be this size, but some might not. The largest address size would typically be a word, though this may not be the case (the Z80 had an 8-bit byte and a 1-byte word, but allowed some doubling of registers to give some 16-bit support including 16-bit addressing).
Again we see here a difference between C and C++ where int is defined in terms of word-size and long being defined to take advantage of a processor which has a "long word" concept should such exist, though possibly being identical in a given case to int. The minimum and maximum values are again in the limits.h header. (Indeed, as time has gone on, int may be defined as smaller than the natural word-size, as a combination of consistency with what is common elsewhere, reduction in memory usage for an array of ints, and probably other concerns I don't know of).
Java and .NET languages take the approach of defining int and long as fixed across all architecutres, and making dealing with the differences an issue for the runtime (particularly the JITter) to deal with. Notably though, even in .NET the size of a pointer (in unsafe code) will vary depending on architecture to be the underlying word size, rather than a language-imposed word size.
Hence, octet, byte and word are all very independent of each other, despite the relationship of octet == byte and word being a whole number of bytes (and a whole binary-round number like 2, 4, 8 etc.) being common today.
Not all bytes are 8 bits. Some are 7, some 9, some other values entirely. The reason 8 is important is that, in most modern computers, it is the standard number of bits in a byte. As Nikola mentioned, a bit is the actual smallest unit (a single binary value, true or false).
As Will mentioned, this article http://en.wikipedia.org/wiki/Byte describes the byte and its variable-sized history in some more detail.
The general reasoning behind why 8, 256, and other numbers are important is that they are powers of 2, and computers run using a base-2 (binary) system of switches.
ASCII encoding required 7 bits, and EBCDIC required 8 bits. Extended ASCII codes (such as ANSI character sets) used the 8th bit to expand the character set with graphics, accented characters and other symbols.Some architectures made use of proprietary encodings; a good example of this is the DEC PDP-10, which had a 36 bit machine word. Some operating sytems on this architecture used packed encodings that stored 6 characters in a machine word for various purposes such as file names.
By the 1970s, the success of the D.G. Nova and DEC PDP-11, which were 16 bit architectures and IBM mainframes with 32 bit machine words was pushing the industry towards an 8 bit character by default. The 8 bit microprocessors of the late 1970s were developed in this environment and this became a de facto standard, particularly as off-the shelf peripheral ships such as UARTs, ROM chips and FDC chips were being built as 8 bit devices.
By the latter part of the 1970s the industry settled on 8 bits as a de facto standard and architectures such as the PDP-8 with its 12 bit machine word became somewhat marginalised (although the PDP-8 ISA and derivatives still appear in embedded sytem products). 16 and 32 bit microprocessor designs such as the Intel 80x86 and MC68K families followed.
Since computers work with binary numbers, all powers of two are important.
8bit numbers are able to represent 256 (2^8) distinct values, enough for all characters of English and quite a few extra ones. That made the numbers 8 and 256 quite important.
The fact that many CPUs (used to and still do) process data in 8bit helped a lot.
Other important powers of two you might have heard about are 1024 (2^10=1k) and 65536 (2^16=65k).
Computers are build upon digital electronics, and digital electronics works with states. One fragment can have 2 states, 1 or 0 (if the voltage is above some level then it is 1, if not then it is zero). To represent that behavior binary system was introduced (well not introduced but widely accepted).
So we come to the bit. Bit is the smallest fragment in binary system. It can take only 2 states, 1 or 0, and it represents the atomic fragment of the whole system.
To make our lives easy the byte (8 bits) was introduced. To give u some analogy we don't express weight in grams, but that is the base measure of weight, but we use kilograms, because it is easier to use and to understand the use. One kilogram is the 1000 grams, and that can be expressed as 10 on the power of 3. So when we go back to the binary system and we use the same power we get 8 ( 2 on the power of 3 is 8). That was done because the use of only bits was overly complicated in every day computing.
That held on, so further in the future when we realized that 8 bytes was again too small and becoming complicated to use we added +1 on the power ( 2 on the power of 4 is 16), and then again 2^5 is 32, and so on and the 256 is just 2 on the power of 8.
So your answer is we follow the binary system because of architecture of computers, and we go up in the value of the power to represent get some values that we can simply handle every day, and that is how you got from a bit to an byte (8 bits) and so on!
(2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, and so on) (2^x, x=1,2,3,4,5,6,7,8,9,10 and so on)
The important number here is binary 0 or 1. All your other questions are related to this.
Claude Shannon and George Boole did the fundamental work on what we now call information theory and Boolean arithmetic. In short, this is the basis of how a digital switch, with only the ability to represent 0 OFF and 1 ON can represent more complex information, such as numbers, logic and a jpg photo. Binary is the basis of computers as we know them currently, but other number base computers or analog computers are completely possible.
In human decimal arithmetic, the powers of ten have significance. 10, 100, 1000, 10,000 each seem important and useful. Once you have a computer based on binary, there are powers of 2, likewise, that become important. 2^8 = 256 is enough for an alphabet, punctuation and control characters. (More importantly, 2^7 is enough for an alphabet, punctuation and control characters and 2^8 is enough room for those ASCII characters and a check bit.)
We normally count in base 10, a single digit can have one of ten different values. Computer technology is based on switches (microscopic) which can be either on or off. If one of these represents a digit, that digit can be either 1 or 0. This is base 2.
It follows from there that computers work with numbers that are built up as a series of 2 value digits.
1 digit,2 values
2 digits, 4 values
3 digits, 8 values etc.
When processors are designed, they have to pick a size that the processor will be optimized to work with. To the CPU, this is considered a "word". Earlier CPUs were based on word sizes of fourbits and soon after 8 bits (1 byte). Today, CPUs are mostly designed to operate on 32 bit and 64 bit words. But really, the two state "switch" are why all computer numbers tend to be powers of 2.
I believe the main reason has to do with the original design of the IBM PC. The Intel 8080 CPU was the first precursor to the 8086 which would later be used in the IBM PC. It had 8-bit registers. Thus, a whole ecosystem of applications was developed around the 8-bit metaphor. In order to retain backward compatibility, Intel designed all subsequent architectures to retain 8-bit registers. Thus, the 8086 and all x86 CPUs after that kept their 8-bit registers for backwards compatibility, even though they added new 16-bit and 32-bit registers over the years.
The other reason I can think of is 8 bits is perfect for fitting a basic Latin character set. You cannot fit it into 4 bits, but you can in 8. Thus, you get the whole 256-value ASCII charset. It is also the smallest power of 2 for which you have enough bits into which you can fit a character set. Of course, these days most character sets are actually 16-bit wide (i.e. Unicode).
Charles Petzold wrote an interesting book called Code that covers exactly this question. See chapter 15, Bytes and Hex.
Quotes from that chapter:
Eight bit values are inputs to the
adders, latches and data selectors,
and also outputs from these units.
Eight-bit values are also defined by
switches and displayed by lightbulbs,
The data path in these circuits is
thus said to be 8 bits wide. But
why 8 bits? Why not 6 or 7 or 9 or
10?
... there's really no reason why
it had to be built that way. Eight
bits just seemed at the time to be a
convenient amount, a nice biteful of
bits, if you will.
...For a while, a byte meant simply
the number of bits in a particular
data path. But by the mid-1960s. in
connection with the development of
IBM's System/360 (their large complex
of business computers), the word came
to mean a group of 8 bits.
... One reason IBM gravitated toward
8-bit bytes was the ease in storing
numbers in a format known as BCD.
But as we'll see in the chapters ahead, quite by coincidence a byte is
ideal for storing text because most
written languages around the world
(with the exception of the ideographs
used in Chinese, Japanese and Korean)
can be represented with fewer than 256
characters.
Historical reasons, I suppose. 8 is a power of 2, 2^2 is 4 and 2^4 = 16 is far too little for most purposes, and 16 (the next power of two) bit hardware came much later.
But the main reason, I suspect, is the fact that they had 8 bit microprocessors, then 16 bit microprocessors, whose words could very well be represented as 2 octets, and so on. You know, historical cruft and backward compability etc.
Another, similarily pragmatic reason against "scaling down": If we'd, say, use 4 bits as one word, we would basically get only half the troughtput compared with 8 bit. Aside from overflowing much faster.
You can always squeeze e.g. 2 numbers in the range 0..15 in one octet... you just have to extract them by hand. But unless you have, like, gazillions of data sets to keep in memory side-by-side, this isn't worth the effort.

Resources