Im really confused over here. I am a ai programmer working on a game that is designed to detect beats in songs and some more. I have no previous knowledge about audio and just reading through whatever material i can find. While i got fft working and stuff I simply don't understand the way samples are transferred to different frequencies. Question 1, what does each frequency stands for. For the algorithm i got. I can transfer for example 1024 samples into 512 outcomes. So are they a description of the strength of each spectrum at the current second? it doesn't really make sense since what i remember is that there are 20,000hz in a 44.1khz audio recording. So how does 512 spectrum samples explain what is happening in that moment? Question 2, from what i read, its a number that represent the sound wave at this moment. However i read that by squaring both left channel and right channel, and add them together and you will get the current power level. Both these seems incoherent to my understanding, and i am really buff led so please explain away.
DFT output
the output is complex representation of phasor (Re,Im,Frequency) of basis function (usually sin wave). First item is DC offset so skip it. All the others are multiples of the same fundamental frequency (sampling rate/N). The output is symmetric (if the input is real only) so use just first half of results. Often power spectrum is used
Amplitude=sqrt(Re^2+Im^2)
which is the amplitude of basis function. If phase is needed then
phase=atan2(Im,Re)
beware DFT results are strongly dependent on the input signal shape,frequency and phase shift to your basis functions. That causes the output to vibrate/oscillate around the correct value and produce wide peaks instead of sharp ones for singular frequencies not to mention aliasing.
frequencies
if you got 44100Hz then the max output frequency is half of it that means the biggest frequency present in data is 22050Hz. The DFFT however does not contain this frequency so if you ignore the mirrored second half of results then:
for 4 samples DFT outputs frequencies are { -,11025 } Hz
for 8 samples frequencies are: { -,5512.5,11025,16537.5 } Hz
The output frequency is linear to its address from start so if you got N=512 samples
do DFFT on it
obtain first N/2=256 results
i-th sample represents frequency f=i*samplerate/N Hz
where i={ 1,...,(N/2)-1} ... skipping i=0
the image shows one of mine utility apps tighted together with
2-channel sound generator (top left)
2-channel oscilloscope (top right)
2-channel spectral analyzer (bottom) ... switched to linear frequency scale to make obvious what I mean in above text
zoom the image to see the settings ... I made it as close to the real devices as I could.
Here DCT and DFT comparison:
Here the DFT output dependency on input signal frequency aliasing by sampling rate
more channels
Summing power of channels is more safe. If you just add the channels then you could miss some data. For example let left channel is playing 1 Khz sin wave and the right exact opposite so if you just sum them then the result is zero but you can hear the sound .... (if you are not exactly in the middle between speakers). If you analyze each channel independently then you need to calculate DFFT for each channel but if you use power sum of channels (or abs sum) then you can obtain the frequencies for all channels at once , of coarse you need to scale the amplitudes ...
[Notes]
Bigger the N nicer the result (less aliasing artifacts and closer to the max frequency). For specific frequencies detection are FIR filter detectors more precise and faster.
Strongly recommend to read DFT and all sublinks there and also this plotting real time Data on (qwt) Oscillocope
I am really having hard time understanding the difference. Some say they are same, while others say there is a slight difference. What's the difference, exactly? I would like it if you explained with some analogy.
Bits per second is straightforward. It is exactly what it sounds like. If I have 1000 bits and am sending them at 1000 bps, it will take exactly one second to transmit them.
Baud is symbols per second. If these symbols — the indivisible elements of your data encoding — are not bits, the baud rate will be lower than the bit rate by the factor of bits per symbol. That is, if there are 4 bits per symbol, the baud rate will be ¼ that of the bit rate.
This confusion arose because the early analog telephone modems weren't very complicated, so bps was equal to baud. That is, each symbol encoded one bit. Later, to make modems faster, communications engineers invented increasingly clever ways to send more bits per symbol.¹
Analogy
System 1, bits: Imagine a communication system with a telescope on the near side of a valley and a guy on the far side holding up one hand or the other. Call his left hand "0" and his right hand "1," and you have a system for communicating one binary digit — one bit — at a time.
System 2, baud: Now imagine that the guy on the far side of the valley is holding up playing cards instead of his bare hands. He is using a subset of the cards, ace through 8 in each suit, for a total of 32 cards. Each card — each symbol — encodes 5 bits: 00000 through 11111 in binary.²
Analysis
The System 2 guy can convey 5 bits of information per card in the same time it takes the System 1 guy to convey one bit by revealing one of his bare hands.
You see how the analogy seems to break down: finding a particular card in a deck and showing it takes longer than simply deciding to show your left or right hand. But, that just provides an opportunity to extend the analogy profitably.
A communications system with many bits per symbol faces a similar difficulty, because the encoding schemes required to send multiple bits per symbol are much more complicated than those that send only one bit at a time. To extend the analogy, then, the guy showing playing cards could have several people behind him sharing the work of finding the next card in the deck, handing him cards as fast as he can show them. The helpers are analogous to the more powerful processors required to produce the many-bits-per-baud encoding schemes.
That is to say, by using more processing power, System 2 can send data 5 times faster than the more primitive System 1.
Historical Vignette
What shall we do with our 5-bit code? It seems natural to an English speaker to use 26 of the 32 available code points for the English alphabet. We can use the remaining 6 code points for a space character and a small set of control codes and symbols.
Or, we could just use Baudot code, a 5-bit code invented by Émile Baudot, after whom the unit "baud" was coined.³
Footnotes and Digressions:
For example, the V.34 standard defined a 3,429 baud mode at 8.4 bits per symbol to achieve 28.8 kbit/sec throughput.
That standard only talks about the POTS side of the modem. The RS-232 side remains a 1 bit per symbol system, so you could also correctly call it a 28.8k baud modem. Confusing, but technically correct.
I've purposely kept things simple here.
One thing you might think about is whether the absence of a playing card conveys information. If it does, that implies the existence of some clock or latch signal, so that you can tell the information-carrying absence of a card from the gap between the display of two cards.
Also, what do you do with the cards left over in a poker deck, 9 through King, and the Jokers? One idea would be to use them as special flags to carry metadata. For example, you'll need a way to indicate a short trailing block. If you need to send 128 bits of information, you're going to need to show 26 cards. The first 25 cards convey 5×25=125 bits, with the 26th card conveying the trailing 3 bits. You need some way to signal that the last two bits in the symbol should be disregarded.
This is why the early analog telephone modems were specified in terms of baud instead of bps: communications engineers had been using that terminology since the telegraph days. They weren't trying to confuse bps and baud; it was simply a fact, in their minds, that these modems were transmitting one bit per symbol.
I don't understand why everyone is making this complicated (answers).
I'll just leave this here.
So above would be:
Signal Unit: 4 bits
Baud Rate [Signal Units per second]: 1000 Bd (baud)
Bit Rate [Baud Rate * Signal Unit]: 4000 bps (bits per second)
Bit rate and Baud rate, these two terms are often used in data
communication. Bit rate is simply the number of bits (i.e., 0’s and
1’s) transmitted per unit time. While Baud rate is the number of
signal units transmitted per unit time that is needed to represent
those bits.
Bit rate:-
Bit rate is nothing but number of bits transmitted per second.For example if Bit rate is 1000bps then 1000 bits are i.e. 0s or 1s transmitted per second.
Baud rate:-
It means number of time signal changes its state.When the signal is binary then baud rate and bit rate are same.
According to What’s The Difference Between Bit Rate And Baud Rate?:
Bit Rate
The speed of the data is expressed in bits per second (bits/s or bps).
The data rate R is a function of the duration of the bit or bit time
(TB) (Fig. 1, again):
R = 1/TB
Rate is also called channel capacity C. If the bit time is 10 ns, the
data rate equals:
R = 1/10 x 10–9 = 100 million bits/s
This is usually expressed as 100 Mbits/s.
Baud Rate
The term “baud” originates from the French engineer Emile Baudot, who
invented the 5-bit teletype code. Baud rate refers to the number of
signal or symbol changes that occur per second. A symbol is one of
several voltage, frequency, or phase changes.
NRZ binary has two symbols, one for each bit 0 or 1, that represent
voltage levels. In this case, the baud or symbol rate is the same as
the bit rate. However, it’s possible to have more than two symbols per
transmission interval, whereby each symbol represents multiple bits.
With more than two symbols, data is transmitted using modulation
techniques.
When the transmission medium can’t handle the baseband data,
modulation enters the picture. Of course, this is true of wireless.
Baseband binary signals can’t be transmitted directly; rather, the
data is modulated on to a radio carrier for transmission. Some cable
connections even use modulation to increase the data rate, which is
referred to as “broadband transmission.”
By using multiple symbols, multiple bits can be transmitted per
symbol. For example, if the symbol rate is 4800 baud and each symbol
represents two bits, that translates into an overall bit rate of 9600
bits/s. Normally the number of symbols is some power of two. If N is
the number of bits per symbol, then the number of required symbols is
S = 2^N. Thus, the gross bit rate is:
R = baud rate x log2S = baud rate x 3.32 log10S
If the baud rate is 4800 and there are two bits per symbol, the number
of symbols is 2^2 = 4. The bit rate is:
R = 4800 x 3.32 log(4) = 4800 x 2 = 9600 bits/s
If there’s only one bit per symbol, as is the case with binary NRZ,
the bit and baud rates remain the same.
First something I think necessary to know:
It is symbol that is transferred on a physical channel. Not bit. Symbol is the physical signals that is transferred over the physical medium to convey the data bits. A symbol can be one of several voltage, frequency, or phase changes. Symbol is decided by the physical nature of the medium. While bit is a logical concept.
If you want to transfer data bits, you must do it by sending symbols over the medium. Baud rate describes how fast symbols change over a medium. I.e. it describes the rate of physical state changes over the medium.
If we use only 2 symbols to transfer binary data, which means one symbol for 0 and another symbol for 1, that will lead to baud rate = bit rate. And this is how it works in the old days.
If we are lucky enough to find a way to encode more bits into a symbol, we can achieve higher bit rate with the same baud rate. And this is when the baud rate < bit rate. This doesn't mean the transfer speed is slowed down. It actually means the transfer efficiency/speed is increased.
And the communicating parties have to agree on how bits are represented by each physical symbol. This is where the modulation protocols come in.
But the ability of sending multiple bits per symbol doesn't come free. The transmitter and receiver will be complex depending on the modulation methods. And more processing power is required.
Finally, I'd like to make an analogy:
Suppose I stand on the roof of my house and you stand on your roof. There's a rope between you and me. I want to send some apples to you through a basket down the rope.
The basket is the symbol. The apple is the data bits.
If the basket is small (a physical limitation of the symbol), I may only send one apple per basket. This is when baud/basket rate = bit/apple rate.
If the basket is big, I can send more apples per basket. This is when baud rate < bit rate. I can send all the apples with less baskets. But it takes me more effort (processing power) to put more apples into the basket than put just one apple. If the basket rate remains the same, the more apples I put in one basket, the less time it takes.
Here are some related threads:
How can I be sure that a multi-bit-per-symbol encoding schema exists?
What is difference between the terms bit rate,baud rate and data rate?
bit rate : no of bits(0 or 1 for binary signal) transmitted per second.
baud rate : no of symbols per second.
A symbol consists of 'n' number of bits.
Baud rate = (bit rate)/n
So baud rate is always less than or equal to bit rate.It is equal when signal is binary.
Baud rate is mostly used in telecommunication and electronics, representing symbol per second or pulses per second, whereas bit rate is simply bit per second. To be simple, the major difference is that symbol may contain more than 1 bit, say n bits, which makes baud rate n times smaller than bit rate.
Suppose a situation where we need to represent a serial-communication signal, we will use 8-bit as one symbol to represent the info. If the symbol rate is 4800 baud, then that translates into an overall bit rate of 38400 bits/s. This could also be true for wireless communication area where you will need multiple bits for purpose of modulation to achieve broadband transmission, instead of simple baseline transmission.
Hope this helps.
Bit per second is what is means - rate of data transmission of ones and zeros per second are used.This is called bit per second(bit/s. However, it should not be confused with bytes per second, abbreviated as bytes/s, Bps, or B/s.
Raw throughput values are normally given in bits per second, but many software applications report transfer rates in bytes per second.
So, the standard unit for bit throughput is the bit per second, which is commonly abbreviated bit/s, bps, or b/s.
Baud is a unit of measure of changes , or transitions , that occurs in a signal in each second.
For example if the signal changes from one value to a zero value(or vice versa) one hundred times per second, that is a rate of 100 baud.
The other one measures data(the throughput of channel), and the other ones measures transitions(called signalling rates).
For example if you look at modern modems they use advanced modulation techniques that encoded more than one bit of data into each transition.
Thanks.
The bit rate is a measure of the number of bits that are transmitted per unit of time.
The baud rate, which is also known as symbol rate, measures the number of symbols that are transmitted per unit of time.
A symbol typically consists of a fixed number of bits depending on what the symbol is defined as(for example 8bit or 9bit data). The baud rate is measured in symbols per second.
Take an example, where an ascii character 'R' is transmitted over a serial channel every one second.
The binary equivalent is 01010010.
So in this case, the baud rate is 1(one symbol transmitted per second) and the bit rate is 8 (eight bits are transmitted per second).
This topic is confusing because there are 3 terms in use when people think there are just 2, namely:
"bit rate": units are bits per second
"baud": units are symbols per second
"Baud rate": units are bits per second
"Baud rate" is really a marketing term rather than an engineering term. "Baud rate" was used by modem manufactures in a similar way to megapixels is used for digital cameras. So the higher the "Baud rate" the better the modem was perceived to be.
The engineering unit "baud" is already a rate (symbols per second) which distinguishes it from the "Baud rate" term. However, you can see from the answers that people are confusing these 2 terms together such as baud/sec which is wrong.
From an engineering point of view, I recommend people use the term "bit rate" for "RS-232" and consign to history the term "Baud rate". Use the term "baud" for modulation schemes but avoid it for "RS-232".
In other words, "bit rate" and "Baud rate" are the same thing which means how many bits are transmitted along a wire in one second. Note that bits per second (bps) is the low-level line rate and not the information data rate because asynchronous "RS-232" has start and stop bits that frame the 8 data bits of information so bps includes all bits transmitted.
Bit rate is a measure of the number of data bits (that's 0's and 1's) transmitted in one second. A figure of 2400 bits per second means 2400 zeros or ones can be transmitted in one second, hence the abbreviation 'bps'.
Baud rate by definition means the number of times a signal in a communications channel changes state. For example, a 2400 baud rate means that the channel can change states up to 2400 times per second. When I say 'change state' I mean that it can change from 0 to 1 up to 2400 times per second. If you think about this, it's pretty much similar to the bit rate, which in the above example was 2400 bps.
Whether you can transmit 2400 zeros or ones in one second (bit rate), or change the state of a digital signal up to 2400 times per second (baud rate), it the same thing.
Serial Data Speed:
Data rate (bps) = 1/Tb
Tb is the time duration of 1 bit
If the bit duration is 2ms then data rate is 1/2x10-3 , which is about 500 bps.
Baud rate:
Baud rate is defined as no. of signalling elements(symbols) in a given unit of time (say 1 sec) or it means number of time signal changes its state.When the signal is binary then baud rate and bit rate are same.
Bit rate:- Bit rate is nothing but number of bits transmitted per second.For example if Bit rate is 1000 bps then 1000 bits are i.e. 0s or 1s transmitted per second.
There are few other terms similar to this (i.e serial speed, bit rate, baud rate, USB transfer rate),and i guess(?) the values that are printed on serial monitor relates to serial speed, baud rate and USB transfer rate. Bit rate isn't an another term, please correct me if i am wrong, because serial monitor prints some values at an interval of time and value is definitely a set of bits. so if one value is printed i can say no of bits present in the respective value which gets printed on serial monitor per unit time will be the bit rate.
Replies here are misleading. Saying true, but no one tell that for UART a symbol is not a single character but a single bit and this way the question was tagged.
For example 115200/8n1 is 11520 bytes per second as a single ASCII character is a 1 start bit plus 8 data bit plus 1 stop bit.
As correctly pointed out in the other replies, the bitrate is the amount of logical (or "abstract high level") information transferred in a given time, while baud rate is the number of symbols (more or less "signal changes") in the physical line in a given time.
While it is easy to understand that if a transmitted symbol carries 4 bits of information, then the bitrate is four time the baud rate, things get blurred in case of, for example, a RS-232 serial line.
The classic serial line works on bytes (well, "frames"), not bits. There is no way to transmit fewer that 8 bits (i.e. a byte), because the serial line defines a "frame" (I assume frames with 8 data bits, no parity, 1 start bit and 1 stop bit); and this is usually OK, because devices (computers) work probably on bytes, not single bits.
Given that, when a device sends a byte, i.e. 8 bits, the physical lines transmits 10 symbols, because to the original data composed of 8 bits, 2 more are added (start and stop bits, they are needed for synchronization). Some confusion can arise because the symbols transmitted on the physical line are also called "bits", but they are really symbols (MARK and SPACE, actually).
So in that classic RS-232 (in case of "8N1" frame) the bitrate is actually 8/10 of the baudrate. If we add the parity bit, the ratio lowers further and becomes 8/11.
The number of bits or symbols per second translates directly to the duration of them (bits or symbols). What does it mean for an engineer designing a system? It means that if he is designing a line filter to protect the line or reduce the noise, he should take the duration (or frequency) of the symbols transmitted on that line. For a baudrate of 1000 baud, he knows that the frequency of the signal is 1 KHz, and that a symbol has a duration of 1 ms. Fine. But if he has to calculate how much time is needed to transfer a file from a device to another, say a file of 1000 bytes, he must consider the bitrate, not the baudrate! Because the devices, at higher level, do not even see the start and stop bits, they are only a burden which slow down the communication (but they are useful for error checking).
To take it to the extreme case, imagine that a serial frame is just a bit long. For every bit transmitted by a device, three symbols would travel in the physical line. And if a parity were added, then four symbol would travel: the bitrate would be 1/4 of the baudrate. And if we add a second stop bit, the bitrate goes down to 1/5 of the baudrate!
I'm designing a real time Audio Analyser to be embedded on a FPGA chip. The finished system will read in a live audio stream and output frequency and amplitude pairs for the X most prevalent frequencies.
I've managed to implement the FFT so far, but it's current output is just the real and imaginary parts for each window, and what I want to know is, how do I convert this into the frequency and amplitude pairs?
I've been doing some reading on the FFT, and I see how they can be turned into a magnitude and phase relationship but I need a format that someone without a knowledge of complex mathematics could read!
Thanks
Thanks for these quick responses!
The output from the FFT I'm getting at the moment is a continuous stream of real and imaginary pairs. I'm not sure whether to break these up into packets of the same size as my input packets (64 values), and treat them as an array, or deal with them individually.
The sample rate, I have no problem with. As I configured the FFT myself, I know that it's running off the global clock of 50MHz. As for the Array Index (if the output is an array of course...), I have no idea.
If we say that the output is a series of One-Dimensional arrays of 64 complex values:
1) How do I find the array index [i]?
2) Will each array return a single frequency part, or a number of them?
Thankyou so much for all your help! I'd be lost without it.
Well, the bad news is, there's no way around needing to understand complex numbers. The good news is, just because they're called complex numbers doesn't mean they're, y'know, complicated. So first, check out the wikipedia page, and for an audio application I'd say, read down to about section 3.2, maybe skipping the section on square roots: http://en.wikipedia.org/wiki/Complex_number
What that's telling you is that if you have a complex number, a + bi, you can picture it as living in the x,y plane at location (a,b). To get the magnitude and phase, all you have to do is find two quantities:
The distance from the origin of the plane, which is the magnitude, and
The angle from the x-axis, which is the phase.
The magnitude is simple enough: sqrt(a^2 + b^2).
The phase is equally simple: atan2(b,a).
The FFT result will give you an array of complex values. The twice the magnitude (square root of sum of the complex components squared) of each array element is an amplitude. Or do a log magnitude if you want a dB scale. The array index will give you the center of the frequency bin with that amplitude. You need to know the sample rate and length to get the frequency of each array element or bin.
f[i] = i * sampleRate / fftLength
for the first half of the array (the other half is just duplicate information in the form of complex conjugates for real audio input).
The frequency of each FFT result bin may be different from any actual spectral frequencies present in the audio signal, due to windowing or so-called spectral leakage. Look up frequency estimation methods for the details.
we have a particle detector hard-wired to use 16-bit and 8-bit buffers. Every now and then, there are certain [predicted] peaks of particle fluxes passing through it; that's okay. What is not okay is that these fluxes usually reach magnitudes above the capacity of the buffers to store them; thus, overflows occur. On a chart, they look like the flux suddenly drops and begins growing again. Can you propose a [mostly] accurate method of detecting points of data suffering from an overflow?
P.S. The detector is physically inaccessible, so fixing it the 'right way' by replacing the buffers doesn't seem to be an option.
Update: Some clarifications as requested. We use python at the data processing facility; the technology used in the detector itself is pretty obscure (treat it as if it was developed by a completely unrelated third party), but it is definitely unsophisticated, i.e. not running a 'real' OS, just some low-level stuff to record the detector readings and to respond to remote commands like power cycle. Memory corruption and other problems are not an issue right now. The overflows occur simply because the designer of the detector used 16-bit buffers for counting the particle flux, and sometimes the flux exceeds 65535 particles per second.
Update 2: As several readers have pointed out, the intended solution would have something to do with analyzing the flux profile to detect sharp declines (e.g. by an order of magnitude) in an attempt to separate them from normal fluctuations. Another problem arises: can restorations (points where the original flux drops below the overflowing level) be detected by simply running the correction program against the reverted (by the x axis) flux profile?
int32[] unwrap(int16[] x)
{
// this is pseudocode
int32[] y = new int32[x.length];
y[0] = x[0];
for (i = 1:x.length-1)
{
y[i] = y[i-1] + sign_extend(x[i]-x[i-1]);
// works fine as long as the "real" value of x[i] and x[i-1]
// differ by less than 1/2 of the span of allowable values
// of x's storage type (=32768 in the case of int16)
// Otherwise there is ambiguity.
}
return y;
}
int32 sign_extend(int16 x)
{
return (int32)x; // works properly in Java and in most C compilers
}
// exercise for the reader to write similar code to unwrap 8-bit arrays
// to a 16-bit or 32-bit array
Of course, ideally you'd fix the detector software to max out at 65535 to prevent wraparound of the sort that is causing your grief. I understand that this isn't always possible, or at least isn't always possible to do quickly.
When the particle flux exceeds 65535, does it do so quickly, or does the flux gradually increase and then gradually decrease? This makes a difference in what algorithm you might use to detect this. For example, if the flux goes up slowly enough:
true flux measurement
5000 5000
10000 10000
30000 30000
50000 50000
70000 4465
90000 24465
60000 60000
30000 30000
10000 10000
then you'll tend to have a large negative drop at times when you have overflowed. A much larger negative drop than you'll have at any other time. This can serve as a signal that you've overflowed. To find the end of the overflow time period, you could look for a large jump to a value not too far from 65535.
All of this depends on the maximum true flux that is possible and on how rapidly the flux rises and falls. For example, is it possible to get more than 128k counts in one measurement period? Is it possible for one measurement to be 5000 and the next measurement to be 50000? If the data is not well-behaved enough, you may be able to make only statistical judgment about when you have overflowed.
Your question needs to provide more information about your implementation - what language/framework are you using?
Data overflows in software (which is what I think you're talking about) are bad practice and should be avoided. While you are seeing (strange data output) is only one side effect that is possible when experiencing data overflows, but it is merely the tip of the iceberg of the sorts of issues you can see.
You could quite easily experience more serious issues like memory corruption, which can cause programs to crash loudly, or worse, obscurely.
Is there any validation you can do to prevent the overflows from occurring in the first place?
I really don't think you can fix it without fixing the underlying buffers. How are you supposed to tell the difference between the sequences of values (0, 1, 2, 1, 0) and (0, 1, 65538, 1, 0)? You can't.
How about using an HMM where the hidden state is whether you are in an overflow and the emissions are observed particle flux?
The tricky part would be coming up with the probability models for the transitions (which will basically encode the time-scale of peaks) and for the emissions (which you can build if you know how the flux behaves and how overflow affects measurement). These are domain-specific questions, so there probably aren't ready-made solutions out there.
But one you have the model, everything else---fitting your data, quantifying uncertainty, simulation, etc.---is routine.
You can only do this if the actual jumps between successive values are much smaller than 65536. Otherwise, an overflow-induced valley artifact is indistinguishable from a real valley, you can only guess. You can try to match overflows to corresponding restorations, by simultaneously analysing a signal from the right and the left (assuming that there is a recognizable base line).
Other than that, all you can do is to adjust your experiment by repeating it with different original particle flows, so that real valleys will not move, but artifact ones move to the point of overflow.