I'm a beginner in the field of audio codec and finding it hard to understand; how does sampling rate, bit rate and any other parameter affect the encoding/decoding[Audio format], the quality of audio and file size.
I read constant bit rate is good than variable bit rate, but how to know what amount of bit rate would be perfect to encode the file in as small size as possible without compromising the quality. I'm specifically focusing on audio codec for present.
I had heard about the OPUS, SILK, G.722, SPEEX, but don't know which one should I use to get the better quality and less file size. Also, what parameters should I set for this codecs so they can work effectively for me.
Can anyone guide on this?
Thanks in advance
If you think of the original analog music as a sound wave then converting it to digital means approximating that wave as digital bits. The sampling rate is how many points on that wave you are taking per unit time so the higher the sampling rate the closer you are to the original sound. Lower sampling rate means higher compression but lesser audio quality.
Similarly the bit rate is effectively 'how much' information you're encoding at each point so again, lower bit rate means higher compression but lower audio quality.
Compression algorithms generally use pyschoacoustics to try to determine what information can be lost with the least amount of audible difference. In some sections of a track this may be more or less than in others so using a variable bit rate enables you to achieve higher compression without a 'big' audible drop in quality.
It's well explained here: Link
I don't know the details of those codecs but generally what you should use and what parameters you should pass depends on what you're trying to achieve and for what purpose. For portable use where audio quality might not be paramount you might want to pass lower values to achieve smaller file sizes - for audiophile speakers you probably want to pass the maximum.
Related
Well, i want to ask if ADXL345 can be used to detect an Earthquake Occurrence based on its magnitude/intensity level. For more information, I want to used an accelerometer to create a Device that can detect the intensity/magnitude level of an Earthquake.
I have absolutely no experience in this field, but it looks useful and fascinating.
Questions are:
is this device able to detect medium scale earthquakes?
if yes, does anybody did it, available to share experiences?
if no to the previous, is there any guide which explains algorithms, calculations and mechanical plans?
That sensor is not suitable. It has 13 bit resolution at +-16g full range. That gives you a sensitivity of 0.002g for the lsb. In order to detect an earthquake directly below you, you need approx. a few milli-g (e.g. see here), even less for earthquakes with an epicentre elsewhere.
You want a sensor which is much more sensitive by a factor of 100 and probably with more resolution (better ADC), too.
(And you should have been able to do this quick google-search analysis yourself ;) )
Using accelerometers reading tells you nothing about the actual magnitude of the quake itself. It tells you the size of the quake at your location. Combining location and amplitude will give you a 'weighted' measurement, but that's still useless without a calibration curve. Without knowing what acceleration, at a certain distance, corresponds to what magnitude you will be unable to tell what the magnitude is. You can certainly conclude that your measured earthquake has a median amplitude of, say, 2000% of a non-earthquake reading, but you won't be able to turn it into a Richter measurement. To do that you'd need to take some data during earthquakes of known magnitude and then work out how acceleration, distance and magnitude are related for your device. You could alternatively use a scale like the Shindo (just Google it).
I want to convert a sound from Mic to binary and match it from the database(a type of voice identification program but don't getting idea how to get sound from Mic directly so that i can convert it to binary?Also it is possible or not. Please guide me )
See this:
http://www.dotnetspider.com/resources/4967-How-record-voice-from-microphone.aspx
You're not going to be able to identify voices by doing a binary comparison on sound data. The binary of a particular sound will not be identical to an imitation of that sound unless it is literally the same file because of minor variations in just about everything. You'll need to do some signals processing to do a fuzzy comparison of the data. You can read about signal processing on wikipedia.
You will probably find it easier to use a third party library to process the sound for you. Something like this might be a good start.
You're looking at two very distinct problems here.
The first is pretty technical: Getting sound from the microphone into a digital waveform. How you do this exactly depends on the OS and API you're using (on Windows, you're probably looking at DirectX audio or, if available, ASIO). Typically, this is how you'd proceed:
Set up a recording buffer for the microphone, with suitable parameters (number of channels, physical input on the sound card, sample rate, bit depth, buffer size)
Start the recording. This usually involves pointing the sound library to a callback function to process the recorded buffer.
In the callback, read the buffer, convert it to a suitable format, and append it to the audio file of your choice. (You could also record to RAM only, but longer recordings may exceed available storage).
Store the recorded audio in a suitable database field (some kind of binary blob)
This is the easy part though; the harder part is matching a chunk of audio data against other chunks. A naïve approach would be to try and find exact matches, but that won't help you much, because the chance that you find one is practically zero - recording equipment, even the best, introduces a bit of random noise, and recording setups vary slightly whether you want to or not, so even if you'd have someone say something twice, perfectly identical, you'd still see differences in the recorded audio.
What you need to do, then, is find certain typical characteristics of the waveform. Things you could look for are:
Overall amplitude shape
Base frequencies
Selected harmonics (formants)
Extracting these is non-trivial and involves pretty severe math; and then you'll have to condense them into some sort of fingerprint, and find a way to compare them with some fuzziness (so that a near-match is good enough, rather than requiring exact matches). Finding the right parameters and comparison algorithms isn't easy, and it takes a lot of tweaking and testing; your best bet is to go find a library that does this for you.
Imagine that you had all the supercomputers in the world at your disposal for the next 10 years. Your task was to compress 10 full-length movies losslessly as much as possible. Another criteria was that a normal computer should be able to decompress it on the fly and should not need to spend much of his HD to install the decompressing software.
My question is, how much more compression could you achieve than the best alternatives today? 1%, 5%, 50%? More specifically: is there a theoretical limit to compression, given a fixed dictionary size (if it is called that for video compression as well)?
The limits of compression are dictated by the randomness of the source. Welcome to the study of information theory! See data compression.
There is a theoretical limit: I suggest reading this article on Information theory and the pigeon hole principle. It seems to sum up the issue in a very easy to understand way.
If you have a fixed catalogue of all the movies you were ever going to compress, you could just send an id for the movie and have the "decompression" lookup up the data with that index. So compression could be to a fixed size of log2(N) bits, where N was the number of movies.
I suspect the practical lower bound is rather higher than this.
Do you really mean lossless? Most of today's video compression is lossy, I thought.
It is important to redefine the limits with the latest developments regarding information theory. Therefore, it is essential to report the hypotheses for which the limit is valid.
In information theory, 3 fundamental hypotheses are used which are the following:
the information is defined by the entropy function H(X).
the information that identifies the source is known both by the encoder and by the decoder.
the source and its isomorphisms are considered. It means that we can decode a symbol at a time.
First limit, the most famous, defined by Shannon in which all 3 hypotheses are true.
NH(X)
With H(X) entropy of the source X.
Second limit. we remove the second hypothesis the decoder does not know the source.
NH(X)+source information
Third limit, let's remove the third hypothesis. In this case, the Set Shaping Theory SST is used, a new method that is revolutionizing information theory. This theory studies the one-to-one functions f that transform a set of strings into a set of equal size made up of strings of greater length. With this method, we get the following limit:
N2H(Y)+ source information≈NH(X)
with f(X)=Y and N2>N.
In practice, we obtain a gain in terms of compression equivalent to the information necessary to describe the source is obtained. The information needed to describe the source represents the inefficiency of the entropy coding.
In this case, however, it is not possible to decode a symbol at a time (the code is not instantaneous) but the message must be decoded in full before obtaining the original message.
Important progress has been made in this area. It was possible to apply this theory to a concrete case of data compression "Practical applications of Set Shaping Theory in Huffman coding".
Another interesting aspect is that the authors shared the code and the function that performs the transform described in the set shaping theory. The file is shared on Matlab file exchange: https://www.mathworks.com/matlabcentral/fileexchange/115590-test-sst-huffman-coding?s_tid=FX_rc1_behav
I'm writing software which is demonstraiting video on demand service. One of the feature is something similiar to IIS Smooth Streaming - I want to adjust quality to the bandwith of the client. My idea is, to split single movie into many, let's say - 2 seconds parts, in different qualities and then send it to the client and play them. The point is that for example first part can be in very high quality, and second in really poor (if the bandwith seems to be poor). The question is - do you know any software that allows me to cut movies precisly? For example ffmpeg splits movies in a way that join is visible and really annoying (seconds are the measure of precision). I use qt + phonon as a player if it matters. Or maybe you know any better way to provide such feature, without splitting movie into parts?
Are you sure ffmpeg's precision is in seconds? Here's an excerpt from the man page:
-t duration
Restrict the transcoded/captured video sequence to the duration specified in seconds. "hh:mm:ss[.xxx]" syntax is also supported.
-ss position
Seek to given time position in seconds. "hh:mm:ss[.xxx]" syntax is also supported.
-itsoffset offset
Set the input time offset in seconds. "[-]hh:mm:ss[.xxx]" syntax is also supported. This option affects all the input files that follow it. The offset is added to the timestamps of the input files. Specifying a positive offset means that the corresponding streams are delayed by 'offset' seconds.
Looks like it supports up to millisecond precision, and since most video is not +1000 frames per second, this would be more than enough precision to accurately seek through any video stream.
Are you sure this is a good idea? Checking the bandwidth and switching out clips every two seconds seems like it will only allow you to buffer two seconds into the future at any given point, and unless the client has some Godly connection, it will appear extremely jumpy.
And what about playback, if the user replays the video? Would it recalculate the quality as it replays, or do you build the video file while streaming?
I am not experienced in the field of streaming video, but it seems what I see most often is that the provider has several different quality versions of their video (from extremely low to HD), and they test the user's bandwidth and then stream at an appropriate quality.
(I apologize if I misunderstood the question.)
Say you have a conference room and meetings take place at arbitrary impromptu times. You would like to keep an audio record of all meetings. In order to make it as easy to use as possible, no action would be required on the part of meeting attenders, they just know that when they have a meeting in a specific room they will have a record of it.
Obviously just recording nonstop would be inefficient as it would be a waste of data storage and a pain to sift through.
I figure there are two basic ways to go about it.
Recording simply starts and stops according to sound level thresholds.
Recording is continuous, but split into X minute blocks. Blocks found to contain no content are discarded.
I like the second way better because I feel there is less risk for losing data because of late starts, or triggers failing.
I would like to implement in Python, and on Windows if possible.
Implementation suggestions?
Bonus considerations that probably deserve their own questions:
best audio format and compression for this purpose
any way of determining how many speakers are present, assuming identification is unrealistic
This is one of those projects where the path is going to be defined more about what's on hand for ready reuse.
You'll probably find it easier to continuously record and saving the data off in chunks (for example, hour long pieces).
Format is going to be dependent on what you in the form of recording tools and audio processing library. You may even find that you use two. One format, like PCM encoded WAV for recording and processing, but compressed MP3 for storage.
Once you have an audio stream, you'll need to access it in a PCM form (list of amplitude values). A simple averaging approach will probably be good enough to detect when there is a conversation. Typical tuning attributes:
* Average energy level to trigger
* Amount of time you need to be at the energy level or below to identify stop and start (I recommend two different values)
* Size of analysis window for averaging
As for number of participants, unless you find a library that does this, I don't see an easy solution. I've used speech recognition engines before and also done a reasonable amount of audio processing and I haven't seen any 'easy' ways to do this. If you were to look, search out universities doing speech analysis research. You may find some prototypes you can modify to give your software some clues.
I think you'll have difficulty doing this entirely in Python. You're talking about doing frequency/amplitude analysis of MP3 files. You would have to open up the file and look for a volume threshold, then cut out the portions that go below that threshold. Figuring out how many speakers are present would require very advanced signal processing.
A cursory Google search turned up nothing for me. You might have better luck looking for an off-the-shelf solution.
As an aside- there may be legal complications to having a recorder running 24/7 without letting people know.