Way to encode 8-bit per sample audio into 2-bits per sample - wav

If ADPCM can store 16-bit per sample audio into 4-bit per sample, is there a way to store 8-bit per sample audio into 2-bit per sample?

The G.726 standard supercedes G.721 and G.723 into a single standard, and adds 2-bit ADPCM to the 3- 4- and 5-bit modes from older standards. These are all very simple and fast to encode/decode. There appears to be no file format for the 2-bit version, but there is a widely re-used open source Sun library to encode/decode the formats; SpanDSP is just one library that includes the Sun code. These take 16-bit samples as input but it is trivial to convert 8-bit to 16-bit.
If you want to hear the 2-bit mode you may have to write your own converter that calls into the library.
There's also ADPCM specifications from long ago like "ADPCM Creative Technology" that support low bit rates and sample sizes.
See also the Sox documentation about various old compression schemes.
The number of bits per sample is not strictly related to the dynamic range or number of bits in the output. For example, the https://en.wikipedia.org/wiki/Direct_Stream_Digital format used in Super Audio CD achieves high quality with only 1 bit per sample, but at a 2.8224 MHz sample rate.

As far I know, ADPCM compression standard needs 4-bits per sample even if the original uncompressed audio has 8-bit audio samples. Hence there is NO way to encode audio using 2-bit per sample with ADPCM.
EDIT: I am specifically referring to G.726, which is one of the widely supported speech compression standard in WAV. Personally, I am not aware of freely available G.727 codec. FFMPEG is one of the libraries with extensive support for audio codecs. You can see the audio codec list supported by them at https://www.ffmpeg.org/general.html#Audio-Codecs. In the list, I do see support for other ADPCM formats, which may be worth exploring.

Related

whats the differance between G.729 codec variations

what is the difference between
G.729
G.729A
G.729AB
and if I have codecs set up for G729 in asterisk does this mean that G729A and G.729AB will work?
thanks
Asterisk support g729a only. So you have no other options.
G729b is not compatible with g729a, g729ab mean both variant supported by switch.
G.729a is a compatible extension of G.729 that needs less CPU because it has lower speech quality. It is compatible with G729.
Some of its features are:
Sampling frequency 8 kHz/16-bit (80 samples for 10 ms frames)
Fixed bit rate (8 kbit/s 10 ms frames)
Fixed frame size (10 bytes for 10 ms frame)
Algorithmic delay is 15 ms per frame, with 5 ms look-ahead delay
G.729b:
Not compatible with G.729 or G.729a
Has a silence compression method that enables the voice activity detection module (VAD). It is used to detect voice activity in the signal.
It includes a discontinuous transmission (DTX) module which decides on updating the background noise parameters for non speech (noisy frames).
Uses 2-byte Silence Insertion Descriptor (SID) frames transmitted to initiate comfort noise generation (CNG). If transmission is stopped, and the link goes quiet because of no speech, the receiving side might assume that the link has been cut. By inserting comfort noise, analog hiss is simulated digitally during silence to assure the receiver that the link is active and operational.
Read more at its wikipedia article

Video Processor MFT and deinterlacing

MSDN Video Processor MFT mentions that the MFT can be used to deinterlace interlaced video.
I set the output Media type to the same as the input + the MF_MT_INTERLACE_MODE to progressive on the output media type.
But the output samples are still interleaved.
I can't test the Video Proccessor MFT because it needs Windows8/10. But i will say two things :
The documentation says it is GPU accelerated, but does not say if it does fallback to software processing. So, if it's only GPU accelerated, and if your GPU does not support deinterlacing, it can explain that your frames are still interleaved. You can check DXVAHD_PROCESSOR_CAPS.
For a correct deinterlacing, sample needs to be assigned with some of those values : MFSampleExtension_Interlaced, MFSampleExtension_BottomFieldFirst, MFSampleExtension_RepeatFirstField , and so on (Sample Attributes). So you can check if parser/decoder correctly set those values. If it is not, the Video Processor MFT will not be able to do deinterlacing.

Output words in speaker with arduino

I want to generate voice in arduino using code. I can generate simple tones and music in arduino, but I need to output words like right, left, etc in arduino speaker. I found some methods using wav files but it requires external memory card reader. Is there a method to generate using only arduino and speaker?
Typical recorded sound (such as wav files) requires much larger amounts of memory than is a available on-chip on an Arduino.
It is possible to use an encoding and data rate that minimises the memory requirement - at the expense of audio quality. For example generally acceptable quality speech-band audio can be obtained using non-linear (companded) 8-bit PCM at 3KHz sample rate, which if differentially decoded to 4 bit samples (so that each sample is not the PCM code, but the difference in level from the previous sample), then you can get about 1 second of audio in 1.5Kbytes. You would have to do some off-line processing of the original audio to encode it in this manner before storing the resulting data in the Arduino flash memory. You will also have to implement the necessary decode and linearisation.
Another possibility is to use synthesised rather then recorded speech. This technique uses recorded phonemes (components of speech) rather than whole words, and you then build words from these components. The results are generally somewhat robotic and unnatural (modern speech synthesis can in fact be very convincing, but not with the resources available on an Arduino - think 1980's Speak-and-Spell).
Although it can be rather efficient, phoneme speech synthesis requires different phoneme sets for different natural languages. It is possible perhaps for a limited vocabulary perhaps to only encode the subset of phonemes actually used.
You can hear a recording of the kind of speech that can be generated by a simple phoneme speech generator at http://nsd.dyndns.org/speech/. This page discusses a 1980's GI-SP0256 speech chip driven by an Arduino rather than speech generated by the Arduino, but it gives you an idea of what might be achieved - the GI-SP0256 managed with just 2Kb ROM - the Arduino could probably implement something similar directly. The difficulty perhaps is in obtaining the necessary phoneme set. You could possibly record your own and encode them as above. Each word or phrase would then simply be a list of phonemes and delays to be output.
The eSpeak project might be a good place to start - it is probably too large for Arduino, and the whole text to speech translation unnecessary, but it converts text to phonemes, so you could do that part off-line (on a PC), then load the phonemes and the replay code to the Arduino. It may still be too large of course.

Which codec is best and what should be their parameters value?

I'm a beginner in the field of audio codec and finding it hard to understand; how does sampling rate, bit rate and any other parameter affect the encoding/decoding[Audio format], the quality of audio and file size.
I read constant bit rate is good than variable bit rate, but how to know what amount of bit rate would be perfect to encode the file in as small size as possible without compromising the quality. I'm specifically focusing on audio codec for present.
I had heard about the OPUS, SILK, G.722, SPEEX, but don't know which one should I use to get the better quality and less file size. Also, what parameters should I set for this codecs so they can work effectively for me.
Can anyone guide on this?
Thanks in advance
If you think of the original analog music as a sound wave then converting it to digital means approximating that wave as digital bits. The sampling rate is how many points on that wave you are taking per unit time so the higher the sampling rate the closer you are to the original sound. Lower sampling rate means higher compression but lesser audio quality.
Similarly the bit rate is effectively 'how much' information you're encoding at each point so again, lower bit rate means higher compression but lower audio quality.
Compression algorithms generally use pyschoacoustics to try to determine what information can be lost with the least amount of audible difference. In some sections of a track this may be more or less than in others so using a variable bit rate enables you to achieve higher compression without a 'big' audible drop in quality.
It's well explained here: Link
I don't know the details of those codecs but generally what you should use and what parameters you should pass depends on what you're trying to achieve and for what purpose. For portable use where audio quality might not be paramount you might want to pass lower values to achieve smaller file sizes - for audiophile speakers you probably want to pass the maximum.

Using multiple QR codes to encode a binary image

I'm increasingly looking at using QR codes to transmit binary information, such as images, since it seems whenever I demo my app, it's happening in situations where the WiFi or 3G/4G just doesn't work.
I'm wondering if it's possible to split a binary file up into multiple parts to be encoded by a series of QR codes?
Would this be as simple as splitting up a text file, or would some sort of complex data coherency check be required?
Yes, you could convert any arbitrary file into a series of QR codes,
something like Books2Barcodes.
The standard way of encoding data too big to fit in one QR code is with the "Structured Append Feature" of the QR code standard.
Alas, I hear that most QR encoders or decoders -- such as zxing -- currently do not (yet) support generating or reading such a series of barcodes that use the structured append feature.
QR codes already have a pretty strong internal error correction.
If you are lucky, perhaps splitting up your file with the "split" utility
into pieces small enough to fit into a easily-readable QR code,
then later scanning them in (hopefully) the right order and using "cat" to re-assemble them,
might be adequate for your application.
You surely can store a lot of data in a QR code; it can store 2953 bytes of data, which is nearly twice the size of a standard TCP/IP packet originated on an Ethernet network, so it's pretty powerful.
You will need to define some header for each QR code that describes its position in the stream required to rebuild the data. It'll be something like filename chunk 12 of 96, though encoded in something better than plain text. (Eight bytes for filename, one byte each for chunk number and total number of chunks -- a maximum of 256 QR codes, one simple ten-byte answer, still leaving 2943 bytes per code.)
You will probably also want to use some form of forward error correction such as erasure codes to encode sufficient redundant data to allow for mis-reads of either individual QR codes or entire missing QR codes to be transparently handled well. While you may be able to take an existing library, such as for Reed-Solomon codes to provide the ability to fix mis-reads within a QR code, handling missing QR codes entirely may take significantly more effort on your part.
Using erasure codes will of course reduce the amount of data you can transmit -- instead of all 753,408 bytes (256 * 2943), you will only have 512k or 384k or even less available to your final images -- depending upon what code rate you choose.
I think it is theoretically possible and as simple as splitting up text file. However, you probably need to design some kind of header to know that the data is multi-part and to make sure different parts can be merged together correctly regardless of the order of scanning.
I am assuming that the QR reader library returns raw binary data, and you will you the job of converting it to whatever form you want.
If you want automated creation and transmission, see
gre/qrloop: Encode a big binary blob to a loop of QR codes
maxg0/displaysocket.js: DisplaySocket.js - a JavaScript library for sending data from one device to another via QR ocdes using only a display and a camera
Note - I haven't used either.
See also: How can I publish data from a private network without adding a bidirectional link to another network - Security StackExchange

Resources