does Alsa utility arecord interpolate captured samples? - audio-recording

I am working on some code to record audio using the ALSA API. I configure, then capture, then write captured samples out to a wav file. I configure ALSA for S32_LE, signed 32 bit samples. The data I am capturing has the 1st byte of every 4-byte sample equal to 0x00, so it appears that my hardware is effectively only capturing 24 bits samples. This is reasonable to me for numerous reasons.
However, when I play back the wav file, the sound is distorted. The audio is there, I can tell the samples are mostly correct, but the capture is very distorted, so something is wrong.
When I use arecord using the same configuration I used for my code, the recording is perfect, no distortion. So I know the hardware is good (and my code is bad).
This is what I do not understand. When I compare the wav files, that is, the wav file my code generated and the wav file arecord generated, the wav file headers are exactly the same (except for the chunk length values of course). So my wav file generation appears correct.
However, the sample data in the arecord capture does NOT have a 0x00 in each sample word like my code captures. It appears that arecord is indeed capturing 32 bit samples from my sound card. But when my code uses the same configuration for 32 bit samples, every sample has the 1st byte of 0x00.
Am I missing some ALSA configuration options?
My code uses snd_pcm_hw_params_get_sbits() to retrieve the effective sample word size and it indeed returns 32 for the number of sample bits. I have studied the source code for the ALSA utilities aplay/arecord and I cannot find any clues as to whether arecord is changing the captured samples in any way.
In summary, why does arecord capture 32 bit samples from hardware that will only give me 24 bit samples?
Thanks,
-Andres

Related

Trying to interface with an Aanderaa RCM9

I have an older Aanderaa RCM9 (https://epic.awi.de/id/eprint/45145/1/RCM9.pdf) that is missing its Data Storage Unit and its reader. They don't produce these anymore nor do they service the model. It would be a shame to toss an otherwise nice piece of equipment, so I thought to try and get a serial feed from the terminal or DSU output and log on an Arduino with an SD card. I have tried to connect with a TTL-RS232 converter, and there seems to be a consistent Tx from the instrument, it comes in batches, but reads out in CoolTerm as "............" I've tried different terminal configurations, and connections, but that's the best I get. Here's how it looks inside: https://imgur.com/a/xxCPUlQ
Any thoughts??
I am afraid that the output is the old Aanderaa PDC4 serial format where long pulses (81ms) represents zeros and short (27ms) represents ones in a 10 bit binary word framed in a 4 second window.

What is the purpose of the “wasted bits-per-sample” in the FLAC audio format?

I am looking into implementing a FLAC decoder. One part of the specification of the SUBFRAME_HEADER is unclear to me.
<1+k> 'Wasted bits-per-sample flag':
0 : no wasted bits-per-sample in source subblock, k=0
1 : k wasted bits-per-sample in source subblock, k-1 follows, unary coded; e.g. k=3 => 001 follows, k=7 => 0000001 follows.
(Here the “<1+k>” designates the size of the field/block.)
This is the only place in the specification that the value k is mentioned. What is its purpose, and how should it be interpreted? I don't find the term “wasted bits-per-sample” to be very meaningful. The hyphenation implies to me that it is not referring to “wasted bits”, but rather referring to “wasted values of bits-per-sample”; however, I don't understand why such a quantity is useful information.
Certain file formats, like AIFF, store 14-bit audio as left-justified 16-bit audio padded with zeros. The FLAC format compressed this by setting the sample size in bits to 16, but setting wasted bits to 2. When a subframe has bits per sample set to 16 but has wasted bits set to 2, the rest of the subframe is to be decoded as 14 bit, and has to be padded back to 16 bits.
In other words the flag says: this audio stream says it has a bitdepth of 16, but the least significant k bits are 0 everywhere, in other words, they are not used/wasted. So, this subframe is coded as (16-k) bits and you have to add k bits of padding to get the original back.
Besides its use to efficiently compress audio that has samples with a bitdepth other than 8, 16 or 24 bits padded to whole byte samplesizes, some tools use it to do some form of lossy compression, by selectively decreasing the bitdepth of audio. This has also been done with some DVD-Audio. See this forum thread for more information

How does sound data look like?

I read how sounds represented with numbers in computer here.
And I figured out that usual representation is that, we get 44,100 numbers between [-32767, 32767] per second.
Then to my imagination, there's got to be a big one-column matrix, right?
I'm a R user, so speaking in R, sound data of 3 seconds would be,
s <- 3
sound <- matrix(0, ncol = 1, nrow = 44100 * s)
nrow(sound)
#> [1] 132300
one-column matrix with 132,300 rows.
Is this really the case?
I want some analogous picture in my head, say, in case of a picture with 256 * 256,
if we RGB that picture, we get 3 matrices each with 256 * 256.
And in the case of sounds, we get a long long column? As I think about this again, it's not even a matrix after all. It's a column.
Am I right? I can't find any similar dataset searching Internet.
Any advices will be welcomed. Thanks.
The raw format that is created early in that linked question could look a lot like a single dimension array. And probably the signal that is sent to the speaker to make the sound could be represented similarly.
But you're unlikely to find a file on your computer that looks like that for several reasons:
Sound can be stored at different bit depth - that is how many bits for each 'number' CD Audio tracks have a 16 bit depth, but you could have 8 or 32 bits etc. In a straight stream of these numbers you need some how to know how far to read to the next number, so that information needs to be safed somewhere.
Sample rate can vary. If you've got a sequence of numbers representing an audio signal, then you need to know how long each number lasts for.
mostly sounds are more complex. Instead of a single source, you have stereo, or 5 channel, or whatever, so the system needs to be able to store / decode multiple pieces of information for the sounds you want to hear at a particular time
much of sound is repetitive, and so can often benefit from compression.
So most sounds are stored in a compressed format that includes wrapper information about how to decode it. The wrapper information includes how to decode the different audio channels, what sort of compression was used etc.
The closest you're likely to find are a .wav file (Windows) or .aiff (Mac). But even these include some metadata (sample rate and bit depth to start).

How this CRC (Cyclic Redundancy Check) calculation can be solved?

I want to send data to a TCP 105 circuit.
The following hex command is OK to send data 123:
7F30001103 313233 45D4
Here, 313233 is hex representation of 123 and 45D4 is the CRC value.
I'm in problem to obtain this 45D4 after calculating CRC. After searching for a long time on the web, I'm getting other CRC values in different standards. But those CRC values are not being accepted by my LED display circuit.
Please help me to know how is it possible to get 45D4 from 7F30001103313233.
Thanks in advance.
The command matches an algorithm called CRC-16/CMS.
$ reveng -w 16 -s 7f30001103313233d445
width=16 poly=0x8005 init=0xffff refin=false refout=false xorout=0x0000 ch
eck=0xaee7 name="CRC-16/CMS"
This is probably the correct algorithm, as you've only given one codeword (and because I've assumed that the CRC has been byte-swapped.)
To generate code that computes this CRC, see Mark Adler's crcany tool, for instance.

Storing a BMP image in a QR code

I'm trying to create (or, if I've somehow missed it in my research, find) an algorithm to encode/decode a bmp image into/from a QR code format. I've been using a guide (Thonky) to try to understand the basics of QR codes and I'm still not sure how to go about this problem, specifically:
Should I encode the data as binary or would numeric be more reasonable (assuming each pixel will have a max. value of 255)?
I've searched for information on the structured append capabilities of QR codes but haven't found much detail beyond the fact that it's supported by QR codes -- how could I implement/utilize this functionality?
And, of course, if there are any tips/suggestions to better store an image as binary data, I'm very open to suggestions!
Thanks for your time,
Sean
I'm not sure you'll be able to achieve that, as the amount of information a QR Code can hold is quite limited.
First of all, you'll probably want to store your image as raw bytes, as the other formats (numeric and alphanumeric) are designed to hold text/numbers and would provide less space to store your image. Let's assume you choose the biggest possible QR Code (version 40), with the smallest level of error correction, which can hold up to 2953 bytes of binary information (see here).
First option, as you suggest, you store the image as a bitmap. This format allows no compression at all and requires (in the case of an RGB image without alpha channel) 3 bytes per pixel. If we take into account the file header size (14 to 54 bytes), and ignore the padding (each row of image data must be padded to a length being a multiple of 4), that allows you to store roughly 2900/3 = 966 pixels. If we consider a square image, this represents a 31x31 bitmap, which is small even for a thumbnail image (for example, my avatar at the end of this post is 32x32 pixels).
Second option, you use JPEG to encode your image. This format has the advantage of using a compression algorithm that can reduce the file size. This time there is no exact formula to get the size of an image fitting in 2.9kB, but I tried using a few square images and downsizing them until they fit in this size, keeping a good (93) quality factor: this gives an average of about 60x60 pixel images. (On such small images, it's normal not to see an incredible compression factor between jpeg and bmp, as the file header in a jpeg file is far larger than in a bmp file: about 500 bytes). This is better than bitmap, but remains quite small.
Finally, even if you succeed in encoding your image in this QR Code, you will encounter an other problem: a QR Code this big is very, very hard to scan successfully. As a matter of fact, this QR Code will have a size of 177x177 modules (a "module" being a small white or black square). Assuming you scan it using a smartphone providing so-called "HD" frames (1280x720 pixels), each module will have a maximum size on the frame of about 4 pixels. If you take into account the camera noise, the aliasing and the blur due to the fact that the user is never perfectly idle when scanning, the quality of the input frames will make it very hard for any QR Code decoding algorithm to successfully get the QR Code (don't forget we set its error correction level on low at the beginning of this!).
Even though it's not very good news, I hope this helps you!
There is indeed a way to encode information on several (up to 16) QR Codes, using a special header in your QR Codes called "Structured append". The best source of information you can use is the norm about QR Codes (ISO 18004:2006); it's possible (but not necessarily easy) to find it for free on the web.
The relevant part (section 9) of this norm says:
"Up to 16 QR Code symbols may be appended in a structured format. If a symbol is part of a Structured Append message, it is indicated by a header block in the first three symbol character positions.
The Structured Append Mode Indicator 0011 is placed in the four most significant bit positions in the first symbol character.
This is immediately followed by two Structured Append codewords, spread over the four least significant bits of the first symbol character, the second symbol character and the four most significant bits of the third symbol character. The first codeword is the symbol sequence indicator. The second codeword is the parity data and is identical in all symbols in the message, enabling it to be verified that all symbols read form part of the same Structured Append message. This header is immediately followed by the data codewords for the symbol commencing with the first Mode Indicator."
Nevertheless, i'm not sure most QR Code scanners can handle this, as it's a quite advanced feature.
You can define a fixed image size, reduce jpg header parts and using just vital information about it, so you can save up to 480bytes of a ~500bytes normal header.
I was using this method to store people photos for a small-club ID cards, images about 64x64 pixels is enough.

Resources