How to determine samples count in ADPCM wav file? - wav

I see no such field in canonical WAV structure, but maybe it possible to use existing fields for that?
I know that we can calc samples count for PCM stream easy (raw_sound_data_size / (bits_pers_sample / 8)), but what to do with ADPCM?

Generally Subchank2Size is a size of data in bytes. And bitsPerSample how many bits in sample. So the number of samples should be:
samples = Subchank2Size / channels / ( bitsPerSample / 8 ).
Its true for uncompressed data
ADPCM data saved in "blocks". The block has three parts, the header, data, and padding. The three together are <nBlockAlign> bytes.
Header
typedef struct adpcmblockheader_tag {
BYTE bPredictor[nChannels];
int iDelta[nChannels];
int iSamp1[nChannels];
int iSamp2[nChannels];
} ADPCMBLOCKHEADER;
Data
The data is a bit string parsed in groups of (wBitsPerSample * nChannels).
Padding
Bit Padding is used to round off the block to an exact byte length.
More info about decoding ADPCM format can be found here
Unfortunately it seems there is no way to find exact samples count without enumerate all blocks.

Related

What is the purpose of the “wasted bits-per-sample” in the FLAC audio format?

I am looking into implementing a FLAC decoder. One part of the specification of the SUBFRAME_HEADER is unclear to me.
<1+k> 'Wasted bits-per-sample flag':
0 : no wasted bits-per-sample in source subblock, k=0
1 : k wasted bits-per-sample in source subblock, k-1 follows, unary coded; e.g. k=3 => 001 follows, k=7 => 0000001 follows.
(Here the “<1+k>” designates the size of the field/block.)
This is the only place in the specification that the value k is mentioned. What is its purpose, and how should it be interpreted? I don't find the term “wasted bits-per-sample” to be very meaningful. The hyphenation implies to me that it is not referring to “wasted bits”, but rather referring to “wasted values of bits-per-sample”; however, I don't understand why such a quantity is useful information.
Certain file formats, like AIFF, store 14-bit audio as left-justified 16-bit audio padded with zeros. The FLAC format compressed this by setting the sample size in bits to 16, but setting wasted bits to 2. When a subframe has bits per sample set to 16 but has wasted bits set to 2, the rest of the subframe is to be decoded as 14 bit, and has to be padded back to 16 bits.
In other words the flag says: this audio stream says it has a bitdepth of 16, but the least significant k bits are 0 everywhere, in other words, they are not used/wasted. So, this subframe is coded as (16-k) bits and you have to add k bits of padding to get the original back.
Besides its use to efficiently compress audio that has samples with a bitdepth other than 8, 16 or 24 bits padded to whole byte samplesizes, some tools use it to do some form of lossy compression, by selectively decreasing the bitdepth of audio. This has also been done with some DVD-Audio. See this forum thread for more information

How to store negative numbers in EEPROM (Arduino IDE)?

I am trying to find a straightforward way to store negative values in EEPROM, integer values ranging from -20 to 20. I have been using EEPROM.write and EEPROM.read functions to store strings one character at a time, but I am having trouble with negative numbers. I figure I only need one byte for this value.
It's just matter of number representation. You just have to use correct data types to print or use:
Version 1: int8_t data = EEPROM.read(addr);
Version 2:
byte data = EEPROM.read(addr);
Serial.print((int8_t)data);
EEPROM.write can be used directly with int8_t: EEPROM.write(int8_value);
Or, if you wan't int, put/get methods can be used for it (even for structs containing POD types only or so)

Huffman code length

I want to build a huffman tree and assign a code to all the 255 byte values based on their frequencies. But For my application I need a hash table to get the code for a byte in constant time. But in worst case the tree may be so unbalanced that certain bytes have a very large key (even 254 bit long) . So maintaining a hash table is being very difficult. The code requires high performance and so stroing them as a string won't work. How can I resolve the issue?
Why would you need a hash table for 256 values? Simply have a 256-entry table where you directly index the code for each byte.
Each code is at most 32 bytes long, so just have a table of 256 entries, each with a fixed number of 33 bytes per entry. 8448 bytes. The first byte of the 33 being the length of the code in bits, and the remaining bytes being the code, of which you only use the requisite number of bits for each.

Understanding the BPP inside DICOM images

I'm working with DICOM files since a few days, using FO-DICOM.
I'm using a set of dicom files for my tests, and I've been printing the "Photometric Interpretation" and the "Sample Per Pixel" values, to have a better understanding of what kind of images I'm working with.
The result was "MONOCHROME2" for the Photometric Interpretation, and "1" for the Sample Per Pixel.
What I understood by reading the part3 of the standard is that MONOCHROME2 represent a gray scale, starting from black for its minimum values.
But what is the Sample Per Pixel exactly? I thought this was representing the number of bytes (and not bits) per pixel (that would be logic to have 8 bits per pixel for a scale of gray right?)
But my problem here is that actually, my images seem to have 32 bpp.
I'm working with 512*512 pixels images, and I converted them into byte arrays. So I was expecting arrays of 512*512=262144 bytes.
But I get arrays of 1048630 bytes (which is a bit more than 4*262144)
Does someone have an explanation?
EDIT:
Here's are some of my datas :
PhotometricInterpretation=MONOCHROME2
SamplePerPixel=1
BitsAllocated=16
BitsStored=12
HighBit=11
PixelRepresentation=0
NumberOfFrames=0
The attribute (0028,0002) SamplesPerPixel refers to color images only and tells you the number of planes which are present in the image (e.g. 3 for RGB), so you have
PhotometricInterpretation=RGB
SamplesPerPixel=3
With 8 bits per pixel (I will revisit BPP below). As long as you have PhotometricInterpretation = MONOCHROME1 or MONOCHROME2, you can expect the SamplesPerPixel to be 1 and nothing else.
What you do have to take into consideration is the number of bits per pixel:
BitsAllocated (0028,0100)
BitsStored (0028,0101)
HighBit (0028,0102)
These tell you how many bits are used to encode a pixel value (BitsAllocated) and which of these bits really contain grayscale information (BitsStored, HighBit). HighBit is zero-based and usually but not necessarily = BitsStored-1
An example to illustrate this: For CT images, it is very common to express gray values in hounsfield units which range from -1000 to +3000. These are represented by 12 bits which are stored with a 2-byte-alignment, so
BitsAllocated (0028,0100) = 16
BitsStored (0028,0101) = 12
HighBit (0028,0102) = 11
Another degree of freedom is PixelRepresentation which tells you if the pixel data is encoded unsigned (0) or in 2s complement (1). I have seen both for CT images, however signed pixel data is rather unusual for image types other than CT.
In your example, I would assume that Bits Allocated == 32 or (not very likely) that you have a dataset containing multiple images ('frames'), so NumberOfFrames (0028,0008) > 1. If Number of Frames is absent, you can safely assume to have only one frame.
I have over-simplified a bit here, especially about color images but I think this is complicated enough ;-). Basically, DICOM offers any thinkable degree of freedom to encode pixel data and describe the encoding in the header.
I think I have recommended you to have a look at the DCMTK in a recent post. The DicomImage class features a nice interface (getInterData()) which cares about all that stuff and provides the pixel data read from a DICOM file in a normalized format.
[EDIT]: Feel free to post a DICOM dump of your dataset here, I would have a look at it and tell you how to interpret the pixel data.

Good Idea/Bad Idea: Using Qt's QSet on very large dataset?

Is it a bad idea to use QSet to keep track of a very large set of fairly large strings? Each string is 54 characters (108 bytes). The set may contain thousands of entries (I'm not sure on the exact number yet). The QSet will only be used for insertion and membership query.
If it is a bad idea, I'm definitely open to suggestions. My 54 character strings are composed of only 6 different characters (e.g. "AAAAAAAAABBBBBBBBBCCCCCCCCCDDDDDDDDDEEEEEEEEEFFFFFFFFF"). This seems like a good candidate for compression, perhaps? Any other suggestions are welcome.
Realize that by using a built-in set, you're going to have some path-level compression based on the nature of your data. Of course, this depends on the container's implementation.
Look at some information on radix trees, digital search trees, red-black trees, etc. You'll see that you don't need to store each and every string, but rather the patterns. For instance, let's simplify your problem: we have only 3 characters that can appear an maximum of 2 times each, and each string is 6 characters long. Three possible strings are:
AABBCC, AABCBC, and AACBCB
With these examples, we could get away with using a maximum of 6 + 3 + 4 = 13 nodes instead of a full 18 nodes. not substantial, but I don't know what you're doing either. As with any type of compression, the more your prefix patterns are reused, the more compression you have.
Edit:
The numbers 13 and 18 come from the path-level compression. For instance, in straight C (for argument/discussion), if I am implementing my string storage class as a wrapper around an array I would probably just have an array of character pointers with each pointer referencing a spot in memory that contains a pattern. In the example I gave above, this would take 18 characters ( 6 * 3 = 18). Adding on the size of the array (let's say that sizeof(char*) is 4, our array would take 3 * 4 bytes of storage = 12 + 18 or 30 bytes total to store our patterns.
If I am instead storing the patterns in a sort of digital search tree, I make a small tradeoff. The nodes in my tree are going to be larger than 1 byte apiece (1 byte for the character in the node, 4 bytes for the "next" pointer in each node, 5 bytes apiece). The first pattern we store is AABBCC. This is 6 nodes in the tree. Next is AABCBC. We reuse the path AAB from the first tree and need only an additional 3 nodes for CBC. The last pattern is AACBCB. We reuse AA, and need 4 new nodes for CBCB. This is a total of 13 nodes * 5 bytes = 65 bytes of storage. However, if you have a lot of long, repeating patterns in the prefix of your data, then you'll see some prefix path-level compression.
If this isn't the case for you, I would look into Huffman or LZW compression. This will require you to build a dictionary of patterns that have integer numbers tied to them. When you compress, you build the dictionary and create integer id's for each pattern in your text. You then replace the patterns in your text with the integer id's. When uncompressing, you do the opposite. I don't have the time to describe these algorithms in more detail, so you'll need to look them up.
It's a tradeoff in simplicity/time. If your data will allow it, take the shorter method and just use the built-in container. If not, you will need something more tailored to your data.
I don't think you'd have any additional problems using QSet over another sort of container, such as std::set, a map, or a vector. If you are wondering about running out of memory, that probably depends on how many thousands of the strings you need to store, and if there was a way to encode them more concisely. (For example, if the characters always occur in the same order but vary in relative lengths, store the length for each character rather than all of the characters.) However, even 50,000 of these strings is only around 5 MB, and 500,000 of them is only 50 MB to store, discounting storage overhead, which is a moderate amount of memory on modern machines.
QSet does sound like a good idea. It's basically just a hash-table and it can optimize its bucket size dynamically. Perfect.
Another suggestion for compressing the key:
Treat it as a base-6 number string (think A=0, B=1, ... F=5) and convert it into binary (int).
QByteArray ba("112"); // instead of "BBC"
int num = ba.toInt(0, 6 /*base*/); // num == 44
6^3 < 2^8, so we can represent every 3 chars in your string with a 1 byte int (or char) and make a bytearray of it. That would cut down the size of the key from 54 bytes to 18 bytes.
From your earlier comment: "In my strings, there will always be 54 characters, and there will always be 9 of each character. The order is the only thing that changes."
Don't store raw strings then. You could just compress them into the 6 characters actually used, and then make a QSet of those. A trivial compression would be {a,b,c,d,e,f}, and if the character set is known beforehand (and only those 6 characters) you could even pack things into a 16-bit integer.

Resources