Understanding the BPP inside DICOM images - dicom

I'm working with DICOM files since a few days, using FO-DICOM.
I'm using a set of dicom files for my tests, and I've been printing the "Photometric Interpretation" and the "Sample Per Pixel" values, to have a better understanding of what kind of images I'm working with.
The result was "MONOCHROME2" for the Photometric Interpretation, and "1" for the Sample Per Pixel.
What I understood by reading the part3 of the standard is that MONOCHROME2 represent a gray scale, starting from black for its minimum values.
But what is the Sample Per Pixel exactly? I thought this was representing the number of bytes (and not bits) per pixel (that would be logic to have 8 bits per pixel for a scale of gray right?)
But my problem here is that actually, my images seem to have 32 bpp.
I'm working with 512*512 pixels images, and I converted them into byte arrays. So I was expecting arrays of 512*512=262144 bytes.
But I get arrays of 1048630 bytes (which is a bit more than 4*262144)
Does someone have an explanation?
EDIT:
Here's are some of my datas :
PhotometricInterpretation=MONOCHROME2
SamplePerPixel=1
BitsAllocated=16
BitsStored=12
HighBit=11
PixelRepresentation=0
NumberOfFrames=0

The attribute (0028,0002) SamplesPerPixel refers to color images only and tells you the number of planes which are present in the image (e.g. 3 for RGB), so you have
PhotometricInterpretation=RGB
SamplesPerPixel=3
With 8 bits per pixel (I will revisit BPP below). As long as you have PhotometricInterpretation = MONOCHROME1 or MONOCHROME2, you can expect the SamplesPerPixel to be 1 and nothing else.
What you do have to take into consideration is the number of bits per pixel:
BitsAllocated (0028,0100)
BitsStored (0028,0101)
HighBit (0028,0102)
These tell you how many bits are used to encode a pixel value (BitsAllocated) and which of these bits really contain grayscale information (BitsStored, HighBit). HighBit is zero-based and usually but not necessarily = BitsStored-1
An example to illustrate this: For CT images, it is very common to express gray values in hounsfield units which range from -1000 to +3000. These are represented by 12 bits which are stored with a 2-byte-alignment, so
BitsAllocated (0028,0100) = 16
BitsStored (0028,0101) = 12
HighBit (0028,0102) = 11
Another degree of freedom is PixelRepresentation which tells you if the pixel data is encoded unsigned (0) or in 2s complement (1). I have seen both for CT images, however signed pixel data is rather unusual for image types other than CT.
In your example, I would assume that Bits Allocated == 32 or (not very likely) that you have a dataset containing multiple images ('frames'), so NumberOfFrames (0028,0008) > 1. If Number of Frames is absent, you can safely assume to have only one frame.
I have over-simplified a bit here, especially about color images but I think this is complicated enough ;-). Basically, DICOM offers any thinkable degree of freedom to encode pixel data and describe the encoding in the header.
I think I have recommended you to have a look at the DCMTK in a recent post. The DicomImage class features a nice interface (getInterData()) which cares about all that stuff and provides the pixel data read from a DICOM file in a normalized format.
[EDIT]: Feel free to post a DICOM dump of your dataset here, I would have a look at it and tell you how to interpret the pixel data.

Related

How does sound data look like?

I read how sounds represented with numbers in computer here.
And I figured out that usual representation is that, we get 44,100 numbers between [-32767, 32767] per second.
Then to my imagination, there's got to be a big one-column matrix, right?
I'm a R user, so speaking in R, sound data of 3 seconds would be,
s <- 3
sound <- matrix(0, ncol = 1, nrow = 44100 * s)
nrow(sound)
#> [1] 132300
one-column matrix with 132,300 rows.
Is this really the case?
I want some analogous picture in my head, say, in case of a picture with 256 * 256,
if we RGB that picture, we get 3 matrices each with 256 * 256.
And in the case of sounds, we get a long long column? As I think about this again, it's not even a matrix after all. It's a column.
Am I right? I can't find any similar dataset searching Internet.
Any advices will be welcomed. Thanks.
The raw format that is created early in that linked question could look a lot like a single dimension array. And probably the signal that is sent to the speaker to make the sound could be represented similarly.
But you're unlikely to find a file on your computer that looks like that for several reasons:
Sound can be stored at different bit depth - that is how many bits for each 'number' CD Audio tracks have a 16 bit depth, but you could have 8 or 32 bits etc. In a straight stream of these numbers you need some how to know how far to read to the next number, so that information needs to be safed somewhere.
Sample rate can vary. If you've got a sequence of numbers representing an audio signal, then you need to know how long each number lasts for.
mostly sounds are more complex. Instead of a single source, you have stereo, or 5 channel, or whatever, so the system needs to be able to store / decode multiple pieces of information for the sounds you want to hear at a particular time
much of sound is repetitive, and so can often benefit from compression.
So most sounds are stored in a compressed format that includes wrapper information about how to decode it. The wrapper information includes how to decode the different audio channels, what sort of compression was used etc.
The closest you're likely to find are a .wav file (Windows) or .aiff (Mac). But even these include some metadata (sample rate and bit depth to start).

Encoding DNA strand in Binary

Hey guys I have the following question:
Suppose we are working with strands of DNA, each strand consisting of
a sequence of 10 nucleotides. Each nucleotide can be any one of four
different types: A, G, T or C. How many bits does it take to encode a
DNA strand?
Here is my approach to it and I want to know if that is correct.
We have 10 spots. Each spot can have 4 different symbols. This means we require 4^10 combinations using our binary digits.
4^10 = 1048576.
We will then find the log base 2 of that. What do you guys think of my approach?
Each nucleotide (aka base-pair) takes two bits (one of four states -> 2 bits of information). 10 base-pairs thus take 20 bits. Reasoning that way is easier than doing the log2(4^10), but gives the same answer.
It would be fewer bits of information if there were any combinations that couldn't appear. e.g. some codons (sequence of three base-pairs) that never appear. But ten independent 2-bit pieces of information sum to 20 bits.
If some sequences appear more frequently than others, and a variable-length representation is viable, then Huffman coding or other compression schemes could save bits most of the time. This might be good in a file-format, but unlikely to be good in-memory when you're working with them.
Densely packing your data into an array of 2bit fields makes it slower to access a single base-pair, but comparing the whole chunk for equality with another chunk is still efficient. (memcmp).
20 bits is unfortunately just slightly too large for a 16bit integer (which computers are good at). Storing in an array of 32bit zero-extended values wastes a lot of space. On hardware with good unaligned support, storing 24bit zero-extended values is ok (do a 32bit load and mask the high 8 bits. Storing is even less convenient though: probably a 16b store and an 8b store, or else load the old value and merge the high 8, then do a 32b store. But that's not atomic.).
This is a similar problem for storing codons (groups of three base-pairs that code for an amino acid): 6 bits of information doesn't fill a byte. Only wasting 2 of every 8 bits isn't that bad, though.
Amino-acid sequences (where you don't care about mutations between different codons that still code for the same AA) have about 20 symbols per position, which means a symbol doesn't quite fit into a 4bit nibble.
I used to work for the phylogenetics research group at Dalhousie, so I've sometimes thought about having a look at DNA-sequence software to see if I could improve on how they internally store sequence data. I never got around to it, though. The real CPU intensive work happens in finding a maximum-likelihood evolutionary tree after you've already calculated a matrix of the evolutionary distance between every pair of input sequences. So actual sequence comparison isn't the bottleneck.
do the maths:
4^10 = 2^2^10 = 2^20
Answer: 20 bits

What is the "Law of the Eight"?

While studying this document on the Evolution of JPEG, i came across "The law of the eight" in Section 7.3 of the above document.
Despite the introduction of other block sizes from 1 to 16 with the SmartScale extension, beyond the fixed size 8 in the original JPEG standard, the fact remains that the block size of 8 will still be the default value, and that all other-size DCTs are scaled in reference to the standard 8x8 DCT.
The “Law of the Eight” explains, why the size 8 is the right default and reference value for the DCT size.
My question is
What exactly is this "law of the eight" ?
Historically, was a study performed that evaluated numerous images from a sample to arrive at the conclusion that 8x8 image block contains enough redundant data to support compression techniques using DCT? With very large image sizes like 8M(4Kx4K) fast becoming the norm in most digital images/videos, is this assumption still valid?
Another historic reason to limit the macro-block to 8x8 would have been the computationally prohibitive image-data size for larger macro-blocks. With modern super-scalar architectures (eg. CUDA) that restriction no longer applies.
Earlier similar questions exist - 1, 2 and 3. But none of them bother about any details/links/references to this mysterious fundamental "law of the eight".
1. References/excerpts/details of the original study will be highly appreciated as i would like to repeat it with a modern data-set with very large sized images to test the validity of 8x8 macro blocks being optimal.
2. In case a similar study has been recently carried-out, references to it are welcome too.
3. I do understand that SmartScale is controversial. Without any clear potential benefits 1, at best it is comparable with other backward-compliant extensions of the jpeg standard 2. My goal is to understand whether the original reasons behind choosing 8x8 as the DCT block-size (in jpeg image compression standard) are still relevant, hence i need to know what the law of the eight is.
My understanding is, the Law of the Eight is simply a humorous reference to the fact that the Baseline JPEG algorithm prescribed 8x8 as its only block size.
P.S. In other words, "the Law of the Eight" is a way to explain why "all other-size DCTs are scaled in reference to 8x8 DCT" by bringing in the historical perspective -- the lack of support for any other size in the original standard and its defacto implementations.
The next question to ask: why Eight? (Note that despite being a valid question, this is not the subject of the present discussion, which would still be relevant even if another value was picked historically, e.g. "Law of the Ten" or "Law of the Thirty Two".) The answer to that one is: because computational complexity of the problem grows as O(N^2) (unless FCT-class algorithms are employed, which grow slower as O(N log N) but are harder to implement on primitive hardware of embedded platforms, hence limited applicability), so larger block sizes quickly become impractical. Which is why 8x8 was chosen, as small enough to be practical on wide range of platforms but large enough to allow for not-too-coarse control of quantization levels for different frequencies.
Since the standard has clearly scratched an itch, a whole ecosphere soon grew around it, including implementations optimized for 8x8 as their sole supported block size. Once the ecosphere was in place, it became impossible to change the block size without breaking existing implementations. As that was highly undesirable, any tweaks to DCT/quantization parameters had to remain compatible with 8x8-only decoders. I believe this consideration must be what's referred to as the "Law of the Eight".
While not being an expert, I don't see how larger block sizes can help. First, dynamic range of values in one block will increase on average, requiring more bits to represent them. Second, relative quantization of frequencies ranging from "all" (represented by the block) to "pixel" has to stay the same (it is dictated by human perception bias after all), the quantization will get a bit smoother, that's all, and for the same compression level the potential quality increase will likely be unnoticeable.

Storing a BMP image in a QR code

I'm trying to create (or, if I've somehow missed it in my research, find) an algorithm to encode/decode a bmp image into/from a QR code format. I've been using a guide (Thonky) to try to understand the basics of QR codes and I'm still not sure how to go about this problem, specifically:
Should I encode the data as binary or would numeric be more reasonable (assuming each pixel will have a max. value of 255)?
I've searched for information on the structured append capabilities of QR codes but haven't found much detail beyond the fact that it's supported by QR codes -- how could I implement/utilize this functionality?
And, of course, if there are any tips/suggestions to better store an image as binary data, I'm very open to suggestions!
Thanks for your time,
Sean
I'm not sure you'll be able to achieve that, as the amount of information a QR Code can hold is quite limited.
First of all, you'll probably want to store your image as raw bytes, as the other formats (numeric and alphanumeric) are designed to hold text/numbers and would provide less space to store your image. Let's assume you choose the biggest possible QR Code (version 40), with the smallest level of error correction, which can hold up to 2953 bytes of binary information (see here).
First option, as you suggest, you store the image as a bitmap. This format allows no compression at all and requires (in the case of an RGB image without alpha channel) 3 bytes per pixel. If we take into account the file header size (14 to 54 bytes), and ignore the padding (each row of image data must be padded to a length being a multiple of 4), that allows you to store roughly 2900/3 = 966 pixels. If we consider a square image, this represents a 31x31 bitmap, which is small even for a thumbnail image (for example, my avatar at the end of this post is 32x32 pixels).
Second option, you use JPEG to encode your image. This format has the advantage of using a compression algorithm that can reduce the file size. This time there is no exact formula to get the size of an image fitting in 2.9kB, but I tried using a few square images and downsizing them until they fit in this size, keeping a good (93) quality factor: this gives an average of about 60x60 pixel images. (On such small images, it's normal not to see an incredible compression factor between jpeg and bmp, as the file header in a jpeg file is far larger than in a bmp file: about 500 bytes). This is better than bitmap, but remains quite small.
Finally, even if you succeed in encoding your image in this QR Code, you will encounter an other problem: a QR Code this big is very, very hard to scan successfully. As a matter of fact, this QR Code will have a size of 177x177 modules (a "module" being a small white or black square). Assuming you scan it using a smartphone providing so-called "HD" frames (1280x720 pixels), each module will have a maximum size on the frame of about 4 pixels. If you take into account the camera noise, the aliasing and the blur due to the fact that the user is never perfectly idle when scanning, the quality of the input frames will make it very hard for any QR Code decoding algorithm to successfully get the QR Code (don't forget we set its error correction level on low at the beginning of this!).
Even though it's not very good news, I hope this helps you!
There is indeed a way to encode information on several (up to 16) QR Codes, using a special header in your QR Codes called "Structured append". The best source of information you can use is the norm about QR Codes (ISO 18004:2006); it's possible (but not necessarily easy) to find it for free on the web.
The relevant part (section 9) of this norm says:
"Up to 16 QR Code symbols may be appended in a structured format. If a symbol is part of a Structured Append message, it is indicated by a header block in the first three symbol character positions.
The Structured Append Mode Indicator 0011 is placed in the four most significant bit positions in the first symbol character.
This is immediately followed by two Structured Append codewords, spread over the four least significant bits of the first symbol character, the second symbol character and the four most significant bits of the third symbol character. The first codeword is the symbol sequence indicator. The second codeword is the parity data and is identical in all symbols in the message, enabling it to be verified that all symbols read form part of the same Structured Append message. This header is immediately followed by the data codewords for the symbol commencing with the first Mode Indicator."
Nevertheless, i'm not sure most QR Code scanners can handle this, as it's a quite advanced feature.
You can define a fixed image size, reduce jpg header parts and using just vital information about it, so you can save up to 480bytes of a ~500bytes normal header.
I was using this method to store people photos for a small-club ID cards, images about 64x64 pixels is enough.

Maximizing Stored Information (Entropy?)

So I'm not sure if this question belongs here or maybe Math overflow. In any case, my question is about information theory.
Let's say I have a 16 bit word. There are 65,536 unique configurations of 1's and 0's in that number. What each one of those configurations represents is unimportant as depending on your notation (2's complement vs signed magnitude etc.) the same configuration can mean different things.
What I'm wondering is are there any techniques to store more information than that in a 16 bit word?
My original ideas were like odd/even parity or something but then I realized that's already determined by the configuration... i.e. there is no extra information encoded in that. I'm beginning to wonder if no such thing exists.
EDIT For example, let's say some magical computer (thinking quantum or something here) could understand 0,1,a. Then obviously we have 3^16 configurations and can now store more than the numbers [0 - 65,536]. Are there any other properties of a 16 bit word that you can mess with in order to encode extra information in your bit stream?
EDIT2 I am really struggling to put this into words. Right now when I look at a 16 bit word in the computer, the property which conveys information to me the relative ordering of individual 1's and 0's. Is there another property or way of looking at a 16 bit word which would allow more than 2^16 unique "configurations"? (Note it would no longer be a configuration, but 2^16 xxxx's where xxxx is a noun describing an instance of that property). The only thing I can really think of is something like if we looked at the number of 1 to 0 transitions or something rather than whether each bit was actually a 1 or 0? Now transitions does not yield more than 2^16 combinations because it is ultimately solely dependent on the configuration of 1's and 0's. I'm looking for properties that would derive from the configuration of 1's and 0's AND something else thus resulting in MORE than 2^16. Does anyone even know what this would be called if it did exist?
EDIT3 Ok I got it. My question boils down to this: How do we prove that the configuration of 1's and 0's in a word completely defines it? I.E. How do we prove that you need no other information besides the bitmap to show equality between two 16 bit words?
FINAL EDIT
I have an example... If instead of looking at the presence of 1's and 0's we look at transition between bits we can store 2^16 alphabet characters. If the bit to left is the same, treat it as a 1, if it transitions, treat it as a 0. Using the 16 bit word as a circularly linked list type structure where each link represent 0/1 we basically for a 16 bit word out of the transition between bits. That is an exact example of what I was looking for but that results in 2^16, nothing better. I am convinced that you cannot do better and am marking the correct answer =(
The amount of information in a particular configuration of 16 0/1s is determined by the probability of this configuration (this is called self-information). This can be bigger than 16 bits if the configuration is less likely than 1/(2^16), but that means that some other configurations are more likely than 1/(2^16) and so will contain less information than 16 bits.
To take into account all the possible configurations, you have to use the expected value of self-information (called entropy) of individual configurations. This value will reach its maximum when the probabilities of all configurations are equal (that is 1/(2^16)) and then it will be exactly 16 bits.
So the answer is no, you cannot store more than 16 bits of information in 16 0/1s.
See
http://en.wikipedia.org/wiki/Information_theory
http://en.wikipedia.org/wiki/Self-information
EDIT It is important to realize that bit does not stand for 0 or 1, but it is a unit of information, that is -log_2 P(w) where P(w) is the probability of a particular configuration.
You cannot store more than 2 states in one digit of a semiconductor device. You answered it yourself. The only way more information can be fitted into 16 digits is if each digit were to have many possible values.

Resources