How to calculate the maximum of data bits for each QR code? - qr-code

Having some information for QR version 40 (177*177 modules) with correction level L (7% error correction)
Version: 40
Error Correction Level: L
Data bits: 23.648
Numeric Mode: 7089
Alphanumeric Mode: 4296
Byte Mode: 2953
I don’t know about these points:
Does 1 module equal 1 bit?
How to calculate the maximum number of data bits in a QR code type? e.g Why do we have 23,648 for data bits?
How to convert data bits to Numeric/Alphanumeric in a QR code type? e.g. why do we have 7,089 for Numeric and 4,296 for Alphanumeric?
Thanks all!

The derivation of the numbers to which you refer is a result of several distinct steps performed when generating the symbol described in detail by ISO/IEC 18004.
Any formula for the data capacity will be necessarily awkward and unenlightening since many of the parameters that determine the structure of QR Code symbols have been manually chosen and therefore implementations must generally resort to including tables of constants for these non-computed values.
How to derive the number of usable data bits
Essentially the total number of data modules for a chosen symbol version would be the total symbol area less any function pattern modules and format/version information modules:
DataModules = Rows × Columns − ( FinderModules + AlignmentModules + TimingPatternModules ) − ( FormatInformationModules + VersionInformationModules )
The values of these parameters are constants defined per symbol version.
Some of these data modules are then allocated to error correction purposes as defined by the chosen error correction level. What remains is the usable data capacity of the symbol found by treating each remaining module as a single bit:
UsableDataBits = DataModules − ErrorCorrectionBits
How to derive the character capacity for each mode
Encoding of the input data begins with a 4-bit mode indicator followed by a character count value whose length depends on the version of the symbol and the mode. Then the data is encoded according to the rules for the particular mode resulting in the following data compaction:
Numeric Groups of 3 characters into 10 bits; 2 remainders into 7 bits; 1 remainder into 4 bits.
Alphanumeric Groups of 2 characters into 11 bits; 1 remainder into 6 bits.
Byte Each character into 8 bits.
Kanji Each wide-character into 13 bits.
Although it does not affect the symbol capacity, for completeness I'll point out that a 4-bit terminator pattern is appended which may be truncated or omitted if there is insufficient capacity in the symbol. Any remaining data bits are then filled with a padding pattern.
Worked Example
Given a version 40 symbol with error correction level L.
The size is 177×177 = 31329 modules
There are three 8×8 finder patterns (192 modules), forty six 5×5 alignment patterns (1150 modules) and 272 timing modules, totalling 1614 function pattern modules.
There are also 31 format information modules and 36 version information modules, totalling 67 modules.
DataModules = 31329 − 1614 − 67 = 29648
Error correction level L dictates that there shall be 750 8-bit error correction codewords (6000 bits):
UsableDataBits = 29648 − 6000 = 23648
The character count lengths for a version 40 symbol are specified as follows:
Numeric 14 bits.
Alphanumeric 13 bits.
Byte 16 bits.
Kanji 12 bits.
Consider alphanumeric encoding. From the derived UsableDataBits figure of 23648 bits available we take 4 bits for the mode indicator and 13 bits for the character count leaving just 23631 for the actual alphanumeric data (and truncatable terminator and padding.)
You quoted 4296 as the alphanumeric capacity of a version 40-L QR Code symbol. Now 4296 alphanumeric characters becomes exactly 2148 groups of two characters each converted to 11 bits, producing 23628 data bits which is just inside our symbol capacity. However 4297 characters would produce 2148 groups with one remainder character that would be encoded into 6 bits, which produces 23628 + 6 bits overall – exceeding the 23631 bits available. So 4296 characters is clearly the correct alphanumeric capacity of a type 40-L QR Code.
Similarly for numeric encoding we have 23648−4−14 = 23630 bits available. Your quoted 7089 is exactly 2363 groups of three characters each converted to 10 bits, producing 23630 bits – exactly filling the bits available. Clearly any further characters would not fit so we have found our limit.
Caveat
Whilst the character capacity can be derived using the above procedure in practise QR Code permits encoding the input using multiple modes within a single symbol and a decent QR Code generator will switch between modes as often as necessary to optimise the overall data density. This makes the whole business of considering the capacity limits much less useful for open applications since they only describe the pathological case.

Related

Mathematical precision at 19th decimal place and beyond

I have the same set of data and am running the same code, but sometimes I get different results at the 19th decimal place and beyond. Although this is not a great concern to me for numbers less than 0.0001, it makes me wonder whether 19th decimal place is Raku's limit of precision?
Word 104 differ:
0.04948872986571077 19 chars
0.04948872986571079 19 chars
Word 105 differ:
0.004052062278212545 20 chars
0.0040520622782125445 21 chars
TL;DR See the doc's outstanding Numerics page.
(I had forgotten about that page before I wrote the following answer. Consider this answer at best a brief summary of a few aspects of that page.)
There are two aspects to this. Internal precision and printing precision.
100% internal precision until RAM is exhausted
Raku supports arbitrary precision number types. Quoting Wikipedia's relevant page:
digits of precision are limited only by the available memory of the host system
You can direct Raku to use one of its arbitrary precision types.[1] If you do so it will retain 100% precision until it runs out of RAM.
Arbitrary precision type
Corresponding type checking[2]
Example of value of that type
Int
my Int $foo ...
66174449004242214902112876935633591964790957800362273
FatRat
my FatRat $foo ...
66174449004242214902112876935633591964790957800362273 / 13234889800848443102075932929798260216894990083844716
Thus you can get arbitrary internal precision for integers and fractions (including arbitrary precision decimals).
Limited internal precision
If you do not direct Raku to use an arbitrary precision number type it will do its best but may ultimately switch to limited precision. For example, Raku will give up on 100% precision if a formula you use calculates a Rat and the number's denominator exceeds 64 bits.[1]
Raku's fall back limited precision number type is Num:
On most platforms, [a Num is] an IEEE 754 64-bit floating point numbers, aka "double precision".
Quoting the Wikipedia page for that standard:
Floating point is used ... when a wider range is needed ... even if at the cost of precision.
The 53-bit significand precision gives from 15 to 17 significant decimal digits precision (2−53 ≈ 1.11 × 10−16).
Printing precision
Separate from internal precision is stringification of numbers.
(It was at this stage that I remembered the doc page on Numerics linked at the start of this answer.)
Quoting Printing rationals:
Keep in mind that output routines like say or put ... may choose to display a Num as an Int or a Rat number. For a more definitive string to output, use the raku method or [for a rational number] .nude
Footnotes
[1] You control the type of a numeric expression via the types of individual numbers in the expression, and the types of the results of numeric operations, which in turn depend on the types of the numbers. Examples:
1 + 2 is 3, an Int, because both 1 and 2 are Ints, and a + b is an Int if both a and b are Ints;
1 / 2 is not an Int even though both 1 and 2 are individually Ints, but is instead 1/2 aka 0.5, a Rat.
1 + 4 / 2 will print out as 3, but the 3 is internally a Rat, not an Int, due to Numeric infectiousness.
[2] All that enforcement does is generate a run-time error if you try to assign or bind a value that is not of the numeric type you've specified as the variable's type constraint. Enforcement doesn't mean that Raku will convert numbers for you. You have to write your formulae to ensure the result you get is what you want.[1] You can use coercion -- but coercion cannot regain precision that's already been lost.

Representing decimal numbers in binary

How do I represent integers numbers, for example, 23647 in two bytes, where one byte contains the last two digits (47) and the other contains the rest of the digits(236)?
There are several ways do to this.
One way is to try to use Binary Coded Decimal (BCD). This codes decimal digits, rather than the number as a whole into binary. The packed form puts two decimal digits into a byte. However, your example value 23647 has five decimal digits and will not fit into two bytes in BCD. This method will fit values up to 9999.
Another way is to put each of your two parts in binary and place each part into a byte. You can do integer division by 100 to get the upper part, so in Python you could use
upperbyte = 23647 // 100
Then the lower part can be gotten by the modulus operation:
lowerbyte = 23647 % 100
Python will directly convert the results into binary and store them that way. You can do all this in one step in Python and many other languages:
upperbyte, lowerbyte = divmod(23647, 100)
You are guaranteed that the lowerbyte value fits, but if the given value is too large the upperbyte value many not actually fit into a byte. All this assumes that the value is positive, since negative values would complicate things.
(This following answer was for a previous version of the question, which was to fit a floating-point number like 36.47 into two bytes, one byte for the integer part and another byte for the fractional part.)
One way to do that is to "shift" the number so you consider those two bytes to be a single integer.
Take your value (36.47), multiply it by 256 (the number of values that fit into one byte), round it to the nearest integer, convert that to binary. The bottom 8 bits of that value are the "decimal numbers" and the next 8 bits are the "integer value." If there are any other bits still remaining, your number was too large and there is an overflow condition.
This assumes you want to handle only non-negative values. Handling negatives complicates things somewhat. The final result is only an approximation to your starting value, but that is the best you can do.
Doing those calculations on 36.47 gives the binary integer
10010001111000
So the "decimal byte" is 01111000 and the "integer byte" is 100100 or 00100100 when filled out to 8 bits. This represents the float number 36.46875 exactly and your desired value 36.47 approximately.

How to do Division of two fixed point 64 bits variables in Synthesizable Verilog?

I'm implementing an Math equation in verilog, in a combinational scheme (assigns = ...) to the moment Synthesis tool (Quartus II) has been able to do add, sub and mul easly 32 bit unsigned absolute numbers by using the operators "+,- and *" respectively.
However, one of the final steps of the equation is to divide two 64 bits unsigned fixed point variables, the reason why is such of large 64 bit capacity is because I'm destinating 16 bits for integers and 48 bits for fractions (although, computer does everything in binary and doesn't care about fractions, I would be able to check the number to separate fraction from integer in the end).
Problem is that the operator "/" is useless since it auto-invokes a so-called "LPM_divide" library which output only gives me the integer, disregarding fractions, plus in a wrong position (the less significant bit).
For example:
b1000111010000001_000000000000000000000000000000000000000000000000 / b1000111010000001_000000000000000000000000000000000000000000000000
should be 1, it gives me
b0000000000000000_000000000000000000000000000000000000000000000001
So, how can I make this division for synthesizable verilog? What methods or algorithms should I follow, I'd like it to be faster, maybe a full combinational?
I'd like it to keep the 16 integers - 24 fractions user point of view. Thanks in advance.
First assume you multiply two fixed-point numbers.
Let's call them X and Y, first containing Xf fractional bits, and second Yf fractional bits accordingly.
If you multiply those numbers as integers, the LSB Xf+Yf bits of the integer result could be treated as fractional bits of resulting fixed-point number (and you still multiply them as integers).
Similarly, if you divide number of Sf fractional bits by number of Df fractional bits, the resulting integer could be treated as fixed-point number having Sf-Df fractional bits -- therefore your example with resulting integer 1.
Thus, if you need to get 48 fractional bits from your division of 16.48 number by another 16.48 number, append divident with another 48 zeroed fractional bits, then divide the resulting 64+48=112-bit number by another 64-bit number, treating both as integers (and using LPM_divide). The result's LSB 48 bits will then be what you need -- the resulting fixed-point number's 48 fractional bits.

What's the significance of the bit group size in base64 or base32 encoding (RFC 4648)?

Why would they chose to use a 24-bit or 40-bit (that's really odd) bit group/word size for base 64 and base 32 respectively.
Specifically, can someone explain why the the least common multiple is significant?
lcm(log2(64), 8) = 24
lcm(log2(32), 8) = 40
Base 64 encoding basically involves taking a stream of 8-bit bytes and transforming it to a stream of 6-bit characters that can be represented by printable ASCII characters.
Taking a single byte at a time means you have one 6 bit character with 2 bits left over.
Taking two bytes (16 bits) means you have two 6-bit characters with 4 bits left over.
Taking 3 bytes (24 bits) means you have three bytes that can be split exactly into 4 characters with no bits left over.
So the lcm of bytes size and character size is naturally the size you need to split your input into.
6 bit characters are chosen because this is the largest size that you can use printable ascii characters for all values. If you went up to 7 bits you would need non-printing characters.
The argument for base 32 is similar, but now you are using 5-bit characters, so the lcm of 8 and 5 is the word size. This character size allows for case insensitive printable characters, 6 bit characters require differentiating between upper and lower cases.

Y = base64(X) where X is integer - is Y alphanumeric?

Additional details:
X is any positive integer 6 digits or less.
X is left-padded with zeros to maintain a width of 6.
Please explain your answer :)
(This might be better in the Math site, but figured it involves programming functions)
The picture from the german Wikipedia article is very helpful:
You see that 6 consecutive bits from the original bytes generate a Base64 value. To generate + or / (codes 62 and 63), you'd need the bitstrings 111110 and 111111, so at least 5 consecutive bits set.
However, look at the ASCII codes for 0...9:
00110000
00110001
00110010
00110011
00110100
00110101
00110110
00110111
00111000
00111001
No matter how you concatenate six of those, there won't be more than 3 consecutive bits set. So it's not possible to generate a Base64 string that contains + or / this way, Y will always be alphanumeric.
EDIT: In fact, you can even rule other Base64 values out like 000010 (C), so this leads to nice follow-up questions/puzzles like "How many of the 64 values are possible at all?".

Resources