Reed Solomon error correction for QR-Code - qr-code

As a QR-Code uses Reed-Solomon for error correction, am I right, that with certain levels of corruption, a QR-Code reader could theoretically return wrong results?
If yes, are there other levels of integrity checks (checksums etc.) that would prevent that?

You can do a web search for "QR Code ISO", to find a pdf version of the document. I found one here:
https://www.swisseduc.ch/informatik/theoretische_informatik/qr_codes/docs/qr_standard.pdf
There are multiple strengths of error correction in the standard, and to avoid mis-correction, in some cases, some of the "parity" bytes are only used for error detection, and not for error correction. This is shown in table 13 in the pdf file linked to above. The ones marked with a "b" are cases where some of the parity bytes are used only for error detection. For example, the very first entry in table 13 shows (26,19,2)b, which means 26 total bytes, 19 data bytes, and 2 byte correction, which means of the 26-19 = 7 parity bytes, 4 are used for correction (each corrected byte requires 2 parity bytes unless hardware can flag "erasures"), and 3 are used for detection only.
If the error correction calculates an invalid location (one that is "outside" the range of valid locations), that will be flagged as a detected error. If the number of unique calculated locations is less than than the number of assumed errors used to calculate those locations (duplicate or non-existing root) that will be flagged as a detected error. For higher levels of error correction, the odds of all the calculated locations being valid for bad data is so small that none of the parity bytes are used for error detection only. These cases don't have the "b" in their table 13 entries.
The choices made for the various levels of error correction result in a very small chance of a bad result, but it's always possible.
are there other levels of integrity checks (checksums etc.) that would prevent that?
A QR-Code reader could flags bytes where any of the bits were not clearly 0 or 1 (like a shade of grey on a black / white code) as potential "erasures", which would lower the odds of a bad result. I don't know if this is done.
When generating a QR-Code, a mask is chosen to even out the ratio of light and dark areas in a code, and after correction, if there is evidence that the wrong mask was chosen, that could be flagged as a detected error, but I'm not sure if the "best" mask is always chosen when a code is printed, so I don't know if a check for "best" mask is used.

Related

What’s the significance of the number “536870916” in computer science

I’ve noticed in many games there are a lot of errors regarding the number 536870916. For example, in one game that’s coded in Lua, the maximum number you can damage an enemy is 536870916, which is undocumented. I noticed other errors regarding this number when I googled it, for example:
“Random crash "Failed to allocate 536870916 bytes and will now terminate"”
Does anyone happen to know why this is?
There's nothing all that special about 536870916. It just happens to be very close to a power of 2: 229 = 536870912.
536870912 bytes is 512MiB, or 0.5GiB. It's a reasonable memory limit to configure for an application, so numbers going slightly above it are bound to appear in crash reports.
If you search numbers 536870912-536870916 on Google you'll see a diminishing number of results:
536870912: 47,500,000 results
536870913: 7,920,000 results
536870914: 36,300 results
536870915: 7,720 results
536870916: 8,380 results
Another source where you might see 536870916 is when numbers are used as bit sets to store flags. Sometimes error codes are stored like this. In binary, 536870916 has only 2 bits set, which makes it a union of two flags.

1's complement checksum even bit errors

I am trying to wrap my head around 1's complement checksum error detection as is used in UDP.
My understanding with simplified example for an UDP-like 1's complement checksum error checking algorithm operating on 8 bit words (I know UDP uses 16 bit words):
Sum all 8 bit words of data, carry the MSB rollover to the LSB.
Take 1's complement of this sum, set checksum, send datagram
Receiver adds with carry rollover all received 8 bit words of data in the incoming datagram, adds checksum.
If sum = 0xFF, no errors. Else, error occurred, throw away packet.
It is obvious that this algorithm can detect 1 bit errors and by extension any odd-numbered bit errors. If just one bit in an 8-bit data word is corrupted, the sum + checksum will never equal 0xFF. A plain and simple example would be A = 00000000, B = 00000001, then ~(A + B) = 11111110. If A(receiver) = 00000001, B(reciever) = 00000001, the sum + checksum would be 0x00 != 0xFF
My question is:
It's not too clear to me if this can detect 2 bit errors. My intuition says no, and a simple example is taking A = 00000001, B = 00000000, then sum + checksum would be 0xFF, but there are two total errors in A and B from sender to receiver. If the 2 bit error occurred in the same word, theres a chance it could be detected, but it doesn't seem guaranteed.
How robust is UDP error checking? Does it work for even numbers of bit errors?
Some even-bit changes can be detected, some can't.
Any error that changes the sum will be detected. So a 2-bit error that changes the sum will be detected, but a 2-bit error that does not change the sum will not be detected.
A 2-bit error in a single word (single byte in your simplified example) will change the value of that word, which will change the sum, and therefore will always be detected. Most 2-bit errors across different words will be detected, but a 2-bit error that changes the same bit in different directions (one 0->1, the other 1->0) in different words will not change the sum -- the change in value created by one of the changed bits will be cancelled out by the equal-but-opposite change in value created by the other changed bit -- and therefore that error will not be detected.
Because this checksum is simply an addition, it will also fail to detect the insertion or removal of words whose arithmetic value is zero (and since this is a one's complement computation, that means words whose content is all 0s or all 1s).
It will also fail to detect transpositions of words, (because a+b gives the same sum as b+a), or more generally it will fail to detect errors that add the same amount to one word as they subtract from the other (because a+b gives the same sum as (a+n)+(b-n), e.g. 3+3=4+2=5+1). You could consider the transposition and cancelling-error cases to be made up of multiple pairs of same-bit changes.

LDPC: Hard_decision decoding algorithm

I am working with Bit-flipping decoding [Hard_decision] for one bit flip.
I have followed for below H_matrix :"For the bit-flipping algorithm the messages passed along the Tanner graph edges are also binary: a bit node sends a message declaring if it is a one or a zero, and each check node sends a message to each connected bit node, declaring what value the bit is based on the information available to the check node.
The check node determines that its parity-check equation is satisfied if the modulo-2 sum of the incoming bit values is zero. If the majority of the messages received by a bit node are different from its received value the bit node changes (flips) its current value. This process is repeated until all of the parity-check equations are satisfied.
Correct codeword = [10010101]
H matrix[4][8] = {{0,1,0,1,1,0,0,1},{1,1,1,0,0,1,0,0},{0,0,1,0,0,1,1,1}, {1,0,0,1,1,0,1,0}};
ReceivedCodeWord[8]={1,0,1,1,0,1,0,1}; //Error codeword
I need to get [10010101] but instead i am getting [10010001] for ReceivedCodeWord[8] = {1,0,1,1,0,1,0,1}.
But for other possible ReceivedCodeWord i am getting correct.
e.g. ReceivedCodeWord [00010101] i am getting correct codeword [10010101]
ReceivedCodeWord [11010101] i am getting correct codeword [10010101].
Doubt: why for ReceivedCodeWord {1,0,1,1,0,1,0,1} i am getting [10010001], its totally wrong. Please explain me.
Here's a link
Thanks
This is caused by the liner block code you used. Firstly, this code isn't a LDPC code, since the H martix doesn't satisfy the row-column constraint (e.g., the 3-th colmun and the 6-th column have 1s in two same poistions).
(1) When {1,0,1,1,0,1,0,1} is received , the check sums are {0,1,1,0}. If you use multiple bit flipping decoding algorithm, which means more than one bit can be flipped during one iteration, the 3-th variable node and 6-th variable node will be flipped together since both of them connect two unsatisfied check nodes, then the result will be {1,0,0,1,0,0,0,1} after first iteration. During the second iteration, the check sums are still {0,1,1,0}, so the 3-th variable node and 6-th variable node will be flipped again, this process will be recurrent in later iterations.
(2) The minimum Hamming distance of this code is 2, so this code can dectect 1 erros and correct 0.5 errors. Even you use single bit flipping decoding algorithm, it will has 50% probability to decode the received codeword {1,0,1,1,0,1,0,1} as another vaild codeword {1,0,1,1,0,0,0,1}.

What is the name for encoding/encrypting with noise padding?

I want code to render n bits with n + x bits, non-sequentially. I'd Google it but my Google-fu isn't working because I don't know the term for it.
For example, the input value in the first column (2 bits) might be encoded as any of the output values in the comma-delimited second column (4 bits) below:
0 1,2,7,9
1 3,8,12,13
2 0,4,6,11
3 5,10,14,15
My goal is to take a list of integer IDs, and transform them in a way they can still be used for persistent URLs, but that can't be iterated/enumerated sequentially, and where a client cannot determine programmatically if a URL in a search result set has been visited previously without visiting it again.
I would term this process "encoding". You'll see something similar done to permit the use of communications channels that have special symbols that are not permitted in data. Examples: uuencoding and base64 encoding.
That said, you still need to (and appear at first blush to have) ensure that there is only one correct de-code; and accept the increase in size of the output (in the case above, the output will be double the size, bit-for-bit as the input).
I think you'd be better off encrypting the number with a cheap cypher + a constant secret key stored on your server(s), adding a random character or four at the end, and a cheap checksum, and simply reject any responses that don't have a valid checksum.
<encrypt(secret)>
<integer>+<random nonsense>
</encrypt>
+
<checksum()>
<integer>+<random nonsense>
</checksum>
Then decrypt the first part (remember, cheap == fast), validate the ciphertext using the checksum, throw off the random nonsense, and use the integer you stored.
There are probably some cryptographic no-no's here, but let's face it, the cost of this algorithm being broken is a touch on the low side.

error correction code upper bound

If I want to send a d-bit packet and add another r bits for error correction code (d>r)
how many errors I can find and correct at most?
You have 2^d different kinds of packets of length d bits you want to send. Adding your r bits to them makes them into codewords of length d+r, so now you have 2^d possible codewords you could send. The receiver could get 2^(d+r) different received words(codewords with possible errors). The question then becomes, how do you map those 2^(d+r) received words to the 2^d codewords?
This comes down to the minimum distance of the code. That is, for each pair of codewords, find the number of bits where they differ, then take the smallest of those values.
Let's say you had a minimum distance of 3. You received a word and you notice that it isn't one of the codewords. That is, there's an error. So, for the lack of a better decoding algorithm, you flip the first bit, and see if its a codeword. If it isn't you flip it back and flip the next one. Eventually, you get a codeword. Since all codewords differ in 3 positions, you know this codeword is the "closest" to the received word, since you would have to flip 2 bits in the received word to get to another codeword. If you didn't get a codeword from flipping just one bit at a time, you can't figure out where the errors are, since there are multiple codewords you could get to by flipping two bits, but you know there are at least two errors.
This leads to the general principle that for a minimum distance md, you can detect md-1 errors and correct floor((md-1)/2) errors. Calculating the minimum distance depends on the details of how you generate the codewords, otherwise known as the code. There are various bounds you can use to figure out an upper limit on md based on d and (d+r).
Paul mentioned the Hamming Code, which is a good example. It achieves the Hamming bound. For the (7,4) Hamming code, you have 4 bit messages and 7 bit codewords, and you achieve a minimum distance of 3. Obviously*, you are never going to get a minimum distance greater than the number of bits you are adding so this is the very best you can do. Don't get too used to this though. The Hamming code is one of the few examples of a non-trivial perfect code, and most of those have a minimum distance that is less than the number of bits you add.
*It's not really obvious, but I'm pretty sure it's true for non-trivial error correcting codes. Adding one parity bit gets you a minimum distance of two, allowing you to detect an error. The code consisting of {000,111} gets you a minimum distance of 3 by adding just 2 bits, but it's trivial.
You should probably read the wikipedia page on this:
http://en.wikipedia.org/wiki/Error_detection_and_correction
It sounds like you specifically want a Hamming Code:
http://en.wikipedia.org/wiki/Hamming_code#General_algorithm
Using that scheme, you can look up some example values from the linked table.

Resources