Reed Solomon Decoding - Error Correction - Syndromes Calculation - qr-code

I am implementing Reed Solomon Decoding for QR Codes Decoding using C++.
I have implemented the main part of Decoding and error detection so far. I have followed ISO/IEC 18004:2006 Manual.
As I have seen in Annex B : Error Correction decoding steps, Syndromes S(i)
are calculated as S(i) = R(a^i). Let's assume we have High Error Correction Level, so we have 9 Data Codewords and 17 Error Correction Codewords, which give us a total of 26 codewords when we are in QR Codes Version 1. So, I assume that the polynomial R(x) shown in Pg.76 of ISO/IEC 18004:2006 Manual will be a sequence of
Data Codewords and Error Correction Codewords with correct power of x respectively. So, S(i) = R(a^j) , where i=0...15 and j=0...25 for High Error Correction Level. But, when I run my code and as I have a whole QR Code Matrix with no errors, I expect all syndromes to be equal to zero, I take as a result non-zero Syndromes. Have I understood something wrong about Syndromes calculation under Galois Field Arithmetic through Reed Solomon Decoding ?

After looking at QR Code references, for version 1, level H, with 9 data bytes and 17 error correction bytes, using generator polynomial g(x) = (x-1)(x-a)(x-a^2)...(x-a^(16)) you should be using syndromes S(i) = R(a^i) for i = 0 to 16. In a no error case, all 17 syndromes should be zero.
There's a decent wiki article for Reed Solomon error correction:
http://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction
The wiki article contains a link to a Nasa tech brief RSECC tutorial:
http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19900019023.pdf
Link to C source code for a console program that demonstrates RSECC methods for 8 bit field (user chooses from 29 possible fields). I use Microsoft compilers or Visual Studio to compile it and Windows to run it, but it should work on most systems.
Note - I updated the ecc demo program to handle erasures in addition to errors, just in case it could be useful. Also added code to calculate error value polynomial Omega in case Euclid method is not used. The link is the same as before:
http://rcgldr.net/misc/eccdemo8.zip
Update based on the questions in comments:
My question about which GF(2^8):
GF(2^8) is based on 9 bit polynomial
x^8 + x^4 + x^3 + x^2 + 1 = hex 11d
primitive is x + 0 (hex 2)
Looking up QR code references, different generator polynomials are used depending on the correction level: L (low), M (medium), Q (quality), H (high).
Question about decoding using matrices. Sklar paper shows decoding using linear equations and matrix inversion. This procedure has to assume a maximum error case t, which will be floor(e / 2) where e is the number of error correction bytes (also called parity bytes or redundant bytes). If the determinant is zero, then try t-1 errors, if that's zero, try t-2 errors and so on, until determinant is non-zero or t is reduced to zero.
The Euclid or Berlekamp Massey decoding methods will automatically determine the number of errors.
In all cases, if there are more than t errors, there's some chance that a mis-correction will occur, depending on the odds of producing t locations where none of them are out of range. If any of the t locations found from error correction are out of range, then an uncorrectable error has been detected.
Update #2
I did a quick overview of the ISO document.
The generator polynomial is (x - 1) (x - 2) (x - 2^2) ..., so the syndromes to check are S(0) to S(n-1) as you mentioned before, and in the case of zero errors, then all syndromes S(0) to S(n-1) should be zero.
The ISO document uses the term codewords to refer to bytes (or symbols), but in most ecc articles, the term codeword refers to an array of bytes including data and error correction bytes, and the error correction bytes are often called parity bytes, redundant bytes or remainder bytes. So keep this in mind if reading other ecc articles.
Page 37 of the ISO document mentions "erasures" and "errors", which is RSECC terminology. "Erasures" refer to bad (or potentially bad) data bytes at known locations, detected outside of RSECC. "Errors" refer to bad bytes not detected outside of RSECC, and only determined during RSECC decoding. The document then notes that there are no invalid data bit patterns, which would imply that there is no "erasure" detection. It then adds to the confusion by showing an equation that includes erasure and error counts.
If you're curious, the Nasa pdf file on RSECC explains erasure handling starting at page 86, but I don't think this applies to QR codes.
http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19900019023.pdf
Getting back to the ISO document, it uses p to note the number or error correction bytes used for misdecode protection, as opposed to being used for correction. This is shown in table 9 on page 38. For version 1, which seems to be what you're using, reinterpreting:
error correction level
| number of data bytes
| | number of ecc bytes used for correction
| | | number of ecc bytes used for misdecode protection (p)
| | | | correction capability
L 19 4 3 2/26 ~ 07.69%
M 16 8 2 4/26 ~ 15.38%
Q 13 12 1 6/26 ~ 23.08%
H 9 16 1 8/26 ~ 30.77%
Given that this table shows that the expected correction capability is met without the usage of erasures, then even if erasures could be detected, they are not needed.
With GF(2^8), there are 255 (not 256) possible error locations that can be generated by RSECC decoding, but in version 1, there are only 26 valid locations. Any generated location outside of the 26 valid locations would be a detection of an uncorrectable error. So for L level, the 3 p bytes translates into the odds of miscorrection by 1/(2^24), and location range muliplies this by (26/255)^2 for ~6.20E-10 probablity. For H level, the 1 p bytes translates into the odds of miscorrection by (1/2^8) and location range by (26/255)^8 for ~4.56E-11 probability.
Note that for version 2, p = 0 for levels M, Q, H, relying on the location range (44/255)^(8 or 11 or 14) for miscorrection probability of 7.87E-7, 4.04E-9, 2.07E-11.

Related

Learning Cyclical Redundancy Check Failure Cases

I am trying to understand how likely a Cyclic Redundancy Check is to fail, given a particular divisor P(x). I am specifically interested in failures resulting from an odd number of bit flips in my message, an example to follow.
Some prerequisite info:
CRC is a very commonly used way to detect errors in computer networks.
P(x), G(x), R(x), and T(x) are all polynomials under binary field arithmetic (i.e., all coefficients are mod2: 0 or 1).'
P(x) is the polynomial that we are given and that we will divide by.
E(x) is an error pattern. It is XORed with T(x) to get T'(x). G(x) is the message that we want to send.
R(x) is the remainder of G(x)/P(x) or just G(x)modP(x).
T(x) is our sent data, (x^k)G(x)+R(x), where k is the degree of P(x).
T'(x) is our received data but potentially with errors.
When T'(x) is received, if T'(x)modP(x)=0 then it is said to be error-free. It may not actually be error-free.
Proof:
Assume an odd number of errors has x + 1 as a factor.
Then E(x) = (x + 1)T(x).
Evaluate E(x) for x = 1 → E(x) = E(1) = 1 since there are odd number of terms.
But (x + 1)T(x) = (1 + 1)T(1) = 0.
Therefore, E(x) cannot have x + 1 as a factor.
Example:
Say my P(x)=x^7 +1=10000001\
Let G(x)=x^7+x^6+x^3+x+1=11001011\
So, T(x)=110010111001010\
When E(x)=011111011111111\
T'(x)=E(x)XORT(x)=101101100110101\
T'(x) modulus P(x)=0, a failure.\
I simulated the results on a particular message(T(x)), namely 11001011, and found CRC to fail 42 of the 16384 possible odd parity bit flips that I attempted. Failure means that T'(x)modP(x)=0.
I expected odd parity bit errors to be caught based on the above proof.
Is the proof wrong, or am I doing my example calculation wrong?
What I really want to know is, given P(x)=x^7 +1, what are the offs that any general message with an odd number of bit flips will be erroneous but not be caught as being erroneous?
Sorry, this is so long-winded but I just want to make sure everything is super clear.

Single-bit Error Detection through CRC(Cyclic Redundancy Check)

I was going through some problems related to the single bit error detection based on the CRC generators and was trying to analyse which generator detect single-bit error and which don't.
Suppose, If I have a CRC generator polynomial as x4 + x2. Now I want to know whether it guarantees the detection of a single-bit error or not ?
According to references 1 and 2 , I am concluding some points :-
1) If k=1,2,3 for error polynomial xk, then remainders will be x,x2,x3 respectively in the case of polynomial division by generator polynomial x4 + x2 and according to the references, if generator has more than one term and coefficient of x0 is 1 then all the single bit errors can be caught. But It does not say that if coefficient of x0 is not 1 then single bit error can't be detected. It is saying that "In a cyclic code , those e(x) errors that are divisible by g(x) are not caught."
2) I have to check the remainder of E(x)/g(x) where E(x)(suppose, it is xk) where, k=1,2,3,... is error polynomial and g(x) is generator polynomial. If remainder is zero then I can't detect error and when it is non-zero then I can detect it.
So, According to me, generator polynomial x4 +x2 guarantees the detection of single-bit error based on the above 2 points.Please confirm whether I am right or not.
if coefficient of x0 is not 1 then single bit error can't be detected?
If the coefficient of x0 is not 1, it is the same as shifting the CRC polynomial left by 1 (or more) bits (multiplying by some power of x). Shifting a CRC polynomial left 1 or more bits won't affect it's ability to detect errors, it just appends 1 or more zero bits to the end of codewords.
generator polynomial x4 + x2 guarantees the detection of single-bit error
Correct. x4 + x2 is x2 + 1 shifted left two bits, x4 + x2 = (x2) (x2 + 1) = (x2) (x + 1) (x + 1) , and since x2 + 1 can detect any single bit error, then so can x4 + x2. Also with the (x + 1) term (two of these), it adds an even parity check and can detect any odd number of bit errors.
In general, all CRC polynomials can detect a single bit error regardless of message length. All CRC polynomials have a "cylic" period: if you use the CRC polynomial as the basis for a Linear Feedback Shift Register, and the initial value is 000...0001, then after some fixed number of cycles, it will cycle back to 000...0001. The simplest failure for a CRC is to have a 2 bit error, where the 2 bits are separated by a distance equal to the cyclic period. Say the period is 255 for an 8 bit CRC (9 bit polynomial), then a 2 bit error, one at bit[0] and one at bit[255] will result in a CRC = 0, and fail to be detected, This can't happen with a single bit error, it will just continue to go through the cycles, none of which include the value 0. If the period is n cycles, then no 2 bit error can fail if the number of bits in the message + CRC is <= n. All CRC polynomials that are a product of any polynomial times (x + 1) can detect any odd number of bit errors (since x + 1 is essentially adds an even parity check).
Shifting a CRC polynomial left by z bits means that every codeword will have z trailing zero bits. There are cases where this is done. Say you have a fast 32 bit CRC algorithm. To use that algorithm for a 16 bit CRC, the 17 bit CRC polynomial is shifted left 16 bits so that the least significant non-zero term is x16. After computing using the 32 bit CRC algorithm, the 32 bit CRC is shifted right 16 bits to produce the 16 bit CRC.

Data link layer - CRC what does divide by 1 + x mean?

Can someone please explain what this part of CRC codes from Tannenbaum computer networks means!
If there has been a single-bit error, E(x) = x^i , where i determines which bit is
in error. If G(x) contains two or more terms, it will never divide into E(x), so all
single-bit errors will be detected.
And
If there have been two isolated single-bit errors, E(x) = x^i + x^j , where i > j.
Alternatively, this can be written as E(x) = x^j (x^(i − j) + 1). If we assume that G(x)
is not divisible by x, a sufficient condition for all double errors to be detected is
that G(x) does not divide x ^k + 1 for any k up to the maximum value of i − j (i.e.,
up to the maximum frame length). Simple, low-degree polynomials that give pro-
tection to long frames are known. For example, x ^15 + x ^14 + 1 will not divide
x ^k + 1 for any value of k below 32,768.
Please post in simple terms so I can understand it a bit more!. EXAMPLEs are appreciated. Thanks in advance!
A message is a sequence of bits. You can convert any sequence of bits into a polynomial by just making each bit the coefficient of 1, x, x2, etc. starting with the first bit. So 100101 becomes 1+x3+x5.
You can make these polynomials useful by considering their coefficients to be members of the simplest finite field, GF(2), which consists only of the elements 0 and 1. There addition is the exclusive-or operation and multiplication is the and operation.
Now you can do all the things you did with polynomials in high school, but where the coefficients are over GF(2). So 1+x added to x+x2 becomes 1+x2. 1+x times 1+x becomes 1+x2. (Work it out.)
Cyclic Redundancy Checks (CRCs) are derived from this approach to binary message arithmetic, where a message converted to a polynomial is divided by a special constant polynomial whose degree is the number of bits in the CRC. Then the coefficients of the remainder of that polynomial division is the CRC of that message.
Read Ross William's CRC tutorial for more. (Real CRCs are not just that remainder, but you'll see.)

Machine Arithmetic and Smearing: addition of a large an small number

So to 10000 one will add the value 1/10000 10000times. Logically this gives 10001.
However, due to smearing this does not occur which stems from storage limitations. The result is 10000.999999992928.
I have located where the smearing occurs, which is in the second addition:
1: 10000.0001
2: 10000.000199999999
3: 10000.000299999998
4: 10000.000399999997
etc...
However, grasping why the smearing occurred is where the struggle lies.
I wrote code to generate floating point binary numbers to see whether smearing occurred here
So 10000 = 10011100010000 or 1.001110001*10**13 while
0.0001= 0.00000000000001101001 or
1.1010001101101110001011101011000111000100001100101101*2**(-14)
then 10000.0001 = 10011100010000.00000000000001101001
Now the smearing occurs in the next addition. Does it have to do with mantissa size? Why does it only occur in this step as well? Just interested to know. I am going to add all the 1/10000 first and then add it to the 10000 to avoid smaering.
The small "smearing" error for a single addition can be computed exactly as
a=10000; b=0.0001
err = ((a+b)-a)-b
print "err=",err
>>> err= -7.07223084891e-13
The rounding error of an addition is of size (abs(a)+abs(b))*mu/2 or around 1e4 * 1e-16 = 1e-12, which nicely fits the computed result.
In general you also have to test the expression ((a+b)-b)-a, but one of them is always zero, here the latter one.
And indeed this single step error accumulated over all the steps already gives the observed result, secondary errors relating to the slow increase in the sum as first term in each addition having a much lower impact.
print err*10000
>>> -7.072230848908026e-09
print 10001+err*10000
>>> 10000.999999992928
The main problem is that 1/10000 i.e. 0.0001 cannot be encoded exactly as a machine float value (see the IEEE 754 standard), since 10000 is not a power of 2. Also 1/10 = 0.1 cannot be encoded as machine float, so you will experience phanomena like 0.1 + 0.1 + 0.1 > 0.3.
When computing with double precision (64 bit) the following holds:
1.0001 - 1 < 0.0001
10000.0001 + 9999*0.0001 == 10001
So I assume you are computing with single precision (32 bit)?

why does pgeom n parameter value need to be one less than I expected?

There is an example question in the book Head First Statistics:
20% of cereal packets contain a free toy. What’s the probability
you’ll need to open fewer than 4 cereal packets before finding your
first toy?
The worked solution is given as:
P(X ≤ 3)
= 1 - q^r
= 1 - 0.8^3
= 1 - 0.512
= 0.488
I would have expected to use the following R statement:
> pgeom(3, 0.2)
[1] 0.5904
But as you can see the answer isn't as expected. The correct value for the n parameter is 2 as can be seen below:
> pgeom(2, 0.2)
[1] 0.488
Can someone explain why this is the case and where I am thinking about this incorrectly?
I just ran into this. My text book and pgeom use different density functions. From the documentation, pgeom uses p(x) = p*(1-p)^x, my book uses p(x) = p*(1-p)^(x-1). Presumably Head First uses the latter formula too.
The question says "fewer than 4".
So, if you consider a max of 3 tries, then the number of failures is 2 before you get the free toy and from documentation:
pgeom(q, prob, lower.tail = TRUE, log.p = FALSE) where
q: "vector of quantiles representing the number of failures in a sequence of Bernoulli trials before success occurs"
The first parameter in pgeom would be the number of failed trials.
Your question asks the probability of finding a toy in less than 4 packets. So you can either find a toy in packet 1,2 or 3. That means either 1 or 2 failed trials can occur but not 3 (because your 3rd packet is a success event).
P.S: I'm taking an intro level stat course and faced a similar confusion. This is what I've convinced myself.

Resources