Fast CRC using PCLMULQDQ - final reduction of 128 bits

Fast CRC using PCLMULQDQ - final reduction of 128 bits - intel

I've been trying to implement the algorithm for CRC32 calculation as described here:
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf; and I'm confused about Step 3, the reduction from 128 bits to 64 bits. Hopefully someone can clarify the steps for me:
Multiply the upper 64 bits of the remaining 128 bits with the constant K5, result is 96 bits
Multiply the upper 64 bits of the 96 bits with the constant K6, result is 64 bits
Do these results need to be XORed with the lower 64 bits of the starting 128 bits, following the pattern of the previous folds? Figure 8 in the paper doesn't specify, and I am confused by the alignment of the data in the figure.

It appears that figure 8 shows the final 128 bits (working remainder xor last 128 bits of buffer data) followed by 32 bits of appended zeros, since crc32 = (msg(x) • x^32) % p(x). So you see a total of 160 bits as 64|32|32|32.
My assumption is that the upper 64 bits are multiplied by K5 producing a 96 bit product. That product is then xor'ed to the lower 96 bits of the 160 bit entity (remember the lower 32 bits start off as 32 bits of appended zeros).
Then the upper 32 bits (not 64) of the lower 96 bits are multiplied by K6 producing a 64 bit product which is xor'ed to the lower 64 bits of the 160 bit entity.
Then the Barrett algorithm is used to produce a 32 bit CRC from the lower 64 bits of the 160 bit entity (where the lower 32 bits were originally appended zeros).
To explain the Barrett algorithm, consider the 64 bits as a dividend, and the CRC polynomial as a divisor. Then remainder = dividend - (⌊ dividend / divisor ⌋ · divisor). Rather than actually divide, pclmulqdq is used, and ⌊ dividend / divisor ⌋ = (dividend · ⌊ 2^64 / divisor ⌋) >> 64.

Related

BERT Heads Count

From the literature I read,
Bert Base has 12 encoder layers and 12 attention heads. Bert Large has 24 encoder layers and 16 attention heads.
Why is Bert large having 16 attentions heads ?

The number of attention heads is irrespective of the number of (encoder) layers.
However, there is an inherent tie between the hidden size of each model (768 for bert-base, and 1024 for bert-large), which is explained in the original Transformers paper.
Essentially, the choice made by the authors is that the self-attention block size (d_k) equals the hidden dimension (d_hidden), divided by the number of heads (h), or formally
d_k = d_hidden / h
Since the standard choice seems to be d_k = 64, we can infer the final size from our parameters:
h = d_hidden / d_k = 1024 / 64 = 16
which is exactly the value you are looking at in bert-large.

Single-bit Error Detection through CRC(Cyclic Redundancy Check)

I was going through some problems related to the single bit error detection based on the CRC generators and was trying to analyse which generator detect single-bit error and which don't.
Suppose, If I have a CRC generator polynomial as x4 + x2. Now I want to know whether it guarantees the detection of a single-bit error or not ?
According to references 1 and 2 , I am concluding some points :-
1) If k=1,2,3 for error polynomial xk, then remainders will be x,x2,x3 respectively in the case of polynomial division by generator polynomial x4 + x2 and according to the references, if generator has more than one term and coefficient of x0 is 1 then all the single bit errors can be caught. But It does not say that if coefficient of x0 is not 1 then single bit error can't be detected. It is saying that "In a cyclic code , those e(x) errors that are divisible by g(x) are not caught."
2) I have to check the remainder of E(x)/g(x) where E(x)(suppose, it is xk) where, k=1,2,3,... is error polynomial and g(x) is generator polynomial. If remainder is zero then I can't detect error and when it is non-zero then I can detect it.
So, According to me, generator polynomial x4 +x2 guarantees the detection of single-bit error based on the above 2 points.Please confirm whether I am right or not.

if coefficient of x0 is not 1 then single bit error can't be detected?
If the coefficient of x0 is not 1, it is the same as shifting the CRC polynomial left by 1 (or more) bits (multiplying by some power of x). Shifting a CRC polynomial left 1 or more bits won't affect it's ability to detect errors, it just appends 1 or more zero bits to the end of codewords.
generator polynomial x4 + x2 guarantees the detection of single-bit error
Correct. x4 + x2 is x2 + 1 shifted left two bits, x4 + x2 = (x2) (x2 + 1) = (x2) (x + 1) (x + 1) , and since x2 + 1 can detect any single bit error, then so can x4 + x2. Also with the (x + 1) term (two of these), it adds an even parity check and can detect any odd number of bit errors.
In general, all CRC polynomials can detect a single bit error regardless of message length. All CRC polynomials have a "cylic" period: if you use the CRC polynomial as the basis for a Linear Feedback Shift Register, and the initial value is 000...0001, then after some fixed number of cycles, it will cycle back to 000...0001. The simplest failure for a CRC is to have a 2 bit error, where the 2 bits are separated by a distance equal to the cyclic period. Say the period is 255 for an 8 bit CRC (9 bit polynomial), then a 2 bit error, one at bit[0] and one at bit[255] will result in a CRC = 0, and fail to be detected, This can't happen with a single bit error, it will just continue to go through the cycles, none of which include the value 0. If the period is n cycles, then no 2 bit error can fail if the number of bits in the message + CRC is <= n. All CRC polynomials that are a product of any polynomial times (x + 1) can detect any odd number of bit errors (since x + 1 is essentially adds an even parity check).
Shifting a CRC polynomial left by z bits means that every codeword will have z trailing zero bits. There are cases where this is done. Say you have a fast 32 bit CRC algorithm. To use that algorithm for a 16 bit CRC, the 17 bit CRC polynomial is shifted left 16 bits so that the least significant non-zero term is x16. After computing using the 32 bit CRC algorithm, the 32 bit CRC is shifted right 16 bits to produce the 16 bit CRC.

Reed Solomon Decoding - Error Correction - Syndromes Calculation

I am implementing Reed Solomon Decoding for QR Codes Decoding using C++.
I have implemented the main part of Decoding and error detection so far. I have followed ISO/IEC 18004:2006 Manual.
As I have seen in Annex B : Error Correction decoding steps, Syndromes S(i)
are calculated as S(i) = R(a^i). Let's assume we have High Error Correction Level, so we have 9 Data Codewords and 17 Error Correction Codewords, which give us a total of 26 codewords when we are in QR Codes Version 1. So, I assume that the polynomial R(x) shown in Pg.76 of ISO/IEC 18004:2006 Manual will be a sequence of
Data Codewords and Error Correction Codewords with correct power of x respectively. So, S(i) = R(a^j) , where i=0...15 and j=0...25 for High Error Correction Level. But, when I run my code and as I have a whole QR Code Matrix with no errors, I expect all syndromes to be equal to zero, I take as a result non-zero Syndromes. Have I understood something wrong about Syndromes calculation under Galois Field Arithmetic through Reed Solomon Decoding ?

After looking at QR Code references, for version 1, level H, with 9 data bytes and 17 error correction bytes, using generator polynomial g(x) = (x-1)(x-a)(x-a^2)...(x-a^(16)) you should be using syndromes S(i) = R(a^i) for i = 0 to 16. In a no error case, all 17 syndromes should be zero.
There's a decent wiki article for Reed Solomon error correction:
http://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction
The wiki article contains a link to a Nasa tech brief RSECC tutorial:
http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19900019023.pdf
Link to C source code for a console program that demonstrates RSECC methods for 8 bit field (user chooses from 29 possible fields). I use Microsoft compilers or Visual Studio to compile it and Windows to run it, but it should work on most systems.
Note - I updated the ecc demo program to handle erasures in addition to errors, just in case it could be useful. Also added code to calculate error value polynomial Omega in case Euclid method is not used. The link is the same as before:
http://rcgldr.net/misc/eccdemo8.zip
Update based on the questions in comments:
My question about which GF(2^8):
GF(2^8) is based on 9 bit polynomial
x^8 + x^4 + x^3 + x^2 + 1 = hex 11d
primitive is x + 0 (hex 2)
Looking up QR code references, different generator polynomials are used depending on the correction level: L (low), M (medium), Q (quality), H (high).
Question about decoding using matrices. Sklar paper shows decoding using linear equations and matrix inversion. This procedure has to assume a maximum error case t, which will be floor(e / 2) where e is the number of error correction bytes (also called parity bytes or redundant bytes). If the determinant is zero, then try t-1 errors, if that's zero, try t-2 errors and so on, until determinant is non-zero or t is reduced to zero.
The Euclid or Berlekamp Massey decoding methods will automatically determine the number of errors.
In all cases, if there are more than t errors, there's some chance that a mis-correction will occur, depending on the odds of producing t locations where none of them are out of range. If any of the t locations found from error correction are out of range, then an uncorrectable error has been detected.
Update #2
I did a quick overview of the ISO document.
The generator polynomial is (x - 1) (x - 2) (x - 2^2) ..., so the syndromes to check are S(0) to S(n-1) as you mentioned before, and in the case of zero errors, then all syndromes S(0) to S(n-1) should be zero.
The ISO document uses the term codewords to refer to bytes (or symbols), but in most ecc articles, the term codeword refers to an array of bytes including data and error correction bytes, and the error correction bytes are often called parity bytes, redundant bytes or remainder bytes. So keep this in mind if reading other ecc articles.
Page 37 of the ISO document mentions "erasures" and "errors", which is RSECC terminology. "Erasures" refer to bad (or potentially bad) data bytes at known locations, detected outside of RSECC. "Errors" refer to bad bytes not detected outside of RSECC, and only determined during RSECC decoding. The document then notes that there are no invalid data bit patterns, which would imply that there is no "erasure" detection. It then adds to the confusion by showing an equation that includes erasure and error counts.
If you're curious, the Nasa pdf file on RSECC explains erasure handling starting at page 86, but I don't think this applies to QR codes.
http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19900019023.pdf
Getting back to the ISO document, it uses p to note the number or error correction bytes used for misdecode protection, as opposed to being used for correction. This is shown in table 9 on page 38. For version 1, which seems to be what you're using, reinterpreting:
error correction level
| number of data bytes
| | number of ecc bytes used for correction
| | | number of ecc bytes used for misdecode protection (p)
| | | | correction capability
L 19 4 3 2/26 ~ 07.69%
M 16 8 2 4/26 ~ 15.38%
Q 13 12 1 6/26 ~ 23.08%
H 9 16 1 8/26 ~ 30.77%
Given that this table shows that the expected correction capability is met without the usage of erasures, then even if erasures could be detected, they are not needed.
With GF(2^8), there are 255 (not 256) possible error locations that can be generated by RSECC decoding, but in version 1, there are only 26 valid locations. Any generated location outside of the 26 valid locations would be a detection of an uncorrectable error. So for L level, the 3 p bytes translates into the odds of miscorrection by 1/(2^24), and location range muliplies this by (26/255)^2 for ~6.20E-10 probablity. For H level, the 1 p bytes translates into the odds of miscorrection by (1/2^8) and location range by (26/255)^8 for ~4.56E-11 probability.
Note that for version 2, p = 0 for levels M, Q, H, relying on the location range (44/255)^(8 or 11 or 14) for miscorrection probability of 7.87E-7, 4.04E-9, 2.07E-11.

Creating the error correction codewords for a QR code with Generator Polynomial

i am trying to create Generator polynomial for 7 error correction code words. i don't understand how coefficients calculate. The QR code specification says to use byte-wise modulo 100011101 arithmetic (where 100011101 is a binary number that is equivalent to 285 in decimal). This means that when a number is 256 or larger, it should be XORed with 285.
In other words:
2^8 = 256 xor 285 = 29 ok. But how can i calculate 5334?
5334 xor 285 = 5579 still bigger than 256.
the answer is 122. i don't understand how we found 122 ? thank you so much.

Think about numbers as about polynomials from F2[X]. It means that number one can be represented by 1, number 2 is represented by x, number 3 is represented by x + 1.
Number 5334 is represented by p_5334 = x^12+x^10+x^7+x^6+x^4+x^2+x^1
Number 285 is represented by p_285 = x^8+x^4+x^3+x^2+1
You need to get the polynomial p_5334 mod p_285.

Casio fx-83GB PLUS, how to perform x mod y?

How can I perform
x mod y (e.g. 89^3 mod 3127)
on this calculator?
I got Cryptography exam tomorrow and I can't figure out how to do the mod part on the calc that I have..
This is the encrypting part of RSA algorithm.
Any ideas?

I doubt your calculator has a modulus function. Here's a decent algorithm that works:
Compute 89^3 = 704 969. Write this down or store the result somewhere.
Now reduce modulo n. To do this, compute result / modulus and ignore the decimal, e.g. 704 969 / 3127 &approx; 225.
Multiply that number by the modulus and subtract it from the original result, e.g. 704 969 - 225*3127 = 1394.
If the original exponentiation is so large that it overflows your calculator, you can compute a smaller exponent and do the above reduction modulo n multiple times. For example, if you're asking to compute 89^10, you can instead compute 89^5, reduce that modulo n, square that result to get 89^10, and reduce the squared value modulo n as well.
A key point is that at pretty much any point in the computation process, you can reduce the value modulo n and still arrive at the same figure. Your professor may throw a curveball at you like this - or they may not. Still, better to be prepared.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Fast CRC using PCLMULQDQ - final reduction of 128 bits - intel

Related

BERT Heads Count

Single-bit Error Detection through CRC(Cyclic Redundancy Check)

Reed Solomon Decoding - Error Correction - Syndromes Calculation

Creating the error correction codewords for a QR code with Generator Polynomial

Casio fx-83GB PLUS, how to perform x mod y?

Categories

Resources