LDPC: Hard_decision decoding algorithm - vector

I am working with Bit-flipping decoding [Hard_decision] for one bit flip.
I have followed for below H_matrix :"For the bit-flipping algorithm the messages passed along the Tanner graph edges are also binary: a bit node sends a message declaring if it is a one or a zero, and each check node sends a message to each connected bit node, declaring what value the bit is based on the information available to the check node.
The check node determines that its parity-check equation is satisfied if the modulo-2 sum of the incoming bit values is zero. If the majority of the messages received by a bit node are different from its received value the bit node changes (flips) its current value. This process is repeated until all of the parity-check equations are satisfied.
Correct codeword = [10010101]
H matrix[4][8] = {{0,1,0,1,1,0,0,1},{1,1,1,0,0,1,0,0},{0,0,1,0,0,1,1,1}, {1,0,0,1,1,0,1,0}};
ReceivedCodeWord[8]={1,0,1,1,0,1,0,1}; //Error codeword
I need to get [10010101] but instead i am getting [10010001] for ReceivedCodeWord[8] = {1,0,1,1,0,1,0,1}.
But for other possible ReceivedCodeWord i am getting correct.
e.g. ReceivedCodeWord [00010101] i am getting correct codeword [10010101]
ReceivedCodeWord [11010101] i am getting correct codeword [10010101].
Doubt: why for ReceivedCodeWord {1,0,1,1,0,1,0,1} i am getting [10010001], its totally wrong. Please explain me.
Here's a link
Thanks

This is caused by the liner block code you used. Firstly, this code isn't a LDPC code, since the H martix doesn't satisfy the row-column constraint (e.g., the 3-th colmun and the 6-th column have 1s in two same poistions).
(1) When {1,0,1,1,0,1,0,1} is received , the check sums are {0,1,1,0}. If you use multiple bit flipping decoding algorithm, which means more than one bit can be flipped during one iteration, the 3-th variable node and 6-th variable node will be flipped together since both of them connect two unsatisfied check nodes, then the result will be {1,0,0,1,0,0,0,1} after first iteration. During the second iteration, the check sums are still {0,1,1,0}, so the 3-th variable node and 6-th variable node will be flipped again, this process will be recurrent in later iterations.
(2) The minimum Hamming distance of this code is 2, so this code can dectect 1 erros and correct 0.5 errors. Even you use single bit flipping decoding algorithm, it will has 50% probability to decode the received codeword {1,0,1,1,0,1,0,1} as another vaild codeword {1,0,1,1,0,0,0,1}.

Related

Seed limits in R

Consensus on set.seed in R is that that it effectively generates a long sequence of pseudo-random numbers, pre-determined by the seed. Then the first call you make to this sequence (with the first non-deterministic function you use) takes the first batch from that sequence, the second call takes the next batch, so forth.
I am wondering what the limits to this are. Specifically, what happens when you get to the end of that long sequence? Let's say, after setting a seed, you then sample from the first 100 integers repeatedly. Would there come a point where you start generating the same samples (in the same order) as you were seeing at the beginning? How long would this take? (Does it depend on the seed?) If not, how would reaching the 'end' of the sequence and presumably circling back to the beginning manifest?
The ?RNGkind help page in R gives more details on the default random number generator, the "Mersenne Twister" algorithm:
"Mersenne-Twister": From Matsumoto and Nishimura (1998); code
updated in 2002. A twisted GFSR with period 2^19937 - 1 and
equidistribution in 623 consecutive dimensions (over the
whole period). The ‘seed’ is a 624-dimensional set of 32-bit
integers plus a current position in that set.
As stated there, the "period" (the length of time it takes to get back to the beginning and start repeating values is 2^19937-1, or approximately 10^(19937/log2(10)) = 10^6001.
If the size of your "batches" happened to line up exactly with the period, then you would indeed start getting the same batches again.
I'm not sure how many pseudorandom samples R uses to pick a sample of size 1 from a set. Ideally it would be only 1 (so your "batch size" would be 1), but it might be more depending on the generality/complexity of the sampling algorithm.
I know that runif() translates more or less directly from the PRNG, so a sequence of runif() calls would indeed repeat exactly.

1's complement checksum even bit errors

I am trying to wrap my head around 1's complement checksum error detection as is used in UDP.
My understanding with simplified example for an UDP-like 1's complement checksum error checking algorithm operating on 8 bit words (I know UDP uses 16 bit words):
Sum all 8 bit words of data, carry the MSB rollover to the LSB.
Take 1's complement of this sum, set checksum, send datagram
Receiver adds with carry rollover all received 8 bit words of data in the incoming datagram, adds checksum.
If sum = 0xFF, no errors. Else, error occurred, throw away packet.
It is obvious that this algorithm can detect 1 bit errors and by extension any odd-numbered bit errors. If just one bit in an 8-bit data word is corrupted, the sum + checksum will never equal 0xFF. A plain and simple example would be A = 00000000, B = 00000001, then ~(A + B) = 11111110. If A(receiver) = 00000001, B(reciever) = 00000001, the sum + checksum would be 0x00 != 0xFF
My question is:
It's not too clear to me if this can detect 2 bit errors. My intuition says no, and a simple example is taking A = 00000001, B = 00000000, then sum + checksum would be 0xFF, but there are two total errors in A and B from sender to receiver. If the 2 bit error occurred in the same word, theres a chance it could be detected, but it doesn't seem guaranteed.
How robust is UDP error checking? Does it work for even numbers of bit errors?
Some even-bit changes can be detected, some can't.
Any error that changes the sum will be detected. So a 2-bit error that changes the sum will be detected, but a 2-bit error that does not change the sum will not be detected.
A 2-bit error in a single word (single byte in your simplified example) will change the value of that word, which will change the sum, and therefore will always be detected. Most 2-bit errors across different words will be detected, but a 2-bit error that changes the same bit in different directions (one 0->1, the other 1->0) in different words will not change the sum -- the change in value created by one of the changed bits will be cancelled out by the equal-but-opposite change in value created by the other changed bit -- and therefore that error will not be detected.
Because this checksum is simply an addition, it will also fail to detect the insertion or removal of words whose arithmetic value is zero (and since this is a one's complement computation, that means words whose content is all 0s or all 1s).
It will also fail to detect transpositions of words, (because a+b gives the same sum as b+a), or more generally it will fail to detect errors that add the same amount to one word as they subtract from the other (because a+b gives the same sum as (a+n)+(b-n), e.g. 3+3=4+2=5+1). You could consider the transposition and cancelling-error cases to be made up of multiple pairs of same-bit changes.

What is the name for encoding/encrypting with noise padding?

I want code to render n bits with n + x bits, non-sequentially. I'd Google it but my Google-fu isn't working because I don't know the term for it.
For example, the input value in the first column (2 bits) might be encoded as any of the output values in the comma-delimited second column (4 bits) below:
0 1,2,7,9
1 3,8,12,13
2 0,4,6,11
3 5,10,14,15
My goal is to take a list of integer IDs, and transform them in a way they can still be used for persistent URLs, but that can't be iterated/enumerated sequentially, and where a client cannot determine programmatically if a URL in a search result set has been visited previously without visiting it again.
I would term this process "encoding". You'll see something similar done to permit the use of communications channels that have special symbols that are not permitted in data. Examples: uuencoding and base64 encoding.
That said, you still need to (and appear at first blush to have) ensure that there is only one correct de-code; and accept the increase in size of the output (in the case above, the output will be double the size, bit-for-bit as the input).
I think you'd be better off encrypting the number with a cheap cypher + a constant secret key stored on your server(s), adding a random character or four at the end, and a cheap checksum, and simply reject any responses that don't have a valid checksum.
<encrypt(secret)>
<integer>+<random nonsense>
</encrypt>
+
<checksum()>
<integer>+<random nonsense>
</checksum>
Then decrypt the first part (remember, cheap == fast), validate the ciphertext using the checksum, throw off the random nonsense, and use the integer you stored.
There are probably some cryptographic no-no's here, but let's face it, the cost of this algorithm being broken is a touch on the low side.

What is the cost of deleting a value from a hashtable?

Now I have this question where I was asked the cost of deleting a value from a hash table when we used linear probing while the insertion process.
What I could figure out from reading various stuff on the internet is that it has to do something with the load factor. Though I am not sure, but I read a relation between the load factor and no of probes required and it is No of probes = 1 / (1-LF).
So I believe the cost has to be dependent on the probe sequence. But then another thought ruins everything.
What if the element was inserted in p probes and now I am trying to delete this element. But before this I had already deleted few elements having the same hash code and were a part of insertion in probes less than p.
In this case I reach to a stage where I see a slot empty in the hash table but I am not sure if the element I am trying to delete is already deleted or is at some other location as a result of probing.
I also found that once I delete an element I must mark this slot with some special indicator to inform that it is available, but this doesn't solve my problem of being uncertain about the element which I am willing to delete.
Could anyone please suggest how to find the cost in such cases?
Is the approach going to vary if it is non-linear probing?
The standard approach is "lookup the element, mark as deleted". Marking obviously has O(1) cost, so the total operation cost is the same as just lookup: O(1) expected. It can be as high as O(n) in degenerate cases (e.g. all elements have the same hash). O(1) expected is all we can say theoretically.
About the load factor. The higher the load factor (ratio of number of occupied buckets to the total number), the larger is the expected factor (but this doesn't change the theoretical O cost). Note that in this case load factor includes number of both present in the table elements plus the number of buckets that got marked as deleted previously.
Other probing kinds (e.g. quadratic) don't change the theoretical cost, but may alter the expected constant factor or its variance. If you look at "fallback" sequences, in linear ordering the sequences of different buckets overlap. This means that if for some bucket the sequence is long, for adjacent buckets it will also be long. E.g.: if buckets 4 to 10 are occupied, sequence for bucket #4 is 7 bucket long (4, 5, 6, ..., 10), for #5 it's 6 and so on. For quadratic probing this is not the case.
However, linear probing has the benefit of better memory-cache behavior, since you check memory cells close to each other. In practice, though, for quadratic probing fallback sequences are rarely long enough for this to matter.
Finally, in linear probing case, it is possible to work without deleted mark, but for this you'd have to complicate deleting procedure considerably (still O(1) expected, though, but with much higher constant factor). Whether it is worth it has to be decided with actual profiling; for example, this simplifies inserting somewhat and lookup a bit. For a C++ implementation this would have the downside that erase() would invalidate iterators, though.

error correction code upper bound

If I want to send a d-bit packet and add another r bits for error correction code (d>r)
how many errors I can find and correct at most?
You have 2^d different kinds of packets of length d bits you want to send. Adding your r bits to them makes them into codewords of length d+r, so now you have 2^d possible codewords you could send. The receiver could get 2^(d+r) different received words(codewords with possible errors). The question then becomes, how do you map those 2^(d+r) received words to the 2^d codewords?
This comes down to the minimum distance of the code. That is, for each pair of codewords, find the number of bits where they differ, then take the smallest of those values.
Let's say you had a minimum distance of 3. You received a word and you notice that it isn't one of the codewords. That is, there's an error. So, for the lack of a better decoding algorithm, you flip the first bit, and see if its a codeword. If it isn't you flip it back and flip the next one. Eventually, you get a codeword. Since all codewords differ in 3 positions, you know this codeword is the "closest" to the received word, since you would have to flip 2 bits in the received word to get to another codeword. If you didn't get a codeword from flipping just one bit at a time, you can't figure out where the errors are, since there are multiple codewords you could get to by flipping two bits, but you know there are at least two errors.
This leads to the general principle that for a minimum distance md, you can detect md-1 errors and correct floor((md-1)/2) errors. Calculating the minimum distance depends on the details of how you generate the codewords, otherwise known as the code. There are various bounds you can use to figure out an upper limit on md based on d and (d+r).
Paul mentioned the Hamming Code, which is a good example. It achieves the Hamming bound. For the (7,4) Hamming code, you have 4 bit messages and 7 bit codewords, and you achieve a minimum distance of 3. Obviously*, you are never going to get a minimum distance greater than the number of bits you are adding so this is the very best you can do. Don't get too used to this though. The Hamming code is one of the few examples of a non-trivial perfect code, and most of those have a minimum distance that is less than the number of bits you add.
*It's not really obvious, but I'm pretty sure it's true for non-trivial error correcting codes. Adding one parity bit gets you a minimum distance of two, allowing you to detect an error. The code consisting of {000,111} gets you a minimum distance of 3 by adding just 2 bits, but it's trivial.
You should probably read the wikipedia page on this:
http://en.wikipedia.org/wiki/Error_detection_and_correction
It sounds like you specifically want a Hamming Code:
http://en.wikipedia.org/wiki/Hamming_code#General_algorithm
Using that scheme, you can look up some example values from the linked table.

Resources