Statistical probability of N contiguous true-bits in a sequence of bits? - math

Let's assume I have an N-bit stream of generated bits. (In my case 64kilobits.)
Whats the probability of finding a sequence of X "all true" bits, contained within a stream of N bits. Where X = (2 to 16), and N = (16 to 1000000), and X < N.
For example:
If N=16 and X=5, whats the likelyhood of finding 11111 within a 16-bit number.
Like this pseudo-code:
int N = 1<<16; // (64KB)
int X = 5;
int Count = 0;
for (int i = 0; i < N; i++) {
int ThisCount = ContiguousBitsDiscovered(i, X);
Count += ThisCount;
}
return Count;
That is, if we ran an integer in a loop from 0 to 64K-1... how many times would 11111 appear within those numbers.
Extra rule: 1111110000000000 doesn't count, because it has 6 true values in a row, not 5. So:
1111110000000000 = 0x // because its 6 contiguous true bits, not 5.
1111100000000000 = 1x
0111110000000000 = 1x
0011111000000000 = 1x
1111101111100000 = 2x
I'm trying to do some work involving physically-based random-number generation, and detecting "how random" the numbers are. Thats what this is for.
...
This would be easy to solve if N were less than 32 or so, I could just "run a loop" from 0 to 4GB, then count how many contiguous bits were detected once the loop was completed. Then I could store the number and use it later.
Considering that X ranges from 2 to 16, I'd literally only need to store 15 numbers, each less than 32 bits! (if N=32)!
BUT in my case N = 65,536. So I'd need to run a loop, for 2^65,536 iterations. Basically impossible :)
No way to "experimentally calculate the values for a given X, if N = 65,536". So I need maths, basically.

Fix X and N, obiously with X < N. You have 2^N possible values of combinations of 0 and 1 in your bit number, and you have N-X +1 possible sequences of 1*X (in this part I'm only looking for 1's together) contained in you bit number. Consider for example N = 5 and X = 2, this is a possible valid bit number 01011, so fixed the last two characteres (the last two 1's) you have 2^2 possible combinations for that 1*Xsequence. Then you have two cases:
Border case: Your 1*X is in the border, then you have (2^(N -X -1))*2 possible combinations
Inner case: You have (2^(N -X -2))*(N-X-1) possible combinations.
So, the probability is (border + inner )/2^N
Examples:
1)N = 3, X =2, then the proability is 2/2^3
2) N = 4, X = 2, then the probaility is 5/16

A bit brute force, but I'd do something like this to avoid getting mired in statistics theory:
Multiply the probabilities (1 bit = 0.5, 2 bits = 0.5*0.5, etc) while looping
Keep track of each X and when you have the product of X bits, flip it and continue
Start with small example (N = 5, X=1 - 5) to make sure you get edge cases right, compare to brute force approach.
This can probably be expressed as something like Sum (Sum 0.5^x (x = 1 -> 16) (for n = 1 - 65536) , but edge cases need to be taken into account (i.e. 7 bits doesn't fit, discard probability), which gives me a bit of a headache. :-)

#Andrex answer is plain wrong as it counts some combinations several times.
For example consider the case N=3, X=1. Then the combination 101 happens only 1/2^3 times but the border calculation counts it two times: one as the sequence starting with 10 and another time as the sequence ending with 01.
His calculations gives a (1+4)/8 probability whereas there are only 4 unique sequences that have at least a single contiguous 1 (as opposed to cases such as 011):
001
010
100
101
and so the probability is 4/8.
To count the number of unique sequences you need to account for sequences that can appear multiple times. As long as X is smaller than N/2 this will happens. Not sure how you can count them tho.

Related

Check whether a number can be expressed as sum of x powers of two

Is there a bit trick to check whether a number can be expressed as sum of x powers of 2?
Example: For x=3 n=21, the numbers are 16, 4, and 1. If n=30, it should be false, because there are no 3 powers of two to represent 30.
For a number n …
… the minimum x is the number of 1-bits in n. This number is called popcount(n).
Example: The binary number 0b1011 needs at least popcount(0b1011)=3 powers of two to be summed up (0b1000+0b0010+0b0001).
… the maximum x is n. Because 1 is a power of two you can add 1 n times to get n.
Now comes the hard question. What if x is between popcount(n) and n?
As it turns out, all of these x are possible. To build a sum of x powers of two …
start at the shortest sum (the binary representation of n)
If you have less than x addends, split any addend that is bigger than 1 into two addends, increasing the number of addends by one. This can be done until you arrive at x=n.
Example: Can 11=0b1011 be expressed as a sum of x=7 powers of two?
Yes, because popcount(n)=3 <= x=7 <= n=11.
To build a sum with x=7 powers of two we use
11 = 0b1011 = 0b1000+0b10+0b1 | only 3 addends, so split one
= (0b100+0b100)+0b10+0b1 | only 4 addends, so split another one
= ((0b10+0b10)+0b100)+0b10+0b1 | only 5 addends, so split another one
= (((0b1+0b1)+0b10)+0b100)+0b10+0b1 | only 6 addends, so split another one
= (((0b1+0b1)+(0b1+0b1))+0b100)+0b10+0b1 | 7 addends, done
Implementation
To implement the check »can n can be expressed as sum of x powers of two?« use
isSumOfXPowersOfTwo(n, x) {
return x<=n && x>=popcount(n).
}
For efficient bit-twiddling implementations of popcount see this question. Some processors even have an instruction for that.

How to find number of way that a postage of n cents can be made by 4, 6, 10 cents?

For example,
n = 4 (4x1) 1 way
n = 10 (4x1, 6x1) (10x1) 2 ways
Is there any equation can express the number of way?
You have used recurrence-relation tag - yes, it is possible to use recurrence to calculate the number of ways.
P(N) = P(N-10) + P(N-6) + P(N-4)
P(0) = 1
Explanation - you can get sum N, using (N-10) cents sum and 10-cent coin and so on.
For rather large values of N recursive algorithm will work too long, so one could build dynamic programming algorithm to accelerate calculations (DP will reuse calculated values for smaller sums)
Suppose you have a list of denominations. In your case it is A = [4,6,10]. So suppose you have the following things:
A = [4,6,10]
Length of list A = N
Sum = K
The problem can be written as:
# Given the list of denominations, its length and the sum.
P(A,N,K) = 0 if N < 0 or K < 0,
1 if K = 0,
P(A,N-1,K) + P(A,N-1,k-A[N]) #A[N]-> Nth element of list
As we can see the possibility of re-using sub-problems, DP will work wonderfully.

What is the fastest way to find if a large integer is power of ten?

I could just use division and modulus in a loop, but this is slow for really large integers. The number is stored in base two, and may be as large as 2^8192. I only need to know if it is a power of ten, so I figure there may be a shortcut (other than using a lookup table).
If your number x is a power of ten then
x = 10^y
for some integer y, which means that
x = (2^y)(5^y)
So, shift the integer right until there are no more trailing zeroes (should be a very low cost operation) and count the number of digits shifted (call this k). Now check if the remaining number is 5^k. If it is, then your original number is a power of 10. Otherwise, it's not. Since 2 and 5 are both prime this will always work.
Let's say that X is your input value, and we start with the assumption.
X = 10 ^ Something
Where Something is an Integer.
So we say the following:
log10(X) = Something.
So if X is a power of 10, then Something will be an Integer.
Example
int x = 10000;
double test = Math.log10(x);
if(test == ((int)test))
System.out.println("Is a power of 10");

how do I generate 2 random prime numbers that when multiplied, yield a number with X bits? (X given as argument))

I lack the math skills to make this function.
basically, i want to return 2 random prime numbers that when multiplied, yield a number of bits X given as argument.
for example:
if I say my X is 3 then a possible solution would be:
p = 2 and q = 3 becouse 2 * 3 = 6 (110 has 3 bits).
A problem with this statement is that it starts by asking for two "random" prime numbers. Without any explicit statement of the distribution of the required random primes, we are already stuck. (This is the beginning of a classic paradox, where we are asked to generate a "random" integer.)
But suppose that we change the statement to finding any two arbitrary primes, that yield the desired product with a given number of bits x. The answer is trivial.
The set of numbers that have exactly x bits in their binary representation is the half open set of integers [2^(x-1),2^x-1].
Choose an arbitrary prime number that is less than or equal to (2^x-1)/2. Call it p1.
Next, choose a second prime number that lies in the interval (2^(x-1)/p1,(2^x-1)/p1). Call it p2.
It must be true that p1*p2 will be in the desired interval.
For example, given x = 10, so the product must lie in the interval [512,1023], the set of integers with exactly 10 bits. (Note, there are apparently 147 such numbers in that interval, with exactly two prime factors.)
Step 1:
Choose p1 as any prime no larger than 1023/2 = 511.5. I'll pick p1 = 137. Then the second prime factor must be a prime that lies in the interval
[512 1023]/137
ans =
3.7372 7.4672
thus either 5 or 7.
dec2bin(137*[5 7])
ans =
1010101101
1110111111
If you know the number of bits, you can generate a number 2^(x-2) < x < 2^(x-1). Then take the square root and find the closest primes on either side of it. Multiplying them together will, in most cases, get you a number in the correct range. If it's too high, you can take the two primes directly on the lower side of it.
pseudocode:
x = bits
primelist[] = makeprimelist()
rand = randnum between 2^(x-2) and 2^(x-1)
n = findposition(primelist, rand)
do
result = primelist[n]*primelist[n+1]
n--
while result > 2^(x-1)
Note that numbers generated this way will allways have '1' as the highest significant bit, so would be possible to generate a number of x-1 bits and just tack the 1 onto the end.

Geometrical progression with any number row

I can have any number row which consists from 2 to 10 numbers. And from this row, I have to get geometrical progression.
For example:
Given number row: 125 5 625 I have to get answer 5. Row: 128 8 512 I have to get answer 4.
Can you give me a hand? I don't ask for a program, just a hint, I want to understand it by myself and write a code by myself, but damn, I have been thinking the whole day and couldn't figure this out.
Thank you.
DON'T WRITE THE WHOLE PROGRAM!
Guys, you don't get it, I can't just simple make a division. I actually have to get geometrical progression + show all numbers. In 128 8 512 row all numbers would be: 8 32 128 512
Seth's answer is the right one. I'm leaving this answer here to help elaborate on why the answer to 128 8 512 is 4 because people seem to be having trouble with that.
A geometric progression's elements can be written in the form c*b^n where b is the number you're looking for (b is also necessarily greater than 1), c is a constant and n is some arbritrary number.
So the best bet is to start with the smallest number, factorize it and look at all possible solutions to writing it in the c*b^n form, then using that b on the remaining numbers. Return the largest result that works.
So for your examples:
125 5 625
Start with 5. 5 is prime, so it can be written in only one way: 5 = 1*5^1. So your b is 5. You can stop now, assuming you know the row is in fact geometric. If you need to determine whether it's geometric then test that b on the remaining numbers.
128 8 512
8 can be written in more than one way: 8 = 1*8^1, 8 = 2*2^2, 8 = 2*4^1, 8 = 4*2^1. So you have three possible values for b, with a few different options for c. Try the biggest first. 8 doesn't work. Try 4. It works! 128 = 2*4^3 and 512 = 2*4^4. So b is 4 and c is 2.
3 15 375
This one is a bit mean because the first number is prime but isn't b, it's c. So you'll need to make sure that if your first b-candidate doesn't work on the remaining numbers you have to look at the next smallest number and decompose it. So here you'd decompose 15: 15 = 15*?^0 (degenerate case), 15 = 3*5^1, 15 = 5*3^1, 15 = 1*15^1. The answer is 5, and 3 = 3*5^0, so it works out.
Edit: I think this should be correct now.
This algorithm does not rely on factoring, only on the Euclidean Algorithm, and a close variant thereof. This makes it slightly more mathematically sophisticated then a solution that uses factoring, but it will be MUCH faster. If you understand the Euclidean Algorithm and logarithms, the math should not be a problem.
(1) Sort the set of numbers. You have numbers of the form ab^{n1} < .. < ab^{nk}.
Example: (3 * 2, 3*2^5, 3*2^7, 3*2^13)
(2) Form a new list whose nth element of the (n+1)st element of the sorted list divided by the (n)th. You now have b^{n2 - n1}, b^{n3 - n2}, ..., b^{nk - n(k-1)}.
(Continued) Example: (2^4, 2^2, 2^6)
Define d_i = n_(i+1) - n_i (do not program this -- you couldn't even if you wanted to, since the n_i are unknown -- this is just to explain how the program works).
(Continued) Example: d_1 = 4, d_2 = 2, d_3 = 6
Note that in our example problem, we're free to take either (a = 3, b = 2) or (a = 3/2, b = 4). The bottom line is any power of the "real" b that divides all entries in the list from step (2) is a correct answer. It follows that we can raise b to any power that divides all the d_i (in this case any power that divides 4, 2, and 6). The problem is we know neither b nor the d_i. But if we let m = gcd(d_1, ... d_(k-1)), then we CAN find b^m, which is sufficient.
NOTE: Given b^i and b^j, we can find b^gcd(i, j) using:
log(b^i) / log(b^j) = (i log b) / (j log b) = i/j
This permits us to use a modified version of the Euclidean Algorithm to find b^gcd(i, j). The "action" is all in the exponents: addition has been replaced by multiplication, multiplication with exponentiation, and (consequently) quotients with logarithms:
import math
def power_remainder(a, b):
q = int(math.log(a) / math.log(b))
return a / (b ** q)
def power_gcd(a, b):
while b != 1:
a, b = b, power_remainder(a, b)
return a
(3) Since all the elements of the original set differ by powers of r = b^gcd(d_1, ..., d_(k-1)), they are all of the form cr^n, as desired. However, c may not be an integer. Let me know if this is a problem.
The simplest approach would be to factorize the numbers and find the greatest number they have in common. But be careful, factorization has an exponential complexity so it might stop working if you get big numbers in the row.
What you want is to know the Greatest Common Divisor of all numbers in a row.
One method is to check if they all can be divided by the smaller number in the row.
If not, try half the smaller number in the row.
Then keep going down until you find a number that divides them all or your divisor equals 1.
Seth Answer is not correct, applyin that solution does not solves 128 8 2048 row for example (2*4^x), you get:
8 128 2048 =>
16 16 =>
GCD = 16
It is true that the solution is a factor of this result but you will need to factor it and check one by one what is the correct answer, in this case you will need to check the solutions factors in reverse order 16, 8, 4, 2 until you see 4 matches all the conditions.

Resources