Calculating the right number of bits in a bloom filter - math

I'm trying to make a configurable bloom filter. In the constructor you set the predicted necessary capacity of the filter (n), the desired error rate (p), and a list of hash functions (of size k).
According to Wikipedia, the following relation holds (m being the number of bits):
p = (1 - k * n / m) ** k
Since I get p, n and k as parameters, I need to solve for m; I get the following:
m = k * n / (1 - p ** (1 / k))
However, there are a few things that make me think I did something wrong. For starters, p ** (1 / k) will tend towards 1 for a large enough k, which means the whole fraction is ill defined (because you can conceivably divide by 0).
Another thing you may notice is that as p (the allowed maximum error rate) grows, so does m, which is totally backwards.
Where did I go wrong?

You did solve the equation correctly, however note that Wikipedia states:
The probability of all of them being 1, which would cause
the algorithm to erroneously claim that the element is in
the set, is often given as:
p ~= (1 - (1 - 1 / m) ** (k * n)) ** k ~= (1 - Exp(-k * n / m)) ** k
This is very different from what you've stated:
p = (1 - k * n / m) ** k
So what you really want to start with is
p = (1 - (1 - 1 / m) ** (k * n)) ** k
I worked this out to be
(1 - 1 / m) ** (k * n) = 1 - p ** (1 / k)
1 - 1 / m = (1 - p ** (1 / k)) ** (1 / (k * n))
m - 1 = m * (1 - p ** (1 / k)) ** (1 / (k * n))
m - m * (1 - p ** (1 / k)) ** (1 / (k * n)) = 1
m * (1 - (1 - p ** (1 / k)) ** (1 / (k * n))) = 1
m = 1 / (1 - (1 - p ** (1 / k)) ** (1 / (k * n)))

Related

Which one is faster? O(2^n) or O(n!)

I'm studying algorithm complexity and I am trying to figure out this one question that runs in my mind- is O(n!) faster than O(2^n) or is it the opposite way around?
O(2^n) is 2 * 2 * 2 * ... where O(n!) is 1 * 2 * 3 * 4 * ...
O(n!) will quickly grow much larger - so O(2^n) is faster.
For example: 2^10 = 1024 and 10! = 3628800
You can try working with Stirling's approximation for n!
https://en.wikipedia.org/wiki/Stirling%27s_approximation
n! = (n / e)^n * sqrt(2 * Pi * n) * (1 + o(n))
Now, let's compare
O(n!) <=> O(2^n)
In order to find out the right letter <, = or > let's compute limit
lim (n! / 2^n) =
n -> +inf
lim (n / e)^n * sqrt(2 * pi * n) / 2^n >=
n -> +inf
lim n^n / (2 * e)^n >= // when n > 4 * e
n -> +inf
lim (4 * e)^n / (2 * e)^n =
n -> +inf
lim 2^n = +inf
n -> +inf
So
lim (n! / 2^n) = +inf
n -> +inf
which means that O(n!) > O(2^)

Solve sigma example

I have a sigma example:
And I don't have any idea how to solve it. Can you help me with the code, please?
(Code pascal, java or c++)
Expanding the inner term, you get m^3 - 3m^2n + 3mn^2 - n^3, which yields a double summation of m^5, -3m^4n, 3m^3n^2 and -m^2n^3. These summations are separable, meaning that they are the product of a sum on m of a power of m and a sum on n of a power of n.
You can evaluate these sums by means of the Faulhaber formulas up to degree five, which are polynomial expressions. Evaluate them by Horner's method.
int F1(int n) { return (n + 1) * n / 2; }
int F2(int n) { return ((2 * n + 3) * n + 1) * n / 6; }
int F3(int n) { return ((n + 2) * n + 1) * n * n / 4; }
...
int S= F5(20) * 30 - 3 * F4(20) * F1(30) + 3 * F3(20) * F2(30) - F2(20) * F3(30);
Using the obvious method of summation, the inner loop will evaluate 30 cubes of a difference, for a total of 60 additions and 60 multiplications, and the outer loop will repeat this 20 times, with extra multiplications and additions, for a total of 1220 + and 1240 *.
Compare to the above method, performing 18 +, 30 * and 7 divisions in total (independently of the values of m and n).

How to find n in ((r^n-1)/(r-1))%p = s, if p is prime?

I thought of reducing it to this, but couldn't come up to any conclusion.
((r^n-1)/(r-1))%p == ((r^n-1)*(invmod(r-1,p)))%p.
it's also given that n should lie in between [1,p) if possible and for every r^i where i belongs [1,p) are distinct and contains all the numbers from [1,p).
Please help !
I will assume in this answer that we are talking about r^(n-1)
x % p = s
means that exists an arbitrary integer number m so that
x = p * m + s
since the % is periodic and divides numbers into modulo classes. This means that
(r ^ (n - 1)) / (r - 1) = p * m + s
where m is an arbitrary integer number. This means that
r ^ (n - 1) = (p * m + s) * (r - 1)
Since all the numbers are positive, we can turn this into logarithmic formula:
ln (r ^ (n - 1)) = ln ((p * m + s) * (r - 1))
Since power inside a logarithm is equivalent to a scalar, we can do some further modifications:
(n - 1) * ln(r) = ln ((p * m + s) * (r - 1))
so
n * ln(r) = ln ((p * m + s) * (r - 1)) + ln(r)
therefore
n * ln(r) = ln((p * m + s) * r * (r - 1))
Finally:
n = ln((p * m + s) * r * (r - 1)) / ln(r)
We can further refine this if needed:
n = log(r, (p * m + s) * r * (r - 1))
So
n = log(r, r) + log(r, (p * m + s) * (r - 1))
which is
n = 1 + log(r, (p * m + s) * (r - 1))
You will need to analyze the problem space, knowing that n, r and s are in the interval of [1, p) and m is an arbitrary integer. So, the question is: what is the set of possible integer values for m that will allow all the three values to be in the desired interval and what will the possible values be. This is a longer analysis which is outside the scope of a short SO answer, but I think you should be ok from here. If not, then ask another question where you will be stuck and let me know about it.

Mutual Information in a Binary Erasure Channel

Imagine a Binary Erasure Channel as depicted on Wikipedia.
One equation describing the mutual information is following:
I(x;y)
= H(x) - H(x|y)
= H(x) - p(y=0) • 0 - p(y=?) • H(x) -p(y=1) • 0
Why is it "p(y=?) • H(x)" and not "p(y=?) • H(x|y=?)"?
It can be proved using Bayes' theorem.
The channel:
x y
1-f
0 --------> 0
\
\ f
+------> ?
/
/ 1-f
1---------> 1
Let input distribution be P(x) = {p(x=0)=g; p(x=1)=1-g}
Then:
p(x=0/y=?) = p(y=?/x=0) * p(x=0) / p(y=?)
p(x=0/y=?) = (f * g) / (f * g - f * (1 - g)) = g;
p(x=1/y=?) = p(y=?/x=1) * p(x=1) / p(y=?)
p(x=1/y=?) = (f * (1 - g)) / (f * g - f * (1 - g)) = 1 - g;
As result:
p(x=0/y=?) = p(x=0)
p(x=1/y=?) = p(x=1)
From the defenitions of entropy and conditional entropy:
H(X) = p(x=0) * log(1 / p(x=0)) + p(x=1) * log(1 / p(x=1))
H(X/y=?) = p(x=0/y=?) * log(1 / p(x=0/y=?)) + p(x=1/y=?) * log(1 / p(x=1/y=?))
So:
H(X) = H(X/y=?)

Interpolation considering acceleration [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I don't really know if what I want to do is considered interpolation but I'll try to explain.
Now when I want to go from point A to point B(for simplicity consider only 1 coordinate space) in time T I compute position using linear interpolation formula:
P(t) = A + (B-A) * (t / T), T != 0
This works fine in most cases, but I want to cosider acceleration and braking like this:
first x% of the time it will be acceleration from vi speed to v speed
next y% of the time it will be constant v speed
last z% of the time it will be deceleration to reach vf speed at t = T
How can I compute P(t), t in [0, T] considering acceleration and braking?
Consider we have the following points in time:
t0 = 0 is the beginning of the movement
ta is the point when acceleration ends
td is the point when decceleration begins
T is the end of the movement
Then we have three segments of the movement. [t0, ta], (ta, td], (td, T]. Each can be specified separately. For the acceleration / decceleration we need to calculate the acceleration aa and the decceleration ad as follows:
aa = (v - vi) / (ta - t0)
ad = (vf - v) / (T - td)
According to your question, all values are given.
Then the movement can be expressed as:
P(t) :=
if(t < ta)
1 / 2 * aa * t^2 + vi * t + A
else if(t < td)
v * (t - ta) + 1 / 2 * aa * ta^2 + vi * ta + A
// this is the length of the first part
else
1 / 2 * ad * (t - td)^2 + v * (t - td)
+ v * (td - ta) + 1 / 2 * aa * ta^2 + vi * ta + A
//those are the lengths of the first two parts
If we precompute the lengths of the parts as
s1 := 1 / 2 * aa * ta^2 + vi * ta + A
s2 := v * (td - ta)
then the formula becomes a bit shorter:
P(t) :=
if(t < ta)
1 / 2 * aa * t^2 + vi * t + A
else if(t < td)
v * (t - ta) + s1
else
1 / 2 * ad * (t - td)^2 + v * (t - td) + s1 + s2
Here is an example plot:
However, it is very likely that the movement does not hit B at T except you chose proper values. That is because the equation is over-specified. You can e.g. calculate v based on B instead of specifying it.
Edit
The calculation of v to reach a specific B is:
v = (2 * A - 2 * B - td * vf + T * vf + ta * vi) / (ta - td - T)

Resources