Tracking average number of results per query over time

Tracking average number of results per query over time - math

This is perhaps more of a match question, but asking here due to the constraints of the logging apparatus.
An API accepts a batch of query terms, and returns N results per query. N is a non-negative integer.
For example:
Request 1 contains 100 query terms, the first 3 of which return 2 matches each.
[0] = 2 results,
[1] = 2,
[2] = 2,
[3] = 0,
...
[99] = 0
So the average results per query = (2 results x 3 queries) / 100 queries = .06, which I log.
The problem I can't figure out is how to accurately average in subsequent requests. For example a request 2 contains a single query with returns a single result, so 1/1 = 1 result per query.
My average results per query now looks like this: (.06 + 1) / 2 = .53, which I do not think is accurate. How can I weight the first request to obtain an accurate datapoint which can then be used in an equation to track average results per query over a time?

You could weigh them by number of query terms? Then it would be (100 * .06 + 1 * 1) / (100 + 1) = .069.
Or simply (6 + 1) / (100 + 1) = .069. But it depends upon what you consider the 'correct' average of course.
r = nResults1 + nResults2 + ...
= nQueryTerms1 * avgResultsPerQuery1 + nQueryTerms2 * avgResPerQuery2 + ...
q = nQueryTerms1 + nQueryTerms2 + ...
avg = r / q

Related

Run Time Estimate/Big O Notation for Nested For Loop

I'm having trouble understanding the meaning of a function f(x) representing the number of operations performed in some code.
int sum = 0; // + 1
for (int i = 0; i < n; i++)
for (int j = 1; j <= i; j++)
sum = sum + 1; // n * (n + 1) / 2
(Please note that there is no 2 in the numerator on that last comment, but there is in the function below.)
Then my notes say that f(x) = 2n(n + 1) / 2 + 1 = O(n^2)
I understand that because there are two for loops, that whatever f(x) is, it will be = O(n^2), but why is the time estimate what it is? How does the j<= i give you n*(n+1)? What about the 2 in the denominator?

Think about, across the entire execution of this code, how many times the inner loop will run. Notice that
when i = 0, it runs zero times;
when i = 1, it runs one time;
when i = 2, it runs two times;
when i = 3, it runs three times;
...; and
when i = n - 1, it runs n - 1 times.
This means that the total number of times the innermost loop runs is given by
0 + 1 + 2 + 3 + 4 + ... + (n - 1)
This is a famous summation and it solves to
0 + 1 + 2 + 3 + 4 + ... + (n - 1) = n(n - 1) / 2
This is the n - 1st triangular number and it's worth committing this to memory.
The number given - n(n + 1) / 2 - seems to be incorrect, but it's pretty close to the true number. I think they assumed the loop would run 1 + 2 + 3 + ... + n times rather than 0 + 1 + 2 + ... + n - 1 times.
From this it's a bit easier to see where the O(n2) term comes from. Notice that n(n - 1) / 2 = n2 / 2 - n / 2, so in big-O land where we drop constants and low-order terms we're left with n2.

Find row of pyramid based on index?

Given a pyramid like:
0
1 2
3 4 5
6 7 8 9
...
and given the index of the pyramid i where i represents the ith number of the pyramid, is there a way to find the index of the row to which the ith element belongs? (e.g. if i = 6,7,8,9, it is in the 3rd row, starting from row 0)

There's a connection between the row numbers and the triangular numbers. The nth triangular number, denoted Tn, is given by Tn = n(n-1)/2. The first couple triangular numbers are 0, 1, 3, 6, 10, 15, etc., and if you'll notice, the starts of each row are given by the nth triangular number (the fact that they come from this triangle is where this name comes from.)
So really, the goal here is to determine the largest n such that Tn ≤ i. Without doing any clever math, you could solve this in time O(√n) by just computing T0, T1, T2, etc. until you find something bigger than i. Even better, you could binary search for it in time O(log n) by computing T1, T2, T4, T8, etc. until you overshoot, then binary searching on the range you found.
Alternatively, we could try to solve for this directly. Suppose we want to find the choice of n such that
n(n + 1) / 2 = i
Expanding, we get
n2 / 2 + n / 2 = i.
Equivalently,
n2 / 2 + n / 2 - i = 0,
or, more easily:
n2 + n - 2i = 0.
Now we use the quadratic formula:
n = (-1 &pm; √(1 + 8i)) / 2
The negative root we can ignore, so the value of n we want is
n = (-1 + √(1 + 8i)) / 2.
This number won't necessarily be an integer, so to find the row you want, we just round down:
row = ⌊(-1 + √(1 + 8i)) / 2⌋.
In code:
int row = int((-1 + sqrt(1 + 8 * i)) / 2);
Let's confirm that this works by testing it out a bit. Where does 9 go? Well, we have
(-1 + √(1 + 72)) / 2 = (-1 + √73) / 2 = 3.77
Rounding down, we see it goes in row 3 - which is correct!
Trying another one, where does 55 go? Well,
(-1 + √(1 + 440)) / 2 = (√441 - 1) / 2 = 10
So it should go in row 10. The tenth triangular number is T10 = 55, so in fact, 55 starts off that row. Looks like it works!

I get row = math.floor (√(2i + 0.25) - 0.5) where i is your number
Essentially the same as the guy above but I reduced n2 + n to (n + 0.5)2 - 0.25

I think ith element belongs nth row where n is number of n(n+1)/2 <= i < (n+1)(n+2)/2
For example, if i = 6, then n = 3 because n(n+1)/2 <= 6
and if i = 8, then n = 3 because n(n+1)/2 <= 8

why powerset gives 2^N time complexity?

The following is a recursive function for generating powerset
void powerset(int[] items, int s, Stack<Integer> res) {
System.out.println(res);
for(int i = s; i < items.length; i++) {
res.push(items[i]);
powerset(items, s+1, res);
res.pop();
}
}
I don't really understand why this would take O(2^N). Where's that 2 coming from ?
Why T(N) = T(N-1) + T(N-2) + T(N-3) + .... + T(1) + T(0) solves to O(2^n). Can someone explains why ?

We are doubling the number of operations we do every time we decide to add another element to the original array.
For example, let us say we only have the empty set {}. What happens to the power set if we want to add {a}? We would then have 2 sets: {}, {a}. What if we wanted to add {b}? We would then have 4 sets: {}, {a}, {b}, {ab}.
Notice 2^n also implies a doubling nature.
2^1 = 2, 2^2 = 4, 2^3 = 8, ...

Below is more generic explanation.
Note that generating power set is basically generating combinations.
(nCr is number of combinations can be made by taking r items from total n items)
formula: nCr = n!/((n-r)! * r!)
Example:Power Set for {1,2,3} is {{}, {1}, {2}, {3}, {1,2}, {2,3}, {1,3} {1,2,3}} = 8 = 2^3
1) 3C0 = #combinations possible taking 0 items from 3 = 3! / ((3-0)! * 0!) = 1
2) 3C1 = #combinations possible taking 1 items from 3 = 3! / ((3-1)! * 1!) = 3
3) 3C2 = #combinations possible taking 2 items from 3 = 3! / ((3-2)! * 2!) = 3
4) 3C3 = #combinations possible taking 3 items from 3 = 3! / ((3-3)! * 3!) = 1
if you add above 4 it comes out 1 + 3 + 3 + 1 = 8 = 2^3. So basically it turns out to be 2^n possible sets in a power set of n items.
So in an algorithm if you are generating a power set with all these combinations, then its going to take time proportional to 2^n. And so the time complexity is 2^n.

Something like this
T(1)=T(0);
T(2)=T(1)+T(0)=2T(0);
T(3)=T(2)+T(1)+T(0)=2T(2);
Thus we have
T(N)=2T(N-1)=4T(N-2)=... = 2^(N-1)T(1), which is O(2^N)

Shriganesh Shintre's answer is pretty good, but you can simplify it even more:
Assume we have a set S:
{a1, a2, ..., aN}
Now we can write a subset s where each item in the set can have the value 1 (included) or 0 (excluded). Now we can see that the number of possible sets s is a product of:
2 * 2 * ... * 2 or 2^N

I could explain this in several mathematical ways to you the first one :
Consider one element like a each subset have 2 option about a either they have it or not so we must have $ 2^n $ subset and since you need to call function for create every subset you need to call this function $ 2^n $.
another solution:
This solution is with this recursion and it produce you equation , let me define T(0) = 2 for first a set with one element we have T(1) = 2, you just call the function and it ends here. now suppose that for every sets with k < n elements we have this formula
T(k) = T(k-1) + T(k-2) + ... + T(1) + T(0) (I name it * formula)
I want to prove that for k = n this equation is true.
consider every subsets that have the first element ( like what you do at began of your algorithm and you push first element) now we have n-1 elements so it take T(n-1) to find every subsets that have the first element. so far we have :
T(n) = T(n-1) + T(subsets that dont have the first element) (I name it ** formula)
at the end of your for loop you remove the first element now we have every subsets that dont have the first element like I said in (**) and again you have n-1 elements so we have :
T(subsets that dont have the first element) = T(n-1) (I name it * formula)
from formula (*) and (*) we have :
T(subsets that dont have the first element) = T(n-1) = T(n-2) + ... + T(1) + T(0) (I name it **** formula)
And now we have what you want from the first, from formula() and (**) we have :
T(n) = T(n-1) + T(n-2) + ... + T(1) + T(0)
And also we have T(n) = T(n-1) + T(n-1) = 2 * T(n-1) so T(n) = $2^n $

Separate a number into fixed part

What is the algorithm to find the number way to separate number M to p part and no two part equal.
Example:
M = 5, P = 2
they are (1,4) (2,3). If P = 3 then no partition availabe, i.e
not (1,2,2) because there two 2 in partition.

In the expanded product
(1+x)(1+x2)(1+x3)...(1+xn)
find the coefficient of x^n. This gives the number of any possibility to represent n as sum of different numbers, i.e., a variable number of terms.
You want the number of possibilites to have
n = i1+i2+...+iP with i1 < i2 < ... < iP
which can be realized by setting
i1=j1, i2=i1+j2=j1+j2, ...
iP=iP-1+jP=j1+j2+...+jP with all jk > 0
so that the original task is the same as counting all the ways that one can solve
n = P * j1+(P-1) * j2+...+1 * jP with all jk > 0, but unrelated among each other.
The corresponding generator function is the product of the geometric series of the powers of x, omitting the constant term,
(x+x2+x3+...) * (x2+x4+x6+...) * (x3+x6+x9+...) * ... * (xP+x2*P+x3*P+...)
= xP*(P+1)/2 * (1+x+x2+...) * (1+x2+x4+...) * (1+x3+x6+...) * ... * (1+xP+x2*P+...)
Clearly, one needs n >= P*(P+1)/2 to get any solution at all. For P=3 that bound is n >= 6, so that n=5 has indeed no solutions in that case.
Algorithm
count = new double[N]
for k=0..N-1 do count[k] = 1
for j=2..P do
for k=j..N-1 do
count[k] += count[k-j]
Then count[k] contains the number of combinations for n=P*(P+1)/2+k.

How to find the range for a given number, interval and start value?

Provided the below values
start value = 1
End Value = 20
Interval = 5
I have been provided a number 6. I have to find the range of numbers in which the number 6 falls say now the answer is 6-10.
If the given number is greater than the end value then return the same number.
Is there any formula so that i can generate the range for the number?
UPDATE
I tried the below solution, But it is not working if the range interval is changed,
$end_value = $start_value + $range_interval;
// we blindly return the last term if value is greater than max value
if ($input_num > $end_value) {
return '>' . $end_value;
}
// we also find if its a first value
if ($input_num <= $end_value && $value >= $start_value) {
return $start_value . '-' . $end_value;
}
// logic to find the range for a given integer
$dived_value = $input_num/$end_value;
// round the value to get the exact match
$rounded_value = ceil($dived_value);
$upper_bound_range = $rounded_value*$end_value;
$lower_bound_range = $upper_bound_range - $end_value;
return $lower_bound_range . '-'. $upper_bound_range;

In (c-style) pseudocode:
// Integer division assumed
rangeNumber = (yourNumber - startValue) / rangeLength;
lower_bound_range = startValue + rangeNumber*rangeLength;
upper_bound_range = lower_bound_range + rangeLength-1;
For your input:
rangeNumber = (6-1)/5 = 1
lower_bound_range = 1 + 5*1 = 6
upper_bound_range = 10
and so range is [6, 10]

The answer depends on whether you talk about integers or floats. Since all your example numbers are integers, I assume you talk about those. I further assume that all your intervals contain the same number of integers, in your example 5, namely 1...5, 6...10, 11...15, and 16...20. Note that 0 is not contained in the 1st interval (otherwise the 1st interval had 6 numbers).
In this case the answer is easy.
Let be:
s the start value that is not contained in the 1st interval,
i the interval size, i.e. the number of integers that it contains,
p the provided number to which an interval should be assigned,
b the 1st integer in this interval, and
e the last integer in this interval.
Then:
b = s + (p-s-1)\i * i + 1 (here, "\" means integer division, i.e. without remainder)
e = b + i - 1
In your example:
s = 0, i = 5, p = 6, thus
b = 0 + (6-0-1)\5 * 5 + 1 = 6
e = 6 + 5 - 1 = 10

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Tracking average number of results per query over time - math

Related

Run Time Estimate/Big O Notation for Nested For Loop

Find row of pyramid based on index?

why powerset gives 2^N time complexity?

Separate a number into fixed part

How to find the range for a given number, interval and start value?

Categories

Resources