Average of 3 numbers using integers - math

I would like to compute the average of three numbers, like:
d = int(round((a + b + c) / 3.0))
Where a, b, c, d are integers.
Is it possible to get the same result using just integers?
I'm interested in this because of performance reasons, I assume doing the math using integers should be faster than using floats.
The example above converts the integers to floats, calculates the result, rounds it and converts back to integer. Is it possible to avoid the int <-> float conversions?

Given the requirements for 1, 1, 2 -> 1; 1, 2, 2 -> 2 then this can be done using integer division.
Using // for the integer division and n for number of elements.
average = ( a+ b + c + .... + n//2 ) // n
ie sum up all the values and then add a number to deal with rounding.
As noted in #Henrik's answer this assumes that all numbers are positive.

(a + b + c + 1) / 3
Explanation: if (a + b + c) % 3 == 1, it's rounded down; if (a + b + c) % 3 == 2, it's rounded up.
At least this should work for a + b + c >= 0. You may need to treat negative values separately.

A variant of Mahmoud's answer using only one division:
d = (((a+b+c) * 10) + 15) / 30

It should be as simple as that:
d = (((a+b+c) * 10) / 3 + 5) / 10;

Related

How to get the mid of 2 int32 number elegantly without overflow?

In some conditions like Segment Tree or Binary Search, I have to get the average value of 2 number.
An easy solution is mid = (a + b) / 2. But when it comes to a and b have the same plus-minus signs, The a + b thing will overflow. For example, 1234567 + 2147483647, -1234567 + (-2147483647) in int32 type.
I searched online and got to know mid = (b - a) / 2 + a, it can avoid the situation above, but still not perfect. When it comes to a and b have the different plus-minus signs, mid = (a + b) / 2 won't overflow but (b - a) / 2 + a will. For example, -2147483648 + 2147483647 in int32 type.
To thoroughly solve this problem, I've written the code in pic below. I divide the 2 situations by the plus-minus signs. (I use some bit operations to improve its efficiency.) But it is way too complex for such a simple problem, right?
Is there some elegant solution to this problem?
I tried to divide the problem to 2 situations and solve them respectively.
But I still want a more elegant solution.
Got the answer!
mid = (a & b) + ((a ^ b) >> 1)
a & b keeps the same bits that a and b have in common, they need not to be distributed averagely. Like when you find the average value of 102 and 110, you don't need to calculate the 100 they have in common. You can just keep that, and deal with the 2 and 10 part, distribute them averagely to 2 number. As (102 + 110) / 2 = (2 * 100 + 2 + 10) / 2 = 100 + (2 + 10) / 2 = 100 + 6 = 106.
(a ^ b) >> 1 deals with the "2 and 10 part", it gets all the bits that a and b don't have in common, and divide it by 2.
Adds up 2 parts above so we get the average value of a and b. Not a strict proof, though.

Concatenation of binary representation of first n positive integers in O(logn) time complexity

I came across this question in a coding competition. Given a number n, concatenate the binary representation of first n positive integers and return the decimal value of the resultant number formed. Since the answer can be large return answer modulo 10^9+7.
N can be as large as 10^9.
Eg:- n=4. Number formed=11011100(1=1,10=2,11=3,100=4). Decimal value of 11011100=220.
I found a stack overflow answer to this question but the problem is that it only contains a O(n) solution.
Link:- concatenate binary of first N integers and return decimal value
Since n can be up to 10^9 we need to come up with solution that is better than O(n).
Here's some Python code that provides a fast solution; it uses the same ideas as in Abhinav Mathur's post. It requires Python >= 3.8, but it doesn't use anything particularly fancy from Python, and could easily be translated into another language. You'd need to write algorithms for modular exponentiation and modular inverse if they're not already available in the target language.
First, for testing purposes, let's define the slow and obvious version:
# Modulus that results are reduced by,
M = 10 ** 9 + 7
def slow_binary_concat(n):
"""
Concatenate binary representations of 1 through n (inclusive).
Reinterpret the resulting binary string as an integer.
"""
concatenation = "".join(format(k, "b") for k in range(n + 1))
return int(concatenation, 2) % M
Checking that we get the expected result:
>>> slow_binary_concat(4)
220
>>> slow_binary_concat(10)
462911642
Now we'll write a faster version. First, we split the range [1, n) into subintervals such that within each subinterval, all numbers have the same length in binary. For example, the range [1, 10) would be split into four subintervals: [1, 2), [2, 4), [4, 8) and [8, 10). Here's a function to do that splitting:
def split_by_bit_length(n):
"""
Split the numbers in [1, n) by bit-length.
Produces triples (a, b, 2**k). Each triple represents a subinterval
[a, b) of [1, n), with a < b, all of whose elements has bit-length k.
"""
a = 1
while n > a:
b = 2 * a
yield (a, min(n, b), b)
a = b
Example output:
>>> list(split_by_bit_length(10))
[(1, 2, 2), (2, 4, 4), (4, 8, 8), (8, 10, 16)]
Now for each subinterval, the value of the concatenation of all numbers in that subinterval is represented by a fairly simple mathematical sum, which can be computed in exact form. Here's a function to compute that sum modulo M:
def subinterval_concat(a, b, l):
"""
Concatenation of values in [a, b), all of which have the same bit-length k.
l is 2**k.
Equivalently, sum(i * l**(b - 1 - i)) for i in range(a, b)) modulo M.
"""
n = b - a
inv = pow(l - 1, -1, M)
q = (pow(l, n, M) - 1) * inv
return (a * q + (q - n) * inv) % M
I won't go into the evaluation of the sum here: it's a bit off-topic for this site, and it's hard to express without a good way to render formulas. If you want the details, that's a topic for https://math.stackexchange.com, or a page of fairly simple algebra.
Finally, we want to put all the intervals together. Here's a function to do that.
def fast_binary_concat(n):
"""
Fast version of slow_binary_concat.
"""
acc = 0
for a, b, l in split_by_bit_length(n + 1):
acc = (acc * pow(l, b - a, M) + subinterval_concat(a, b, l)) % M
return acc
A comparison with the slow version shows that we get the same results:
>>> fast_binary_concat(4)
220
>>> fast_binary_concat(10)
462911642
But the fast version can easily be evaluated for much larger inputs, where using the slow version would be infeasible:
>>> fast_binary_concat(10**9)
827129560
>>> fast_binary_concat(10**18)
945204784
You just have to note a simple pattern. Taking up your example for n=4, let's gradually build the solution starting from n=1.
1 -> 1 #1
2 -> 2^2(1) + 2 #6
3 -> 2^2[2^2(1)+2] + 3 #27
4 -> 2^3{2^2[2^2(1)+2]+3} + 4 #220
If you expand the coefficients of each term for n=4, you'll get the coefficients as:
1 -> (2^3)*(2^2)*(2^2)
2 -> (2^3)*(2^2)
3 -> (2^3)
4 -> (2^0)
Let the N be total number of bits in the string representation of our required number, and D(x) be the number of bits in x. The coefficients can then be written as
1 -> 2^(N-D(1))
2 -> 2^(N-D(1)-D(2))
3 -> 2^(N-D(1)-D(2)-D(3))
... and so on
Since the value of D(x) will be the same for all x between range (2^t, 2^(t+1)-1) for some given t, you can break the problem into such ranges and solve for each range using mathematics (not iteration). Since the number of such ranges will be log2(Given N), this should work in the given time limit.
As an example, the various ranges become:
1. 1 (D(x) = 1)
2. 2-3 (D(x) = 2)
3. 4-7 (D(x) = 3)
4. 8-15 (D(x) = 4)

Find row of pyramid based on index?

Given a pyramid like:
0
1 2
3 4 5
6 7 8 9
...
and given the index of the pyramid i where i represents the ith number of the pyramid, is there a way to find the index of the row to which the ith element belongs? (e.g. if i = 6,7,8,9, it is in the 3rd row, starting from row 0)
There's a connection between the row numbers and the triangular numbers. The nth triangular number, denoted Tn, is given by Tn = n(n-1)/2. The first couple triangular numbers are 0, 1, 3, 6, 10, 15, etc., and if you'll notice, the starts of each row are given by the nth triangular number (the fact that they come from this triangle is where this name comes from.)
So really, the goal here is to determine the largest n such that Tn ≤ i. Without doing any clever math, you could solve this in time O(√n) by just computing T0, T1, T2, etc. until you find something bigger than i. Even better, you could binary search for it in time O(log n) by computing T1, T2, T4, T8, etc. until you overshoot, then binary searching on the range you found.
Alternatively, we could try to solve for this directly. Suppose we want to find the choice of n such that
n(n + 1) / 2 = i
Expanding, we get
n2 / 2 + n / 2 = i.
Equivalently,
n2 / 2 + n / 2 - i = 0,
or, more easily:
n2 + n - 2i = 0.
Now we use the quadratic formula:
n = (-1 &pm; √(1 + 8i)) / 2
The negative root we can ignore, so the value of n we want is
n = (-1 + √(1 + 8i)) / 2.
This number won't necessarily be an integer, so to find the row you want, we just round down:
row = ⌊(-1 + √(1 + 8i)) / 2⌋.
In code:
int row = int((-1 + sqrt(1 + 8 * i)) / 2);
Let's confirm that this works by testing it out a bit. Where does 9 go? Well, we have
(-1 + √(1 + 72)) / 2 = (-1 + √73) / 2 = 3.77
Rounding down, we see it goes in row 3 - which is correct!
Trying another one, where does 55 go? Well,
(-1 + √(1 + 440)) / 2 = (√441 - 1) / 2 = 10
So it should go in row 10. The tenth triangular number is T10 = 55, so in fact, 55 starts off that row. Looks like it works!
I get row = math.floor (√(2i + 0.25) - 0.5) where i is your number
Essentially the same as the guy above but I reduced n2 + n to (n + 0.5)2 - 0.25
I think ith element belongs nth row where n is number of n(n+1)/2 <= i < (n+1)(n+2)/2
For example, if i = 6, then n = 3 because n(n+1)/2 <= 6
and if i = 8, then n = 3 because n(n+1)/2 <= 8

Calculating the level of insertion based on the size of the tree

If I have a graph structure that looks like the following
a level-1
b c level-2
c d e level-3
e f g h level-4
...... level-n
a points to b and c
b points to c and d
c points to d and e
and so on
how can i calculate the n from the size(number of existing nodes) of the graph/tree?
The number of nodes present if the height is h is given by
1 + 2 + 3 + ... + h = h(h + 1) / 2
This means that one simple option would be to take the total number of nodes n and do a simple binary search to find the right value of h that such that h(h + 1) / 2 = n.
Alternatively, since n = h(h + 1) / 2, you can note that
n = h(h + 1) / 2
2n = h2 + h
0 = h2 + h - 2n
Now you have a quadratic equation (in h) that you can solve to directly get back the value of h. The solution is
h = (-1 ± √(1 + 8n)) / 2
If you take the minus branch, you'll get back a negative number, so you should take the positive branch and compute
(-1 + √(1 + 8n)) / 2
to directly get back h.
Hope this helps!

When will this Recurrence Relation repeat

I have this recurrence formula:
P(n) = ( P(n-1) + 2^(n/2) ) % (X)
s.t. P(1) = 2;
where n/2 is computer integer division i.e. floor of x/2
Since i am taking mod X, this relation should repeat at least with in X outputs.
but it can start repeating before that.
How to find this value?
It needn't repeat within x terms, consider x = 3:
P(1) = 2
P(2) = (P(1) + 2^(2/2)) % 3 = 4 % 3 = 1
P(3) = (P(2) + 2^(3/2)) % 3 = (1 + 2) % 3 = 0
P(4) = (P(3) + 2^(4/2)) % 3 = 4 % 3 = 1
P(5) = (P(4) + 2^(5/2)) % 3 = (1 + 4) % 3 = 2
P(6) = (P(5) + 2^(6/2)) % 3 = (2 + 8) % 3 = 1
P(7) = (P(6) + 2^(7/2)) % 3 = (1 + 8) % 3 = 0
P(8) = (P(7) + 2^(8/2)) % 3 = 16 % 3 = 1
P(9) = (P(8) + 2^(9/2)) % 3 = (1 + 16) % 3 = 2
P(10) = (P(9) + 2^(10/2)) % 3 = (2 + 32) % 3 = 1
P(11) = (P(10) + 2^(11/2)) % 3 = (1 + 32) % 3 = 0
P(12) = (P(11) + 2^(12/2)) % 3 = (0 + 64) % 3 = 1
and you see that the period is 4.
Generally (suppose X is odd, it's a bit more involved for even X), let k be the period of 2 modulo X, i.e. k > 0, 2^k % X = 1, and k is minimal with these properties (see below).
Consider all arithmetic modulo X. Then
n
P(n) = 2 + ∑ 2^(j/2)
j=2
It is easier to see when we separately consider odd and even n:
m m
P(2*m+1) = 2 + 2 * ∑ 2^i = 2 * ∑ 2^i = 2*(2^(m+1) - 1) = 2^((n+2)/2) + 2^((n+1)/2) - 2
i=1 i=0
since each 2^j appears twice, for j = 2*i and j = 2*i+1. For even n = 2*m, there's one summand 2^m missing, so
P(2*m) = 2^(m+1) + 2^m - 2 = 2^((n+2)/2) + 2^((n+1)/2) - 2
and we see that the length of the period is 2*k, since the changing parts 2^((n+1)/2) and 2^((n+2)/2) have that period. The period immediately begins, there is no pre-period part (there can be a pre-period for even X).
Now k <= φ(X) by Euler's generalisation of Fermat's theorem, so the period is at most 2 * φ(X).
(φ is Euler's totient function, i.e. φ(n) is the number of integers 1 <= k <= n with gcd(n,k) = 1.)
What makes it possible that the period is longer than X is that P(n+1) is not completely determined by P(n), the value of n also plays a role in determining P(n+1), in this case the dependence is simple, each power of 2 being used twice in succession doubles the period of the pure powers of 2.
Consider the sequence a[k] = (2^k) % X for odd X > 1. It has the simple recurrence
a[0] = 1
a[k+1] = (2 * a[k]) % X
so each value completely determines the next, thus the entire following part of the sequence. (Since X is assumed odd, it also determines the previous value [if k > 0] and thus the entire previous part of the sequence. With H = (X+1)/2, we have a[k-1] = (H * a[k]) % X.)
Hence if the sequence assumes one value twice (and since there are only X possible values, that must happen within the first X+1 values), at indices i and j = i+p > i, say, the sequence repeats and we have a[k+p] = a[k] for all k >= i. For odd X, we can go back in the sequence, therefore a[k+p] = a[k] also holds for 0 <= k < i. Thus the first value that occurs twice in the sequence is a[0] = 1.
Let p be the smallest positive integer with a[p] = 1. Then p is the length of the smallest period of the sequence a, and a[k] = 1 if and only if k is a multiple of p, thus the set of periods of a is the set of multiples of p. Euler's theorem says that a[φ(X)] = 1, from that we can conclude that p is a divisor of φ(X), in particular p <= φ(X) < X.
Now back to the original sequence.
P(n) = 2 + a[1] + a[1] + a[2] + a[2] + ... + a[n/2]
= a[0] + a[0] + a[1] + a[1] + a[2] + a[2] + ... + a[n/2]
Since each a[k] is used twice in succession, it is natural to examine the subsequences for even and odd indices separately,
E[m] = P(2*m)
O[m] = P(2*m+1)
then the transition from one value to the next is more regular. For the even indices we find
E[m+1] = E[m] + a[m] + a[m+1] = E[m] + 3*a[m]
and for the odd indices
O[m+1] = O[m] + a[m+1] + a[m+1] = O[m] + 2*a[m+1]
Now if we ignore the modulus for the moment, both E and O are geometric sums, so there's an easy closed formula for the terms. They have been given above (in slightly different form),
E[m] = 3 * 2^m - 2 = 3 * a[m] - 2
O[m] = 2 * 2^(m+1) - 2 = 2 * a[m+1] - 2 = a[m+2] - 2
So we see that O has the same (minimal) period as a, namely p, and E also has that period. Unless maybe if X is divisible by 3, that is also the minimal (positive) period of E (if X is divisible by 3, the minimal positive period of E could be a proper divisor of p, for X = 3 e.g., E is constant).
Thus we see that 2*p is a period of the sequence P obtained by interlacing E and O.
It remains to be seen that 2*p is the minimal positive period of P. Let m be the minimal positive period. Then m is a divisor of 2*p.
Suppose m were odd, m = 2*j+1. Then
P(1) = P(m+1) = P(2*m+1)
P(2) = P(m+2) = P(2*m+2)
and consequently
P(2) - P(1) = P(m+2) - P(m+1) = P(2*m+2) - P(2*m+1)
But P(2) - P(1) = a[1] and
P(m+2) - P(m+1) = a[(m+2)/2] = a[j+1]
P(2*m+2) - P(2*m+1) = a[(2*m+2)/2] = a[m+1] = a[2*j+2]
So we must have a[1] = a[j+1], hence j is a period of a, and a[j+1] = a[2*j+2], hence j+1 is a period of a too. But that means that 1 is a period of a, which implies X = 1, a contradiction.
Therefore m is even, m = 2*j. But then j is a period of O (and of E), thus a multiple of p. On the other hand, m <= 2*p implies j <= p, and the only (positive) multiple of p satisfying that inequality is p itself, hence j = p, m = 2*p.

Resources