Partial sums of harmonic series - math

Formula:
I was told by my math teacher that it is impossible to calculate from the formula above n that is neccesary for sum to exceed 40 ( sum > 40), and know the sum in 50 decimals precision.
(in short: First n that is neccesary for sum > 40, and what would that sum be in 50 decimals precision)
I tryed writing c++ program for this, but realized after tno of optimizations that it would take just way too long.

H_n is bounded below by ln n + gamma where gamma is the Euler-Mascheroni constant (http://en.wikipedia.org/wiki/Euler%E2%80%93Mascheroni_constant). So you can start by finding n such that \ln n + gamma = 40. Solving, you get ln n = 40 - gamma, n = e^(40-gamma), which is quite straightforward to calculate. Once you know the ballpark, you can use a binary search and more accurate over and under estimates for H_n (see the asymptotic expansion at http://en.wikipedia.org/wiki/Harmonic_number#Calculation; there are many references that can provide more detail).

Why would that be impossible? It's
40.00000000000000000202186036912232961108532260403356
Steps to get there:
Ask Wolfram Alpha for the number n where the sum equals 40.
You'll get something around 1.32159290357566702732792368 10^17. Pick the next higher integer.
Compute the sum for n = 132159290357566703.
Click on "More digits" until satisfied.

Related

Generate N random integers that are sampled from a uniform distribution and sum to M in R [duplicate]

In some code I want to choose n random numbers in [0,1) which sum to 1.
I do so by choosing the numbers independently in [0,1) and normalizing them by dividing each one by the total sum:
numbers = [random() for i in range(n)]
numbers = [n/sum(numbers) for n in numbers]
My "problem" is, that the distribution I get out is quite skew. Choosing a million numbers not a single one gets over 1/2. By some effort I've calculated the pdf, and it's not nice.
Here is the weird looking pdf I get for 5 variables:
Do you have an idea for a nice algorithm to choose the numbers, that result in a more uniform or simple distribution?
You are looking to partition the distance from 0 to 1.
Choose n - 1 numbers from 0 to 1, sort them and determine the distances between each of them.
This will partition the space 0 to 1, which should yield the occasional large result which you aren't getting.
Even so, for large values of n, you can generally expect your max value to decrease as well, just not as quickly as your method.
You might be interested in the Dirichlet distribution which is used for generate quantities that sum to 1 if you're looking for probabilities. There's also a section on how to generate them using gamma distributions here.
Another way to get n random numbers which sum up to 1:
import random
def create_norm_arr(n, remaining=1.0):
random_numbers = []
for _ in range(n - 1):
r = random.random() # get a random number in [0, 1)
r = r * remaining
remaining -= r
random_numbers.append(r)
random_numbers.append(remaining)
return random_numbers
random_numbers = create_norm_arr(5)
print(random_numbers)
print(sum(random_numbers))
This makes higher numbers more likely.

People - Apple Puzzle [Inspired by client-puzzle protocol]

I am learning a client-puzzle protocol and i have a question about finding the possibility of a solution. Instead of going into the dry protocol facts, here is a scenario:
Lets say i have x people and I have y apples:
Each person must have at least 1 apple
Each person can have at most z apples.
Is there a formula to calculate the number of scenarios?
Example:
4 people [x], 6 apples [y], 15 MAX apples [z]
No. of scenarios calculated by hand: 10.
If my number is very huge, I hope to calculate it using a formula.
Thank you for any help.
Your problem is equivalent to "finds the number of ways you can get x by adding together z numbers, each of which lies between min and max." Sample Python implementation:
def possible_sums(x, z, min, max):
if min*z > x or max*z < x:
return 0
if z == 1:
if x >= min and x <= max:
return 1
else:
return 0
total = 0
#iterate from min, up to and including max
for i in range(min, max+1):
total += possible_sums(x-i, z-1, min, max)
return total
print possible_sums(6, 4, 1, 15)
Result:
10
This function can become quite expensive when called with large numbers, but runtime can be improved with memoization. How this can be accomplished depends on the language, but the conventional Python approach is to store previously calculated values in a dictionary.
def memoize(fn):
results = {}
def f(*args):
if args not in results:
results[args] = fn(*args)
return results[args]
return f
#memoize
def possible_sums(x, z, min, max):
#rest of code goes here
Now print possible_sums(60, 40, 1, 150), which would have taken a very long time to calculate, returns 2794563003870330 in an instant.
There are ways to do this mathematically. It is similar to asking how many ways there are to roll a total of 10 on 3 6-sided dice (x=3, y=10, z=6). You can implement this in a few different ways.
One approach is to use inclusion-exclusion. The number of ways to write y as a sum of x positive numbers with no maximum is y-1 choose x-1 by the stars-and-bars argument. You can calculate the number of ways to write y as a sum of x positive numbers so that a particular set of s of them are at least z+1: 0 if y-x-sz is negative, and y-1-s z choose x-1 if it is nonnegative. Then you can use inclusion-exclusion to write the count as the sum over nonnegative values of s so that y-x-sz is nonnegative of (-1)^s (x choose s)(y-1-sz choose x-1).
You can use generating functions. You can let powers of some variable, say t, hold the total, and the coefficients say how many combinations there are with that total. Then you are asking for the coefficient of t^y in (t+t^2+...+t^z)^x. You can compute this in a few ways.
One approach is with dynamic programming, computing coefficients of (t+t^2+...+t^z)^k for k up to x. The naive approach is probably fast enough: You can compute this for k=1, 2, 3, ..., x. It is a bit faster to use something like repeated squaring, e.g., to compute the 87th power, you could expand 87 in binary as 64+16+4+2+1=0b1010111 (written as a binary literal). You could compute the 1st, 2nd, 4th, 16th, and 64th powers by squaring and multiply these, or you could compute the 0b1, 0b10, 0b101, 0b1010, 0b10101, 0b101011, and 0b1010111 powers by squaring and multiplying to save a little space.
Another approach is to use the binomial theorem twice.
(t+t^2+...+t^z)^x = t^x ((t^z-1)/(t-1))^x
= t^x (t^z-1)^x (t-1)^-x.
The binomial theorem with exponent x lets us rewrite (t^z-1)^x as a sum of (-1)^s t^(z(x-s))(x choose s) where s ranges from 0 to x. It also lets us rewrite (t-1)^-x as an infinite sum of (r+x-1 choose x-1)t^r over nonnegative r. Then we can pick out the finite set of terms which contribute to the coefficient of t^y (r = y-x-sz), and we get the same sum as by inclusion-exclusion above.
For example, suppose we have x=1000, y=1100, z=30. The value is
=1.29 x 10^144.

How to implement fuzzy minimum function via fuzzy maximum

I know that I can represent fuzzy max via power function(i need it in neural network) i.e.
def max(p:Double)(a:Double,b:Double) =
pow(pow(a,p) + pow(b,p) , 1/p)
// assumption a >=0 and b >=0
It is become maximum when p -> infinity and sum when p = 1
Not sure how correctly implement fuzzy minimum.
If you are willing to replace "sum" with "harmonic sum" for the p=1 case, you can use
1/(pow(pow(a,-p) + pow(b,-p),1/p))
This converges to min(a,b) as p goes to infinity.
For p=1 it's 1/(1/a + 1/b), which is related to the harmonic mean but without the factor of 2. Just like in your original formula, a+b is related to the arithmetic mean but without the factor of 2.
However, note that both of these formulas (yours and mine) converge much more slowly to the limit as p goes to infinity, for cases where a and b are closer together.

How to calculate n log n = c

I have a homework problem for my algorithms class asking me to calculate the maximum size of a problem that can be solved in a given number of operations using an O(n log n) algorithm (ie: n log n = c). I was able to get an answer by approximating, but is there a clean way to get an exact answer?
There is no closed-form formula for this equation. Basically, you can transform the equation:
n log n = c
log(n^n) = c
n^n = exp(c)
Then, this equation has a solution of the form:
n = exp(W(c))
where W is Lambert W function (see especially "Example 2"). It was proved that W cannot be expressed using elementary operations.
However, f(n)=n*log(n) is a monotonic function. You can simply use bisection (here in python):
import math
def nlogn(c):
lower = 0.0
upper = 10e10
while True:
middle = (lower+upper)/2
if lower == middle or middle == upper:
return middle
if middle*math.log(middle, 2) > c:
upper = middle
else:
lower = middle
the O notation only gives you the biggest term in the equation. Ie the performance of your O(n log n ) algorithm could actually be better represented by c = (n log n) + n + 53.
This means that without knowing the exact nature of the performance of your algorithm you wouldn't be able to calculate the exact number of operations required to process an given amount of data.
But it is possible to calculate that the maximum number of operations required to process a data set of size n is more than a certain number, or conversely that the biggest problem set that can be solved, using that algorithm and that number of operations, is smaller than a certain number.
The O notation is useful for comparing 2 algorithms, ie an O(n^2) algorithm is faster than a O(n^3) algorithm etc.
see Wikipedia for more info.
some help with logs

Generating sorted random ints without the sort? O(n)

Just been looking at a code golf question about generating a sorted list of 100 random integers. What popped into my head, however, was the idea that you could generate instead a list of positive deltas, and just keep adding them to a running total, thus:
deltas: 1 3 2 7 2
ints: 1 4 6 13 15
In fact, you would use floats, then normalise to fit some upper limit, and round, but the effect is the same.
Although it wouldn't make for shorter code, it would certainly be faster without the sort step. But the thing I have no real handle on is this: Would the resulting distribution of integers be the same as generating 100 random integers from a uniformly distributed probability density function?
Edit: A sample script:
import random,sys
running = 0
max = 1000
deltas = [random.random() for i in range(0,11)]
floats = []
for d in deltas:
running += d
floats.append(running)
upper = floats.pop()
ints = [int(round(f/upper*max)) for f in floats]
print(ints)
Whose output (fair dice roll) was:
[24, 71, 133, 261, 308, 347, 499, 543, 722, 852]
UPDATE: Alok's answer and Dan Dyer's comment point out that using an exponential distribution for the deltas would give a uniform distribution of integers.
So you are asking if the numbers generated in this way are going to be uniformly distributed.
You are generating a series:
yj = ∑i=0j ( xi / A )
where A is the sum of all xi. xi is the list of (positive) deltas.
This can be done iff xi are exponentially distributed (with any fixed mean). So, if xi are uniformly distributed, the resulting yj will not be uniformly distributed.
Having said that, it's fairly easy to generate exponential xi values.
One example would be:
sum := 0
for I = 1 to N do:
X[I] = sum = sum - ln(RAND)
sum = sum - ln(RAND)
for I = 1 to N do:
X[I] = X[I]/sum
and you will have your random numbers sorted in the range [0, 1).
Reference: Generating Sorted Lists of Random Numbers. The paper has other (faster) algorithms as well.
Of course, this generates floating-point numbers. For uniform distribution of integers, you can replace sum above by sum/RANGE in the last step (i.e., the R.H.S becomes X[I]*RANGE/sum, and then round the numbers to the nearest integer).
A uniform distribution has an upper and a lower bound. If you use your proposed method, and your deltas happen to be chosen large enough that you run into the upper bound before you have generated all your numbers, what would your algorithm do next?
Having said that, you may want to investigate the Poisson distribution, which is the distribution of interval times between random events occurring with a given average frequency.
If you take the number range of being 1 to 1000, and you have to use 100 of these numbers, the delta will have to be as a minimum 10, otherwise you can not reach the 1000 mark. How about some working to demonstrate it in action...
The chance of any given number in an evenly distributed random selection is 100/1000 e.g. 1/10 - no shock there, take that as the basis.
Assuming you start using a delta and that delta is just 10.
The odds of getting the number 1 is 1/10 - seems fine.
The odds of getting the number 2 is 1/10 + (1/10 * 1/10) (because you could hit 2 deltas of 1 in a row, or just hit a 2 as the first delta.)
The odds of getting the number 3 is 1/10 + (1/10 * 1/10 * 1/10) + (1/10 * 1/10) + (1/10 * 1/10)
The first case was a delta of 3, the second was hitting 3 deltas of 1 in a row, the third case would be a delta of 1 followed by a 2, and the fourth case was a delta of 2 followed by a 1.
For the sake of my fingers typing, we won't generate the combinations that hit 5.
Immediately the first few numbers have a greater percentage chance than the straight random.
This could be altered by changing the delta value so the fractions are all different, but I do not believe you could find a delta that produced identical odds.
To give an analogy that might just sink it, if you consider your delta as just 6 and you run that twice it is the equivalent of throwing 2 dice - each of the deltas is independant, but you know that 7 has a higher chance of being selected than 2.
I think it will be extremely similar but the extremes will be different because of the normalization. For example, 100 numbers chosen at random between 1 and 100 could all be 1. However, 100 numbers created using your system could all have deltas of 0.01 but when you normalize them you'll scale them up to be in the range 1 -> 100 which will mean you'll never get that strange possibility of a set of very low numbers.
Alok's answer and Dan Dyer's comment point out that using an exponential distribution for the deltas would give a uniform distribution of integers.
So the new version of the code sample in the question would be:
import random,sys
running = 0
max = 1000
deltas = [random.expovariate(1.0) for i in range(0,11)]
floats = []
for d in deltas:
running += d
floats.append(running)
upper = floats.pop()
ints = [int(round(f/upper*max)) for f in floats]
print(ints)
Note the use of random.expovariate(1.0), a Python exponential distribution random number generator (very useful!). Here it's called with a mean of 1.0, but since the script normalises against the last number in the sequence, the mean itself doesn't matter.
Output (fair dice roll):
[11, 43, 148, 212, 249, 458, 539, 725, 779, 871]
Q: Would the resulting distribution of integers be the same as generating 100 random integers from a uniformly distributed probability density function?
A: Each delta will be uniformly distributed. The central limit theorem tells us that the distribution of a sum of a large number of such deviates (since they have a finite mean and variance) will tend to the normal distribution. Hence the later deviates in your sequence will not be uniformly distributed.
So the short answer is "no". Afraid I cannot give a simple solution without doing algebra I don't have time to do today!
The reference (1979) in Alok's answer is interesting. It gives an algorithm for generating the uniform order statistics not by addition but by successive multiplication:
max = 1.
for i = N downto 1 do
out[i] = max = max * RAND^(1/i)
where RAND is uniform on [0,1). This way you don't have to normalize at the end, and in fact don't even have to store the numbers in an array; you could use this as an iterator.
The Exponential distribution: theory, methods and applications
By N. Balakrishnan, Asit P. Basu gives another derivation of this algorithm on page 22 and credits Malmquist (1950).
You can do it in two passes;
in the first pass, generate deltas between 0 and (MAX_RAND/n)
in the second pass, normalise the random numbers to be within bounds
Still O(n), with good locality of reference.

Resources