Improving performance using rand() - julia

I'm running a Monte Carlo simulation of the Ising model. Overall the code is pretty efficient. The function below is called around 10 million times:
function stepFlip(sim::Ising)
i = rand(1:sim.n)
j = rand(1:sim.n)
dE = dEnergy(i, j, sim)
if dE < 0 || rand() < exp(-dE/sim.T)
sim.spins[i,j] *= -1
end
end
Is there a way for me to optimize the random number generation, which is what takes most of the execution time? I know it could be faster for to pre-generate all of them and just read them as I call stepFlip, but at that point is it worth it to allocate all that memory if my goal is performance?

Related

How do I optimize the speed of my Sage reducibility algorithm?

Suppose I have the polynomial f(x) = x^n + x + a. I set a value for n, and want 0 <= a <= A, where A is some other value I set. This means I will have a total of A different polynomials, since a can be any value between 0 and A.
Using Sage, I'm finding the number of these A polynomials that are reducible. For example, suppose I set n=5 and A=10^7. That would tell me how many of these 10^7 polynomials of degree 5 are reducible. I've done this using a loop, which works for low values of A. But for the large values I need (ie. A=10^7), it's taking an extremely long & impractical amount of time. The code is below. Could someone please help me meaningfully optimize this?
x = polygen(QQ)
n = 5
A = 10^7
count = 0
for i in range(A):
p_pol = x^n + x + i
if not p_pol.is_irreducible():
count = count + 1
print(i)
print('Count:' + str(count))
One small, but in this case pretty meaningless optimization is to replace range(A) with xrange(A). The former will create an array of all integers from 0 to A - 1 which is a waste of time and space. xrange(A) will just produce integers one by one and discard them when you're done. Sage 9.0 will be base on Python 3 by default where range is equivalent to xrange.
Let's do a little experiment though. Another small optimization will be to pre-define the part of your polynomial that's constant in each loop:
x = polygen(QQ)
n = 5
A = 10^7
base = x^n + x
Now just as a general test, let's see how long it takes in a few cases to add an integer to the polynomial and then compute its irreducibility:
sage: (base + 1).is_irreducible()
False
sage: %timeit (base + 1).is_irreducible()
1000 loops, best of 3: 744 µs per loop
sage: (base + 3).is_irreducible()
True
sage: %timeit (base + 3).is_irreducible()
1000 loops, best of 3: 387 µs per loop
So it seems in cases where it is irreducible (which will be the majority) it's a little faster, so let's say on average it will take 387µs per. Then:
sage: 0.000387 * 10^7 / 60
64.5000000000000
So this will still take a little over an hour, on average (on my machine).
One thing you can do to speed things up is parallelize it, if you have many CPU cores. For example:
x = polygen(QQ)
A = 10^7
def is_irreducible(i, base=(x^5 + x)):
return (base + i).is_irreducible()
from multiprocessing import Pool
pool = Pool()
A - sum(pool.map(is_irreducible, xrange(A)))
That will in principle give you the same result. Though the speed up you'll get will only be on the order of the number of CPUs you have at best (typically a little less). Sage also comes with some parallelization helpers but I tend to find them a bit lacking for the case of speeding up small calculations over a large range of values (they can be used for this but it requires some care, such as manually batching your inputs; I'm not crazy about it...)
Beyond that, you may need to use some mathematical intuition to try to reduce the problem space.

Monte-Carlo Simulation for the sum of die

I am very new to programming so I apologise in advance for my lack of knowledge.
I want to find the probability of obtaining the sum k when throwing m die. I am not looking for a direct answer, I just want to ask if I am on the right track and what I can improve.
I begin with a function that calculates the sum of an array of m die:
function dicesum(m)
j = rand((1:6), m)
sum(j)
end
Now I am trying specific values to see if I can find a pattern (but without much luck). I have tried m = 2 (two die). What I am trying to do is to write a function which checks whether the sum of the two die is k and if it is, it calculates the probability. My attempt is very naive but I am hoping someone can point me in the right direction:
m = 2
x, y = rand(1:6), rand(1:6)
z = x+y
if z == dicesum(m)
Probability = ??/6^m
I want to somehow find the number of 'elements' in dicesum(2) in order to calculate the probability. For example, consider the case when dicesum(2) = 8. With two die, the possible outcomes are (2,6),(6,2), (5,3), (3,5), (4,4), (4,4). The probability being (2/36)*3.
I understand that the general case is far more complicated but I just want an idea of how to being this problem. Thanks in advance for any help.
If I understand correctly, you want to use simulation to approximate the probability of obtaining a sum of k when roll m dice. What I recommend is creating a function that will take k and m as arguments and repeat the simulation a large number of times. The following might help you get started:
function Simulate(m,k,Nsim=10^4)
#Initialize the counter
cnt=0
#Repeat the experiment Nsim times
for sim in 1:Nsim
#Simulate roll of m dice
s = sum(rand(1:6,m))
#Increment counter if sum matches k
if s == k
cnt += 1
end
end
#Return the estimated probability
return cnt/Nsim
end
prob = Simulate(3,4)
The estimate is approximately .0131.
You can also perform your simulation in a vectorized style as shown below. Its less efficient in terms of memory allocation because it creates a vector s of length Nsim, whereas the loop code uses a single integer to count, cnt. Sometimes unnecessary memory allocation can cause performance issues. In this case, it turns out that the vectorized code is about twice as fast. Usually, loops are a bit faster. Someone more familiar with the internals of Julia might be able to offer an explanation.
function Simulate1(m,k,Nsim=10^4)
#Simulate roll of m dice Nsim times
s = sum(rand(1:6,Nsim,m),2)
#Relative frequency of matches
prob = mean(s .== k)
return prob
end

Why the elapsed time increases while the number of core increases?

I am doing the multi-core computing in R. I am
Here are the code and outputs for each of the computation. Why the elapsed time increases as the number of cores increases? This is really counter-intuitive. I think it is reasonable that the elapsed time decreases as the number of cores increases. Is there any way to fix this?
Here is the code:
library(parallel)
detectCores()
system.time(pvec(1:1e7, sqrt, mc.cores = 1))
system.time(pvec(1:1e7, sqrt, mc.cores = 4))
system.time(pvec(1:1e7, sqrt, mc.cores = 8))
Thank you.
Suppose that your data is divided into N parts. Each part of your data is calculated in T seconds. In a single core architecture you expect all operations will be done in N x T seconds. You also hope that all of the works should be done in T times in an N cores machine. However, in parallel computing, there is a communication lag, which is consumed by each single core (Initializing, passing data from main to child, calculations, passing result and finalizing). Now let the communication lag is C seconds and for simplicity, it is constant for all cores. So, in an N cores machine, calculations should be done in
T + N x C
seconds in which the T part is for calculations and N X C part is for total communications. If we compare it to single core machine, the inequality
(N x T) > (T + N x C)
should be satisfied to gain a computation time, at least, for our assumptions. If we simplify the inequality we can get
C < (N x T - T) / N
so, if the constant communication time is not less than the ratio (N x T - T) / N we have no gain to make this computation parallel.
In your example, the time needed for creation, calculation and communication is bigger than the single core computation for function sqrt.

How to determine how long a recursive fibonacci algorithm will take for larger values if I have the time for the smaller values?

I have used the time library and timed how long the recursive algorithm takes to calculate the fib numbers up to 50. Give those number, is there a formula I can use to determine how long it would have potentially taken to calculate fib(100)?
Times for smaller values:
Fib(40): 0.316 sec
Fib(80): 2.3 years
Fib(100): ???
This depends very much on the algorithm in use. The direct computation takes constant time. The recursive computation without memoization is exponential, with a base of phi. Add memoization to this, and it drops to logarithmic time.
The only one that could fit your data is the exponential time. Doing the basic math ...
(2.3 years / 0.316 sec) ** (1.0/40)
gives us
base = 1.6181589...
Gee, look at that! Less than one part in 10^4 more than phi!
Let t(n) be the time to compute Fib(n).
We can support the hypothesis that
t(n) = phi * t(n-1)
Therefore,
t(100) = phi^(100-80) * t(80)
I trust you can finish from here.

Proving worst case running time of QuickSort

I am trying to perform asymptotic analysis on the following recursive function for an efficient way to power a number. I am having trouble determining the recurrence equation due to having different equations for when the power is odd and when the power is even. I am unsure how to handle this situation. I understand that the running time is theta(logn) so any advice on how to proceed to this result would be appreciated.
Recursive-Power(x, n):
if n == 1
return x
if n is even
y = Recursive-Power(x, n/2)
return y*y
else
y = Recursive-Power(x, (n-1)/2)
return y*y*x
In any case, the following condition holds:
T(n) = T(floor(n/2)) + Θ(1)
where floor(n) is the biggest integer not greater than n.
Since floor doesn't have influence on results, the equation is informally written as:
T(n) = T(n/2) + Θ(1)
You have guessed the asymptotic bound correctly. The result could be proved using Substitution method or Master theorem. It is left as an exercise for you.

Resources