Fix seed via srand() produces different results from rand() - julia

In my Julia 0.5 script I use srand(1234) to get the same results from rand() each time I re-run the script. However, I get different results. What do I wrong?

As #Dan Getz mentioned in the comments, this is likely to because you have some code that calls random functions without you knowing about it.
If you call the same rand() function with the same seed set, you get the same results as expected:
julia> for i in 1:3
srand(1)
println(rand())
end
0.23603334566204692
0.23603334566204692
0.23603334566204692
However, if you have another call in your script to rand that may or may not be called, then your random number generator will be at different stages when you get to the investigated rand() call. Here's an example to illustrate this:
julia> for i in 1:3
srand(1)
if i == 2
rand()
end
println(rand())
end
0.23603334566204692
0.34651701419196046
0.23603334566204692
Notice how in the second iteration of the loop there's an extra rand() call that offsets the random number generator and results in a different value.

In addition to the answer given by #niczky12 I would recommend that you define your own generator and use that for better reproducibility, that way you always keep control of "your" generator, and calls to other functions (perhaps not in your control) that uses the global one will not affect the random numbers you obtain.
For example, creating a MersenneTwister with seed 1234:
rng = MersenneTwister(1234)
Then you simply pass this generator to your rand calls:
julia> rng = MersenneTwister(1234);
julia> rand(rng)
0.5908446386657102
julia> rand(rng, 2, 3)
2×3 Array{Float64,2}:
0.766797 0.460085 0.854147
0.566237 0.794026 0.200586

Related

Failure to report number that is too small

I did the following calculations in Julia
z = LinRange(-0.09025000000000001,0.19025000000000003,5)
d = Normal.(0.05*(1-0.95) .+ 0.95.*z .- 0.0051^2/2, 0.0051 .* (similar(z) .*0 .+1))
minimum(cdf.(d, (z[3]+z[2])/2))
The problem I have is that the last code sometimes gives me the correct result 4.418051841202834e-239, sometimes reports the error DomainError with NaN: Normal: the condition σ >= zero(σ) is not satisfied. I think this is because 4.418051841202834e-239 is too small. But I was wondering why my code can give me different results.
In addition to points mentioned by others, here are a few more:
Firstly, don't use LinRange when numerical accuracy is of importance. This is what the range function is for. LinRange can be used when numerical precision is of lesser importance, since it is faster. From the docstring of range:
Special care is taken to ensure intermediate values are computed rationally. To avoid this induced overhead, see the LinRange constructor.
Example:
julia> LinRange(-0.09025000000000001,0.19025000000000003,5) .- range(-0.09025000000000001,0.19025000000000003,5)
0.0:-3.469446951953614e-18:-1.3877787807814457e-17
Secondly, this is a pretty terrible way to create a vector of a certain value:
0.0051 .* (similar(z) .*0 .+1)
Other's have mentioned ones, etc. but I think it's better to use fill
fill(0.0051, size(z))
which directly fills the array with the right value. Perhaps one should use convert(eltype(z), 0.0051) inside fill.
Thirdly, don't create this vector at all! You use broadcasting, so just use the scalar value:
d = Normal.(0.05*(1-0.95) .+ 0.95.*z .- 0.0051^2/2, 0.0051) # look! just a scalar!
This is how broadcasting works, it expands singleton dimensions implicitly to match other arguments (without actually wasting that memory).
Much of the point of broadcasting is that you don't need to create that sort of 'dummy arrays' anymore. If you find yourself doing that, give it another think; constant-valued arrays are inherently wasteful, and you shouldn't need to create them.
There are two problems:
Noted by #Dan Getz: similar does no initialize the values and quite often unused areas of memory have values corresponding to NaN. In that case multiplication by 0 does not help since NaN * 0 == NaN. Instead you want to have ones(eltype(z),size(z))
you need to use higher precision than Float64. BigFloat is one way to go - just you need to remember to call setprecision(BigFloat, 128) so you actually control how many bits you use. However, much more time-efficient solution (if you run computations at scale) will be to use a dedicated package such as DoubleFloats.
Sample corrected code using DoubleFloats below:
julia> z = LinRange(df64"-0.09025000000000001",df64"0.19025000000000003",5)
5-element LinRange{Double64, Int64}:
-0.09025000000000001,-0.020125,0.05000000000000001,0.12012500000000002,0.19025000000000003
julia> d = Normal.(0.05*(1-0.95) .+ 0.95.*z .- 0.0051^2/2, 0.0051 .* ones(eltype(z),size(z)))
5-element Vector{Normal{Double64}}:
Normal{Double64}(μ=-0.083250505, σ=0.0051)
Normal{Double64}(μ=-0.016631754999999998, σ=0.0051)
Normal{Double64}(μ=0.049986995000000006, σ=0.0051)
Normal{Double64}(μ=0.11660574500000001, σ=0.0051)
Normal{Double64}(μ=0.18322449500000001, σ=0.0051)
julia> minimum(cdf.(d, (z[3]+z[2])/2))
4.418051841203009e-239
The problem in the code is similar(z) which produces a vector with undefined entries and is used without initialization. Use ones(length(z)) instead.

Correct way to generate Poisson-distributed random numbers in Julia GPU code?

For a stochastic solver that will run on a GPU, I'm currently trying to draw Poisson-distributed random numbers. I will need one number for each entry of a large array. The array lives in device memory and will also be deterministically updated afterwards. The problem I'm facing is that the mean of the distribution depends on the old value of the entry. Therefore, I would have to do naively do something like:
CUDA.rand_poisson!(lambda=array*constant)
or:
array = CUDA.rand_poisson(lambda=array*constant)
Both of which don't work, which does not really surprise me, but maybe I just need to get a better understanding of broadcasting?
Then I tried writing a kernel which looks like this:
function cu_draw_rho!(rho::CuDeviceVector{FloatType}, λ::FloatType)
idx = (blockIdx().x - 1i32) * blockDim().x + threadIdx().x
stride = gridDim().x * blockDim().x
#inbounds for i=idx:stride:length(rho)
l = rho[i]*λ
# 1. variant
rho[i] > 0.f0 && (rho[i] = FloatType(CUDA.rand_poisson(UInt32,1;lambda=l)))
# 2. variant
rho[i] > 0.f0 && (rho[i] = FloatType(rand(Poisson(lambda=l))))
end
return
end
And many slight variations of the above. I get tons of errors about dynamic function calls, which I connect to the fact that I'm calling functions that are meant for arrays from my kernels. the 2. variant of using rand() works only without the Poisson argument (which uses the Distributions package, I guess?)
What is the correct way to do this?
You may want CURAND.jl, which provides curand_poisson.
using CURAND
n = 10
lambda = .5
curand_poisson(n, lambda)

Scope of Random Seed setting

I want to make a function that will always return the same numbers if I input a parameter asking for a deterministic response and will give a requested number of pseudorandom numbers otherwise. Unfortunately the only way I can figure out how to do it resets the global random seed which is not desirable.
Is there a way I can set the random number seed for one draw of pseudorandom numbers without affecting the global seed or the existing progression along that seed's pseudorandom number sequence?
Example Case
using Random
function get_random(n::Int, deterministic::Bool)
if deterministic
Random.seed!(1234)
return rand(n)
else
return rand(n)
end
end
Random.seed!(4321)
# This and the next get_random(5,false) should give the same response
# if the Random.seed!(1234) were confined to the function scope.
get_random(5,false)
Random.seed!(4321)
get_random(5,true)
get_random(5,false)
The simplest solution is to use newly allocated RNG like this:
using Random
function get_random(n::Int, deterministic::Bool)
if deterministic
m = MersenneTwister(1234)
return rand(m, n)
else
return rand(n)
end
end
In general I usually tend not to use global RNG in simulations at all as it gives me a better control of the process.

Evaluation in definition

I am sorry about the title, but I couldn't find a better one.
Let's define
function test(n)
print("test executed")
return n
end
f(n) = test(n)
Every time we call f we get
f(5)
test executed
5
Is there a way to tell julia to evaluate test once in the definition of f?
I expect that this is probably not going to be possible, in which case I have a slightly different question. If ar=[1,2,:x,-2,2*:x] is there any way to define f(x) to be the sum of ar, i.e. f(x) = 3*x+1?
If you want to compile based on type information, you can use #generated functions. But it seems like you want to compile based on the runtime values of the input. In this case, you might want to do memoization. There is a library Memoize that provides a macro for memoizing functions.

Parallelization over user defined types with destructive function in Julia

I have the following code:
mutable struct T
a::Vector{Int}
end
N = 100
len = 10
Ts = [T(fill(i, len)) for i in 1:N]
for i = 1:N
destructive_function!(Ts[i])
end
destructive_function! changes a in each element of Ts.
Is there any way to parallelize the for loop? SharedArray does not seem available for user-defined types.
In this case it would be simpler to use threads rather than processes to parallelize the loop
Threads.#threads for i = 1:N
destructive_function!(Ts[i])
end
Just make sure that you set JULIA_NUM_THREADS environment variable before starting Julia. You can check number of threads running with Threads.nthreads() function.
If you want to use processes rather than threads then probably it is simplest to make destructive_function! return the modified value and then run pmap(destructive_function!, Ts) to collect the modified values in a new array (but this would not modify the Ts in place).

Resources