writing into shared arrays within a distributed for loop in JULIA - julia

I have an ODE that I need to solver over a wide range of parameters.
Previously I have used MATLAB's parfor to divide the parameter ranges between multiple threads. I am new to Julia and need to do the same thing in Julia now. Here is the code that I am using
using DifferentialEquations, SharedArrays, Distributed, Plots
function SingleBubble(du,u,p,t)
du[1]=#. u[2]
du[2]=#. ((-0.5*u[2]^2)*(3-u[2]/(p[4]))+(1+(1-3*p[7])*u[2]/p[4])*((p[6]-p[5])/p[2]+2*p[1]/(p[2]*p[8]))*(p[8]/u[1])^(3*p[7])-2*p[1]/(p[2]*u[1])-4*p[3]*u[2]/(p[2]*u[1])-(1+u[2]/p[4])*(p[6]-p[5]+p[10]*sin(2*pi*p[9]*t))/p[2]-p[10]*u[1]*cos(2*pi*p[9]*t)*2*pi*p[9]/(p[2]*p[4]))/((1-u[2]/p[4])*u[1]+4*p[3]/(p[2]*p[4]))
end
R0=2e-6
f=2e6
u0=[R0,0]
LN=1000
RS = SharedArray(zeros(LN))
P = SharedArray(zeros(LN))
bif = SharedArray(zeros(LN,6))
#distributed for i= 1:LN
ps=1e3+i*1e3
tspan=(0,60/f)
p=[0.0725,998,1e-3,1481,0,1.01e5,7/5,R0,f,ps]
prob = ODEProblem(SingleBubble,u0,tspan,p)
sol=solve(prob,Tsit5(),alg_hints=:stiff,saveat=0.01/f,reltol=1e-8,abstol=1e-8)
RS[i]=maximum((sol[1,5000:6000])/R0)
P[i]=ps
for j=1:6
nn=5500+(j-1)*100;
bif[i,j]=(sol[1,nn]/R0);
end
end
plotly()
scatter(P/1e3,bif,shape=:circle,ms=0.5,label="")#,ma=0.6,mc=:black,mz=1,label="")
When using one worker, the for loop is basically executed as a normal single threaded loop and it works fine. However, when I am using addprocs(n) to add n more workers, nothing gets written into the SharedArrays RS, P and bif. I appreciate any guidance anyone may provide.

These changes are required to make your program work with multiple workers and display the results you need:
Whatever packages and functions are used under #distributed loop must be made available in all the processes using #everywhere as explained here. So, in your case it would be DifferentialEquations and SharedArrays packages as well as the SingleBubble() function.
You need to draw the plot only after all the workers have finished their tasks. For this, you would need to use #sync along with #distributed.
With these changes, your code would look like:
using Distributed, Plots
#everywhere using DifferentialEquations, SharedArrays
#everywhere function SingleBubble(du,u,p,t)
du[1]=#. u[2]
du[2]=#. ((-0.5*u[2]^2)*(3-u[2]/(p[4]))+(1+(1-3*p[7])*u[2]/p[4])*((p[6]-p[5])/p[2]+2*p[1]/(p[2]*p[8]))*(p[8]/u[1])^(3*p[7])-2*p[1]/(p[2]*u[1])-4*p[3]*u[2]/(p[2]*u[1])-(1+u[2]/p[4])*(p[6]-p[5]+p[10]*sin(2*pi*p[9]*t))/p[2]-p[10]*u[1]*cos(2*pi*p[9]*t)*2*pi*p[9]/(p[2]*p[4]))/((1-u[2]/p[4])*u[1]+4*p[3]/(p[2]*p[4]))
end
R0=2e-6
f=2e6
u0=[R0,0]
LN=1000
RS = SharedArray(zeros(LN))
P = SharedArray(zeros(LN))
bif = SharedArray(zeros(LN,6))
#sync #distributed for i= 1:LN
ps=1e3+i*1e3
tspan=(0,60/f)
p=[0.0725,998,1e-3,1481,0,1.01e5,7/5,R0,f,ps]
prob = ODEProblem(SingleBubble,u0,tspan,p)
sol=solve(prob,Tsit5(),alg_hints=:stiff,saveat=0.01/f,reltol=1e-8,abstol=1e-8)
RS[i]=maximum((sol[1,5000:6000])/R0)
P[i]=ps
for j=1:6
nn=5500+(j-1)*100;
bif[i,j]=(sol[1,nn]/R0);
end
end
plotly()
scatter(P/1e3,bif,shape=:circle,ms=0.5,label="")#,ma=0.6,mc=:black,mz=1,label="")
Output using multiple workers:

Related

Parallel processing using DataFrames in Julia

I am referring to the example in the documentation for processing Parallel loops and trying to adapt it for my use case. In each independent iteration in my case, I get a DataFrame as a result which I need to finally combine across all iterations using vcat(). This is a simplified version of my attempt so far:
using DataFrames, Distributed
function test()
if length(workers()) < length(Sys.cpu_info())
addprocs(length(Sys.cpu_info()); exeflags="--project=" * Base.active_project())
end
nheads = #distributed (vcat) for i = 1:20
DataFrame(a=[Int(rand(Bool))])
end
end
But on running test(), I get the error:
ERROR: On worker 2: UndefVarError: DataFrame not defined
What do I need to do to correct this?
Your using DataFrames ... statement on the first line only applies to the main "thread". So your worker threads didn't import the required libraries.
To fix this, you should add the keyword #everywhere on the first line. That would ask all the processes to import those libraries.
Edit
Just noticed that you did addprocs in the function. Then my suggestion wouldn't work. Here's a working version:
using Distributed
addprocs(length(Sys.cpu_info()))
#everywhere using DataFrames
function test()
nheads = #distributed (vcat) for i = 1:20
DataFrame(a=[Int(rand(Bool))])
end
end
test()

Julia #distributed: subsequent code run before all workers finish

I have been headbutting on a wall for a few days around this code:
using Distributed
using SharedArrays
# Dimension size
M=10;
N=100;
z_ijw = zeros(Float64,M,N,M)
z_ijw_tmp = SharedArray{Float64}(M*M*N)
i2s = CartesianIndices(z_ijw)
#distributed for iall=1:(M*M*N)
# get index
i=i2s[iall][1]
j=i2s[iall][2]
w=i2s[iall][3]
# Assign function value
z_ijw_tmp[iall]=sqrt(i+j+w) # Any random function would do
end
# Print the last element of the array
println(z_ijw_tmp[end])
println(z_ijw_tmp[end])
println(z_ijw_tmp[end])
The first printed out number is always 0, the second number is either 0 or 10.95... (sqrt of 120, which is correct). The 3rd is either 0 or 10.95 (if the 2nd is 0)
So it appears that the print code (#mainthread?) is allowed to run before all the workers finish. Is there anyway for the print code to run properly the first time (without a wait command)
Without multiple println, I thought it was a problem with scope and spend a few days reading about it #.#
#distributed with a reducer function, i.e. #distributed (+), will be synced, whereas #distributed without a reducer function will be started asynchronously.
Putting a #sync in front of your #distributed should make the code behave the way you want it to.
This is also noted in the documentation here:
Note that without a reducer function, #distributed executes asynchronously, i.e. it spawns independent tasks on all available workers and returns immediately without waiting for completion. To wait for completion, prefix the call with #sync

Scilab pointer function

I am working on converting code from MATLAB to scilab included here.
The # symbol is used as a memory pointer in MATLAB pointing to the location of the function tst_callback.
Scilab does not like this however. Is there a scilab equivalent for the #?
function test
sysIDgui(#tst_callback)
end
function tst_callback()
disp("Hello Ron")
endfunction
What you are trying to do is to pass a function as argument to another function. In Scilab, you don't need any special syntax.
Try it yourself. Define these two functions:
function y = applyFunction(f,x)
y = f(x);
endfunction
function y = double(x)
y = x * 2;
endfunction
Then test it on the console:
--> applyFunction(double,7)
ans =
14.
Note: the main usage of # in MATLAB, is to create anonymous functions (see documentation), ones that are not defined in a separate file. As for Scilab, there is no way to create anonymous functions.

Why does array += (without #.) produce so much memory allocation?

I don't understand why the += operation for arrays produces so much memory allocation, but it gets fixed when applying #.
function loop()
a = randn(10)
total = similar(a)
for i=1:1000
total += a
end
end
function loopdot()
a = randn(10)
total = similar(a)
for i=1:1000
#. total += a
end
end
loop()
loopdot()
Profile.clear_malloc_data()
loop()
loopdot()
produces
160000 total += a
and
0 #. total += a
total += a is the same as total = a + total which is a vectorized operation like:
out = similar(a)
for i in eachindex(a)
out[i] = total[i] + a[i]
end
total = out
since internally it's
total = +(total,a)
This is just like MATLAB, Python, or R and thus has a temporary array that is allocated for the vectorized operation, and then the = sets the reference of total to this new array. This is why vectorized operations are slow vs traditional low level loops and is one of the main reasons why using something like NumPy can be faster than Python but cannot fully reach C (because of these temporaries!).
#. total += a is the same as total .= total .+ a. This blog post explains that in Julia there is semantic dot fusion via anonymous functions, and thus corresponds to doing the following:
# Build an anonymous function for the fused operation
f! = (a,b,c) -> (a[i] = b[i] + c[i])
# Now loop it, since it's `.=` don't make a temporary
for i in eachindex(a)
f!(total,total,a)
end
which updates total in-place without creating a temporary array.
Fusion in Julia happens semantically: this conversion of dotted operations into an anonymous function plus a broadcast! call (which is essentially the loop I wrote there) is done at parsing time, and the anonymous function is compiled so that way this is efficient. This is very useful for other reasons as well. By overloading broadcast! on a generic f!, this is how things like GPUArrays.jl automatically build efficient single kernels which do in-place updates on the GPU. This is as opposed to MATLAB, Python, and R where different vectorized functions are considered as different function calls and thus must compute a return, hence the temporary array.

Measuring CPU time using Scilab

I'm new to working with Scilab, and I'm trying to run the code below, but when I try it keeps showing me this error::
test3(1000) //Line that I type to run the code
!--error 4 //First error
Undefined variable: cputime
at line 2 of function test3 called by:
I ran it using MATLAB, and it worked, but I can't figure out how to make it run using Scilab.
For sample code when typed using the Scilab editor, see below.
function test3(n)
t = cputime;
for (j = 1:n)
x(j) = sin(j);
end
disp(cputime - t);
Typing help cputime in the Scilab console will reveal that this is not a Scilab function. The near-equivalent Scilab function is timer(), but its behavior is a bit different:
cputime in Matlab measures time since Matlab started
timer() measures time since the last call to timer()
Here is your function rewritten in Scilab:
function test3(n)
timer()
for j = 1:n
x(j) = sin(j)
end
disp(timer())
endfunction
Note that Scilab functions must end with endfunction, and that semicolons are optional: line-by-line output is suppressed in Scilab by default.
For completeness, I'll mention tic() and toc(), which work just like Matlab's tic and toc, measuring real-world time of computation.

Resources