Parallel processing using DataFrames in Julia

Parallel processing using DataFrames in Julia - julia

I am referring to the example in the documentation for processing Parallel loops and trying to adapt it for my use case. In each independent iteration in my case, I get a DataFrame as a result which I need to finally combine across all iterations using vcat(). This is a simplified version of my attempt so far:
using DataFrames, Distributed
function test()
if length(workers()) < length(Sys.cpu_info())
addprocs(length(Sys.cpu_info()); exeflags="--project=" * Base.active_project())
end
nheads = #distributed (vcat) for i = 1:20
DataFrame(a=[Int(rand(Bool))])
end
end
But on running test(), I get the error:
ERROR: On worker 2: UndefVarError: DataFrame not defined
What do I need to do to correct this?

Your using DataFrames ... statement on the first line only applies to the main "thread". So your worker threads didn't import the required libraries.
To fix this, you should add the keyword #everywhere on the first line. That would ask all the processes to import those libraries.
Edit
Just noticed that you did addprocs in the function. Then my suggestion wouldn't work. Here's a working version:
using Distributed
addprocs(length(Sys.cpu_info()))
#everywhere using DataFrames
function test()
nheads = #distributed (vcat) for i = 1:20
DataFrame(a=[Int(rand(Bool))])
end
end
test()

Related

writing into shared arrays within a distributed for loop in JULIA

I have an ODE that I need to solver over a wide range of parameters.
Previously I have used MATLAB's parfor to divide the parameter ranges between multiple threads. I am new to Julia and need to do the same thing in Julia now. Here is the code that I am using
using DifferentialEquations, SharedArrays, Distributed, Plots
function SingleBubble(du,u,p,t)
du[1]=#. u[2]
du[2]=#. ((-0.5*u[2]^2)*(3-u[2]/(p[4]))+(1+(1-3*p[7])*u[2]/p[4])*((p[6]-p[5])/p[2]+2*p[1]/(p[2]*p[8]))*(p[8]/u[1])^(3*p[7])-2*p[1]/(p[2]*u[1])-4*p[3]*u[2]/(p[2]*u[1])-(1+u[2]/p[4])*(p[6]-p[5]+p[10]*sin(2*pi*p[9]*t))/p[2]-p[10]*u[1]*cos(2*pi*p[9]*t)*2*pi*p[9]/(p[2]*p[4]))/((1-u[2]/p[4])*u[1]+4*p[3]/(p[2]*p[4]))
end
R0=2e-6
f=2e6
u0=[R0,0]
LN=1000
RS = SharedArray(zeros(LN))
P = SharedArray(zeros(LN))
bif = SharedArray(zeros(LN,6))
#distributed for i= 1:LN
ps=1e3+i*1e3
tspan=(0,60/f)
p=[0.0725,998,1e-3,1481,0,1.01e5,7/5,R0,f,ps]
prob = ODEProblem(SingleBubble,u0,tspan,p)
sol=solve(prob,Tsit5(),alg_hints=:stiff,saveat=0.01/f,reltol=1e-8,abstol=1e-8)
RS[i]=maximum((sol[1,5000:6000])/R0)
P[i]=ps
for j=1:6
nn=5500+(j-1)*100;
bif[i,j]=(sol[1,nn]/R0);
end
end
plotly()
scatter(P/1e3,bif,shape=:circle,ms=0.5,label="")#,ma=0.6,mc=:black,mz=1,label="")
When using one worker, the for loop is basically executed as a normal single threaded loop and it works fine. However, when I am using addprocs(n) to add n more workers, nothing gets written into the SharedArrays RS, P and bif. I appreciate any guidance anyone may provide.

These changes are required to make your program work with multiple workers and display the results you need:
Whatever packages and functions are used under #distributed loop must be made available in all the processes using #everywhere as explained here. So, in your case it would be DifferentialEquations and SharedArrays packages as well as the SingleBubble() function.
You need to draw the plot only after all the workers have finished their tasks. For this, you would need to use #sync along with #distributed.
With these changes, your code would look like:
using Distributed, Plots
#everywhere using DifferentialEquations, SharedArrays
#everywhere function SingleBubble(du,u,p,t)
du[1]=#. u[2]
du[2]=#. ((-0.5*u[2]^2)*(3-u[2]/(p[4]))+(1+(1-3*p[7])*u[2]/p[4])*((p[6]-p[5])/p[2]+2*p[1]/(p[2]*p[8]))*(p[8]/u[1])^(3*p[7])-2*p[1]/(p[2]*u[1])-4*p[3]*u[2]/(p[2]*u[1])-(1+u[2]/p[4])*(p[6]-p[5]+p[10]*sin(2*pi*p[9]*t))/p[2]-p[10]*u[1]*cos(2*pi*p[9]*t)*2*pi*p[9]/(p[2]*p[4]))/((1-u[2]/p[4])*u[1]+4*p[3]/(p[2]*p[4]))
end
R0=2e-6
f=2e6
u0=[R0,0]
LN=1000
RS = SharedArray(zeros(LN))
P = SharedArray(zeros(LN))
bif = SharedArray(zeros(LN,6))
#sync #distributed for i= 1:LN
ps=1e3+i*1e3
tspan=(0,60/f)
p=[0.0725,998,1e-3,1481,0,1.01e5,7/5,R0,f,ps]
prob = ODEProblem(SingleBubble,u0,tspan,p)
sol=solve(prob,Tsit5(),alg_hints=:stiff,saveat=0.01/f,reltol=1e-8,abstol=1e-8)
RS[i]=maximum((sol[1,5000:6000])/R0)
P[i]=ps
for j=1:6
nn=5500+(j-1)*100;
bif[i,j]=(sol[1,nn]/R0);
end
end
plotly()
scatter(P/1e3,bif,shape=:circle,ms=0.5,label="")#,ma=0.6,mc=:black,mz=1,label="")
Output using multiple workers:

Why is a Julia function name (without arguments) silently ignored?

I have a script that defines a function, and later intended to call the function but forgot to add the parentheses, like this:
function myfunc()
println("inside myfunc")
end
myfunc # This line is silently ignored. The function isn't invoked and there's no error message.
After a while I did figure out that I was missing the parentheses, but since Julia didn't give me an error, I'm wondering what that line is actually doing? I'm assuming that it must be doing something with the myfunc statement, but I don't know Julia well enough to understand what is happening.
I tried --depwarn=yes but don't see a julia command line switch to increase the warning level or verbosity. Please let me know if one exists.
For background context, the reason this came up is that I'm trying to translate a Bash script to Julia, and there are numerous locations where an argument-less function is defined and then invoked, and in Bash you don't need parentheses after the function name to invoke it.
The script is being run from command line (julia stub.jl) and I'm using Julia 1.0.3 on macOS.

It doesn't silently ignore the function. Calling myfunc in an interactive session will show you what happens: the call returns the function object to the console, and thus call's the show method for Function, showing how many methods are currently defined for that function in your workspace.
julia> function myfunc()
println("inside myfunc")
end
myfunc (generic function with 1 method)
julia> myfunc
myfunc (generic function with 1 method)
Since you're calling this in a script, show is never called, and thus you don't see any result. But it doesn't error, because the syntax is valid.
Thanks to DNF for the helpful comment on it being in a script.

It does nothing.
As in c, an expression has a value: in c the expression _ a=1+1; _ has the value _ 2 _ In c, this just fell out of the parser: they wanted to be able to evaluate expressions like _ a==b _
In Julia, it's the result of designing a language where the code you write is handled as a data object of the language. In Julia, the expression "a=1+1" has the value "a=1+1".
In c, the fact that
a=1+1;
is an acceptable line of code means that, accidentally,
a;
is also an acceptable line of code. The same is true in Julia: the compiler expects to see a data value there: any data value you put may be acceptable: even for example the data value that represents the calculated value returned by a function:
myfunc()
or the value that represents the function object itself:
myfunc
As in c, the fact that data values are everywhere in your code just indicates that the syntax allows data values everywhere in your code and that the compiler does nothing with data values that are everywhere in your code.

Running a loop and executing a function after each iteration of loop: r

I'm running a google big query script off of RStudio.
I have one important parameterised variable. Which needs to be replaced with values in a dataframe
health_tags<-read.csv('marker_tags.csv')
health_tags<-tail(tags, 7)
I have built a function which executes my query whilst adding the parameters to my variables.
query_details (MD2_date_start="2018-06-06",
MD2_date_end="2018-07-07",
Sterile_tag="7894")
So "query_details" is a function API call which fills in details for BQ to run. How do I write a looper which replaces the values in "sterile_tag" with the codes found in the health_tags CSV and then run the "query_details" function each time until all iterations have completed.

You can use sapply where column should be the real name of your column:
sapply(health_tags$column, function(x) query_details (MD2_date_start="2018-06-06",
MD2_date_end="2018-07-07",
Sterile_tag=as.character(x)))

How to quit/exit from file included in the terminal

What can I do within a file "example.jl" to exit/return from a call to include() in the command line
julia> include("example.jl")
without existing julia itself. quit() will just terminate julia itself.
Edit: For me this would be useful while interactively developing code, for example to include a test file and return from the execution to the julia prompt when a certain condition is met or do only compile the tests I am currently working on without reorganizing the code to much.

I'm not quite sure what you're looking to do, but it sounds like you might be better off writing your code as a function, and use a return to exit. You could even call the function in the include.

Kristoffer will not love it, but
stop(text="Stop.") = throw(StopException(text))
struct StopException{T}
S::T
end
function Base.showerror(io::IO, ex::StopException, bt; backtrace=true)
Base.with_output_color(get(io, :color, false) ? :green : :nothing, io) do io
showerror(io, ex.S)
end
end
will give a nice, less alarming message than just throwing an error.
julia> stop("Stopped. Reason: Converged.")
ERROR: "Stopped. Reason: Converged."
Source: https://discourse.julialang.org/t/a-julia-equivalent-to-rs-stop/36568/12

You have a latent need for a debugging workflow in Julia. If you use Revise.jl and Rebugger.jl you can do exactly what you are asking for.
You can put in a breakpoint and step into code that is in an included file.
If you include a file from the julia prompt that you want tracked by Revise.jl, you need to use includet(.
The keyboard shortcuts in Rebugger let you iterate and inspect variables and modify code and rerun it from within an included file with real values.
Revise lets you reload functions and modules without needing to restart a julia session to pick up the changes.
https://timholy.github.io/Rebugger.jl/stable/
https://timholy.github.io/Revise.jl/stable/
The combination is very powerful and is described deeply by Tim Holy.
https://www.youtube.com/watch?v=SU0SmQnnGys
https://youtu.be/KuM0AGaN09s?t=515
Note that there are some limitations with Revise, such as it doesn't reset global variables, so if you are using some global count or something, it won't reset it for the next run through or when you go back into it. Also it isn't great with runtests.jl and the Test package. So as you develop with Revise, when you are done, you move it into your runtests.jl.
Also the Juno IDE (Atom + uber-juno package) has good support for code inspection and running line by line and the debugging has gotten some good support lately. I've used Rebugger from the julia prompt more than from the Juno IDE.
Hope that helps.

#DanielArndt is right.
It's just create a dummy function in your include file and put all the code inside (except other functions and variable declaration part that will be place before). So you can use return where you wish. The variables that only are used in the local context can stay inside dummy function. Then it's just call the new function in the end.
Suppose that the previous code is:
function func1(...)
....
end
function func2(...)
....
end
var1 = valor1
var2 = valor2
localVar = valor3
1st code part
# I want exit here!
2nd code part
Your code will look like this:
var1 = valor1
var2 = valor2
function func1(...)
....
end
function func2(...)
....
end
function dummy()
localVar = valor3
1st code part
return # it's the last running line!
2nd code part
end
dummy()
Other possibility is placing the top variables inside a function with a global prefix.
function dummy()
global var1 = valor1
global var2 = valor2
...
end
That global variables can be used inside auxiliary function (static scope) and outside in the REPL
Another variant only declares the variables and its posterior use is free
function dummy()
global var1, var2
...
end

Measuring CPU time using Scilab

I'm new to working with Scilab, and I'm trying to run the code below, but when I try it keeps showing me this error::
test3(1000) //Line that I type to run the code
!--error 4 //First error
Undefined variable: cputime
at line 2 of function test3 called by:
I ran it using MATLAB, and it worked, but I can't figure out how to make it run using Scilab.
For sample code when typed using the Scilab editor, see below.
function test3(n)
t = cputime;
for (j = 1:n)
x(j) = sin(j);
end
disp(cputime - t);

Typing help cputime in the Scilab console will reveal that this is not a Scilab function. The near-equivalent Scilab function is timer(), but its behavior is a bit different:
cputime in Matlab measures time since Matlab started
timer() measures time since the last call to timer()
Here is your function rewritten in Scilab:
function test3(n)
timer()
for j = 1:n
x(j) = sin(j)
end
disp(timer())
endfunction
Note that Scilab functions must end with endfunction, and that semicolons are optional: line-by-line output is suppressed in Scilab by default.
For completeness, I'll mention tic() and toc(), which work just like Matlab's tic and toc, measuring real-world time of computation.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Parallel processing using DataFrames in Julia - julia

Related

writing into shared arrays within a distributed for loop in JULIA

Why is a Julia function name (without arguments) silently ignored?

Running a loop and executing a function after each iteration of loop: r

How to quit/exit from file included in the terminal

Measuring CPU time using Scilab

Categories

Resources