Parallel operations over arrays in Julia - julia

What is the parallel (over multiple CPUs) version of this code in Julia?
V = zeros(3)
for i = 1:100000
cc = rand(1:3)
V[cc] += 1
end

This is a direct rewrite of your loop that is thread, tread-safe and avoiding false sharing:
using Random
using Base.Threads
V = let
mt = Tuple([MersenneTwister() for _ in 1:nthreads()])
Vv = Tuple([zeros(3) for _ in 1:nthreads()])
#threads for i = 1:100000
#inbounds cc = rand(mt[threadid()], 1:3)
#inbounds Vv[threadid()][cc] += 1
end
reduce(+, Vv)
end
However, in general for such a small job probably using threading will not give you much benefit. Also if you really need performance probably the code should be restructured a bit e.g. like this:
function worker(iters, rng)
v = zeros(3)
for i = 1:iters
cc = rand(rng, 1:3)
v[cc] += 1
end
v
end
V = let
mt = Tuple([MersenneTwister() for _ in 1:nthreads()])
Vv = [zeros(3) for _ in 1:nthreads()]
jobs_per_thread = fill(div(100000, nthreads()),nthreads())
for i in 1:100000-sum(jobs_per_thread)
jobs_per_thread[i] += 1
end
#assert sum(jobs_per_thread) == 100000
#threads for i = 1:nthreads()
Vv[threadid()] = worker(jobs_per_thread[threadid()], mt[threadid()])
end
reduce(+, Vv)
end
Also under Julia 1.3 you will not have to do manual MersenneTwister management, as Julia will create separate PRNG per thread.

Related

Execution time orders of magnitude longer depending upon global definition location?

I finished writing the following program and began to do some cleanup after the debugging stage:
using BenchmarkTools
function main()
global solution = 0
global a = big"1"
global b = big"1"
global c = big"0"
global total = 0
while a < 100
while b < 100
c = a^b
s = string(c)
total = 0
for i in 1:length(s)
total = total + Int(s[i]) - 48
end
if total > solution
global solution = total
end
global b = b + 1
end
global b = 1
global a = a + 1
end
end
#elapsed begin
main()
end
#run #elapsed twice to ignore compilation overhead
t = #elapsed main()
print("Solution: ", solution)
t = t * 1000;
print("\n\nProgram completed in ", round.(t; sigdigits=5), " milliseconds.")
The runtime for my machine was around 150ms.
I decided to rearrange the globals to better match the typical layout of program, where globals are defined at the top:
using BenchmarkTools
global solution = 0
global a = big"1"
global b = big"1"
global c = big"0"
global total = 0
function main()
while a < 100
while b < 100
c = a^b
s = string(c)
total = 0
for i in 1:length(s)
total = total + Int(s[i]) - 48
end
if total > solution
global solution = total
end
global b = b + 1
end
global b = 1
global a = a + 1
end
end
#elapsed begin
main()
end
#run #elapsed twice to ignore compilation overhead
t = #elapsed main()
print("Solution: ", solution)
t = t * 1000;
print("\n\nProgram completed in ", round.(t; sigdigits=5), " milliseconds.")
Making that one change for where the globals were defined reduced the runtime on my machine to 0.0042ms.
Why is the runtime so drastically reduced?
Don't use globals.
Don't. Use. Globals. They are bad.
When you define your globals outside the main function, then the second time you run your function, a already equals 100, and main() bails out before doing anything at all.
Global variables are a bad idea, not just in Julia, but in programming in general. You can use them when defining proper constants, like π, and maybe some other specialized cases, but not for things like this.
Let me rewrite your function without globals:
function main_locals()
solution = 0
a = 1
while a < 100
b = 1
c = big(1)
while b < 100
c *= a
s = string(c)
total = sum(Int, s) - 48 * length(s)
solution = max(solution, total)
b += 1
end
a += 1
end
return solution
end
On my laptop this is >20x faster than your version with globals defined inside the function, that is, the version that actually works. The other one doesn't work as it should, so the comparison is not relevant.
Edit: I have even complicated this too much. The only thing you need to do is to remove all the globals from your first function, and return the solution, then it will work fine, and be almost as fast as the code I wrote:
function main_with_globals_removed()
solution = 0
a = big"1"
b = big"1"
c = big"0"
total = 0
while a < 100
while b < 100
c = a^b
s = string(c)
total = 0
for i in 1:length(s)
total = total + Int(s[i]) - 48
end
if total > solution
solution = total
end
b = b + 1
end
b = 1
a = a + 1
end
return solution # remember return!
end
Don't use globals.
In the first case, you are always assigning the globals and possibly changing types. Hence compiler needs to do extra work. I assume that the two programs generate different answers after the 2nd run because of the failure to reset globals…
Globals are discouraged in Julia for performance reasons because of potential type instability.

Julia UndefVarError on Metaprogramming

I'm trying to do a solver for equations. When I run the code the X variable appears to be undefined, but it prints out perfectly. What am I missing?
I should give the program some numbers, than operations as Macros and it should create an outer product matrix of the operations applied.
function msu()
print("Insert how many values: ")
quantity = parse(Int64, readline())
values = []
for i in 1:quantity
println("x$i")
num1 = parse(Float64, readline())
push!(values, num1)
end
println(values)
print("How many operations? ")
quantity = parse(Int64, readline())
ops = []
for i in 1:quantity
push!(ops, Meta.parse(readline()))
end
mat = zeros((quantity, quantity))
for i in 1:length(mat)
sum = 0
for j in 1:length(values)
# here begins problems, the following prints are for debugging purpose
print(length(values))
func = Meta.parse("$(ops[convert(Int64, ceil(j / quantity))]) * $(ops[convert(Int64, j % quantity)])")
print(func)
x = values[j]
println(x)
sum += eval(func)
end
mat[i] = sum
end
println(mat)
end
msu()
The original code was in Spanish, if you find any typo it's probably because I skipped a translation.

ST-HOSVD in Julia

I am trying to implement ST-HOSVD algorithm in Julia because I could not found library which contains ST-HOSVD.
See this paper in Algorithm 1 in page7.
https://people.cs.kuleuven.be/~nick.vannieuwenhoven/papers/01-STHOSVD.pdf
I cannot reproduce input (4,4,4,4) tensor by approximated tensor whose tucker rank is (2,2,2,2).
I think I have some mistake in indexes of matrix or tensor elements, but I could not locate it.
How to fix it?
If you know library of ST-HOSVD, let me know.
ST-HOSVD is really common way to reduce information. I hope the question helps many Julia user.
using TensorToolbox
function STHOSVD(A, reqrank)
N = ndims(A)
S = copy(A)
Sk = undef
Uk = []
for k = 1:N
if k == 1
Sk = tenmat(S, k)
end
Sk_svd = svd(Sk)
U1 = Sk_svd.U[ :, 1:reqrank[k] ]
V1t = Sk_svd.V[1:reqrank[k], : ]
Sigma1 = diagm( Sk_svd.S[1:reqrank[k]] )
Sk = Sigma1 * V1t
push!(Uk, U1)
end
X = ttm(Sk, Uk[1], 1)
for k=2:N
X = ttm(X, Uk[k], k)
end
return X
end
A = rand(4,4,4,4)
X = X_STHOSVD(A, [2,2,2,2])
EDIT
Here, Sk = tenmat(S, k) is mode n matricization of tensor S.
S∈R^{I_1×I_2×…×I_N}, S_k∈R^{I_k×(Π_{m≠k}^{N} I_m)}
The function is contained in TensorToolbox.jl. See "Basis" in Readme.
The definition of mode-k Matricization can be seen the paper in page 460.
It works.
I have seen 26 page in this slide
using TensorToolbox
using LinearAlgebra
using Arpack
function STHOSVD(T, reqrank)
N = ndims(T)
tensor_shape = size(T)
for i = 1 : N
T_i = tenmat(T, i)
if reqrank[i] == tensor_shape[i]
USV = svd(T_i)
else
USV = svds(T_i; nsv=reqrank[i] )[1]
end
T = ttm( T, USV.U * USV.U', i)
end
return T
end

Parallelize two (or more) functions in julia

I am trying to solve some wave equation problem (related to my Phd) using finite difference method. For this, I have translated (line by line) a fortran code (link below): (https://github.com/geodynamics/seismic_cpml/blob/master/seismic_CPML_2D_anisotropic.f90)
Inside these code and within the time loop, there are four main loops that are independent. In fact, I could arrange them into four functions.
As I have to run this code about a hundred times, it would be nice to speed up the process. In this sense, I am turning my eyes toward parallelization. See below, as an example:
function main()
...some common code...
for time=1:N
function fun1() # I want this function to run parallel...
function fun2() # ..this function to run parallel with 1,3,4
function fun3() # ..This function to run parallel with 2,3,4
function fun4() # ..This function to run parallel with 1,2,3
end
... more code here...
return
end
So,
1) Is it possible to do what I mention before?
2) Will this approach speed up my code?
3) Is there a better way to think this problem?
A minimal working example could be like this:
function fun1(t)
for i=1:1000
for j=1:1000
t+=(0.5)^t+(0.3)^(t-1);
end
end
return t
end
function fun2(t)
for i=1:1000
for j=1:1000
t+=(0.5)^t;
end
end
return t
end
function fun3(r)
for i=1:1000
for j=1:1000
r = (r + rand())/r;
end
end
return r
end
function main()
a = 2;
b = 2.5;
c = 3.0;
for i=1:100
a = fun1(a);
b = fun2(b);
c = fun3(c);
end
return;
end
So, As can be seen, non of the three functions above (fun1, fun2 & fun3) depend from any ohter, so they can sure run parallel. can these be achieved?, will it bust my computational speed?
Edited:
Hi #BogumiłKamiński I have altered the finite-Diff-eq in order to implement a "loop" (as you sugested) over the inputs and outputs of my functions. If there is no much trouble, I would like your opinion over the parellelization design of the code:
Key elements
1) I have packed all inputs in 4 tuples: sig_xy_in and sig_xy_cros_in (for the 2 sigma functions) and vel_vx_in and vel_vy_in (for 2 velocity functions). I then packed the 4 tuples into 2 vectors for "looping" purposes...
2) I packed the 4 functions in 2 vectors for "looping" purposes...
3) I run the first parallel loop and then unpack its output tuple...
4) I run the second parallel loop(for velocities) and then unpack its output tuple...
5) finally, I packed the outputed elements into the inputs tuples and continue the time loop until finish..
...code
l = Threads.SpinLock()
arg_in_sig = [sig_xy_in,sig_xy_cros_in]; # Inputs tuples x sigma funct
arg_in_vel = [vel_vx_in, vel_vy_in]; # Inputs tuples x velocity funct
func_sig = [sig_xy , sig_xy_cros]; # Vector with two sigma functions
func_vel = [vel_vx , vel_vy]; # Vector with two velocity functions
for it = 1:NSTEP # time steps
#------------------------------------------------------------
# Compute sigma functions
#------------------------------------------------------------
Threads.#threads for j in 1:2 # Star parallel of two sigma functs
Threads.lock(l);
Threads.unlock(l);
arg_in_sig[j] = func_sig[j](arg_in_sig[j]);
end
# Unpack tuples for sig_xy and sig_xy_cros
# Unpack tuples for sig_xy
sigxx = arg_in_sig[1][1]; # changed by sig_xy
sigyy = arg_in_sig[1][2]; # changed by sig_xy
m_dvx_dx = arg_in_sig[1][3]; # changed by sig_xy
m_dvy_dy = arg_in_sig[1][4]; # changed by sig_xy
vx = arg_in_sig[1][5]; # unchanged by sig_xy
vy = arg_in_sig[1][6]; # unchanged by sig_xy
delx_1 = arg_in_sig[1][7]; # unchanged by sig_xy
dely_1 = arg_in_sig[1][8]; # unchanged by sig_xy
...more unpacking...
# Unpack tuples for sig_xy_cros
sigxy = arg_in_sig[2][1]; # changed by sig_xy_cros
m_dvy_dx = arg_in_sig[2][2]; # changed by sig_xy_cros
m_dvx_dy = arg_in_sig[2][3]; # changed by sig_xy_cros
vx = arg_in_sig[2][4]; # unchanged by sig_xy_cros
vy = arg_in_sig[2][5]; # unchanged by sig_xy_cros
...more unpacking....
#--------------------------------------------------------
# velocity
#--------------------------------------------------------
Threads.#threads for j in 1:2 # Start parallel ot two velocity funct
Threads.lock(l)
Threads.unlock(l)
arg_in_vel[j] = func_vel[j](arg_in_vel[j])
end
# Unpack tuples for vel_vx
vx = arg_in_vel[1][1]; # changed by vel_vx
m_dsigxx_dx = arg_in_vel[1][2]; # changed by vel_vx
m_dsigxy_dy = arg_in_vel[1][3]; # changed by vel_vx
sigxx = arg_in_vel[1][4]; # unchanged changed by vel_vx
sigxy = arg_in_vel[1][5];....
# Unpack tuples for vel_vy
vy = arg_in_vel[2][1]; # changed changed by vel_vy
m_dsigxy_dx = arg_in_vel[2][2]; # changed changed by vel_vy
m_dsigyy_dy = arg_in_vel[2][3]; # changed changed by vel_vy
sigxy = arg_in_vel[2][4]; # unchanged changed by vel_vy
sigyy = arg_in_vel[2][5]; # unchanged changed by vel_vy
.....
...more unpacking...
# ensamble new input variables
sig_xy_in = (sigxx,sigyy,
m_dvx_dx,m_dvy_dy,
vx,vy,....);
sig_xy_cros_in = (sigxy,
m_dvy_dx,m_dvx_dy,
vx,vy,....;
vel_vx_in = (vx,....
vel_vy_in = (vy,.....
end #time loop
Here is a simple way to run your code in multithreading mode:
function fun1(t)
for i=1:1000
for j=1:1000
t+=(0.5)^t+(0.3)^(t-1);
end
end
return t
end
function fun2(t)
for i=1:1000
for j=1:1000
t+=(0.5)^t;
end
end
return t
end
function fun3(r)
for i=1:1000
for j=1:1000
r = (r + rand())/r;
end
end
return r
end
function main()
l = Threads.SpinLock()
a = [2.0, 2.5, 3.0]
f = [fun1, fun2, fun3]
Threads.#threads for i in 1:3
for j in 1:4
Threads.lock(l)
println((thread=Threads.threadid(), iteration=j))
Threads.unlock(l)
a[i] = f[i](a[i])
end
end
return a
end
I have added locking - just as an example how you can do it (in Julia 1.3 you would not have to do this as IO is thread safe there).
Also note that rand() is sharing data among threads prior to Julia 1.3 so it would be not safe to run these functions if all of them used rand() (again in Julia 1.3 it would be safe to do so).
To run this code first set the maximum number of threads you want to use e.g. like this on Windows: set JULIA_NUM_THREADS=4 (in Linux you should export). Here is an example of this code run (I have reduced the number of iterations done in order to shorten the output):
julia> main()
(thread = 1, iteration = 1)
(thread = 3, iteration = 1)
(thread = 2, iteration = 1)
(thread = 3, iteration = 2)
(thread = 3, iteration = 3)
(thread = 3, iteration = 4)
(thread = 2, iteration = 2)
(thread = 1, iteration = 2)
(thread = 2, iteration = 3)
(thread = 2, iteration = 4)
(thread = 1, iteration = 3)
(thread = 1, iteration = 4)
3-element Array{Float64,1}:
21.40311930108456
21.402807510451463
1.219028489573526
Now one smal cautionary note - while it is relatively easy to make code multithreaded in Julia (and in Julia 1.3 it will be even simpler) you have to be careful when you do it as you have to take care of race conditions.

Implementation of Savitzky Golay in Julia

I have come across an implementation of SG-filter in Julia at this link. When I execute the function apply_filter, an error is returned -
UndefVarError: apply_filter not defined
I think this is an implementation for a previous version of Julia (?). I am executing this in Julia 1.0 as of now. Couldn't find documentation about the defined types, which is where my guess is concerning the error
I would like to forewarn the user about using the function savitzkyGolay in Julia. There is a mismatch with the result from Scipy implementation (which must have undergone several iterations of checking by the community)
#pyimport scipy.signal as ss
x=[1,2,3,4,5,6,7,8,9,10]
savitzkyGolay(x,5,1)
10-element Array{Float64,1}:
1.6000000000000003
2.200000000000001
3.0
4.0
5.000000000000001
6.000000000000001
7.0
8.0
8.8
9.400000000000002
#Python's scipy implementation
ss.savgol_filter(x,5,1)
10-element Array{Float64,1}:
1.0000000000000007
2.0000000000000004
2.9999999999999996
3.999999999999999
4.999999999999999
5.999999999999999
6.999999999999998
7.999999999999998
8.999999999999996
9.999999999999995
If it can help, I have simplified the code.
using Pkg, LinearAlgebra, DSP, Plots
function vandermonde(halfWindow, polyDeg)
x=[1.0*i for i in -halfWindow:halfWindow]
n = polyDeg+1
m = length(x)
V = zeros(m, n)
for i = 1:m
V[i,1] = 1.0
end
for j = 2:n
for i = 1:m
V[i,j] = x[i] * V[i,j-1]
end
end
return V
end
function SG(halfWindow, polyDeg)
V = vandermonde(halfWindow,polyDeg)
Q,R=qr(V)
n = polyDeg+1
m = 2*halfWindow+1
R1 = vcat(R, zeros(m-n,n))
sg = R1\Q'
for i in 1:(polyDeg+1)
sg[i,:] = sg[i,:]*factorial(i-1)
end
return sg'
end
function apply_filter(filter,signal)
halfWindow = round(Int,(length(filter)-1)/2)
padded_signal = [signal[1]*ones(halfWindow);signal;signal[end]*ones(halfWindow)]
filter_cross_signal = conv(filter[end:-1:1], padded_signal)
return filter_cross_signal[2*halfWindow+1:end-2*halfWindow]
end
Here is how I use it :
mean_speed_unfiltered = readdlm("mean_speeds_raw_-2.txt")
sg = SG(500,2); # halt-window, polynomal degree
t = 10*10^(-3)#s #time of the simulation
dt = 0.1/γ; #time step
Nt = convert(Int, round(t/dt)); #number of iteration
#Smooth the mean speed curve:
mean_speeds_smoothed = apply_filter(sg[:,1],mean_speed_unfiltered)
png(plot([j*dt for j=0:Nt] , mean_speeds_smoothed, title = "Smoothed mean speed over
time", xlabel = "t (s)"), "Mean_speed_filtered_SG")
derivative_mean_speeds_smoothed = apply_filter(sg[:,2],mean_speed_unfiltered)
plt1 = plot(mean_speeds_smoothed,derivative_mean_speeds_smoothed, title = "derivative mean speed over speed", xlabel = "<v>(t) (s)", ylabel = "d<v(t)>/dt")
png(plt1, "Force_SG_1D2Lasers")
However it seems to me that the code presented in https://gist.github.com/lnacquaroli/c97fbc9a15488607e236b3472bcdf097#file-savitzkygolay-jl-L34 is faster.

Resources