Animating subplots using Plots.jl efficiently - julia

I am trying to create an animation with three subplots (one surface, two heatmaps) in Julia using Plots.jl with the GR backend. By far the slowest part of my code is the generation of these plots, so I am trying to find the most efficient way to do it.
I've tried re-calling the plotting inside the animate loop, but that was significantly slower than modifying in place as so:
using Plots,Profile
function mcve(n)
A = rand(n,100,100)
B = rand(n,100,100)
l = #layout [ a b ; c]
p1 = surface(1:100,1:100,A[1,:,:],clims=(0,1),legend=false)
p2 = heatmap(A[1,:,:],clims=(0,1),aspect_ratio=1,legend=false)
p3 = heatmap(B[1,:,:],aspect_ratio=1)
p = plot(p1,p2,p3,layout = l)
anim = #animate for i=1:n
surface!(p[1],1:100,1:100,A[i,:,:])
heatmap!(p[2],A[i,:,:])
heatmap!(p[3],B[i,:,:])
end
gif(anim,"example.gif")
end
mcve(1)
#profile mcve(10)
Profile.print()
Results in the trace
https://pastebin.com/Lv9uCLE5
According to the profiler, nearly half the runtime is spent in a function "setcharheight", which is calling a c library. Is there a way to reduce the number of calls to it I need to make?

I did some experiments, and I found two things that could dramatically speed up the plotting process.
First, rather than redrawing the plots using surface!() and heatmap!(), I simply replaced their :z series. This is shown by comparing the first to the third and second to the fourth functions in the code example.
Second, GR.jl; setcharheight is extremely slow. This is likely due to the ccall(), which means it may be OS dependent. By setting xticks and yticks to false, significant speedups were achieved. This is shown by comparing the first to second and third to fourth functions in the example.
using Plots
function mcve(n,A,B)
l = #layout [ a b ; c]
p1 = surface(1:100,1:100,A[1,:,:],clims=(0,1),legend=false)
p2 = heatmap(A[1,:,:],clims=(0,1),aspect_ratio=1,legend=false)
p3 = heatmap(B[1,:,:],aspect_ratio=1)
p = plot(p1,p2,p3,layout = l)
anim = #animate for i=1:n
surface!(p[1],1:100,1:100,A[i,:,:])
heatmap!(p[2],A[i,:,:])
heatmap!(p[3],B[i,:,:])
end
gif(anim,"example1.gif")
end
function mcve4(n,A,B)
l = #layout [ a b ; c]
p1 = surface(1:100,1:100,A[1,:,:],clims=(0,1),legend=false,xticks=false,yticks=false)
p2 = heatmap(A[1,:,:],clims=(0,1),aspect_ratio=1,legend=false,xticks=false,yticks=false)
p3 = heatmap(B[1,:,:],aspect_ratio=1,xticks=false,yticks=false)
p = plot(p1,p2,p3,layout = l)
anim = #animate for i=1:n
surface!(p[1],1:100,1:100,A[i,:,:],xticks=false,yticks=false)
heatmap!(p[2],A[i,:,:],xticks=false,yticks=false)
heatmap!(p[3],B[i,:,:],xticks=false,yticks=false)
end
gif(anim,"example4.gif")
end
function mcve2(n,A,B)
l = #layout [ a b ; c]
p1 = surface(1:100,1:100,A[1,:,:],clims=(0,1),legend=false,xticks =false,yticks= false)
p2 = heatmap(A[1,:,:],clims=(0,1),aspect_ratio=1,legend=false,xticks = false,yticks= false)
p3 = heatmap(B[1,:,:],aspect_ratio=1,xticks = false,yticks= false)
p = plot(p1,p2,p3,layout = l)
anim = #animate for i=1:n
p[1][1][:z] = A[i,:,:]
p[2][1][:z] = A[i,:,:]
p[3][1][:z] = B[i,:,:]
end
gif(anim,"example2.gif")
end
function mcve3(n,A,B)
l = #layout [ a b ; c]
p1 = surface(1:100,1:100,A[1,:,:],clims=(0,1),legend=false)
p2 = heatmap(A[1,:,:],clims=(0,1),aspect_ratio=1,legend=false)
p3 = heatmap(B[1,:,:],aspect_ratio=1)
p = plot(p1,p2,p3,layout = l)
anim = #animate for i=1:n
p[1][1][:z] = A[i,:,:]
p[2][1][:z] = A[i,:,:]
p[3][1][:z] = B[i,:,:]
end
gif(anim,"example3.gif")
end
A = rand(1,100,100)
B = rand(1,100,100)
mcve(1,A,B)
mcve2(1,A,B)
mcve3(1,A,B)
mcve4(1,A,B)
A = rand(10,100,100)
B = rand(10,100,100)
println("Replot,ticks on")
#time mcve(10,A,B)
println("Replot,ticks off")
#time mcve4(10,A,B)
println(":z replace, ticks on")
#time mcve3(10,A,B)
println(":z replace, ticks off")
#time mcve2(10,A,B)
which results in
Replot,ticks on
19.347849 seconds (12.78 M allocations: 399.848 MiB, 0.30% gc time)
Replot,ticks off
6.227432 seconds (8.71 M allocations: 298.890 MiB, 0.88% gc time)
:z replace, ticks on
8.572728 seconds (5.43 M allocations: 149.359 MiB, 0.24% gc time)
:z replace, ticks off
1.805316 seconds (1.36 M allocations: 48.450 MiB, 0.40% gc time)

Related

Julia: marker_z takes much time

I'm trying to do a gradation plotting.
using Plots
using LinearAlgebra
L = 60 #size of a matrix
N = 10000 #number of loops
E = zeros(Complex{Float64},N,L) #set of eigenvalues
IPR = zeros(Complex{Float64},N,L) #indicator for marker_z
Preparing E & IPR
function main()
cnt = 0
for i = 1:N
cnt += 1
H = rand(Complex{Float64},L,L)
eigenvalue,eigenvector = eigen(H)
for j = 1:L
E[cnt,j] = eigenvalue[j]
IPR[cnt,j] = abs2(norm(abs2.(eigenvector[:,j])))/(abs2(norm(eigenvector[:,j])))
end
end
end
Plotting
function main1()
plot(real.(E),imag.(E),marker_z = real.(IPR),st = scatter,markercolors=:cool,markerstrokewidth=0,markersize=1,dpi=300)
plot!(legend=false,xlabel="ReE",ylabel="ImE")
savefig("test.png")
end
#time main1()
358.794885 seconds (94.30 M allocations: 129.882 GiB, 2.05% gc time)
Comparing with a uniform plotting, a gradation plotting takes too much time.
function main2()
plot(real.(E),imag.(E),st = scatter,markercolor=:blue,markerstrokewidth=0,markersize=1,dpi=300)
plot!(legend=false,xlabel="ReE",ylabel="ImE")
savefig("test1.png")
end
#time main2()
8.100609 seconds (10.85 M allocations: 508.054 MiB, 0.47% gc time)
Is there a way of gradation plotting as fast as a uniform plotting?
I solved the problem by myself.
After updating from Julia 1.3.1 to Julia1.6.3, I checked the main1 became faster as Bill's comments.

How to avoid memory allocation in Julia?

Consider the following simple Julia code operating on four complex matrices:
n = 400
z = eye(Complex{Float64},n)
id = eye(Complex{Float64},n)
fc = map(x -> rand(Complex{Float64}), id)
cr = map(x -> rand(Complex{Float64}), id)
s = 0.1 + 0.1im
#time for j = 1:n
for i = 1:n
z[i,j] = id[i,j] - fc[i,j]^s * cr[i,j]
end
end
The timing shows a few million memory allocations, despite all variables being preallocated:
0.072718 seconds (1.12 M allocations: 34.204 MB, 7.22% gc time)
How can I avoid all those allocations (and GC)?
One of the first tips for performant Julia code is to avoid using global variables. This alone can cut the number of allocations by 7 times. If you must use globals, one way to improve their performance is to use const. Using const prevents change of type but change of value is possible with a warning.
consider this modified code without using functions:
const n = 400
z = Array{Complex{Float64}}(n,n)
const id = eye(Complex{Float64},n)
const fc = map(x -> rand(Complex{Float64}), id)
const cr = map(x -> rand(Complex{Float64}), id)
const s = 0.1 + 0.1im
#time for j = 1:n
for i = 1:n
z[i,j] = id[i,j] - fc[i,j]^s * cr[i,j]
end
end
The timing shows this result:
0.028882 seconds (160.00 k allocations: 4.883 MB)
Not only did the number of allocations get 7 times lower, but also the execution speed is 2.2 times faster.
Now let's apply the second tip for high performance Julia code; write every thing in functions. Writing the above code into a function z_mat(n):
function z_mat(n)
z = Array{Complex{Float64}}(n,n)
id = eye(Complex{Float64},n)
fc = map(x -> rand(Complex{Float64}), id)
cr = map(x -> rand(Complex{Float64}), id)
s = 1.0 + 1.0im
#time for j = 1:n
for i = 1:n
z[i,j] = id[i,j] - fc[i,j]^s * cr[i,j]
end
end
end
and running
z_mat(40)
0.000273 seconds
#time z_mat(400)
0.027273 seconds
0.032443 seconds (429 allocations: 9.779 MB)
That is 2610 times fewer allocations than the original code for the whole function because the loop alone does zero allocations.

Conditional closures in Julia

In many applications of map(f,X), it helps to create closures that depending on parameters apply different functions f to data X.
I can think of at least the following three ways to do this (note that the second for some reason does not work, bug?)
f0(x,y) = x+y
f1(x,y,p) = x+y^p
function g0(power::Bool,X,y)
if power
f = x -> f1(x,y,2.0)
else
f = x -> f0(x,y)
end
map(f,X)
end
function g1(power::Bool,X,y)
if power
f(x) = f1(x,y,2.0)
else
f(x) = f0(x,y)
end
map(f,X)
end
abstract FunType
abstract PowerFun <: FunType
abstract NoPowerFun <: FunType
function g2{S<:FunType}(T::Type{S},X,y)
f(::Type{PowerFun},x) = f1(x,y,2.0)
f(::Type{NoPowerFun},x) = f0(x,y)
map(x -> f(T,x),X)
end
X = 1.0:1000000.0
burnin0 = g0(true,X,4.0) + g0(false,X,4.0);
burnin1 = g1(true,X,4.0) + g1(false,X,4.0);
burnin2 = g2(PowerFun,X,4.0) + g2(NoPowerFun,X,4.0);
#time r0true = g0(true,X,4.0); #0.019515 seconds (12 allocations: 7.630 MB)
#time r0false = g0(false,X,4.0); #0.002984 seconds (12 allocations: 7.630 MB)
#time r1true = g1(true,X,4.0); # 0.004517 seconds (8 allocations: 7.630 MB, 26.28% gc time)
#time r1false = g1(false,X,4.0); # UndefVarError: f not defined
#time r2true = g2(PowerFun,X,4.0); # 0.085673 seconds (2.00 M allocations: 38.147 MB, 3.90% gc time)
#time r2false = g2(NoPowerFun,X,4.0); # 0.234087 seconds (2.00 M allocations: 38.147 MB, 60.61% gc time)
What is the optimal way to do this in Julia?
There's no need to use map here at all. Using a closure doesn't make things simpler or faster. Just use "dot-broadcasting" to apply the functions directly:
function g3(X,y,power=1)
if power != 1
return f1.(X, y, power) # or simply X .+ y^power
else
return f0.(X, y) # or simply X .+ y
end
end

Julia significantly slower with #parallel

I have this code(primitive heat transfer):
function heat(first, second, m)
#sync #parallel for d = 2:m - 1
for c = 2:m - 1
#inbounds second[c,d] = (first[c,d] + first[c+1, d] + first[c-1, d] + first[c, d+1] + first[c, d-1]) / 5.0;
end
end
end
m = parse(Int,ARGS[1]) #size of matrix
firstm = SharedArray(Float64, (m,m))
secondm = SharedArray(Float64, (m,m))
for c = 1:m
for d = 1:m
if c == m || d == 1
firstm[c,d] = 100.0
secondm[c,d] = 100.0
else
firstm[c,d] = 0.0
secondm[c,d] = 0.0
end
end
end
#time for i = 0:opak
heat(firstm, secondm, m)
firstm, secondm = secondm, firstm
end
This code give good times when run sequentially, but when I add #parallel it slow down even if I run on one thread. I just need explanation why this is happening? Code only if it doesn't change algorithm of heat function.
Have a look at http://docs.julialang.org/en/release-0.4/manual/performance-tips/ . Contrary to advised, you make use of global variables a lot. They are considered to change types anytime so they have to be boxed and unboxed everytime they are referenced. This question also Julia pi aproximation slow suffers from the same. In order to make your function faster, have global variables as input arguments to the function.
There are some points to consider. One of them is the size of m. If it is small, parallelism would give much overhead for not a big gain:
julia 36967257.jl 4
# Parallel:
0.040434 seconds (4.44 k allocations: 241.606 KB)
# Normal:
0.042141 seconds (29.13 k allocations: 1.308 MB)
For bigger m you could have better results:
julia 36967257.jl 4000
# Parallel:
0.054848 seconds (4.46 k allocations: 241.935 KB)
# Normal:
3.779843 seconds (29.13 k allocations: 1.308 MB)
Plus two remarks:
1/ initialisation could be simplified to:
for c = 1:m, d = 1:m
if c == m || d == 1
firstm[c,d] = 100.0
secondm[c,d] = 100.0
else
firstm[c,d] = 0.0
secondm[c,d] = 0.0
end
end
2/ your finite difference schema does not look stable. Please take a look at Linear multistep method or ADI/Crank Nicolson.

Compute sum_i f(i) x(i) x(i)' fast?

I'm trying to compute the summation of f(i) * x(i) * x(i)'
where x(i) is a column vector, x(i)' is the transpose, and f(i) is a scalar. So it's a weighted sum of outer products.
In MATLAB, this can be achieved pretty fast by using bsxfun.
The following code runs in 260 ms on my laptop (MacBook Air 2010)
N = 1e5;
d = 100;
f = randn(N, 1);
x = randn(N, d);
% H = zeros(d, d);
tic;
H = x' * bsxfun(#times, f, x);
toc
I've been trying to make Julia do the same job, but I can't get to do it faster.
N = int(1e5);
d = 100;
f = randn(N);
x = randn(N, d);
function hess1(x, f)
N, d = size(x);
temp = zeros(N, d);
#simd for kk = 1:N
#inbounds temp[kk, :] = f[kk] * x[kk, :];
end
H = x' * temp;
end
function hess2(x, f)
N, d = size(x);
H2 = zeros(d,d);
#simd for k = 1:N
#inbounds H2 += f[k] * x[k, :]' * x[k, :];
end
return H2
end
function hess3(x, f)
N, d = size(x);
H3 = zeros(d,d);
for k = 1:N
for k1 = 1:d
#simd for k2 = 1:d
#inbounds H3[k1, k2] += x[k, k1] * x[k, k2] * f[k];
end
end
end
return H3
end
The results are
#time H1 = hess1(x, f);
#time H2 = hess2(x, f);
#time H3 = hess3(x, f);
elapsed time: 0.776116469 seconds (262480224 bytes allocated, 26.49% gc time)
elapsed time: 30.496472345 seconds (16385442496 bytes allocated, 56.07% gc time)
elapsed time: 2.769934563 seconds (80128 bytes allocated)
hess1 is like MATLAB's bsxfun but slower, and hess3 uses no temporary memory, but significantly slower. My best julia code is 3 times slower than MATLAB.
How can I make this julia code faster?
IJulia gist: http://nbviewer.ipython.org/gist/memming/669fb8e78af3338ebf6f
Julia version: 0.3.0-rc1
EDIT:
I tested on a more powerful computer (3.5 Ghz Intel i7, 4 core, L2 256kB, L3 8 MB)
MATLAB R2014a without -singleCompThread: 0.053 s
MATLAB R2014a with -singleCompThread: 0.080 s (#tholy's suggestion)
Julia 0.3.0-rc1
hess1 elapsed time: 0.215406904 seconds (262498648 bytes allocated, 32.74% gc time)
hess2 elapsed time: 10.722578699 seconds (16384080176 bytes allocated, 62.20% gc time)
hess3 elapsed time: 1.065504355 seconds (80176 bytes allocated)
bsxfunstyle elapsed time: 0.063540168 seconds (80081072 bytes allocated, 25.04% gc time) (#IainDunning's solution)
Indeed, using broadcast is much faster and comparable to MATLAB's bsxfun.
You are looking for the broadcast function. Here is the relevant issue discussing the functionality and naming.
I implemented your version as well as a broadcast version, here is what I found:
srand(1988)
N = 100_000
d = 100
f = randn(N, 1)
x = randn(N, d)
function hess1(x, f)
N, d = size(x);
temp = zeros(N, d);
#simd for kk = 1:N
#inbounds temp[kk, :] = f[kk] * x[kk, :];
end
H = x' * temp;
end
function bsxfunstyle(x, f)
x' * broadcast(*,f,x)
end
# Warmup
hess1(x,f)
bsxfunstyle(x, f)
# For real
println("Hess1")
#time H1 = hess1(x, f)
println("Broadcast")
#time H2 = bsxfunstyle(x, f)
# Check solutions are identical
println(sum(abs(H1-H2)))
with output
Hess1
elapsed time: 0.324256216 seconds (262498648 bytes allocated, 33.95% gc time)
Broadcast
elapsed time: 0.126647594 seconds (80080696 bytes allocated, 20.22% gc time)
0.0
There are several performance issues with your functions
you're creating temporary arrays by x[kk, :].
you are traversing matrix in rows while they are stored in column order.
You are using x' (which first transpose the matrix) rather than At_mul_B(x,...)
A simple modification gives better performances :
N = 100_000
d = 100
f = randn(N)
x = randn(N, d)
f = randn(N, 1)
x = randn(N, d)
function hess(x, f)
N, d = size(x);
temp = zeros(N, d);
#inbounds for k1 = 1:d
#simd for kk = 1:N
temp[kk, k1] = f[kk] * x[kk, k1]
end
end
H = At_mul_B(x, temp)
end
#time hess(x, f)
# 0.067636 seconds (9 allocations: 76.371 MB, 11.24% gc time)

Resources