I have a 3 dimensional array
x = rand(6,6,2^10)
I want to multiply each matrix along the third dimension by a vector. Is there a more clean way to do this than:
y = rand(6,1)
z = zeros(6,1,2^10)
for i in 1:2^10
z[:,:,i] = x[:,:,i] * y
end
If you are working with matrices, it may be appropriate to consider x as a vector of matrices instead of a 3D array. Then you could do
x = [rand(6,6) for _ in 1:2^10]
y = [rand(6)]
z = x .* y
z is now a vector of vectors.
And if z is preallocated, that would be
z .= x .* y
And, if you want it really fast, use vectors of StaticArrays
using StaticArrays
x = [#SMatrix rand(6, 6) for _ in 1:2^10]
y = [#SVector rand(6)]
z = x .* y
That's showing a 10x speedup on my computer, running in 12us.
mapslices(i->i*y, x, (1,2)) is maybe "cleaner" but it will be slower.
Read as: apply the function "times by y" to each slice of the first two dimensions.
function tst(x,y)
z = zeros(6,1,2^10)
for i in 1:2^10
z[:,:,i] = x[:,:,i] * y
end
return z
end
tst2(x,y) = mapslices(i->i*y, x, (1,2))
time tst(x,y);
0.002152 seconds (4.10 k allocations: 624.266 KB)
#time tst2(x,y);
0.005720 seconds (13.36 k allocations: 466.969 KB)
sum(x.*y',2) is a clean short solution.
It also has good speed and memory properties. The trick is to view matrix-vector multiplication as a linear combination of matrix's columns scaled by the vector elements. Instead of doing each linear combination for matrix x[:,:,i], we use the same scale y[i] for x[:,i,:]. In code:
const x = rand(6,6,2^10);
const y = rand(6,1);
function tst(x,y)
z = zeros(6,1,2^10)
for i in 1:2^10
z[:,:,i] = x[:,:,i]*y
end
return z
end
tst2(x,y) = mapslices(i->i*y,x,(1,2))
tst3(x,y) = sum(x.*y',2)
Benchmarking gives:
julia> using BenchmarkTools
julia> z = tst(x,y); z2 = tst2(x,y); z3 = tst3(x,y);
julia> #benchmark tst(x,y)
BenchmarkTools.Trial:
memory estimate: 688.11 KiB
allocs estimate: 8196
--------------
median time: 759.545 μs (0.00% GC)
samples: 6068
julia> #benchmark tst2(x,y)
BenchmarkTools.Trial:
memory estimate: 426.81 KiB
allocs estimate: 10798
--------------
median time: 1.634 ms (0.00% GC)
samples: 2869
julia> #benchmark tst3(x,y)
BenchmarkTools.Trial:
memory estimate: 336.41 KiB
allocs estimate: 12
--------------
median time: 114.060 μs (0.00% GC)
samples: 10000
So tst3 using sum has better performance (~7x over tst and ~15x over tst2).
Using StaticArrays as suggested by #DNF is also an option, and it would be nice to compare it to the solutions here.
Related
I'm using Julia 1.0. Please consider the following code:
using LinearAlgebra
using Distributions
## create random data
const data = rand(Uniform(-1,2), 100000, 2)
function test_function_1(data)
theta = [1 2]
coefs = theta * data[:,1:2]'
res = coefs' .* data[:,1:2]
return sum(res, dims = 1)'
end
function test_function_2(data)
theta = [1 2]
sum_all = zeros(2)
for i = 1:size(data)[1]
sum_all .= sum_all + (theta * data[i,1:2])[1] * data[i,1:2]
end
return sum_all
end
After running it for the first time, I timed it
julia> #time test_function_1(data)
0.006292 seconds (16 allocations: 5.341 MiB)
2×1 Adjoint{Float64,Array{Float64,2}}:
150958.47189289227
225224.0374366073
julia> #time test_function_2(data)
0.038112 seconds (500.00 k allocations: 45.777 MiB, 15.61% gc time)
2-element Array{Float64,1}:
150958.4718928927
225224.03743660534
test_function_1 is significantly superior, both in allocations and speed, but test_function_1 is not devectorized. I would expect test_function_2 to perform better. Note that both functions do the same.
I have a hunch that it's because in test_function_2, I use sum_all .= sum_all + ..., but I'm not sure why that's a problem. Can I get a hint?
So first let me comment how I would write your function if I wanted to use a loop:
function test_function_3(data)
theta = (1, 2)
sum_all = zeros(2)
for row in eachrow(data)
sum_all .+= dot(theta, row) .* row
end
return sum_all
end
Next, here is a benchmark comparison of the three options:
julia> #benchmark test_function_1($data)
BenchmarkTools.Trial:
memory estimate: 5.34 MiB
allocs estimate: 16
--------------
minimum time: 1.953 ms (0.00% GC)
median time: 1.986 ms (0.00% GC)
mean time: 2.122 ms (2.29% GC)
maximum time: 4.347 ms (8.00% GC)
--------------
samples: 2356
evals/sample: 1
julia> #benchmark test_function_2($data)
BenchmarkTools.Trial:
memory estimate: 45.78 MiB
allocs estimate: 500002
--------------
minimum time: 16.316 ms (7.44% GC)
median time: 16.597 ms (7.63% GC)
mean time: 16.845 ms (8.01% GC)
maximum time: 34.050 ms (4.45% GC)
--------------
samples: 297
evals/sample: 1
julia> #benchmark test_function_3($data)
BenchmarkTools.Trial:
memory estimate: 96 bytes
allocs estimate: 1
--------------
minimum time: 777.204 μs (0.00% GC)
median time: 791.458 μs (0.00% GC)
mean time: 799.505 μs (0.00% GC)
maximum time: 1.262 ms (0.00% GC)
--------------
samples: 6253
evals/sample: 1
Next you can go a bit faster if you explicitly implement the dot in the loop:
julia> function test_function_4(data)
theta = (1, 2)
sum_all = zeros(2)
for row in eachrow(data)
#inbounds sum_all .+= (theta[1]*row[1]+theta[2]*row[2]) .* row
end
return sum_all
end
test_function_4 (generic function with 1 method)
julia> #benchmark test_function_4($data)
BenchmarkTools.Trial:
memory estimate: 96 bytes
allocs estimate: 1
--------------
minimum time: 502.367 μs (0.00% GC)
median time: 502.547 μs (0.00% GC)
mean time: 505.446 μs (0.00% GC)
maximum time: 806.631 μs (0.00% GC)
--------------
samples: 9888
evals/sample: 1
To understand the differences let us have a look at this line of your code:
sum_all .= sum_all + (theta * data[i,1:2])[1] * data[i,1:2]
Let us count the memory allocations you do in this expression:
sum_all .=
sum_all
+ # allocation of a new vector as a result of addition
(theta
* # allocation of a new vector as a result of multiplication
data[i,1:2] # allocation of a new vector via getindex
)[1]
* # allocation of a new vector as a result of multiplication
data[i,1:2] # allocation of a new vector via getindex
So you can see that in each iteration of the loop you allocate five times.
Allocations are expensive. And you can see this in the benchmarks that you have 5000002 allocations in the process:
1 allocation of sum_all
1 allocation of theta
500000 allocations in the loop (5 * 100000)
Additionally you perform indexing like data[i,1:2] which performs
bounds checking, which is also a small cost (but marginal in comparison to allocations).
Now in function test_function_3 I use eachrow(data). This time I also get rows of data matrix, but they are returned as views (not new matrices) so no allocation happens inside the loop. Next I use a dot function again to avoid allocation that was earlier caused by a matrix multiplication (I have changed theta to a Tuple from a Matrix as then dot is a bit faster, but this secondary). Finally I write um_all .+= dot(theta, row) .* row and in this case all operations are broadcasted, so Julia can do broadcast fusion (again - no allocations happen).
In test_function_4 I just replace dot by unrolled loop as we know we have two elements to calculate the dot product for. Actually if you fully unroll everything and use #simd it gets even faster:
julia> function test_function_5(data)
theta = (1, 2)
s1 = 0.0
s2 = 0.0
#inbounds #simd for i in axes(data, 1)
r1 = data[i, 1]
r2 = data[i, 2]
mul = theta[1]*r1 + theta[2]*r2
s1 += mul * r1
s2 += mul * r2
end
return [s1, s2]
end
test_function_5 (generic function with 1 method)
julia> #benchmark test_function_5($data)
BenchmarkTools.Trial:
memory estimate: 96 bytes
allocs estimate: 1
--------------
minimum time: 22.721 μs (0.00% GC)
median time: 23.146 μs (0.00% GC)
mean time: 24.306 μs (0.00% GC)
maximum time: 100.109 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
So you can see that this way you are around 100x faster than with test_function_1. Still already test_function_3 is relatively fast and it is fully generic so probably normally I would write something like test_function_3 unless I really needed to be super fast and knew that the dimensions of my data are fixed and small.
I have some code which loads a csv file of 2000 2D coordinates, then a function called collision_count counts the number of pairs of coordinates that are closer than a distance d of each other:
using BenchmarkTools
using CSV
using LinearAlgebra
function load_csv()::Array{Float64,2}
df = CSV.read("pos.csv", header=0)
return Matrix(df)'
end
function collision_count(pos::Array{Float64,2}, d::Float64)::Int64
count::Int64 = 0
N::Int64 = size(pos, 2)
for i in 1:N
for j in (i+1):N
#views dist = norm(pos[:,i] - pos[:,j])
count += dist < d
end
end
return count
end
Here are the results:
pos = load_csv()
#benchmark collision_count($pos, 2.0)
BenchmarkTools.Trial:
memory estimate: 366.03 MiB
allocs estimate: 5997000
--------------
minimum time: 152.070 ms (18.80% GC)
median time: 158.915 ms (20.60% GC)
mean time: 158.751 ms (20.61% GC)
maximum time: 181.726 ms (21.98% GC)
--------------
samples: 32
evals/sample: 1
This is about 30x slower than this Python code:
import numpy as np
import scipy.spatial.distance
pos = np.loadtxt('pos.csv',delimiter=',')
def collision_count(pos, d):
pdist = scipy.spatial.distance.pdist(pos)
return np.count_nonzero(pdist < d)
%timeit collision_count(pos, 2)
5.41 ms ± 63 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Any way to make it faster? And what's up with all the allocations?
The fastest I can get trivially is the follows
using Distances
using StaticArrays
using LinearAlgebra
pos = [#SVector rand(2) for _ in 1:2000]
function collision_count(pos::Vector{<:AbstractVector}, d)
count = 0
#inbounds for i in axes(pos,2)
for j in (i+1):lastindex(pos,2)
dist = sqeuclidean(pos[i], pos[j])
count += dist < d*d
end
end
return count
end
There are a variety of changes here, some stylistic, some structural. Starting with style, you may note that I don't type anything more restrictively than I need to. This has no performance benefit, since Julia is smart enough to infer types for your code.
The biggest structural change is switching from using a matrix to a vector of StaticVectors. The reason for this change is that since points are your scalar type, it makes more sense to have a vector of elements where each element is a point. The next change I made is to use a squared norm, since sqrt operations are expensive. The results speak for themselves:
#benchmark collision_count($pos, .1)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.182 ms (0.00% GC)
median time: 1.214 ms (0.00% GC)
mean time: 1.218 ms (0.00% GC)
maximum time: 2.160 ms (0.00% GC)
--------------
samples: 4101
evals/sample: 1
Note that there are n log(n) algorithms that may be faster, but this should be pretty close to optimal for a naive implementation.
Here's a solution that doesn't rely on specific knowledge about the dimensionality of the points:
(Edit: I updated the function to make it more robust with respect to indexing. Some AbstractArrays have indices that do not start at 1, so now I use axes and lastindex instead of size.)
function collision_count2(pos::AbstractMatrix, d)
count = 0
#inbounds for i in axes(pos, 2)
for j in (i+1):lastindex(pos, 2)
dist2 = sum(abs2(pos[k, i] - pos[k, j]) for k in axes(pos, 1))
count += dist2 < d^2
end
end
return count
end
Benchmarks:
julia> using BenchmarkTools
julia> #btime collision_count(pos, 0.7) setup=(pos=rand(2, 2000));
533.055 ms (13991005 allocations: 488.01 MiB)
julia> #btime collision_count2(pos, 0.7) setup=(pos=rand(2, 2000));
4.700 ms (0 allocations: 0 bytes)
The speed is actually close to the SVector solution. On the upcoming Julia version 1.5, the difference compared to the OP's code should be much smaller, since views become more efficient.
BTW: drop the type annotations, like these
count::Int64 = 0
N::Int64 = size(pos, 2)
it's just adding visual noise.
I read somewhere that looping rather than vectorized operations perform better in Julia. Yet I am stuck with an R-like indexing of arrays/dataframes, which I do not know how it is implemented. I also do not see a direct solution for improving it when my arrays are large.
arr = zeros(Int64, (10,2))
out = zeros(Int64, (10,2))
arr[:,1] = [1 2 3 4 5 6 7 8 9 10]
arr[:,2] = [2 3 2 3 4 4 5 6 7 5]
for i in 1:10
for j in 1:2
x[i,j]=sum(arr[arr[:,2] .== i, j])
end
end
x
This is just a demonstration of the array, the arr is usually an array with almost ~100K rows.
This works perfectly fine, but I wanted to know if there is a better performing way that I can do this.
On my computer the following is >100x as fast as Chris' view version:
x = zeros(Int, 10, 2)
for j in 1:size(arr, 2), i in 1:size(arr, 1)
x[arr[i, 2], j] += arr[i, j]
end
Edit: So I wrapped everything in functions and timed it using BenchmarkTools:
Original function:
function accum0(arr)
x = similar(arr)
for i in 1:size(arr, 1)
for j in 1:size(arr, 2)
x[i,j]=sum(arr[arr[:,2] .== i, j])
end
end
return x
end
Chris' version:
function accum1(arr)
x = similar(arr)
for j in 1:size(arr, 2), i in 1:size(arr, 1)
x[i,j]=sum(view(arr,view(arr,:,2) .== i, j))
end
return x
end
My version:
function accum(a)
x = zeros(eltype(a), size(a))
for j in 1:size(a, 2), i in 1:size(a, 1)
x[a[i, 2], j] += a[i, j]
end
return x
end
Here are the timings with a much larger matrix:
julia> using BenchmarkTools
julia> const A = hcat(collect(1:2*10^4), rand(1:1000, 2*10^4));
Some warming up, and then:
julia> #benchmark accum0(A)
BenchmarkTools.Trial:
samples: 2
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 6.26 gb
allocs estimate: 1080002
minimum time: 3.64 s (14.60% GC)
median time: 3.83 s (17.94% GC)
mean time: 3.83 s (17.94% GC)
maximum time: 4.02 s (20.97% GC)
julia> #benchmark accum1(A)
BenchmarkTools.Trial:
samples: 2
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 308.98 mb
allocs estimate: 1005068
minimum time: 2.69 s (3.81% GC)
median time: 2.74 s (4.37% GC)
mean time: 2.74 s (4.37% GC)
maximum time: 2.79 s (4.90% GC)
julia> #benchmark accum(A)
BenchmarkTools.Trial:
samples: 10000
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 312.61 kb
allocs estimate: 3
minimum time: 75.60 μs (0.00% GC)
median time: 169.22 μs (0.00% GC)
mean time: 210.90 μs (21.65% GC)
maximum time: 108.19 ms (99.75% GC)
As you can see, the difference is rather dramatic (more than four orders of magnitude). There is little gain to using view, there are still lots of allocation. For slightly bigger matrices, neither accum0 nor accum1 returns at all within reasonable time.
Julia is column-major, so you want to loop down columns, not across rows.
arr = zeros(Int64, (10,2))
out = similar(arr)
x = similar(arr)
arr[:,1] = [1 2 3 4 5 6 7 8 9 10]
arr[:,2] = [2 3 2 3 4 4 5 6 7 5]
for j in 1:2, i in 1:10
x[i,j]=sum(arr[arr[:,2] .== i, j])
end
x
Notice that you stay in the same column each inner iteration, that's more performant (check the performance tips).
Lastly, note that arr[:,2] creates a copy of the 2nd column each time that is done. It would be better to make a "view", i.e. a type which doesn't copy the array, just makes a shell type which looks like a vector, but is still pointing to the same values as arr. This is done with view(arr,:,2). So you can do
arr = zeros(Int64, (10,2))
out = similar(arr)
x = similar(arr)
arr[:,1] = [1 2 3 4 5 6 7 8 9 10]
arr[:,2] = [2 3 2 3 4 4 5 6 7 5]
for j in 1:2, i in 1:10
x[i,j]=sum(view(arr,view(arr,:,2) .== i, j))
end
x
This will only be faster when the matrices are larger.
Suppose I have an array of tuples:
arr = [(1,2), (3,4), (5,6)]
With python I can do zip(*arr) == [(1, 3, 5), (2, 4, 6)]
What is the equivalent of this in julia?
As an alternative to splatting (since that's pretty slow), you could do something like:
unzip(a) = map(x->getfield.(a, x), fieldnames(eltype(a)))
This is pretty quick.
julia> using BenchmarkTools
julia> a = collect(zip(1:10000, 10000:-1:1));
julia> #benchmark unzip(a)
BenchmarkTools.Trial:
memory estimate: 156.45 KiB
allocs estimate: 6
--------------
minimum time: 25.260 μs (0.00% GC)
median time: 31.997 μs (0.00% GC)
mean time: 48.429 μs (25.03% GC)
maximum time: 36.130 ms (98.67% GC)
--------------
samples: 10000
evals/sample: 1
By comparison, I have yet to see this complete:
#time collect(zip(a...))
For larger arrays use #ivirshup's solution below.
For smaller arrays, you can use zip and splitting.
You can achieve the same thing in Julia by using the zip() function (docs here). zip() expects many tuples to work with so you have to use the splatting operator ... to supply your arguments. Also in Julia you have to use the collect() function to then transform your iterables into an array (if you want to).
Here are these functions in action:
arr = [(1,2), (3,4), (5,6)]
# wtihout splatting
collect(zip((1,2), (3,4), (5,6)))
# Output is a vector of arrays:
> ((1,3,5), (2,4,6))
# same results with splatting
collect(zip(arr...))
> ((1,3,5), (2,4,6))
julia:
use ...
for r in zip(arr...)
println(r)
end
There is also the Unzip.jl package:
julia> using Unzip
julia> unzip([(1,2), (3,4), (5,6)])
([1, 3, 5], [2, 4, 6])
which seems to work a bit faster than the selected answer:
julia> using Unzip, BenchmarkTools
julia> a = collect(zip(1:10000, 10000:-1:1));
julia> unzip_ivirshup(a) = map(x->getfield.(a, x), fieldnames(eltype(a))) ;
julia> #btime unzip_ivirshup($a);
18.439 μs (4 allocations: 156.41 KiB)
julia> #btime unzip($a); # unzip from Unzip.jl is faster
12.798 μs (4 allocations: 156.41 KiB)
julia> unzip(a) == unzip_ivirshup(a) # check output is the same
true
Following up on #ivirshup 's answer I would like to add a version that is still an iterator
unzip(a) = (getfield.(a, x) for x in fieldnames(eltype(a)))
which keeps the result unevaluated until used. It even gives a (very slight) speed improvement when comparing
#benchmark a1, b1 = unzip(a)
BenchmarkTools.Trial:
memory estimate: 156.52 KiB
allocs estimate: 8
--------------
minimum time: 33.185 μs (0.00% GC)
median time: 76.581 μs (0.00% GC)
mean time: 83.808 μs (18.35% GC)
maximum time: 7.679 ms (97.82% GC)
--------------
samples: 10000
evals/sample: 1
vs.
BenchmarkTools.Trial:
memory estimate: 156.52 KiB
allocs estimate: 8
--------------
minimum time: 33.914 μs (0.00% GC)
median time: 39.020 μs (0.00% GC)
mean time: 64.788 μs (16.52% GC)
maximum time: 7.853 ms (98.18% GC)
--------------
samples: 10000
evals/sample: 1
I will add a solution based on the following simple macro
"""
#unzip xs, ys, ... = us
will expand the assignment into the following code
xs, ys, ... = map(x -> x[1], us), map(x -> x[2], us), ...
"""
macro unzip(args)
args.head != :(=) && error("Expression needs to be of form `xs, ys, ... = us`")
lhs, rhs = args.args
items = isa(lhs, Symbol) ? [lhs] : lhs.args
rhs_items = [:(map(x -> x[$i], $rhs)) for i in 1:length(items)]
rhs_expand = Expr(:tuple, rhs_items...)
esc(Expr(:(=), lhs, rhs_expand))
end
Since it's just a syntactic expansion, there shouldn't be any performance or type instability issue. Compare to other solutions based on fieldnames, this has the advantage of also working when the array element type is abstract. For example, while
julia> unzip_get_field(a) = map(x->getfield.(a, x), fieldnames(eltype(a)));
julia> unzip_get_field(Any[("a", 3), ("b", 4)])
ERROR: ArgumentError: type does not have a definite number of fields
the macro version still works:
julia> #unzip xs, ys = Any[("a", 3), ("b",4)]
(["a", "b"], [3, 4])
I am looking for an efficient way to compute the derivatives of a multidimensional array in Julia. To be precise, I would like to have an equivalent of numpy.gradient in Julia. However, the Julia function diff :
works only for 2-dimensional arrays
reduces the size of the array by one along the differentiated dimension
It is straightforward to extend the definition of diff of Julia so it can work on 3-dimensional arrays, e.g. with
function diff3D(A::Array, dim::Integer)
if dim == 1
[A[i+1,j,k] - A[i,j,k] for i=1:size(A,1)-1, j=1:size(A,2), k=1:size(A,3)]
elseif dim == 2
[A[i,j+1,k] - A[i,j,k] for i=1:size(A,1), j=1:size(A,2)-1, k=1:size(A,3)]
elseif dim == 3
[A[i,j,k+1] - A[i,j,k] for i=1:size(A,1), j=1:size(A,2), k=1:size(A,3)-1]
else
throw(ArgumentError("dimension dim must be 1, 2, or 3 got $dim"))
end
end
which would work with e.g.
a = [i*j*k for i in 1:10, j in 1:10, k in 1:20]
However, the extension to an arbitrary dimension is not possible, and the boundary are not taken into account so the gradient can have the same dimension as the original array.
I have some ideas to implement an analogue of numpy's gradient in Julia, but I fear they would be extremely slow and ugly, hence my questions : is there a canonical way to do this in Julia that I missed ? And if there is none, what would be optimal ?
Thanks.
I'm not too familiar with diff, but from what I understand about what its doing I've made a n-dimensional implementation, that uses Julia features like parametric types and splatting:
function mydiff{T,N}(A::Array{T,N}, dim::Int)
#assert dim <= N
idxs_1 = [1:size(A,i) for i in 1:N]
idxs_2 = copy(idxs_1)
idxs_1[dim] = 1:(size(A,dim)-1)
idxs_2[dim] = 2:size(A,dim)
return A[idxs_2...] - A[idxs_1...]
end
with some sanity checks:
A = rand(3,3)
#assert diff(A,1) == mydiff(A,1) # Base diff vs my impl.
#assert diff(A,2) == mydiff(A,2) # Base diff vs my impl.
A = rand(3,3,3)
#assert diff3D(A,3) == mydiff(A,3) # Your impl. vs my impl.
Note that there are more magical ways to do this, like using code generation to make specialized methods up to a finite dimension, but I think thats probably not needed to get good-enough performance.
Even simpler way to do it:
mydiff(A::AbstractArray,dim) = mapslices(diff, A, dim)
Not sure how this would compare in terms of speed though.
Edit: Maybe slightly slower, but this is a more general solution to extending functions to higher-order arrays:
julia> using BenchmarkTools
julia> function mydiff{T,N}(A::Array{T,N}, dim::Int)
#assert dim <= N
idxs_1 = [1:size(A,i) for i in 1:N]
idxs_2 = copy(idxs_1)
idxs_1[dim] = 1:(size(A,dim)-1)
idxs_2[dim] = 2:size(A,dim)
return A[idxs_2...] - A[idxs_1...]
end
mydiff (generic function with 1 method)
julia> X = randn(500,500,500);
julia> #benchmark mydiff($X,3)
BenchmarkTools.Trial:
samples: 3
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 2.79 gb
allocs estimate: 22
minimum time: 2.05 s (15.64% GC)
median time: 2.15 s (14.62% GC)
mean time: 2.16 s (11.05% GC)
maximum time: 2.29 s (3.61% GC)
julia> #benchmark mapslices(diff,$X,3)
BenchmarkTools.Trial:
samples: 2
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 1.99 gb
allocs estimate: 3750056
minimum time: 2.52 s (7.90% GC)
median time: 2.61 s (9.17% GC)
mean time: 2.61 s (9.17% GC)
maximum time: 2.70 s (10.37% GC)