Unzip an array of tuples in julia - julia

Suppose I have an array of tuples:
arr = [(1,2), (3,4), (5,6)]
With python I can do zip(*arr) == [(1, 3, 5), (2, 4, 6)]
What is the equivalent of this in julia?

As an alternative to splatting (since that's pretty slow), you could do something like:
unzip(a) = map(x->getfield.(a, x), fieldnames(eltype(a)))
This is pretty quick.
julia> using BenchmarkTools
julia> a = collect(zip(1:10000, 10000:-1:1));
julia> #benchmark unzip(a)
BenchmarkTools.Trial:
memory estimate: 156.45 KiB
allocs estimate: 6
--------------
minimum time: 25.260 μs (0.00% GC)
median time: 31.997 μs (0.00% GC)
mean time: 48.429 μs (25.03% GC)
maximum time: 36.130 ms (98.67% GC)
--------------
samples: 10000
evals/sample: 1
By comparison, I have yet to see this complete:
#time collect(zip(a...))

For larger arrays use #ivirshup's solution below.
For smaller arrays, you can use zip and splitting.
You can achieve the same thing in Julia by using the zip() function (docs here). zip() expects many tuples to work with so you have to use the splatting operator ... to supply your arguments. Also in Julia you have to use the collect() function to then transform your iterables into an array (if you want to).
Here are these functions in action:
arr = [(1,2), (3,4), (5,6)]
# wtihout splatting
collect(zip((1,2), (3,4), (5,6)))
# Output is a vector of arrays:
> ((1,3,5), (2,4,6))
# same results with splatting
collect(zip(arr...))
> ((1,3,5), (2,4,6))

julia:
use ...
for r in zip(arr...)
println(r)
end

There is also the Unzip.jl package:
julia> using Unzip
julia> unzip([(1,2), (3,4), (5,6)])
([1, 3, 5], [2, 4, 6])
which seems to work a bit faster than the selected answer:
julia> using Unzip, BenchmarkTools
julia> a = collect(zip(1:10000, 10000:-1:1));
julia> unzip_ivirshup(a) = map(x->getfield.(a, x), fieldnames(eltype(a))) ;
julia> #btime unzip_ivirshup($a);
18.439 μs (4 allocations: 156.41 KiB)
julia> #btime unzip($a); # unzip from Unzip.jl is faster
12.798 μs (4 allocations: 156.41 KiB)
julia> unzip(a) == unzip_ivirshup(a) # check output is the same
true

Following up on #ivirshup 's answer I would like to add a version that is still an iterator
unzip(a) = (getfield.(a, x) for x in fieldnames(eltype(a)))
which keeps the result unevaluated until used. It even gives a (very slight) speed improvement when comparing
#benchmark a1, b1 = unzip(a)
BenchmarkTools.Trial:
memory estimate: 156.52 KiB
allocs estimate: 8
--------------
minimum time: 33.185 μs (0.00% GC)
median time: 76.581 μs (0.00% GC)
mean time: 83.808 μs (18.35% GC)
maximum time: 7.679 ms (97.82% GC)
--------------
samples: 10000
evals/sample: 1
vs.
BenchmarkTools.Trial:
memory estimate: 156.52 KiB
allocs estimate: 8
--------------
minimum time: 33.914 μs (0.00% GC)
median time: 39.020 μs (0.00% GC)
mean time: 64.788 μs (16.52% GC)
maximum time: 7.853 ms (98.18% GC)
--------------
samples: 10000
evals/sample: 1

I will add a solution based on the following simple macro
"""
#unzip xs, ys, ... = us
will expand the assignment into the following code
xs, ys, ... = map(x -> x[1], us), map(x -> x[2], us), ...
"""
macro unzip(args)
args.head != :(=) && error("Expression needs to be of form `xs, ys, ... = us`")
lhs, rhs = args.args
items = isa(lhs, Symbol) ? [lhs] : lhs.args
rhs_items = [:(map(x -> x[$i], $rhs)) for i in 1:length(items)]
rhs_expand = Expr(:tuple, rhs_items...)
esc(Expr(:(=), lhs, rhs_expand))
end
Since it's just a syntactic expansion, there shouldn't be any performance or type instability issue. Compare to other solutions based on fieldnames, this has the advantage of also working when the array element type is abstract. For example, while
julia> unzip_get_field(a) = map(x->getfield.(a, x), fieldnames(eltype(a)));
julia> unzip_get_field(Any[("a", 3), ("b", 4)])
ERROR: ArgumentError: type does not have a definite number of fields
the macro version still works:
julia> #unzip xs, ys = Any[("a", 3), ("b",4)]
(["a", "b"], [3, 4])

Related

using broadcasting Julia for converting vector of vectors to matrices

I am a julia newbie, and have a baby assignment to write a function which converts a vector of vectors to a matrix. This was pretty easy to do by iterating over the elements.
However, I have read that broadcasting tends to be more efficient. But I wasn't sure how to do it here, because a .= operation cannot work, as it would read the vector as a 1 by n array, and thus be trying to broadcast on two arrays of different length.
Is there a way to broadcast?
My code is below
function vecvec_to_matrix(vecvec)
dim1 = length(vecvec)
dim2 = length(vecvec[1])
my_array = zeros(Int64, dim1, dim2)
for i in 1:dim1
for j in 1:dim2
my_array[i,j] = vecvec[i][j]
end
end
return my_array
end
If your vectors are short and of fixed size (e.g., a list of points in 3 dimensions), then you should strongly consider using the StaticArrays package and then calling reinterpret. Demo:
julia> using StaticArrays
julia> A = rand(3, 8)
3×8 Array{Float64,2}:
0.153872 0.361708 0.39703 0.405625 0.0881371 0.390133 0.185328 0.585539
0.467841 0.846298 0.884588 0.798848 0.14218 0.156283 0.232487 0.22629
0.390566 0.897737 0.569882 0.491681 0.499163 0.377012 0.140902 0.513979
julia> reinterpret(SVector{3,Float64}, A)
1×8 reinterpret(SArray{Tuple{3},Float64,1,3}, ::Array{Float64,2}):
[0.153872, 0.467841, 0.390566] [0.361708, 0.846298, 0.897737] [0.39703, 0.884588, 0.569882] … [0.390133, 0.156283, 0.377012] [0.185328, 0.232487, 0.140902] [0.585539, 0.22629, 0.513979]
julia> B = vec(copy(ans))
8-element Array{SArray{Tuple{3},Float64,1,3},1}:
[0.1538721224514592, 0.467840786943454, 0.39056612358281706]
[0.3617079493961777, 0.8462982350893753, 0.8977366743282564]
[0.3970299970547111, 0.884587972864584, 0.5698823030478959]
[0.40562472747685074, 0.7988484677138279, 0.49168126614394647]
[0.08813706434793178, 0.14218012559727544, 0.499163319341982]
[0.3901332827772166, 0.15628284837250006, 0.3770117394226711]
[0.18532803309577517, 0.23248748941275688, 0.14090166962667428]
[0.5855387782654986, 0.22628968661452897, 0.5139790762185006]
julia> reshape(reinterpret(Float64, B), (3, 8))
3×8 reshape(reinterpret(Float64, ::Array{SArray{Tuple{3},Float64,1,3},1}), 3, 8) with eltype Float64:
0.153872 0.361708 0.39703 0.405625 0.0881371 0.390133 0.185328 0.585539
0.467841 0.846298 0.884588 0.798848 0.14218 0.156283 0.232487 0.22629
0.390566 0.897737 0.569882 0.491681 0.499163 0.377012 0.140902 0.513979
Your way is intuitive and fast already. You can improve performance with some #inbounds and that's about it. vcat is also fast. I think broadcasting is not necessary in your case. You
Here are some benchmarks of the various ways I can think of
function vecvec_to_matrix(vecvec)
dim1 = length(vecvec)
dim2 = length(vecvec[1])
my_array = zeros(Int64, dim1, dim2)
for i in 1:dim1
for j in 1:dim2
my_array[i,j] = vecvec[i][j]
end
end
return my_array
end
function vecvec_to_matrix2(vecvec::AbstractVector{T}) where T <: AbstractVector
dim1 = length(vecvec)
dim2 = length(vecvec[1])
my_array = Array{eltype(vecvec[1]), 2}(undef, dim1, dim2)
#inbounds #fastmath for i in 1:dim1, j in 1:dim2
my_array[i,j] = vecvec[i][j]
end
return my_array
end
function vecvec_to_matrix3(vecvec::AbstractVector{T}) where T <: AbstractVector
dim1 = length(vecvec)
dim2 = length(vecvec[1])
my_array = Array{eltype(vecvec[1]), 2}(undef, dim1, dim2)
Threads.#threads for i in 1:dim1
for j in 1:dim2
my_array[i,j] = vecvec[i][j]
end
end
return my_array
end
using Tullio
function using_tullio(vecvec::AbstractVector{T}) where T <: AbstractVector
dim1 = length(vecvec)
dim2 = length(vecvec[1])
my_array = Array{eltype(vecvec[1]), 2}(undef, dim1, dim2)
#tullio my_array[i, j] = vecvec[i][j]
my_array
end
function using_vcat(vecvec::AbstractVector{T}) where T <: AbstractVector
vcat(vecvec...)
end
using BenchmarkTools
vecvec =[rand(Int, 100) for i in 1:100];
#benchmark vecvec_to_matrix(vecvec)
#benchmark vecvec_to_matrix2(vecvec)
#benchmark vecvec_to_matrix3(vecvec)
#benchmark using_tullio(vecvec)
#benchmark using_vcat(vecvec)
with results
julia> #benchmark vecvec_to_matrix(vecvec)
BenchmarkTools.Trial:
memory estimate: 78.20 KiB
allocs estimate: 2
--------------
minimum time: 12.701 μs (0.00% GC)
median time: 15.001 μs (0.00% GC)
mean time: 24.465 μs (10.98% GC)
maximum time: 3.884 ms (98.30% GC)
--------------
samples: 10000
evals/sample: 1
julia> #benchmark vecvec_to_matrix2(vecvec)
BenchmarkTools.Trial:
memory estimate: 78.20 KiB
allocs estimate: 2
--------------
minimum time: 8.600 μs (0.00% GC)
median time: 9.800 μs (0.00% GC)
mean time: 19.532 μs (12.37% GC)
maximum time: 3.834 ms (98.82% GC)
--------------
samples: 10000
evals/sample: 1
julia> #benchmark vecvec_to_matrix3(vecvec)
BenchmarkTools.Trial:
memory estimate: 83.28 KiB
allocs estimate: 32
--------------
minimum time: 8.399 μs (0.00% GC)
median time: 14.600 μs (0.00% GC)
mean time: 28.178 μs (11.82% GC)
maximum time: 8.269 ms (0.00% GC)
--------------
samples: 10000
evals/sample: 1
julia> #benchmark using_tullio(vecvec)
BenchmarkTools.Trial:
memory estimate: 78.20 KiB
allocs estimate: 2
--------------
minimum time: 8.299 μs (0.00% GC)
median time: 10.101 μs (0.00% GC)
mean time: 19.476 μs (12.15% GC)
maximum time: 3.661 ms (98.74% GC)
--------------
samples: 10000
evals/sample: 1
julia> #benchmark using_vcat(vecvec)
BenchmarkTools.Trial:
memory estimate: 78.20 KiB
allocs estimate: 2
--------------
minimum time: 5.540 μs (0.00% GC)
median time: 7.480 μs (0.00% GC)
mean time: 16.236 μs (15.30% GC)
maximum time: 876.400 μs (97.85% GC)
--------------
samples: 10000
evals/sample: 5

What went wrong with my Julia loops/devectorized code

I'm using Julia 1.0. Please consider the following code:
using LinearAlgebra
using Distributions
## create random data
const data = rand(Uniform(-1,2), 100000, 2)
function test_function_1(data)
theta = [1 2]
coefs = theta * data[:,1:2]'
res = coefs' .* data[:,1:2]
return sum(res, dims = 1)'
end
function test_function_2(data)
theta = [1 2]
sum_all = zeros(2)
for i = 1:size(data)[1]
sum_all .= sum_all + (theta * data[i,1:2])[1] * data[i,1:2]
end
return sum_all
end
After running it for the first time, I timed it
julia> #time test_function_1(data)
0.006292 seconds (16 allocations: 5.341 MiB)
2×1 Adjoint{Float64,Array{Float64,2}}:
150958.47189289227
225224.0374366073
julia> #time test_function_2(data)
0.038112 seconds (500.00 k allocations: 45.777 MiB, 15.61% gc time)
2-element Array{Float64,1}:
150958.4718928927
225224.03743660534
test_function_1 is significantly superior, both in allocations and speed, but test_function_1 is not devectorized. I would expect test_function_2 to perform better. Note that both functions do the same.
I have a hunch that it's because in test_function_2, I use sum_all .= sum_all + ..., but I'm not sure why that's a problem. Can I get a hint?
So first let me comment how I would write your function if I wanted to use a loop:
function test_function_3(data)
theta = (1, 2)
sum_all = zeros(2)
for row in eachrow(data)
sum_all .+= dot(theta, row) .* row
end
return sum_all
end
Next, here is a benchmark comparison of the three options:
julia> #benchmark test_function_1($data)
BenchmarkTools.Trial:
memory estimate: 5.34 MiB
allocs estimate: 16
--------------
minimum time: 1.953 ms (0.00% GC)
median time: 1.986 ms (0.00% GC)
mean time: 2.122 ms (2.29% GC)
maximum time: 4.347 ms (8.00% GC)
--------------
samples: 2356
evals/sample: 1
julia> #benchmark test_function_2($data)
BenchmarkTools.Trial:
memory estimate: 45.78 MiB
allocs estimate: 500002
--------------
minimum time: 16.316 ms (7.44% GC)
median time: 16.597 ms (7.63% GC)
mean time: 16.845 ms (8.01% GC)
maximum time: 34.050 ms (4.45% GC)
--------------
samples: 297
evals/sample: 1
julia> #benchmark test_function_3($data)
BenchmarkTools.Trial:
memory estimate: 96 bytes
allocs estimate: 1
--------------
minimum time: 777.204 μs (0.00% GC)
median time: 791.458 μs (0.00% GC)
mean time: 799.505 μs (0.00% GC)
maximum time: 1.262 ms (0.00% GC)
--------------
samples: 6253
evals/sample: 1
Next you can go a bit faster if you explicitly implement the dot in the loop:
julia> function test_function_4(data)
theta = (1, 2)
sum_all = zeros(2)
for row in eachrow(data)
#inbounds sum_all .+= (theta[1]*row[1]+theta[2]*row[2]) .* row
end
return sum_all
end
test_function_4 (generic function with 1 method)
julia> #benchmark test_function_4($data)
BenchmarkTools.Trial:
memory estimate: 96 bytes
allocs estimate: 1
--------------
minimum time: 502.367 μs (0.00% GC)
median time: 502.547 μs (0.00% GC)
mean time: 505.446 μs (0.00% GC)
maximum time: 806.631 μs (0.00% GC)
--------------
samples: 9888
evals/sample: 1
To understand the differences let us have a look at this line of your code:
sum_all .= sum_all + (theta * data[i,1:2])[1] * data[i,1:2]
Let us count the memory allocations you do in this expression:
sum_all .=
sum_all
+ # allocation of a new vector as a result of addition
(theta
* # allocation of a new vector as a result of multiplication
data[i,1:2] # allocation of a new vector via getindex
)[1]
* # allocation of a new vector as a result of multiplication
data[i,1:2] # allocation of a new vector via getindex
So you can see that in each iteration of the loop you allocate five times.
Allocations are expensive. And you can see this in the benchmarks that you have 5000002 allocations in the process:
1 allocation of sum_all
1 allocation of theta
500000 allocations in the loop (5 * 100000)
Additionally you perform indexing like data[i,1:2] which performs
bounds checking, which is also a small cost (but marginal in comparison to allocations).
Now in function test_function_3 I use eachrow(data). This time I also get rows of data matrix, but they are returned as views (not new matrices) so no allocation happens inside the loop. Next I use a dot function again to avoid allocation that was earlier caused by a matrix multiplication (I have changed theta to a Tuple from a Matrix as then dot is a bit faster, but this secondary). Finally I write um_all .+= dot(theta, row) .* row and in this case all operations are broadcasted, so Julia can do broadcast fusion (again - no allocations happen).
In test_function_4 I just replace dot by unrolled loop as we know we have two elements to calculate the dot product for. Actually if you fully unroll everything and use #simd it gets even faster:
julia> function test_function_5(data)
theta = (1, 2)
s1 = 0.0
s2 = 0.0
#inbounds #simd for i in axes(data, 1)
r1 = data[i, 1]
r2 = data[i, 2]
mul = theta[1]*r1 + theta[2]*r2
s1 += mul * r1
s2 += mul * r2
end
return [s1, s2]
end
test_function_5 (generic function with 1 method)
julia> #benchmark test_function_5($data)
BenchmarkTools.Trial:
memory estimate: 96 bytes
allocs estimate: 1
--------------
minimum time: 22.721 μs (0.00% GC)
median time: 23.146 μs (0.00% GC)
mean time: 24.306 μs (0.00% GC)
maximum time: 100.109 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
So you can see that this way you are around 100x faster than with test_function_1. Still already test_function_3 is relatively fast and it is fully generic so probably normally I would write something like test_function_3 unless I really needed to be super fast and knew that the dimensions of my data are fixed and small.

Julia: How to count efficiently the number of missings in a `Vector{Union{T, Missing}}`

Consider
x = rand([missing, rand(Int, 100)...], 1_000_000)
which yields typeof(x) = Array{Union{Missing, Int64},1}.
What's the most efficient way to count the number of missings in x?
The cleanest way is probably just
count(ismissing, x)
Simple, easy to remember, and fast
Since you're asking for the "most efficient" way, let me give some benchmark results. It is slightly faster than #xiaodai's answer, and as fast as a simple loop implementation:
julia> #btime count($ismissing,$x);
278.499 μs (0 allocations: 0 bytes)
julia> #btime mapreduce($ismissing, $+, $x);
293.901 μs (0 allocations: 0 bytes)
julia> #btime count_missing($x)
278.499 μs (0 allocations: 0 bytes)
where
julia> function count_missing(x)
c = 0
#inbounds for i in eachindex(x)
if ismissing(x[i])
c += 1
end
end
return c
end
Abstraction for no cost, just the way you'd want it to be.
If you know that your number of missing is less than 4 billion elements (or less than 65k elements) you can be several times faster than #crstnbr answer with the following code:
function count_missing(x, T)
c = zero(T)
for i in 1:length(x)
c += #inbounds ismissing(x[i])
end
return Int(c) #we want to have stable result type
# this could be further combined with a barrier function
# that could check the size of `x` at the runtime
end
Now the benchmarks.
This is the original time on my laptop:
julia> #btime count_missing($x, Int)
227.799 μs (0 allocations: 0 bytes)
9971
Slash the time by half if you know there is less than 4 billion matching elements:
julia> #btime count_missing($x, UInt32)
113.899 μs (0 allocations: 0 bytes)
9971
Slash the time by 8x if you know there is less than 65k matching elements:
julia> #btime count_missing($x, UInt16)
29.200 μs (0 allocations: 0 bytes)
9971
This is an unsafe answer and is not guaranteed to work in future if Julia changes the memory layout but it's fun
x = Vector{Union{Missing, Float64}}(missing, 100_000_000)
x[rand(1:100_000_000, 90_000_000)] .= rand.()
using BenchmarkTools
#benchmark count($ismissing, $x)
# BenchmarkTools.Trial:
# memory estimate: 0 bytes
# allocs estimate: 0
# --------------
# minimum time: 48.468 ms (0.00% GC)
# median time: 51.755 ms (0.00% GC)
# mean time: 66.863 ms (0.00% GC)
# maximum time: 91.449 ms (0.00% GC)
# --------------
# samples: 76
# evals/sample: 1
function unsafe_count_missing(x::Vector{Union{Missing, T}}) where T
#assert isbitstype(T)
l = length(x)
GC.#preserve x begin
y = unsafe_wrap(Vector{UInt8}, Ptr{UInt8}(pointer(x) + sizeof(T)*l), l)
res = reduce(-, y; init = l)
end
res
end
#time count(ismissing, x) == unsafe_count_missing(x)
#benchmark faster_count_missing($x)
# BenchmarkTools.Trial:
# memory estimate: 80 bytes
# allocs estimate: 1
# --------------
# minimum time: 9.190 ms (0.00% GC)
# median time: 9.718 ms (0.00% GC)
# mean time: 9.845 ms (0.00% GC)
# maximum time: 15.691 ms (0.00% GC)
# --------------
# samples: 508
# evals/sample: 1

How to repeat individual characters in strings in Julia

This question shows how to repeat individual characters in strings in Python.
>>> s = '123abc'
>>> n = 3
>>> ''.join([c*n for c in s])
'111222333aaabbbccc'
How would you do that in Julia?
EDIT
As a newcomer to Julia I am amazed at what the language has to offer.
For example, I would have thought that the Python code above is about as simple as the code could get in any language. However, as shown by my answer below, the Julia equivalent code join([c^n for c in s]) is arguably simpler, and may be reaching the optimum of simplicity for any language.
On the other hand, #niczky12 has shown that with the addition of the ellipsis operator to the string function, the speed can be substantially increased over what the somewhat simpler join function achieves.
In one case Julia shines for simplicity. In the other case, Julia shines for speed.
To a Python programmer the first case should be almost immediately readable when they notice that c^n is just c*n in Python. When they see the speed increase using the ... ellipsis operator, the extra complexity might not deter them from learning Julia. Readers might be starting to think I hope many Python programmers will take Julia seriously. They would not be wrong.
Thanks to #rickhg12hs for suggesting bench-marking. I have learned a lot.
In addition to the answers above, I found that the string function runs even faster. Here are my benchmarks:
julia> n = 2;
julia> s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
julia> string((c^n for c in s)...) # proof that it works
"AABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUUVVWWXXYYZZ"
julia> n = 26000;
julia> #benchmark join(c^n for c in s)
BenchmarkTools.Trial:
memory estimate: 1.44 MiB
allocs estimate: 36
--------------
minimum time: 390.616 μs (0.00% GC)
median time: 425.861 μs (0.00% GC)
mean time: 484.638 μs (6.54% GC)
maximum time: 45.006 ms (98.99% GC)
--------------
samples: 10000
evals/sample: 1
julia> #benchmark string((c^n for c in s)...)
BenchmarkTools.Trial:
memory estimate: 1.29 MiB
allocs estimate: 31
--------------
minimum time: 77.480 μs (0.00% GC)
median time: 101.667 μs (0.00% GC)
mean time: 126.455 μs (0.00% GC)
maximum time: 832.524 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
As you can see it's about 3 times faster than the join solution proposed by #Julia Learner.
I tested the above on 0.7 but had no deprecation warnings so I'm assuming it works fine on 1.0 too. Even TIO says so.
You can do it with either a Julia comprehension or a generator.
julia> VERSION
v"1.0.0"
julia> s = "123abc"
"123abc"
# n is number of times to repeat each character.
julia> n = 3
3
# Using a Julia comprehension with [...]
julia> join([c^n for c in s])
"111222333aaabbbccc"
# Using a Julia generator without the [...]
julia> join(c^n for c in s)
"111222333aaabbbccc"
For small strings there should be little practical difference in speed.
Edit
TL;DR: In general, the generator is somewhat faster than the comprehension. However, see case 3 for the opposite. The memory estimates were very similar.
#rickhg12hs has suggested it would be nice to have benchmarks.
Using the great BenchmarkTools package, the results are below.
n = the number of times to repeat each character
s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" in each case
In each case, the comprehension median time, C, is listed first, vs the generator median time, G, second. The times were rounded as seemed appropriate and the original figures are below the numbered summaries. Smaller, of course, is better.
The memory estimates were not very different.
1. n = 26, C=3.8 vs. G=2.8 μs, G faster
julia> using BenchmarkTools
julia> n = 26;
julia> #benchmark join([c^n for c in s])
BenchmarkTools.Trial:
memory estimate: 3.55 KiB
allocs estimate: 39
--------------
minimum time: 3.688 μs (0.00% GC)
median time: 3.849 μs (0.00% GC)
mean time: 4.956 μs (16.27% GC)
maximum time: 5.211 ms (99.85% GC)
--------------
samples: 10000
evals/sample: 8
julia> #benchmark join(c^n for c in s)
BenchmarkTools.Trial:
memory estimate: 3.19 KiB
allocs estimate: 36
--------------
minimum time: 2.661 μs (0.00% GC)
median time: 2.756 μs (0.00% GC)
mean time: 3.622 μs (19.94% GC)
maximum time: 4.638 ms (99.89% GC)
--------------
samples: 10000
evals/sample: 9
2. n = 260, C=10.7 vs. G=8.1 μs, G faster
julia> n = 260;
julia> #benchmark join([c^n for c in s])
BenchmarkTools.Trial:
memory estimate: 19.23 KiB
allocs estimate: 39
--------------
minimum time: 8.125 μs (0.00% GC)
median time: 10.691 μs (0.00% GC)
mean time: 18.559 μs (35.36% GC)
maximum time: 43.930 ms (99.92% GC)
--------------
samples: 10000
evals/sample: 1
julia> #benchmark join(c^n for c in s)
BenchmarkTools.Trial:
memory estimate: 18.88 KiB
allocs estimate: 36
--------------
minimum time: 7.270 μs (0.00% GC)
median time: 8.126 μs (0.00% GC)
mean time: 10.872 μs (18.04% GC)
maximum time: 10.592 ms (99.87% GC)
--------------
samples: 10000
evals/sample: 4
3. n = 2,600, C=62.3 vs. G=63.7 μs, C faster
julia> n = 2600;
julia> #benchmark join([c^n for c in s])
BenchmarkTools.Trial:
memory estimate: 150.16 KiB
allocs estimate: 39
--------------
minimum time: 51.746 μs (0.00% GC)
median time: 63.293 μs (0.00% GC)
mean time: 77.315 μs (2.79% GC)
maximum time: 3.721 ms (96.85% GC)
--------------
samples: 10000
evals/sample: 1
julia> #benchmark join(c^n for c in s)
BenchmarkTools.Trial:
memory estimate: 149.80 KiB
allocs estimate: 36
--------------
minimum time: 47.897 μs (0.00% GC)
median time: 63.720 μs (0.00% GC)
mean time: 88.716 μs (17.58% GC)
maximum time: 42.457 ms (99.83% GC)
--------------
samples: 10000
evals/sample: 1
4. n = 26,000, C=667 vs. G=516 μs, G faster
julia> n = 26000;
julia> #benchmark join([c^n for c in s])
BenchmarkTools.Trial:
memory estimate: 1.44 MiB
allocs estimate: 39
--------------
minimum time: 457.589 μs (0.00% GC)
median time: 666.710 μs (0.00% GC)
mean time: 729.592 μs (10.91% GC)
maximum time: 42.673 ms (98.76% GC)
--------------
samples: 6659
evals/sample: 1
julia> #benchmark join(c^n for c in s)
BenchmarkTools.Trial:
memory estimate: 1.44 MiB
allocs estimate: 36
--------------
minimum time: 475.977 μs (0.00% GC)
median time: 516.176 μs (0.00% GC)
mean time: 659.001 μs (10.36% GC)
maximum time: 42.268 ms (98.41% GC)
--------------
samples: 7548
evals/sample: 1
Code tested in Version 1.0.0 (2018-08-08).
When I'am trying to write map(x -> x^3, "123abc"), I got an error.
julia> map(x -> x^3, "123abc")
ERROR: ArgumentError: map(f, s::AbstractString) requires f to return AbstractChar; try map(f, collect(s)) or a comprehension instead
So, There's another way to do that.
julia> map(x -> x^3, collect("123abc"))
6-element Array{String,1}:
"111"
"222"
"333"
"aaa"
"bbb"
"ccc"
julia> join(map(x -> x^3, collect("123abc")))
"111222333aaabbbccc"
And Maybe repeat is more convenient.
julia> repeat(collect("123abc"), inner=3)
18-element Array{Char,1}:
'1'
'1'
'1'
'2'
'2'
'2'
'3'
'3'
'3'
'a'
'a'
'a'
'b'
'b'
'b'
'c'
'c'
'c'
julia> join(repeat(collect("123abc"), inner=3))
"111222333aaabbbccc"

Fastest way to draw from a distribution many times

This:
function draw1(n)
return rand(Normal(0,1), Int(n))
end
is somewhat faster than this:
function draw2(n)
result = zeros(Float64, Int(n))
for i=1:Int(n)
result[i] = rand(Normal(0,1))
end
return result
end
Just curious why that is, and if the explicit loop way can be speeded up (I tried #inbounds and #simd and didn't get a speedup). Is it the initial allocation of zeros()? I timed that separately at about 0.25 seconds, which doesn't fully account for the difference (plus doesn't the first way pre-allocate an array under the hood?).
Example:
#time x = draw1(1e08)
1.169986 seconds (6 allocations: 762.940 MiB, 4.53% gc time)
#time y = draw2(1e08)
1.824750 seconds (6 allocations: 762.940 MiB, 3.05% gc time)
Try this implementation:
function draw3(n)
d = Normal(0,1)
result = Vector{Float64}(Int(n))
#inbounds for i=1:Int(n)
result[i] = rand(d)
end
return result
end
What is the difference:
uses #inbounds
creates Normal(0,1) only once
performs faster initialization of result
When I test it it has essentially the same performance as draw1 (I have not tested it on 10e8 vector size though (not enough memory) - if you can run such #benchmark it would be nice):
julia> using BenchmarkTools
julia> #benchmark draw1(10e5)
BenchmarkTools.Trial:
memory estimate: 7.63 MiB
allocs estimate: 2
--------------
minimum time: 12.296 ms (0.00% GC)
median time: 13.012 ms (0.00% GC)
mean time: 14.510 ms (8.49% GC)
maximum time: 84.253 ms (81.30% GC)
--------------
samples: 345
evals/sample: 1
julia> #benchmark draw2(10e5)
BenchmarkTools.Trial:
memory estimate: 7.63 MiB
allocs estimate: 2
--------------
minimum time: 20.374 ms (0.00% GC)
median time: 21.622 ms (0.00% GC)
mean time: 22.787 ms (5.95% GC)
maximum time: 92.265 ms (77.18% GC)
--------------
samples: 220
evals/sample: 1
julia> #benchmark draw3(10e5)
BenchmarkTools.Trial:
memory estimate: 7.63 MiB
allocs estimate: 2
--------------
minimum time: 12.415 ms (0.00% GC)
median time: 12.956 ms (0.00% GC)
mean time: 14.456 ms (8.67% GC)
maximum time: 84.342 ms (83.74% GC)
--------------
samples: 346
evals/sample: 1
EDIT: actually defining a loop in a separate function (exactly as rand does) gives a bit better performance of draw4 than draw3:
function g!(d, v)
#inbounds for i=1:length(v)
v[i] = rand(d)
end
end
function draw4(n)
result = Vector{Float64}(Int(n))
g!(Normal(0,1), result)
return result
end
A shorter answer is that the built-in implementation is fastest, which is fortunately often the case.
Instead of draw4 above, you could just use the inbuilt
function draw5(n)
result = Vector{Float64}(Int(n))
rand!(Normal(0,1), result)
end
Filling an existing vector with something like rand! will always be inbounds.

Resources