I have a function that returns an array. I'd like to map the function to a vector of inputs, and the the output to be a simple concatenation of all the arrays. The function is:
function log_it(r, bzero = 0.25, N = 400)
main = rand(Float16, (N+150));
main[1] = bzero;
for i in 2:N+150
main[i] = *(r, main[i-1], (1-main[i-1]))
end;
y = unique(main[(N+1):(N+150)]);
r_vec = repeat([r], size(y)[1]);
hcat(r_vec, y)
end;
and I can map it fine:
map(log_it, 2.4:0.001:2.405)
but the result is gross:
[2.4 0.58349609375]
[2.401 0.58349609375]
[2.402 0.583984375; 2.402 0.58349609375]
[2.403 0.583984375]
[2.404 0.583984375]
[2.405 0.58447265625; 2.405 0.583984375]
NB, the length of the nested arrays is unbounded - I'm looking for a solution that doesn't depend on knowing the the length of nested arrays in advance.
What I want is something like this:
2.4 0.583496
2.401 0.583496
2.402 0.583984
2.402 0.583496
2.403 0.583984
2.404 0.583984
2.405 0.584473
2.405 0.583984
Which I made using a for loop:
results = Array{Float64, 2}(undef, 0, 2)
for i in 2.4:0.001:2.405
results = cat(results, log_it(i), dims = 1)
end
results
The code works fine, but the for loop takes about four times as long. I also feel like map is the right way to do it and I'm just missing something - either in executing map in such a way that it returns a nice vector of arrays, or in some mutation of the array that will "unnest". I've tried looking through functions like flatten and collect but can't find anything.
Many thanks in advance!
Are you sure you're benchmarking this correctly? Especially with very fast operations benchmarking can sometimes be tricky. As a starting point, I would recommend to ensure you always wrap any code you want to benchmark into a function, and use the BenchmarkTools package to get reliable timings.
There generally shouldn't be a performance penalty for writing loops in Julia, so a 3x increase in runtime for a loop compared to map sounds suspicious.
Here's what I get:
julia> using BenchmarkTools
julia> #btime map(log_it, 2.4:0.001:2.405)
121.426 μs (73 allocations: 14.50 KiB)
julia> function with_loop()
results = Array{Float64, 2}(undef, 0, 2)
for i in 2.4:0.001:2.405
results = cat(results, log_it(i), dims = 1)
end
results
end
julia> #btime with_loop()
173.492 μs (295 allocations: 23.67 KiB)
So the loop is about 50% slower, but that's because you're allocating more.
When you're using map there's usually a more Julia way of expressing what you're doing using broadcasting. This works for any user defined function:
julia> #btime log_it.(2.4:0.001:2.405)
121.434 μs (73 allocations: 14.50 KiB)
Is equivalent to your map expression. What you're looking for I think is just a way to stack all the resulting vectors - you can use vcat and splatting for that:
julia> #btime vcat(log_it.(2.4:0.001:2.405)...)
122.837 μs (77 allocations: 14.84 KiB)
and just to confirm:
julia> vcat(log_it.(2.4:0.001:2.405)...) == with_loop()
true
So using broadcasting and concatenating gives the same result as your loop at the speed and memory cost of your map solution.
Related
I am trying to do a simple function to check the differences between factorial and Stirling's approximation:
using DataFrames
n = 24
df_res = DataFrame(i = BigInt[],
f = BigInt[],
st = BigInt[])
for i in 1:n
fac = factorial(big(i))
sterling = i^i*exp(-i)*sqrt(i)*sqrt(2*pi)
res = DataFrame(i = [i],
f = [fac],
st = [sterling])
df_res = [df_res;res]
end
first(df_res, 24)
The result for sterling when i= 16 and i= 24 is 0!. So, I checked power for both values and the result is 0:
julia> 16^16
0
julia> 24^24
0
I did the same code in R, and there are no issues. What am I doing wrong or what I don't know about Julia and I probably should?
It appears that Julia integers are either 32-bit or 64-bit, depending on your system, according to the Julia documentation for Integers and Floating-Point Numbers. Your exponentiation is overflowing your values, even if they're 64 bits.
Julia looks like it supports Arbitrary Precision Arithmetic, which you'll need to store the large resultant values.
According to the Overflow Section, writing big(n) makes n arbitrary precision.
While the question has been answered at the another post one more thing is worth saying.
Julia is one of very few languages that allows you to define your very own primitive types - so you can be still with fast fixed precision numbers yet handle huge values. There is a package BitIntegers for that:
BitIntegers.#define_integers 512
Now you can do:
julia> Int512(2)^500
3273390607896141870013189696827599152216642046043064789483291368096133796404674554883270092325904157150886684127560071009217256545885393053328527589376
Usually you will get better performance for for even big fixed point arithmetic numbers. For an example:
julia> #btime Int512(2)^500;
174.687 ns (0 allocations: 0 bytes)
julia> #btime big(2)^500;
259.643 ns (9 allocations: 248 bytes)
There is a simple solution to your problem that does not involve using BigInt or any specialized number types, and which is much faster. Simply tweak your mathematical expression slightly.
foo(i) = i^i*exp(-i)*sqrt(i)*sqrt(2*pi) # this is your function
bar(i) = (i / exp(1))^i * sqrt(i) * sqrt(2*pi) # here's a better way
OK, let's test it:
1.7.2> foo(16)
0.0 # oops. not what we wanted
1.7.2> foo(big(16)) # works
2.081411441522312838373895982304611417026205959453251524254923609974529540404514e+13
1.7.2> bar(16) # also works
2.0814114415223137e13
Let's try timing it:
1.7.2> using BenchmarkTools
1.7.2> #btime foo(n) setup=(n=16)
18.136 ns (0 allocations: 0 bytes)
0.0
1.7.2> #btime foo(n) setup=(n=big(16))
4.457 μs (25 allocations: 1.00 KiB) # horribly slow
2.081411441522312838373895982304611417026205959453251524254923609974529540404514e+13
1.7.2> #btime bar(n) setup=(n=16)
99.682 ns (0 allocations: 0 bytes) # pretty fast
2.0814114415223137e13
Edit: It seems like
baz(i) = float(i)^i * exp(-i) * sqrt(i) * sqrt(2*pi)
might be an even better solution, since the numerical values are closer to the original.
I have a concrete Array and want to efficiently construct a similar array of the same dimensions filled with ones. What would be the recommended approach?
Here's a random array to work with:
julia> A = rand(0:1, 10, 5)
10×5 Matrix{Int64}:
or A = rand(0:1., 10, 5) (with a dot on 0. and/or 1.) for a random matrix of floats.
Two approaches are very natural. I could do this:
julia> zero(A) .+ 1
5×10 Matrix{Int64}:
Or I could do it this way:
julia> repeat(ones(size(A)[2])', outer = size(A)[1])
5×10 Matrix{Float64}:
The first approach is more elegant. The second approach feels more clunky and prone to error (accidentally exchanging [1] and [2]), but at the same time it doesn't involve the addition operation and so possibly involves fewer allocations (or maybe not because the compiler is super smart Edit: quick benchmark below suggests the compiler is super smart).
And of course there may be another, better approach.
using BenchmarkTools
A = rand(0:1, 1000, 1000)
#btime zero(A) .+ 1
## 1.609 ms (6 allocations: 15.26 MiB)
#btime repeat(ones(size(A)[2])', outer = size(A)[1])
## 3.032 ms (10 allocations: 7.64 MiB)
Edit 2: Follow-up onBogumił's answer
The following method for a unit-array J, defined for convenience, is efficient:
function J(A::AbstractArray{T,N}) where {T,N}
ones(T, size(A))
end
J(A)
#btime J(A)
## 789.929 μs (2 allocations: 7.63 MiB)
What about:
ones(Int, size(A))
or
fill(1, size(A))
I am struggling with Julia every time that I need to collect data in an array "outside" functions.
If I use push!(element, array) I can collect data in the array, but if the code is inside a loop then the array "grows" each time.
What do you recommend?
I know is quite basic :) but thank you!
I assume the reason why you do not want to use push! is because you have used Matlab before where this sort of operation is painfully slow (to be precise, it is an O(n^2) operation, so doubling n quadruples the runtime). This is not true for Julia's push!, since push! uses the algorithm described here which is only O(n) (so doubling n only doubles the runtime).
You can easily check this experimentally. In Matlab we have
>> n = 100000; tic; a = []; for i = 1:n; a = [a;0]; end; toc
Elapsed time is 2.206152 seconds.
>> n = 200000; tic; a = []; for i = 1:n; a = [a;0]; end; toc
Elapsed time is 8.301130 seconds.
so indeed the runtime quadruples for an n of twice the size. In contrast, in Julia we have
julia> using BenchmarkTools
function myzeros(n)
a = Vector{Int}()
for i = 1:n
push!(a,0)
end
return a
end
#btime myzeros(100_000);
#btime myzeros(200_000);
486.054 μs (17 allocations: 2.00 MiB)
953.982 μs (18 allocations: 3.00 MiB)
so indeed the runtimes only doubles for an n of twice the size.
Long story short: if you know the size of the final array, then preallocating the array is always best (even in Julia). However, if you don't know the final array size, then you can use push! without losing too much performance.
I have two versions of code that seem to do the same thing:
sum = 0
for x in 1:100
sum += x
end
sum = 0
for x in collect(1:100)
sum += x
end
Is there a practical difference between the two approaches?
In Julia, 1:100 returns a particular struct called UnitRange that looks like this:
julia> dump(1:100)
UnitRange{Int64}
start: Int64 1
stop: Int64 100
This is a very compact struct to represent ranges with step 1 and arbitrary (finite) size. UnitRange is subtype of AbstractRange, a type to represent ranges with arbitrary step, subtype of AbstractVector.
The instances of UnitRange dynamically compute their elements whenever the you use getindex (or the syntactic sugar vector[index]). For example, with #less (1:100)[3] you can see this method:
function getindex(v::UnitRange{T}, i::Integer) where {T<:OverflowSafe}
#_inline_meta
val = v.start + (i - 1)
#boundscheck _in_unit_range(v, val, i) || throw_boundserror(v, i)
val % T
end
This is returning the i-th element of the vector by adding i - 1 to the first element (start) of the range. Some functions have optimised methods with UnitRange, or more generally with AbstractRange. For instance, with #less sum(1:100) you can see the following
function sum(r::AbstractRange{<:Real})
l = length(r)
# note that a little care is required to avoid overflow in l*(l-1)/2
return l * first(r) + (iseven(l) ? (step(r) * (l-1)) * (l>>1)
: (step(r) * l) * ((l-1)>>1))
end
This method uses the formula for the sum of an arithmetic progression, which is extremely efficient as it's evaluated in a time independent of the size of the vector.
On the other hand, collect(1:100) returns a plain Vector with one hundred elements 1, 2, 3, ..., 100. The main difference with UnitRange (or other types of AbstractRange) is that getindex(vector::Vector, i) (or vector[i], with vector::Vector) doesn't do any computation but simply accesses the i-th element of the vector. The downside of a Vector over a UnitRange is that generally speaking there aren't efficient methods when working with them as the elements of this container are completely arbitrary, while UnitRange represents a set of numbers with peculiar properties (sorted, constant step, etc...).
If you compare the performance of methods for which UnitRange has super-efficient implementations, this type will win hands down (note the use of interpolation of variables with $(...) when using macros from BenchmarkTools):
julia> using BenchmarkTools
julia> #btime sum($(1:1000_000))
0.012 ns (0 allocations: 0 bytes)
500000500000
julia> #btime sum($(collect(1:1000_000)))
229.979 μs (0 allocations: 0 bytes)
500000500000
Remember that UnitRange comes with the cost of dynamically computing the elements every time you access them with getindex. Consider for example this function:
function test(vec)
sum = zero(eltype(vec))
for idx in eachindex(vec)
sum += vec[idx]
end
return sum
end
Let's benchmark it with a UnitRange and a plain Vector:
julia> #btime test($(1:1000_000))
812.673 μs (0 allocations: 0 bytes)
500000500000
julia> #btime test($(collect(1:1000_000)))
522.828 μs (0 allocations: 0 bytes)
500000500000
In this case the function calling the plain array is faster than the one with a UnitRange because it doesn't have to dynamically compute 1 million elements.
Of course, in these toy examples it'd be more sensible to iterate over all elements of vec rather than its indices, but in real world cases a situation like these may be more sensible. This last example, however, shows that a UnitRange is not necessarily more efficient than a plain array, especially if you need to dynamically compute all of its elements. UnitRanges are more efficient when you can take advantage of specialised methods (like sum) for which the operation can be performed in constant time.
As a file remark, note that if you originally have a UnitRange it's not necessarily a good idea to convert it to a plain Vector to get good performance, especially if you're going to use it only once or very few times, as the conversion to Vector involves itself the dynamic computation of all elements of the range and the allocation of the necessary memory:
julia> #btime collect($(1:1000_000));
422.435 μs (2 allocations: 7.63 MiB)
julia> #btime test(collect($(1:1000_000)))
882.866 μs (2 allocations: 7.63 MiB)
500000500000
In Julia I want to find the column index of a matrix for the maximum value in each row, with the result being a Vector{Int}. Here is how I am doing it currently (Samples has 7 columns and 10,000 rows):
mxindices = [ i[2] for i in findmax(Samples, dims = 2)[2]][:,1]
This works but feels rather clumsy and verbose. Wondered if there was a better way.
Even simpler: Julia has an argmax function and Julia 1.1+ has an eachrow iterator. Thus:
map(argmax, eachrow(x))
Simple, readable, and fast — it matches the performance of Colin's f3 and f4 in my quick tests.
UPDATE: For the sake of completeness, I've added Matt B.'s excellent solution to the test-suite (and I also forced the transpose in f4 to generate a new matrix rather than a lazy view).
Here are some different approaches (yours is the base-case f0):
f0(x) = [ i[2] for i in findmax(x, dims = 2)[2]][:,1]
f1(x) = getindex.(argmax(x, dims=2), 2)
f2(x) = [ argmax(vec(x[n,:])) for n = 1:size(x,1) ]
f3(x) = [ argmax(vec(view(x, n, :))) for n = 1:size(x,1) ]
f4(x) = begin ; xt = Matrix{Float64}(transpose(x)) ; [ argmax(view(xt, :, k)) for k = 1:size(xt,2) ] ; end
f5(x) = map(argmax, eachrow(x))
Using BenchmarkTools we can examine the efficiency of each (I've set x = rand(100, 200)):
julia> #btime f0($x);
76.846 μs (13 allocations: 4.64 KiB)
julia> #btime f1($x);
76.594 μs (11 allocations: 3.75 KiB)
julia> #btime f2($x);
53.433 μs (103 allocations: 177.48 KiB)
julia> #btime f3($x);
43.477 μs (3 allocations: 944 bytes)
julia> #btime f4($x);
73.435 μs (6 allocations: 157.27 KiB)
julia> #btime f5($x);
43.900 μs (4 allocations: 960 bytes)
So Matt's approach is the fairly obvious winner, as it appears to just be a syntactically cleaner version of my f3 (the two probably compile to something very similar, but I think it would be overkill to check that).
I was hoping f4 might have an edge, despite the temporary created via instantiating the transpose, since it could operate on the columns of a matrix rather than the rows (Julia is a column-major language, so operations on columns will always be faster since the elements are synchronous in memory). But it doesn't appear to be enough to overcome the disadvantage of the temporary.
Note, if it is ever the case that you want the full CartesianIndex, that is, both the row and column index of the maximum in each row, then obviously the appropriate solution is just argmax(x, dims=2).
Mapslices function is also a great option for this problem:
julia> Samples = rand(10000, 7);
julia> res = mapslices(row -> findmax(row)[2], Samples, dims=[2])[:,1];
julia> res[1:10]
10-element Array{Int64,1}:
3
1
3
5
4
4
1
4
5
3
Although this is a lot slower than what Colin suggested above, it might be more readable for some people. This is essentially exactly the same code as you started out with, but uses mapslices instead of list comprehensions.