I have a set containing a single item, in this case a string:
b = Set(["A"])
I want to get that single item out. What's the best way of doing this? The only way I can figure out to do it is using a loop:
single_item = ""
for item in b
single_item = item
end
which gets my what I need
julia> single_item
"A"
but I feel like there must be an easier way.
how about
julia> collect(b)[1]
"A"
edit
as the legendary Dan Getz suggested, consider doing
julia> collect(take(b,1))[1]
"A"
if memory is an issue
I suggest first
julia> b = Set(["A"])
Set(ASCIIString["A"])
julia> first(b)
"A"
We can profile this, looking at number of allocations. (since memory allocating is slow). I would ignore the actual timing since this is a single run.
The results shown are the second run of each call. with b declared const.
julia> #time first(b)
0.000003 seconds (4 allocations: 160 bytes)
"A"
julia> #time collect(b)[1]
0.000005 seconds (5 allocations: 240 bytes)
"A"
julia> #time first(next(b,start(b)))
0.000007 seconds (5 allocations: 192 bytes)
"A"
Related
I have a concrete Array and want to efficiently construct a similar array of the same dimensions filled with ones. What would be the recommended approach?
Here's a random array to work with:
julia> A = rand(0:1, 10, 5)
10×5 Matrix{Int64}:
or A = rand(0:1., 10, 5) (with a dot on 0. and/or 1.) for a random matrix of floats.
Two approaches are very natural. I could do this:
julia> zero(A) .+ 1
5×10 Matrix{Int64}:
Or I could do it this way:
julia> repeat(ones(size(A)[2])', outer = size(A)[1])
5×10 Matrix{Float64}:
The first approach is more elegant. The second approach feels more clunky and prone to error (accidentally exchanging [1] and [2]), but at the same time it doesn't involve the addition operation and so possibly involves fewer allocations (or maybe not because the compiler is super smart Edit: quick benchmark below suggests the compiler is super smart).
And of course there may be another, better approach.
using BenchmarkTools
A = rand(0:1, 1000, 1000)
#btime zero(A) .+ 1
## 1.609 ms (6 allocations: 15.26 MiB)
#btime repeat(ones(size(A)[2])', outer = size(A)[1])
## 3.032 ms (10 allocations: 7.64 MiB)
Edit 2: Follow-up onBogumił's answer
The following method for a unit-array J, defined for convenience, is efficient:
function J(A::AbstractArray{T,N}) where {T,N}
ones(T, size(A))
end
J(A)
#btime J(A)
## 789.929 μs (2 allocations: 7.63 MiB)
What about:
ones(Int, size(A))
or
fill(1, size(A))
Let's say I have a vector
v = Any[1,2,3,4]
And I would like to recompute its eltype in such a way that
typeof(v) = Vector{Int}
Is it possible to accomplish this without having to manually concatenate each of the elements in v?
You can't "retype" the existing v, just create a copy of it with the more concrete type1.
Conversion
Assuming you already (statically) know the result type, you have multiple options. Most readable (and IMO, idiomatic) would be
Vector{Int}(v)
which is almost equivalent to
convert(Vector{Int}, v)
except that the latter does not copy if the input types is already the target type. Alternatively:
convert.(Int, v)
which surely copies as well.
What to convert to
If you don't know what the "common type" would be, there are multiple options how to get one that matches. In general, typejoin can be used to find a least upper bound:
mapreduce(typeof, typejoin, v; init=Union{})
The result will most likely be abstract, e.g. Real for an array of Ints and Float64s. So, for numeric types, you might be better off with promote_type:
mapreduce(typeof, promote_type, v; init=Union{}) # or init=Number
This at least gives you Float64 for mixed Ints and Float64s.
But all of this is not really recommended, since it might be fragile, surprising, and is certainly not type stable.
1For certain combinations of types, with compatible binary form, reinterpret will work and return a view with a different type, but this is only possible for bits types, which Any is not. For converting Any[1,2,3] to Int[1,2,3] copying is fundamentally necessary because the two arrays have different layouts in memory: the former is an array of pointers to individually allocated integers objects, whereas the latter stores the Int values inline in contiguous memory.
If you don't know the output type, then consider using a comprehension
foo(v) = [x for x in v]
This is considerably faster on my computer than identity.(v):
julia> v = Any[1,2,3,4];
julia> #btime foo($v);
153.018 ns (2 allocations: 128 bytes)
julia> #btime identity.($v);
293.908 ns (5 allocations: 208 bytes)
julia> #btime foo(v) setup=(v=Any[rand(0:9) for _ in 1:1000]);
1.331 μs (2 allocations: 7.95 KiB)
julia> #btime identity.(v) setup=(v=Any[rand(0:9) for _ in 1:1000]);
25.498 μs (494 allocations: 15.67 KiB)
This is a quick and dirty trick that usually solves the problem
julia> v = Any[1,2,3,4]
4-element Array{Any,1}:
1
2
3
4
julia> identity.(v)
4-element Array{Int64,1}:
1
2
3
4
A programme I am writing has a user-written file containing parameters which are to be read in and implemented within the code. Users should be able to comment their input file by delimiting them with a comment character (I have gone with "#", in convention with Julia) - in parsing the input file, the code will remove these comments. Whilst making minor optimisations to this parser, I noted that instantiating the second variable prior to calling split() made a noticeable difference to the number allocations:
function removecomments1(line::String; dlm::String="#")
str::String = ""
try
str, tmp = split(line, dlm)
catch
str = line
finally
return str
end
end
function removecomments2(line::String; dlm::String="#")
str::String = ""
tmp::SubString{String} = ""
try
str, tmp = split(line, dlm)
catch
str = line
finally
return str
end
end
line = "Hello world # my comment"
#time removecomments1(line)
#time removecomments2(line)
$> 0.016092 seconds (27.31 k allocations: 1.367 MiB)
0.016164 seconds (31.26 k allocations: 1.548 MiB)
My intuition (coming from a C++ background) tells me that initialising both variables should have resulted in an increase in speed as well as minimising further allocations, since the compiler has already been told that a second variable is required as well as its corresponding type, however this doesn't appear to hold. Why would this be the case?
Aside: Are there any more efficient ways of achieving the same result as these functions?
EDIT:
Following a post by Oscar Smith, initialising str as type SubString{String} instead of String has reduced the allocations by around 10%:
$> 0.014811 seconds (24.29 k allocations: 1.246 MiB)
0.015045 seconds (28.25 k allocations: 1.433 MiB)
In your example, the only reason you need the try-catch block is because you're trying to destructure the output of split even though split will return a one element array when the input line has no comments. If you simply extract the first element from the output of split, then you can avoid the try-catch construct, which will save you time and memory:
julia> using BenchmarkTools
julia> removecomments3(line::String; dlm::String = "#") = first(split(line, dlm))
removecomments3 (generic function with 1 method)
julia> #btime removecomments1($line);
198.522 ns (5 allocations: 224 bytes)
julia> #btime removecomments2($line);
208.507 ns (6 allocations: 256 bytes)
julia> #btime removecomments3($line);
147.001 ns (4 allocations: 192 bytes)
In partial answer to your original question, pre-allocation is mainly used for arrays, not for strings or other scalars. For more discussion of when to use pre-allocation, check out this SO post.
To reason about what this is doing, think about what the split function would return if it was written in c++. It would not be copying, but would instead return a char*. As such, all that str::String = "" is doing is making Julia create an extra string object to ignore.
In Julia I want to find the column index of a matrix for the maximum value in each row, with the result being a Vector{Int}. Here is how I am doing it currently (Samples has 7 columns and 10,000 rows):
mxindices = [ i[2] for i in findmax(Samples, dims = 2)[2]][:,1]
This works but feels rather clumsy and verbose. Wondered if there was a better way.
Even simpler: Julia has an argmax function and Julia 1.1+ has an eachrow iterator. Thus:
map(argmax, eachrow(x))
Simple, readable, and fast — it matches the performance of Colin's f3 and f4 in my quick tests.
UPDATE: For the sake of completeness, I've added Matt B.'s excellent solution to the test-suite (and I also forced the transpose in f4 to generate a new matrix rather than a lazy view).
Here are some different approaches (yours is the base-case f0):
f0(x) = [ i[2] for i in findmax(x, dims = 2)[2]][:,1]
f1(x) = getindex.(argmax(x, dims=2), 2)
f2(x) = [ argmax(vec(x[n,:])) for n = 1:size(x,1) ]
f3(x) = [ argmax(vec(view(x, n, :))) for n = 1:size(x,1) ]
f4(x) = begin ; xt = Matrix{Float64}(transpose(x)) ; [ argmax(view(xt, :, k)) for k = 1:size(xt,2) ] ; end
f5(x) = map(argmax, eachrow(x))
Using BenchmarkTools we can examine the efficiency of each (I've set x = rand(100, 200)):
julia> #btime f0($x);
76.846 μs (13 allocations: 4.64 KiB)
julia> #btime f1($x);
76.594 μs (11 allocations: 3.75 KiB)
julia> #btime f2($x);
53.433 μs (103 allocations: 177.48 KiB)
julia> #btime f3($x);
43.477 μs (3 allocations: 944 bytes)
julia> #btime f4($x);
73.435 μs (6 allocations: 157.27 KiB)
julia> #btime f5($x);
43.900 μs (4 allocations: 960 bytes)
So Matt's approach is the fairly obvious winner, as it appears to just be a syntactically cleaner version of my f3 (the two probably compile to something very similar, but I think it would be overkill to check that).
I was hoping f4 might have an edge, despite the temporary created via instantiating the transpose, since it could operate on the columns of a matrix rather than the rows (Julia is a column-major language, so operations on columns will always be faster since the elements are synchronous in memory). But it doesn't appear to be enough to overcome the disadvantage of the temporary.
Note, if it is ever the case that you want the full CartesianIndex, that is, both the row and column index of the maximum in each row, then obviously the appropriate solution is just argmax(x, dims=2).
Mapslices function is also a great option for this problem:
julia> Samples = rand(10000, 7);
julia> res = mapslices(row -> findmax(row)[2], Samples, dims=[2])[:,1];
julia> res[1:10]
10-element Array{Int64,1}:
3
1
3
5
4
4
1
4
5
3
Although this is a lot slower than what Colin suggested above, it might be more readable for some people. This is essentially exactly the same code as you started out with, but uses mapslices instead of list comprehensions.
I'm curious why Julias implementation of matrix addition appears to make a copy. Heres an example:
foo1=rand(1000,1000)
foo2=rand(1000,1000)
foo3=rand(1000,1000)
julia> #time foo1=foo2+foo3;
0.001719 seconds (9 allocations: 7.630 MB)
julia> sizeof(foo1)/10^6
8.0
The amount of memory allocated is roughly the same as the memory required by a matrix of these dimensions.
It looks like in order to process foo2+foo3 memory is allocated to store the result and then foo1 is assigned to it by reference.
Does this mean that for most linear algebra operations we need to call BLAS and LAPACK functions directly to do things in place?
To understand what is going on here, let's consider what foo1 = foo2 + foo3 actually does.
First it evaluates foo2 + foo3. To do this it will allocate a new temporary array to hold the output
Then it will bind the name foo1 to this new temporary array, undoing all effort you put in to pre-allocate the output array.
In short, you see that memory usage is about that of the resultant array because the routine is indeed allocating new memory for an array of that size.
Here are some alternatives:
write a loop
use broadcast!
We could try do do copy!(foo1, foo2+foo3) and then the array you pre-allocated will be filled, but it will still allocate the temporary (see below)
The original version posted here
Here's some code for those 4 cases
julia> function with_loop!(foo1, foo2, foo3)
for i in eachindex(foo2)
foo1[i] = foo2[i] + foo3[i]
end
end
julia> function with_broadcast!(foo1, foo2, foo3)
broadcast!(+, foo1, foo2, foo3)
end
julia> function with_copy!(foo1, foo2, foo3)
copy!(foo1, foo2+foo3)
end
julia> function original(foo1, foo2, foo3)
foo1 = foo2 + foo3
end
Now let's time these functions
julia> for f in [:with_broadcast!, :with_loop!, :with_copy!, :original]
#eval $f(foo1, foo2, foo3) # compile
println("timing $f")
#eval #time $f(foo1, foo2, foo3)
end
timing with_broadcast!
0.001787 seconds (5 allocations: 192 bytes)
timing with_loop!
0.001783 seconds (4 allocations: 160 bytes)
timing with_copy!
0.003604 seconds (9 allocations: 7.630 MB)
timing original
0.002702 seconds (9 allocations: 7.630 MB, 97.91% gc time)
You can see that with_loop! and broadcast! do about the same and both are much faster and more efficient than the others. with_copy! and original are both slower and use more memory.
In general, to do inplace operations I'd recommend starting out by writing a loop
First, read #spencerlyon2's answer. Another approach is to use Dahua Lin's package Devectorize.jl. It defines the #devec macro which automates the translations of vector (array) expressions into looped code.
In this example we will define with_devec!(foo1,foo2,foo3) as follows,
julia> using Devectorize # install with Pkg.add("Devectorize")
julia> with_devec!(foo1,foo2,foo3) = #devec foo1[:]=foo2+foo3
Running the benchmark achieves the 4 allocations results.
You can use axpy! function from the LinearAlgebra package.
using LinearAlgebra
julia> #time BLAS.axpy!(1., foo2, foo3)
0.002187 seconds (4 allocations: 160 bytes)