I am new to Julia. Just a quick question, in Julia, will using type Int64 slow down the calculation comparing with using Int32?
Like,
i=1::Int64
j=1::Int64
and I want to calculate
i+j
If I define i and j as Int32, will that making i+j faster than if i and j are Int64?
Thanks!
I know in Fortran, int 8 will be much slower than int 4. Not sure if it is the same in Julia
The time will depend on the size of the object.
Consider the following function that does some multiplication, addition and comparison of integers:
min200(T) = minimum(x*x+x for x in UnitRange{T}(1:200))
Here are the times using BenchmarkTools:
julia> #btime min200(Int16);
9.409 ns (0 allocations: 0 bytes)
julia> #btime min200(Int32);
14.329 ns (0 allocations: 0 bytes)
julia> #btime min200(Int64);
47.267 ns (0 allocations: 0 bytes)
julia> #btime min200(Int128);
256.160 ns (0 allocations: 0 bytes)
Note that this difference goes down to the assembly code. Let us see how an addition of two integer number gets compiled.
julia> #code_native +(Int32(5), Int32(7))
.text
; ┌ # int.jl:87 within `+`
pushq %rbp
movq %rsp, %rbp
leal (%rcx,%rdx), %eax
popq %rbp
retq
nopl (%rax)
; └
julia> #code_native +(5,7)
.text
; ┌ # int.jl:87 within `+`
pushq %rbp
movq %rsp, %rbp
leaq (%rcx,%rdx), %rax
popq %rbp
retq
nopw (%rax,%rax)
; └
You can see that the 32-bit addition used 32-bit registry %eax vs 64-bit %rax in the second code.
Moreover, when you using features such as #simd or GPU computing via CuArrays those differences might turn out to be even more significant.
First of all, all of this is architecture dependent. I will be assuming relatively modern 64 bit x86 CPUs for this answer.
For scalar calculations, Int64 and Int32 will be the same speed. (Int16/Int8 are roughly the same, though are occasionally slightly slower). For vectorized computation, i.e. rand(Int64,100) .+ rand(Int64,100), Int64 will be about 2x slower than Int32, due to better cache usage, and higher vectorization widths (for AVX-2/AVX-512 etc).
Related
Let's say I have a vector
v = Any[1,2,3,4]
And I would like to recompute its eltype in such a way that
typeof(v) = Vector{Int}
Is it possible to accomplish this without having to manually concatenate each of the elements in v?
You can't "retype" the existing v, just create a copy of it with the more concrete type1.
Conversion
Assuming you already (statically) know the result type, you have multiple options. Most readable (and IMO, idiomatic) would be
Vector{Int}(v)
which is almost equivalent to
convert(Vector{Int}, v)
except that the latter does not copy if the input types is already the target type. Alternatively:
convert.(Int, v)
which surely copies as well.
What to convert to
If you don't know what the "common type" would be, there are multiple options how to get one that matches. In general, typejoin can be used to find a least upper bound:
mapreduce(typeof, typejoin, v; init=Union{})
The result will most likely be abstract, e.g. Real for an array of Ints and Float64s. So, for numeric types, you might be better off with promote_type:
mapreduce(typeof, promote_type, v; init=Union{}) # or init=Number
This at least gives you Float64 for mixed Ints and Float64s.
But all of this is not really recommended, since it might be fragile, surprising, and is certainly not type stable.
1For certain combinations of types, with compatible binary form, reinterpret will work and return a view with a different type, but this is only possible for bits types, which Any is not. For converting Any[1,2,3] to Int[1,2,3] copying is fundamentally necessary because the two arrays have different layouts in memory: the former is an array of pointers to individually allocated integers objects, whereas the latter stores the Int values inline in contiguous memory.
If you don't know the output type, then consider using a comprehension
foo(v) = [x for x in v]
This is considerably faster on my computer than identity.(v):
julia> v = Any[1,2,3,4];
julia> #btime foo($v);
153.018 ns (2 allocations: 128 bytes)
julia> #btime identity.($v);
293.908 ns (5 allocations: 208 bytes)
julia> #btime foo(v) setup=(v=Any[rand(0:9) for _ in 1:1000]);
1.331 μs (2 allocations: 7.95 KiB)
julia> #btime identity.(v) setup=(v=Any[rand(0:9) for _ in 1:1000]);
25.498 μs (494 allocations: 15.67 KiB)
This is a quick and dirty trick that usually solves the problem
julia> v = Any[1,2,3,4]
4-element Array{Any,1}:
1
2
3
4
julia> identity.(v)
4-element Array{Int64,1}:
1
2
3
4
I have two versions of code that seem to do the same thing:
sum = 0
for x in 1:100
sum += x
end
sum = 0
for x in collect(1:100)
sum += x
end
Is there a practical difference between the two approaches?
In Julia, 1:100 returns a particular struct called UnitRange that looks like this:
julia> dump(1:100)
UnitRange{Int64}
start: Int64 1
stop: Int64 100
This is a very compact struct to represent ranges with step 1 and arbitrary (finite) size. UnitRange is subtype of AbstractRange, a type to represent ranges with arbitrary step, subtype of AbstractVector.
The instances of UnitRange dynamically compute their elements whenever the you use getindex (or the syntactic sugar vector[index]). For example, with #less (1:100)[3] you can see this method:
function getindex(v::UnitRange{T}, i::Integer) where {T<:OverflowSafe}
#_inline_meta
val = v.start + (i - 1)
#boundscheck _in_unit_range(v, val, i) || throw_boundserror(v, i)
val % T
end
This is returning the i-th element of the vector by adding i - 1 to the first element (start) of the range. Some functions have optimised methods with UnitRange, or more generally with AbstractRange. For instance, with #less sum(1:100) you can see the following
function sum(r::AbstractRange{<:Real})
l = length(r)
# note that a little care is required to avoid overflow in l*(l-1)/2
return l * first(r) + (iseven(l) ? (step(r) * (l-1)) * (l>>1)
: (step(r) * l) * ((l-1)>>1))
end
This method uses the formula for the sum of an arithmetic progression, which is extremely efficient as it's evaluated in a time independent of the size of the vector.
On the other hand, collect(1:100) returns a plain Vector with one hundred elements 1, 2, 3, ..., 100. The main difference with UnitRange (or other types of AbstractRange) is that getindex(vector::Vector, i) (or vector[i], with vector::Vector) doesn't do any computation but simply accesses the i-th element of the vector. The downside of a Vector over a UnitRange is that generally speaking there aren't efficient methods when working with them as the elements of this container are completely arbitrary, while UnitRange represents a set of numbers with peculiar properties (sorted, constant step, etc...).
If you compare the performance of methods for which UnitRange has super-efficient implementations, this type will win hands down (note the use of interpolation of variables with $(...) when using macros from BenchmarkTools):
julia> using BenchmarkTools
julia> #btime sum($(1:1000_000))
0.012 ns (0 allocations: 0 bytes)
500000500000
julia> #btime sum($(collect(1:1000_000)))
229.979 μs (0 allocations: 0 bytes)
500000500000
Remember that UnitRange comes with the cost of dynamically computing the elements every time you access them with getindex. Consider for example this function:
function test(vec)
sum = zero(eltype(vec))
for idx in eachindex(vec)
sum += vec[idx]
end
return sum
end
Let's benchmark it with a UnitRange and a plain Vector:
julia> #btime test($(1:1000_000))
812.673 μs (0 allocations: 0 bytes)
500000500000
julia> #btime test($(collect(1:1000_000)))
522.828 μs (0 allocations: 0 bytes)
500000500000
In this case the function calling the plain array is faster than the one with a UnitRange because it doesn't have to dynamically compute 1 million elements.
Of course, in these toy examples it'd be more sensible to iterate over all elements of vec rather than its indices, but in real world cases a situation like these may be more sensible. This last example, however, shows that a UnitRange is not necessarily more efficient than a plain array, especially if you need to dynamically compute all of its elements. UnitRanges are more efficient when you can take advantage of specialised methods (like sum) for which the operation can be performed in constant time.
As a file remark, note that if you originally have a UnitRange it's not necessarily a good idea to convert it to a plain Vector to get good performance, especially if you're going to use it only once or very few times, as the conversion to Vector involves itself the dynamic computation of all elements of the range and the allocation of the necessary memory:
julia> #btime collect($(1:1000_000));
422.435 μs (2 allocations: 7.63 MiB)
julia> #btime test(collect($(1:1000_000)))
882.866 μs (2 allocations: 7.63 MiB)
500000500000
A programme I am writing has a user-written file containing parameters which are to be read in and implemented within the code. Users should be able to comment their input file by delimiting them with a comment character (I have gone with "#", in convention with Julia) - in parsing the input file, the code will remove these comments. Whilst making minor optimisations to this parser, I noted that instantiating the second variable prior to calling split() made a noticeable difference to the number allocations:
function removecomments1(line::String; dlm::String="#")
str::String = ""
try
str, tmp = split(line, dlm)
catch
str = line
finally
return str
end
end
function removecomments2(line::String; dlm::String="#")
str::String = ""
tmp::SubString{String} = ""
try
str, tmp = split(line, dlm)
catch
str = line
finally
return str
end
end
line = "Hello world # my comment"
#time removecomments1(line)
#time removecomments2(line)
$> 0.016092 seconds (27.31 k allocations: 1.367 MiB)
0.016164 seconds (31.26 k allocations: 1.548 MiB)
My intuition (coming from a C++ background) tells me that initialising both variables should have resulted in an increase in speed as well as minimising further allocations, since the compiler has already been told that a second variable is required as well as its corresponding type, however this doesn't appear to hold. Why would this be the case?
Aside: Are there any more efficient ways of achieving the same result as these functions?
EDIT:
Following a post by Oscar Smith, initialising str as type SubString{String} instead of String has reduced the allocations by around 10%:
$> 0.014811 seconds (24.29 k allocations: 1.246 MiB)
0.015045 seconds (28.25 k allocations: 1.433 MiB)
In your example, the only reason you need the try-catch block is because you're trying to destructure the output of split even though split will return a one element array when the input line has no comments. If you simply extract the first element from the output of split, then you can avoid the try-catch construct, which will save you time and memory:
julia> using BenchmarkTools
julia> removecomments3(line::String; dlm::String = "#") = first(split(line, dlm))
removecomments3 (generic function with 1 method)
julia> #btime removecomments1($line);
198.522 ns (5 allocations: 224 bytes)
julia> #btime removecomments2($line);
208.507 ns (6 allocations: 256 bytes)
julia> #btime removecomments3($line);
147.001 ns (4 allocations: 192 bytes)
In partial answer to your original question, pre-allocation is mainly used for arrays, not for strings or other scalars. For more discussion of when to use pre-allocation, check out this SO post.
To reason about what this is doing, think about what the split function would return if it was written in c++. It would not be copying, but would instead return a char*. As such, all that str::String = "" is doing is making Julia create an extra string object to ignore.
In Julia I want to find the column index of a matrix for the maximum value in each row, with the result being a Vector{Int}. Here is how I am doing it currently (Samples has 7 columns and 10,000 rows):
mxindices = [ i[2] for i in findmax(Samples, dims = 2)[2]][:,1]
This works but feels rather clumsy and verbose. Wondered if there was a better way.
Even simpler: Julia has an argmax function and Julia 1.1+ has an eachrow iterator. Thus:
map(argmax, eachrow(x))
Simple, readable, and fast — it matches the performance of Colin's f3 and f4 in my quick tests.
UPDATE: For the sake of completeness, I've added Matt B.'s excellent solution to the test-suite (and I also forced the transpose in f4 to generate a new matrix rather than a lazy view).
Here are some different approaches (yours is the base-case f0):
f0(x) = [ i[2] for i in findmax(x, dims = 2)[2]][:,1]
f1(x) = getindex.(argmax(x, dims=2), 2)
f2(x) = [ argmax(vec(x[n,:])) for n = 1:size(x,1) ]
f3(x) = [ argmax(vec(view(x, n, :))) for n = 1:size(x,1) ]
f4(x) = begin ; xt = Matrix{Float64}(transpose(x)) ; [ argmax(view(xt, :, k)) for k = 1:size(xt,2) ] ; end
f5(x) = map(argmax, eachrow(x))
Using BenchmarkTools we can examine the efficiency of each (I've set x = rand(100, 200)):
julia> #btime f0($x);
76.846 μs (13 allocations: 4.64 KiB)
julia> #btime f1($x);
76.594 μs (11 allocations: 3.75 KiB)
julia> #btime f2($x);
53.433 μs (103 allocations: 177.48 KiB)
julia> #btime f3($x);
43.477 μs (3 allocations: 944 bytes)
julia> #btime f4($x);
73.435 μs (6 allocations: 157.27 KiB)
julia> #btime f5($x);
43.900 μs (4 allocations: 960 bytes)
So Matt's approach is the fairly obvious winner, as it appears to just be a syntactically cleaner version of my f3 (the two probably compile to something very similar, but I think it would be overkill to check that).
I was hoping f4 might have an edge, despite the temporary created via instantiating the transpose, since it could operate on the columns of a matrix rather than the rows (Julia is a column-major language, so operations on columns will always be faster since the elements are synchronous in memory). But it doesn't appear to be enough to overcome the disadvantage of the temporary.
Note, if it is ever the case that you want the full CartesianIndex, that is, both the row and column index of the maximum in each row, then obviously the appropriate solution is just argmax(x, dims=2).
Mapslices function is also a great option for this problem:
julia> Samples = rand(10000, 7);
julia> res = mapslices(row -> findmax(row)[2], Samples, dims=[2])[:,1];
julia> res[1:10]
10-element Array{Int64,1}:
3
1
3
5
4
4
1
4
5
3
Although this is a lot slower than what Colin suggested above, it might be more readable for some people. This is essentially exactly the same code as you started out with, but uses mapslices instead of list comprehensions.
I'm having trouble converting an array of numeric strings into an array of corresponding floating point numbers. A (hypothetical) string array is:
arr = ["8264.", "7.1050^-7", "9970.", "2.1090^-6", "5.2378^-7"]
I would like to convert it into:
arr = [8264., 1.0940859076672388e-6, 9970., 0.011364243260505457, 9.246079446497013e-6]
As a novice of Julia, I have no clue on how to make power operator "^" in the string format to do the correct job in the conversion. I highly appreciate your suggestions!
This function would parse both forms, with without the exponent.
function foo(s)
a=parse.(Float64,split(s,'^'))
length(a)>1 && return a[1]^a[2]
a[1]
end
Somewhat ugly, but gets the job done:
eval.(Meta.parse.(arr))
UPDATE:
Let me elaborate a bit what this does and why it's maybe not good style.
Meta.parse converts a String into a Julia Expression. The dot indicates that we want to broadcast Meta.parse to every string in arr, that is apply it to every element. Afterwards, we use eval - again broadcasted - to evalute the expressions.
This produces the correct result as it literally takes every string as a Julia "command" and hence knows that ^ indicates a power. However, besides being slow, this is potentially insecure as one could inject arbitrary Julia code.
UPDATE:
A safer and faster way to obtain the desired result is to define a short function that does the conversion:
julia> function mystr2float(s)
!occursin('^', s) && return parse(Float64, s)
x = parse.(Float64, split(s, '^'))
return x[1]^x[2]
end
mystr2float (generic function with 1 method)
julia> mystr2float.(arr)
5-element Array{Float64,1}:
8264.0
1.0940859076672388e-6
9970.0
0.011364243260505457
9.246079446497013e-6
julia> using BenchmarkTools
julia> #btime eval.(Meta.parse.($arr));
651.000 μs (173 allocations: 9.27 KiB)
julia> #btime mystr2float.($arr);
5.567 μs (18 allocations: 1.02 KiB)
UPDATE:
Performance comparison with #dberge's suggestion below:
julia> #btime mystr2float.($arr);
5.516 μs (18 allocations: 1.02 KiB)
julia> #btime foo.($arr);
5.767 μs (24 allocations: 1.47 KiB)