How can I check if a string is empty? - julia

How can I check if a string is empty?
I am currently using the == operator:
julia> x = "";
julia> x == "";
true

Use isempty. It is more explicit and more likely to be optimized for its use case.
For example, on the latest Julia:
julia> using BenchmarkTools
julia> myisempty(x::String) = x == ""
foo (generic function with 1 method)
julia> #btime myisempty("")
2.732 ns (0 allocations: 0 bytes)
true
julia> #btime myisempty("bar")
3.001 ns (0 allocations: 0 bytes)
false
julia> #btime isempty("")
1.694 ns (0 allocations: 0 bytes)
true
julia> #btime isempty("bar")
1.594 ns (0 allocations: 0 bytes)
false

Related

In-place opposite (additive inverse) of array

In Julia, one can find the conj and conj! for the respectively out-of-place and in-place conjugate of a complex type object. Surprisingly, I could not find an in-place version for the opposite (additive inverse) of an array.
The main interest is related to allocation as the in-place version allocates nothing.
Here is some benchmarking.
# using BenchmarkTools
a = rand(ComplexF64,1000,1000);
#btime conj($a);
#btime conj!($a);
#btime -$a;
#btime -1 .* $a;
#btime flipsign.(a,-1);
#btime .-$a;
julia> using BenchmarkTools
julia> a = rand(ComplexF64,1000,1000);
julia> #btime conj($a);
3.594 ms (2 allocations: 15.26 MiB)
julia> #btime conj!($a);
979.401 μs (0 allocations: 0 bytes)
julia> #btime -$a;
3.594 ms (2 allocations: 15.26 MiB)
julia> #btime -1 .* $a;
3.586 ms (2 allocations: 15.26 MiB)
julia> #btime flipsign.(a,-1);
3.588 ms (4 allocations: 15.26 MiB)
julia> #btime .-$a;
3.588 ms (2 allocations: 15.26 MiB)
In all cases except the one with conj! you are also measuring the allocation of the result of the expression. This goes away if you use broadcast assignment:
julia> #btime(#. $a = conj($a))
2.409 ms (0 allocations: 0 bytes)
julia> #btime(#. $a = -($a));
2.386 ms (0 allocations: 0 bytes)
The #. macro is just a shortcut for "put dots everywhere", as in a .= .-(a).
Maybe your confusion arises from thinking about Julia types the wrong way. Unlike (I believe) in Matlab, scalars and arrays are strictly different. conj operates on complex scalars, while conj! is just a helper for arrays of complex numbers, equivalent to a -> a .= conj.(a). In-place operations can naturally only work on arrays (i.e., reference types), not scalars.

Is there a lazy `filter` in Julia?

In Python one can use if in the list comprehension to filter out elements. In Julia is there a lazy filter equivalent?
for x in filter(x->x<2, 1:3)
println(x)
end
works and prints only 1 but filter(x->x<2, 1:3) is eager and so may not be desirable for billions of records.
You can do this just like in Python:
julia> function f()
for x in (i for i in 1:10^9 if i == 10^9)
println(x)
end
end
f (generic function with 1 method)
julia> #time f()
1000000000
3.293702 seconds (139.87 k allocations: 7.107 MiB)
julia> #time f()
1000000000
3.224707 seconds (11 allocations: 352 bytes)
and you see that it does not allocate. But it is faster to just perform a filter test inside the loop without using a generator:
julia> function g()
for x in 1:10^9
x == 10^9 && println(x)
end
end
g (generic function with 1 method)
julia> #time g()
1000000000
2.098305 seconds (53.49 k allocations: 2.894 MiB)
julia> #time g()
1000000000
2.094018 seconds (11 allocations: 352 bytes)
Edit Finally you can use Iterators.filter:
julia> function h()
for x in Iterators.filter(==(10^9), 1:10^9)
println(x)
end
end
h (generic function with 1 method)
julia>
julia> #time h()
1000000000
0.390966 seconds (127.96 k allocations: 6.599 MiB)
julia> #time h()
1000000000
0.311650 seconds (12 allocations: 688 bytes)
which in this case will be fastest (see also https://docs.julialang.org/en/latest/base/iterators/#Iteration-utilities-1).
You might also want to check out https://github.com/JuliaCollections/IterTools.jl.
EDIT 2
Sometimes Julia is more powerful than you would think. Check this out:
julia> function g2()
for x in 1:1_000_000_000
x == 1_000_000_000 && println(x)
end
end
g2 (generic function with 1 method)
julia>
julia> #time g2()
1000000000
0.029332 seconds (62.91 k allocations: 3.244 MiB)
julia> #time g2()
1000000000
0.000636 seconds (11 allocations: 352 bytes)
and we see that the compiler has essentially compiled out all our computations.
In essence - in the earlier example constant propagation kicked in and replaced 10^9 by 1_000_000_000 in the Iterators.filter example.
Therefore we have to devise a smarter test. Here it goes:
julia> using BenchmarkTools
julia> function f_rand(x)
s = 0.0
for v in (v for v in x if 0.1 < v < 0.2)
s += v
end
s
end
f_rand (generic function with 1 method)
julia> function g_rand(x)
s = 0.0
for v in x
if 0.1 < v < 0.2
s += v
end
end
s
end
g_rand (generic function with 1 method)
julia> function h_rand(x)
s = 0.0
for v in Iterators.filter(v -> 0.1 < v < 0.2, x)
s += v
end
s
end
h_rand (generic function with 1 method)
julia> x = rand(10^6);
julia> #btime f_rand($x)
2.032 ms (0 allocations: 0 bytes)
14922.291597613703
julia> #btime g_rand($x)
1.804 ms (0 allocations: 0 bytes)
14922.291597613703
julia> #btime h_rand($x)
2.035 ms (0 allocations: 0 bytes)
14922.291597613703
And now we get what I was originally expecting (a plain loop with if is the fastest).

Julia JIT compilation, #time and number of allocations

I am just starting to evaluate Julia (version 0.6.0) and I tested how resize! and sizehint! could impact performance. I used #time macro.
Documentation says "# Run once to JIT-compile" but it seems that running once could not be enough if we check number of allocations.
module Test
function test(x::Int64; hint::Bool=false, resize::Bool=false)
A::Array{Int64} = []
n::Int64 = x
if resize
resize!(A, n)
for i in 1:n
A[i]=i
end
else
if hint sizehint!(A, n) end
for i in 1:n
push!(A, i)
end
end
A[end]
end
end
import Test
#Test.test(1); # (1)
#Test.test(1, hint=true); # (2)
#Test.test(1, resize=true); # (3)
#time Test.test(10_000_000)
#time Test.test(10_000_000, hint=true)
#time Test.test(10_000_000, resize=true)
I got different results for different "JIT-precompile" callings:
Result from code above:
0.494120 seconds (11.02 k allocations: 129.706 MiB, 22.77% gc time)
0.141155 seconds (3.43 k allocations: 76.537 MiB, 41.94% gc time)
0.068319 seconds (9 allocations: 76.294 MiB, 76.99% gc time)
If (1) is uncommented:
0.520939 seconds (112 allocations: 129.007 MiB, 21.79% gc time)
0.140845 seconds (3.43 k allocations: 76.537 MiB, 42.35% gc time)
0.068741 seconds (9 allocations: 76.294 MiB, 77.55% gc time)
if (1) && (2) are uncommented:
0.586479 seconds (112 allocations: 129.007 MiB, 19.28% gc time)
0.117521 seconds (9 allocations: 76.294 MiB, 50.56% gc time)
0.068275 seconds (9 allocations: 76.294 MiB, 76.84% gc time)
if (1) && (2) && (3) are uncommented:
0.509668 seconds (112 allocations: 129.007 MiB, 21.61% gc time)
0.112276 seconds (9 allocations: 76.294 MiB, 50.58% gc time)
0.065123 seconds (9 allocations: 76.294 MiB, 76.34% gc time)
if (3) is uncommented:
0.497802 seconds (240 allocations: 129.016 MiB, 22.53% gc time)
0.117035 seconds (11 allocations: 76.294 MiB, 52.56% gc time)
0.067170 seconds (11 allocations: 76.294 MiB, 76.93% gc time)
My questions:
Is it bug?
If it is not bug then is there possibility to invoke complete compilation?
No, the doc here clearly tells this is due to you were running #time in global scope:
julia> function foo()
Test.test(1) # warm-up
#time Test.test(10_000_000)
#time Test.test(10_000_000, hint=true)
#time Test.test(10_000_000, resize=true)
end
foo (generic function with 1 method)
julia> foo()
0.401256 seconds (26 allocations: 129.001 MiB, 47.38% gc time)
0.185094 seconds (6 allocations: 76.294 MiB, 37.13% gc time)
0.034649 seconds (6 allocations: 76.294 MiB, 30.99% gc time)

Generating rectilinear grid coordinates in Julia

In Julia, what's the best way to make an (X, Y) array like this?
0 0
1 0
2 0
3 0
0 1
1 1
2 1
3 1
0 2
1 2
2 2
3 2
0 3
1 3
2 3
3 3
Coordinates are regular and rectilinear but not necessarily integers.
Julia 0.6 includes an efficient product iterator which allows for a fourth solution. Comparing all solutions:
using Base.Iterators
f1(xs, ys) = [[xs[i] for i in 1:length(xs), j in 1:length(ys)][:] [ys[j] for i in 1:length(xs), j in 1:length(ys)][:]]
f2(xs, ys) = hcat(repeat(xs, outer=length(ys)), repeat(ys, inner=length(xs)))
f3(xs, ys) = vcat(([x y] for y in ys for x in xs)...)
f4(xs, ys) = (eltype(xs) == eltype(ys) || error("eltypes must match");
reinterpret(eltype(xs), collect(product(xs, ys)), (2, length(xs)*length(ys)))')
xs = 1:3
ys = 0:4
#show f1(xs, ys) == f2(xs, ys) == f3(xs, ys) == f4(xs, ys)
using BenchmarkTools
#btime f1($xs, $ys)
#btime f2($xs, $ys)
#btime f3($xs, $ys)
#btime f4($xs, $ys)
On my PC, this results in:
f1(xs, ys) == f2(xs, ys) == f3(xs, ys) == f4(xs, ys) = true
548.508 ns (8 allocations: 1.23 KiB)
3.792 μs (49 allocations: 2.45 KiB)
1.916 μs (51 allocations: 3.17 KiB)
353.880 ns (8 allocations: 912 bytes)
For xs = 1:300 and ys=0:400 I get:
f1(xs, ys) == f2(xs, ys) == f3(xs, ys) == f4(xs, ys) = true
1.538 ms (13 allocations: 5.51 MiB)
1.032 ms (1636 allocations: 3.72 MiB)
16.668 ms (360924 allocations: 24.95 MiB)
927.001 μs (10 allocations: 3.67 MiB)
Edit:
By far the fastest method is a direct loop over a preallocated array:
function f5(xs, ys)
lx, ly = length(xs), length(ys)
res = Array{Base.promote_eltype(xs, ys), 2}(lx*ly, 2)
ind = 1
for y in ys, x in xs
res[ind, 1] = x
res[ind, 2] = y
ind += 1
end
res
end
For xs = 1:3 and ys = 0:4, f5 takes 65.339 ns (1 allocation: 336 bytes).
For xs = 1:300 and ys = 0:400, it takes 280.852 μs (2 allocations: 1.84 MiB).
Edit 2:
Including f6 from Dan Getz' comment:
function f6(xs, ys)
lx, ly = length(xs), length(ys)
lxly = lx*ly
res = Array{Base.promote_eltype(xs, ys), 2}(lxly, 2)
ind = 1
while ind<=lxly
#inbounds for x in xs
res[ind] = x
ind += 1
end
end
for y in ys
#inbounds for i=1:lx
res[ind] = y
ind += 1
end
end
res
end
By respecting the column-major order of Julia arrays, it reduces the timings to 47.452 ns (1 allocation: 336 bytes) and 171.709 μs (2 allocations: 1.84 MiB), respectively.
This seems to do the trick. Not sure it is the best solution though. Seems a bit convoluted.
xs = 0:3;
ys = 0:3;
out = [[xs[i] for i in 1:length(xs), j in 1:length(ys)][:] [ys[j] for i in 1:length(xs), j in 1:length(ys)][:]]
sounds like a job for repeat:
hcat(repeat(0:3, outer=4), repeat(0:3, inner=4)).
Note that, it's way too slower than array comprehension when xs or ys is small(e.g. 3,30).

Abstract typed array construction JIT performance

In one of my application, I have to store elements of different subtypes in the array and I got big hit by the JIT performance.
Below is a minimal example.
abstract A
immutable B <: A end
immutable C <: A end
b = B()
c = C()
#time getindex(A, b, b)
#time getindex(A, b, c)
#time getindex(A, c, c)
#time getindex(A, c, b)
#time getindex(A, b, c, b)
#time getindex(A, b, c, c);
0.007756 seconds (6.03 k allocations: 276.426 KB)
0.007878 seconds (5.01 k allocations: 223.087 KB)
0.005175 seconds (2.44 k allocations: 128.773 KB)
0.004276 seconds (2.42 k allocations: 127.546 KB)
0.004107 seconds (2.45 k allocations: 129.983 KB)
0.004090 seconds (2.45 k allocations: 129.983 KB)
As you see, each time I construct the array for different combination of elements, it has to do a JIT.
I also tried [...] instead of T[...], it appeared worse.
Restart the kernel and run the following:
b = B()
c = C()
#time Base.vect(b, b)
#time Base.vect(b, c)
#time Base.vect(c, c)
#time Base.vect(c, b)
#time Base.vect(b, c, b)
#time Base.vect(b, c, c);
0.008252 seconds (6.87 k allocations: 312.395 KB)
0.149397 seconds (229.26 k allocations: 12.251 MB)
0.006778 seconds (6.86 k allocations: 312.270 KB)
0.113640 seconds (178.26 k allocations: 9.132 MB, 3.04% gc time)
0.050561 seconds (99.19 k allocations: 5.194 MB)
0.031053 seconds (72.50 k allocations: 3.661 MB)
In my application I face a lot of different subtypes: each element is of type NTuple{N, A} where N can change. So in the end the application was stuck in JIT.
What's the best way to get around it? The only way I can think of is to create a wrapper, say W, and box all my element into W before entering the array. So the compiler only compiles the array function once.
immutable W
value::NTuple
end
Thanks to #Matt B. after overloading his getindex,
c = C()
#time getindex(A, b, b)
#time getindex(A, b, c)
#time getindex(A, c, c)
#time getindex(A, c, b)
#time getindex(A, b, c, b)
#time getindex(A, b, c, c);
0.008493 seconds (6.43 k allocations: 289.646 KB)
0.000867 seconds (463 allocations: 19.012 KB)
0.000005 seconds (5 allocations: 240 bytes)
0.000003 seconds (5 allocations: 240 bytes)
0.004035 seconds (2.37 k allocations: 122.535 KB)
0.000003 seconds (5 allocations: 256 bytes)
Also, I realized the JIT of tuple is actually quite efficient.
#time tuple(1,2)
#time tuple(b, b)
#time tuple(b, c)
#time tuple(c, c)
#time tuple(c, b)
#time tuple(b, c, b)
#time tuple(b, c, c);
#time tuple(b, b)
#time tuple(b, c)
#time tuple(c, c)
#time tuple(c, b)
#time tuple(b, c, b)
#time tuple(b, c, c);
0.000004 seconds (149 allocations: 10.183 KB)
0.000011 seconds (7 allocations: 336 bytes)
0.000008 seconds (7 allocations: 336 bytes)
0.000007 seconds (7 allocations: 336 bytes)
0.000007 seconds (7 allocations: 336 bytes)
0.000005 seconds (7 allocations: 352 bytes)
0.000004 seconds (7 allocations: 352 bytes)
0.000003 seconds (5 allocations: 192 bytes)
0.000004 seconds (5 allocations: 192 bytes)
0.000002 seconds (5 allocations: 192 bytes)
0.000002 seconds (5 allocations: 192 bytes)
0.000002 seconds (5 allocations: 192 bytes)
0.000002 seconds (5 allocations: 192 bytes)
The JIT heuristics here could probably be better tuned in the base library. While Julia does default to generating specialized methods for unique permutations of argument types, there are a few escape hatches you can use to reduce the number of specializations:
Use f(T::Type) instead of f{T}(::Type{T}). Both are well-typed and behave nicely through inference, but the former will only generate one method for all types.
Use the undocumented all-caps g(::ANY) flag instead of g(::Any). It's semantically identical, but ANY will prevent specialization for that argument.
In this case, you probably want to specialize on the type but not the values:
function Base.getindex{T<:A}(::Type{T}, vals::ANY...)
a = Array(T,length(vals))
#inbounds for i = 1:length(vals)
a[i] = vals[i]
end
return a
end

Resources