Generating rectilinear grid coordinates in Julia - julia

In Julia, what's the best way to make an (X, Y) array like this?
0 0
1 0
2 0
3 0
0 1
1 1
2 1
3 1
0 2
1 2
2 2
3 2
0 3
1 3
2 3
3 3
Coordinates are regular and rectilinear but not necessarily integers.

Julia 0.6 includes an efficient product iterator which allows for a fourth solution. Comparing all solutions:
using Base.Iterators
f1(xs, ys) = [[xs[i] for i in 1:length(xs), j in 1:length(ys)][:] [ys[j] for i in 1:length(xs), j in 1:length(ys)][:]]
f2(xs, ys) = hcat(repeat(xs, outer=length(ys)), repeat(ys, inner=length(xs)))
f3(xs, ys) = vcat(([x y] for y in ys for x in xs)...)
f4(xs, ys) = (eltype(xs) == eltype(ys) || error("eltypes must match");
reinterpret(eltype(xs), collect(product(xs, ys)), (2, length(xs)*length(ys)))')
xs = 1:3
ys = 0:4
#show f1(xs, ys) == f2(xs, ys) == f3(xs, ys) == f4(xs, ys)
using BenchmarkTools
#btime f1($xs, $ys)
#btime f2($xs, $ys)
#btime f3($xs, $ys)
#btime f4($xs, $ys)
On my PC, this results in:
f1(xs, ys) == f2(xs, ys) == f3(xs, ys) == f4(xs, ys) = true
548.508 ns (8 allocations: 1.23 KiB)
3.792 μs (49 allocations: 2.45 KiB)
1.916 μs (51 allocations: 3.17 KiB)
353.880 ns (8 allocations: 912 bytes)
For xs = 1:300 and ys=0:400 I get:
f1(xs, ys) == f2(xs, ys) == f3(xs, ys) == f4(xs, ys) = true
1.538 ms (13 allocations: 5.51 MiB)
1.032 ms (1636 allocations: 3.72 MiB)
16.668 ms (360924 allocations: 24.95 MiB)
927.001 μs (10 allocations: 3.67 MiB)
Edit:
By far the fastest method is a direct loop over a preallocated array:
function f5(xs, ys)
lx, ly = length(xs), length(ys)
res = Array{Base.promote_eltype(xs, ys), 2}(lx*ly, 2)
ind = 1
for y in ys, x in xs
res[ind, 1] = x
res[ind, 2] = y
ind += 1
end
res
end
For xs = 1:3 and ys = 0:4, f5 takes 65.339 ns (1 allocation: 336 bytes).
For xs = 1:300 and ys = 0:400, it takes 280.852 μs (2 allocations: 1.84 MiB).
Edit 2:
Including f6 from Dan Getz' comment:
function f6(xs, ys)
lx, ly = length(xs), length(ys)
lxly = lx*ly
res = Array{Base.promote_eltype(xs, ys), 2}(lxly, 2)
ind = 1
while ind<=lxly
#inbounds for x in xs
res[ind] = x
ind += 1
end
end
for y in ys
#inbounds for i=1:lx
res[ind] = y
ind += 1
end
end
res
end
By respecting the column-major order of Julia arrays, it reduces the timings to 47.452 ns (1 allocation: 336 bytes) and 171.709 μs (2 allocations: 1.84 MiB), respectively.

This seems to do the trick. Not sure it is the best solution though. Seems a bit convoluted.
xs = 0:3;
ys = 0:3;
out = [[xs[i] for i in 1:length(xs), j in 1:length(ys)][:] [ys[j] for i in 1:length(xs), j in 1:length(ys)][:]]

sounds like a job for repeat:
hcat(repeat(0:3, outer=4), repeat(0:3, inner=4)).
Note that, it's way too slower than array comprehension when xs or ys is small(e.g. 3,30).

Related

How can I check if a string is empty?

How can I check if a string is empty?
I am currently using the == operator:
julia> x = "";
julia> x == "";
true
Use isempty. It is more explicit and more likely to be optimized for its use case.
For example, on the latest Julia:
julia> using BenchmarkTools
julia> myisempty(x::String) = x == ""
foo (generic function with 1 method)
julia> #btime myisempty("")
2.732 ns (0 allocations: 0 bytes)
true
julia> #btime myisempty("bar")
3.001 ns (0 allocations: 0 bytes)
false
julia> #btime isempty("")
1.694 ns (0 allocations: 0 bytes)
true
julia> #btime isempty("bar")
1.594 ns (0 allocations: 0 bytes)
false

Is there a lazy `filter` in Julia?

In Python one can use if in the list comprehension to filter out elements. In Julia is there a lazy filter equivalent?
for x in filter(x->x<2, 1:3)
println(x)
end
works and prints only 1 but filter(x->x<2, 1:3) is eager and so may not be desirable for billions of records.
You can do this just like in Python:
julia> function f()
for x in (i for i in 1:10^9 if i == 10^9)
println(x)
end
end
f (generic function with 1 method)
julia> #time f()
1000000000
3.293702 seconds (139.87 k allocations: 7.107 MiB)
julia> #time f()
1000000000
3.224707 seconds (11 allocations: 352 bytes)
and you see that it does not allocate. But it is faster to just perform a filter test inside the loop without using a generator:
julia> function g()
for x in 1:10^9
x == 10^9 && println(x)
end
end
g (generic function with 1 method)
julia> #time g()
1000000000
2.098305 seconds (53.49 k allocations: 2.894 MiB)
julia> #time g()
1000000000
2.094018 seconds (11 allocations: 352 bytes)
Edit Finally you can use Iterators.filter:
julia> function h()
for x in Iterators.filter(==(10^9), 1:10^9)
println(x)
end
end
h (generic function with 1 method)
julia>
julia> #time h()
1000000000
0.390966 seconds (127.96 k allocations: 6.599 MiB)
julia> #time h()
1000000000
0.311650 seconds (12 allocations: 688 bytes)
which in this case will be fastest (see also https://docs.julialang.org/en/latest/base/iterators/#Iteration-utilities-1).
You might also want to check out https://github.com/JuliaCollections/IterTools.jl.
EDIT 2
Sometimes Julia is more powerful than you would think. Check this out:
julia> function g2()
for x in 1:1_000_000_000
x == 1_000_000_000 && println(x)
end
end
g2 (generic function with 1 method)
julia>
julia> #time g2()
1000000000
0.029332 seconds (62.91 k allocations: 3.244 MiB)
julia> #time g2()
1000000000
0.000636 seconds (11 allocations: 352 bytes)
and we see that the compiler has essentially compiled out all our computations.
In essence - in the earlier example constant propagation kicked in and replaced 10^9 by 1_000_000_000 in the Iterators.filter example.
Therefore we have to devise a smarter test. Here it goes:
julia> using BenchmarkTools
julia> function f_rand(x)
s = 0.0
for v in (v for v in x if 0.1 < v < 0.2)
s += v
end
s
end
f_rand (generic function with 1 method)
julia> function g_rand(x)
s = 0.0
for v in x
if 0.1 < v < 0.2
s += v
end
end
s
end
g_rand (generic function with 1 method)
julia> function h_rand(x)
s = 0.0
for v in Iterators.filter(v -> 0.1 < v < 0.2, x)
s += v
end
s
end
h_rand (generic function with 1 method)
julia> x = rand(10^6);
julia> #btime f_rand($x)
2.032 ms (0 allocations: 0 bytes)
14922.291597613703
julia> #btime g_rand($x)
1.804 ms (0 allocations: 0 bytes)
14922.291597613703
julia> #btime h_rand($x)
2.035 ms (0 allocations: 0 bytes)
14922.291597613703
And now we get what I was originally expecting (a plain loop with if is the fastest).

CPS in OCaml: type doesn't check

I'm working on a very simple OCaml exercise on CPS. Line 8-10 is to convert two recursive calls into one tail recursion. However, the compiler complains the type of Line 8:
File "tmp.ml", line 8, characters 9-14:
Error: This expression has type int -> int -> (int -> int) -> int
but an expression was expected of type int
I understand that the compiler expects an int at line 8 because line 6 returns an int. But can someone illustrate why the type of Line 8-10 is not an int?
4 let rec f i n k (i:int) (n:int) (k:int->int) :int =
5 if i + n < 0 then
6 k 1
7 else
8 (f i (n-1) (fun v ->
9 f (i-1) n (fun vv->
10 k (v + vv))))
11 in f 1 1 (fun x -> x)
f i n-1 is parsed as (f i n)-1 rather than the f i (n-1) you presumably expect.
Additionally,
let rec f i n k (i:int) (n:int) (k:int->int) :int
means that your function takes 6 arguments: i, n, k, i, n and k. You probably meant to write:
let rec f (i:int) (n:int) (k:int->int) :int

Abstract typed array construction JIT performance

In one of my application, I have to store elements of different subtypes in the array and I got big hit by the JIT performance.
Below is a minimal example.
abstract A
immutable B <: A end
immutable C <: A end
b = B()
c = C()
#time getindex(A, b, b)
#time getindex(A, b, c)
#time getindex(A, c, c)
#time getindex(A, c, b)
#time getindex(A, b, c, b)
#time getindex(A, b, c, c);
0.007756 seconds (6.03 k allocations: 276.426 KB)
0.007878 seconds (5.01 k allocations: 223.087 KB)
0.005175 seconds (2.44 k allocations: 128.773 KB)
0.004276 seconds (2.42 k allocations: 127.546 KB)
0.004107 seconds (2.45 k allocations: 129.983 KB)
0.004090 seconds (2.45 k allocations: 129.983 KB)
As you see, each time I construct the array for different combination of elements, it has to do a JIT.
I also tried [...] instead of T[...], it appeared worse.
Restart the kernel and run the following:
b = B()
c = C()
#time Base.vect(b, b)
#time Base.vect(b, c)
#time Base.vect(c, c)
#time Base.vect(c, b)
#time Base.vect(b, c, b)
#time Base.vect(b, c, c);
0.008252 seconds (6.87 k allocations: 312.395 KB)
0.149397 seconds (229.26 k allocations: 12.251 MB)
0.006778 seconds (6.86 k allocations: 312.270 KB)
0.113640 seconds (178.26 k allocations: 9.132 MB, 3.04% gc time)
0.050561 seconds (99.19 k allocations: 5.194 MB)
0.031053 seconds (72.50 k allocations: 3.661 MB)
In my application I face a lot of different subtypes: each element is of type NTuple{N, A} where N can change. So in the end the application was stuck in JIT.
What's the best way to get around it? The only way I can think of is to create a wrapper, say W, and box all my element into W before entering the array. So the compiler only compiles the array function once.
immutable W
value::NTuple
end
Thanks to #Matt B. after overloading his getindex,
c = C()
#time getindex(A, b, b)
#time getindex(A, b, c)
#time getindex(A, c, c)
#time getindex(A, c, b)
#time getindex(A, b, c, b)
#time getindex(A, b, c, c);
0.008493 seconds (6.43 k allocations: 289.646 KB)
0.000867 seconds (463 allocations: 19.012 KB)
0.000005 seconds (5 allocations: 240 bytes)
0.000003 seconds (5 allocations: 240 bytes)
0.004035 seconds (2.37 k allocations: 122.535 KB)
0.000003 seconds (5 allocations: 256 bytes)
Also, I realized the JIT of tuple is actually quite efficient.
#time tuple(1,2)
#time tuple(b, b)
#time tuple(b, c)
#time tuple(c, c)
#time tuple(c, b)
#time tuple(b, c, b)
#time tuple(b, c, c);
#time tuple(b, b)
#time tuple(b, c)
#time tuple(c, c)
#time tuple(c, b)
#time tuple(b, c, b)
#time tuple(b, c, c);
0.000004 seconds (149 allocations: 10.183 KB)
0.000011 seconds (7 allocations: 336 bytes)
0.000008 seconds (7 allocations: 336 bytes)
0.000007 seconds (7 allocations: 336 bytes)
0.000007 seconds (7 allocations: 336 bytes)
0.000005 seconds (7 allocations: 352 bytes)
0.000004 seconds (7 allocations: 352 bytes)
0.000003 seconds (5 allocations: 192 bytes)
0.000004 seconds (5 allocations: 192 bytes)
0.000002 seconds (5 allocations: 192 bytes)
0.000002 seconds (5 allocations: 192 bytes)
0.000002 seconds (5 allocations: 192 bytes)
0.000002 seconds (5 allocations: 192 bytes)
The JIT heuristics here could probably be better tuned in the base library. While Julia does default to generating specialized methods for unique permutations of argument types, there are a few escape hatches you can use to reduce the number of specializations:
Use f(T::Type) instead of f{T}(::Type{T}). Both are well-typed and behave nicely through inference, but the former will only generate one method for all types.
Use the undocumented all-caps g(::ANY) flag instead of g(::Any). It's semantically identical, but ANY will prevent specialization for that argument.
In this case, you probably want to specialize on the type but not the values:
function Base.getindex{T<:A}(::Type{T}, vals::ANY...)
a = Array(T,length(vals))
#inbounds for i = 1:length(vals)
a[i] = vals[i]
end
return a
end

Defining a recursive function as iterative?

I have the following recursive function that I need to convert to iterative in Scheme
(define (f n)
(if (< n 3) n
(+
(f (- n 1))
(* 2 (f(- n 2)))
(* 3 (f(- n 3)))
)
))
My issue is that I'm having difficulty converting it to iterative (i.e. make the recursion have linear execution time). I can think of no way to do this, because I just can't figure out how to do this.
The function is defined as follows:
f(n) = n if n<3 else f(n-1) + 2f(n-2) + 3f(n-3)
I have tried to calculate it for 5 linearly, like so
1 + 2 + f(3) + f(4) + f(5)
But in order to calculate say f(5) I'd need to refer back to f(4), f(3), f(2) and for f(4) Id have to refer back to f(3), f(2), f(1)
This is a problem from the SICP book.
In the book, authors have an example of formulating an iterative process for computing the Fibonacci numbers.
(define (fib n)
(fib-iter 1 0 n))
(define (fib-iter a b count)
(if (= count 0)
b
(fib-iter (+ a b) a (- count 1))))
The point here is that use two parameter a and b to memorise f(n+1) and f(n) during computing. The similar could be applied: we need a, b, c to memorise f(n+2), f(n+1) and f(n)
;; an interative process implementation
(define (f-i n)
;; f2 is f(n+2), f1 is f(n+1), f0 is f(n)
(define (interative-f f2 f1 f0 count)
(cond
((= count 0) f0)
(else (interative-f
(+ f2 (* f1 2) (* f0 3))
f2
f1
(- count 1)))))
(interative-f 2 1 0 n))

Resources