Julia has the very nice feature of having access to its own syntactic tree, which makes it easy to generate new functions programatically, but it's much slower than the normal Julia code.
For example:
julia> timing = #time for i in [1:100] tan(pi/2*rand()); end
elapsed time: 1.513e-5 seconds (896 bytes allocated)
julia> timing = #time for i in [1:100] x = pi/2*rand(); eval(:(tan(x))); end
elapsed time: 0.0080231 seconds (23296 bytes allocated)
julia> timing = #time for i in [1:100] eval(:(tan(pi/2*rand()))); end
elapsed time: 0.017245327 seconds (90496 bytes allocated)
Is there a way to give to eval the same speed as the normal Julia code?
EDIT:
I was able to slightly speed up eval using the precompile function, but that still not enough:
julia> tmp3 = :(sin(x))
:(sin(x))
julia> timing = #time for i in [1:100000] x = pi/2*rand(); eval(tmp3); end
elapsed time: 8.651145772 seconds (13602336 bytes allocated)
julia> precompile(tmp3,(Float64,Float64))
julia> timing = #time for i in [1:100000] x = pi/2*rand(); eval(tmp3); end
elapsed time: 8.611654016 seconds (13600048 bytes allocated)
EDIT2:
#Ivarne suggested me to provide details on my project. Well, I would like to use the meta-programming capabilities of Julia to calculate the symbolic derivatives and run them.
I wrote a function derivative(ex::Expr,arg::Symbol) that takes and expression and an argument, and returns a new expression that is the derivative of ex with respect to arg. Unfortunately, the resulting Expr takes too long to evaluate.
EDIT3: as a conclusion, the performances using #eval instead of eval:
julia> timing = #time for i in [1:100000] x = pi/2*rand(); #eval(tmp3); end
elapsed time: 0.005821547 seconds (13600048 bytes allocated)
tmp3 is still :(sin(x))
If you need speed, you shouldn't use eval, because it has to do lots of work to generate optimized fast code every time.
If you want to manipulate expressions, you should look at macros instead. They operate on expressions and return expressions that will be compiled once. See http://docs.julialang.org/en/latest/manual/metaprogramming/.
If you provide some details on your problem, and not only performance testing on eval, it will be easier to point you in the right direction. Making eval in julia faster is a project, not a question for StackOverflow.
Edit:
There is already some of that functionality in Calculus.jl, and I think it will be best if you do something like:
myexpr = :(sin(x))
myexpr_dx = derivative(myxpr)
#eval myfun(x) = $myexpr
#eval myfun_dx(x) = $myexpr_dx
So that you get a function you can evaluate instead of an expression. You can then do performance testing on myfun(x) and myfun_dx()
Related
I wanted make a macro that creates some code for me. E.g.
I have a vector x = [9,8,7] and I want to use a macro to generate this piece of code vcat(x[1], x[2], x[3]) and run it. And I want it to work for arbitrary length vectors.
I have made the macro as below
macro some_macro(a)
quote
astr = $(string(a))
s = mapreduce(aa -> string(astr,"[",aa,"],"), string, 1:length($(a)))
eval(parse(string("vcat(", s[1:(end-1)],")")))
end
end
x = [7,8,9]
#some_macro x
The above works. But when I try to wrap it inside a function
function some_fn(y)
#some_macro y
end
some_fn([4,5,6])
It doesn't work and gives error
UndefVarError: y not defined
and it highlights the below as the culprit
s = mapreduce(aa -> string(astr,"[",aa,"],"), string, 1:length($(a)))
Edit
See julia: efficient ways to vcat n arrays
for advanced example why I want to do instead of using the splat operator
You don't really need macros or generated functions for this. Just use vcat(x...). The three dots are the "splat" operator — it unpacks all the elements of x and passes each as a separate argument to vcat.
Edit: to more directly answer the question as asked: this cannot be done in a macro. Macros are expanded at parse time, but this transformation requires you to know the length of the array. At global scope and in simple tests it may appear that it's working, but it's only working because the argument is defined at parse time. In a function or in any real use-cases, however, that's not the case. Using eval inside a macro is a major red flag and really shouldn't be done.
Here's a demo. You can create a macro that vcats three arguments safely and easily. Note that you should not construct strings of "code" at all here, you can just construct an array of expressions with the :( ) expression quoting syntax:
julia> macro vcat_three(x)
args = [:($(esc(x))[$i]) for i in 1:3]
return :(vcat($(args...)))
end
#vcat_three (macro with 1 method)
julia> #macroexpand #vcat_three y
:((Main.vcat)(y[1], y[2], y[3]))
julia> f(z) = #vcat_three z
f([[1 2], [3 4], [5 6], [7 8]])
3×2 Array{Int64,2}:
1 2
3 4
5 6
So that works just fine; we esc(x) to get hygiene right and splat the array of expressions directly into the vcat call to generate that argument list at parse time. It's efficient and fast. But now let's try to extend it to support length(x) arguments. Should be simple enough. We'll just need to change 1:3 to 1:n, where n is the length of the array.
julia> macro vcat_n(x)
args = [:($(esc(x))[$i]) for i in 1:length(x)]
return :(vcat($(args...)))
end
#vcat_n (macro with 1 method)
julia> #macroexpand #vcat_n y
ERROR: LoadError: MethodError: no method matching length(::Symbol)
But that doesn't work — x is just a symbol to the macro, and of course length(::Symbol) doesn't mean what we want. It turns out that there's absolutely nothing you can put there that works, simply because Julia cannot know how large x is at compile time.
Your attempt is failing because your macro returns an expression that constructs and evals a string at run-time, and eval does not work in local scopes. Even if that could work, it'd be devastatingly slow… much slower than splatting.
If you want to do this with a more complicated expression, you can splat a generator: vcat((elt[:foo] for elt in x)...).
FWIW, here is the #generated version I mentioned in the comment:
#generated function vcat_something(x, ::Type{Val{N}}) where N
ex = Expr(:call, vcat)
for i = 1:N
push!(ex.args, :(x[$i]))
end
ex
end
julia> vcat_something(x, Val{length(x)})
5-element Array{Float64,1}:
0.670889
0.600377
0.218401
0.0171423
0.0409389
You could also remove #generated prefix to see what Expr it returns:
julia> vcat_something(x, Val{length(x)})
:((vcat)(x[1], x[2], x[3], x[4], x[5]))
Take a look at the benchmark results below:
julia> using BenchmarkTools
julia> x = rand(100)
julia> #btime some_fn($x)
190.693 ms (11940 allocations: 5.98 MiB)
julia> #btime vcat_something($x, Val{length(x)})
960.385 ns (101 allocations: 2.44 KiB)
The huge performance gap is mainly due to the fact that #generated function is firstly executed and executed only once at compile time(after the type inference stage) for each N that you passed to it. When calling it with a vector x having the same length N, it won't run the for-loop, instead, it'll directly run the specialized compiled code/Expr:
julia> x = rand(77); # x with a different length
julia> #time some_fn(x);
0.150887 seconds (7.36 k allocations: 2.811 MiB)
julia> #time some_fn(x);
0.149494 seconds (7.36 k allocations: 2.811 MiB)
julia> #time vcat_something(x, Val{length(x)});
0.061618 seconds (6.25 k allocations: 359.003 KiB)
julia> #time vcat_something(x, Val{length(x)});
0.000023 seconds (82 allocations: 2.078 KiB)
Note that, we need to pass the length of x to it ala a value type(Val), since Julia can't get that information(unlike NTuple, Vector only has one type parameter) at compile time.
EDIT:
see Matt's answer for the right and simplest way to solve the problem, I gonna leave the post here since it's relevant and might be helpful when dealing with splatting penalty.
Let x::Vector{Vector{T}}. What is the best way to iterate over all the elements of each inner vector (that is, all elements of type T)? The best I can come up with is a double iteration using the single-line notation, ie:
for n in eachindex(x), m in eachindex(x[n])
x[n][m]
end
but I'm wondering if there is a single iterator, perhaps in the Iterators package, designed specifically for this purpose, e.g. for i in some_iterator(x) ; x[i] ; end.
More generally, what about iterating over the inner-most elements of any array of arrays (that is, arrays of any dimension)?
Your way
for n in eachindex(x), m in eachindex(x[n])
x[n][m]
end
is pretty fast. If you want best speed, use
for n in eachindex(x)
y = x[n]
for m in eachindex(y)
y[m]
end
end
which avoids dereferencing twice (the first dereference is hard to optimize out because arrays are mutable, and so getindex isn't pure). Alternatively, if you don't need m and n, you could just use
for y in x, for z in y
z
end
which is also fast.
Note that column-major storage is irrelevant, since all arrays here are one-dimensional.
To answer your general question:
If the number of dimensions is a compile-time constant, see Base.Cartesian
If the number of dimensions is not a compile-time constant, use recursion
And finally, as Dan Getz mentioned in a comment:
using Iterators
for z in chain(x...)
z
end
also works. This however has a bit of a performance penalty.
I'm wondering if there is a single iterator, perhaps in the Iterators package, designed specifically for this purpose, e.g. for i in some_iterator(x) ; x[i] ; end
Today (in Julia 1.x versions), Iterators.flatten is exactly this.
help?> Iterators.flatten
flatten(iter)
Given an iterator that yields iterators, return an iterator that
yields the elements of those iterators. Put differently, the
elements of the argument iterator are concatenated.
julia> x = [1:5, [π, ℯ, 42], 'a':'e']
3-element Vector{AbstractVector}:
1:5
[3.141592653589793, 2.718281828459045, 42.0]
'a':1:'e'
julia> for el in Iterators.flatten(x)
print(el, " ")
end
1 2 3 4 5 3.141592653589793 2.718281828459045 42.0 a b c d e
julia>
I'd like to generate identical random numbers in R and Julia. Both languages appear to use the Mersenne-Twister library by default, however in Julia 1.0.0:
julia> using Random
julia> Random.seed!(3)
julia> rand()
0.8116984049958615
Produces 0.811..., while in R:
set.seed(3)
runif(1)
produces 0.168.
Any ideas?
Related SO questions here and here.
My use case for those who are interested: Testing new Julia code that requires random number generation (e.g. statistical bootstrapping) by comparing output to that from equivalent libraries in R.
That is an old problem.
Paul Gilbert addressed the same issue in the late 1990s (!!) when trying to assert that simulations in R (then then newcomer) gave the same result as those in S-Plus (then the incumbent).
His solution, and still the golden approach AFAICT: re-implement in fresh code in both languages as the this the only way to ensure identical seeding, state, ... and whatever else affects it.
Pursuing the RCall suggestion made by #Khashaa, it's clear that you can set the seed and get the random numbers from R.
julia> using RCall
julia> RCall.reval("set.seed(3)")
RCall.NilSxp(16777344,Ptr{Void} #0x0a4b6330)
julia> a = zeros(Float64,20);
julia> unsafe_copy!(pointer(a), RCall.reval("runif(20)").pv, 20)
Ptr{Float64} #0x972f4860
julia> map(x -> #printf("%20.15f\n", x), a);
0.168041526339948
0.807516399072483
0.384942351374775
0.327734317164868
0.602100674761459
0.604394054040313
0.124633444240317
0.294600924244151
0.577609919011593
0.630979274399579
0.512015897547826
0.505023914156482
0.534035353455693
0.557249435689300
0.867919487645850
0.829708693316206
0.111449153395370
0.703688358888030
0.897488264366984
0.279732553754002
and from R:
> options(digits=15)
> set.seed(3)
> runif(20)
[1] 0.168041526339948 0.807516399072483 0.384942351374775 0.327734317164868
[5] 0.602100674761459 0.604394054040313 0.124633444240317 0.294600924244151
[9] 0.577609919011593 0.630979274399579 0.512015897547826 0.505023914156482
[13] 0.534035353455693 0.557249435689300 0.867919487645850 0.829708693316206
[17] 0.111449153395370 0.703688358888030 0.897488264366984 0.279732553754002
** EDIT **
Per the suggestion by #ColinTBowers, here's a simpler/cleaner way to access R random numbers from Julia.
julia> using RCall
julia> reval("set.seed(3)");
julia> a = rcopy("runif(20)");
julia> map(x -> #printf("%20.15f\n", x), a);
0.168041526339948
0.807516399072483
0.384942351374775
0.327734317164868
0.602100674761459
0.604394054040313
0.124633444240317
0.294600924244151
0.577609919011593
0.630979274399579
0.512015897547826
0.505023914156482
0.534035353455693
0.557249435689300
0.867919487645850
0.829708693316206
0.111449153395370
0.703688358888030
0.897488264366984
0.279732553754002
See:
?set.seed
"Mersenne-Twister":
From Matsumoto and Nishimura (1998). A twisted GFSR with period 2^19937 - 1 and equidistribution in 623 consecutive dimensions (over the whole period). The ‘seed’ is a 624-dimensional set of 32-bit integers plus a current position in that set.
And you might see if you can link to the same C code from both languages. If you want to see the list/vector, type:
.Random.seed
In MatLab/Octave you could send a command "format long g" and have default numerical output in the REPL formatted like the following:
octave> 95000/0.05
ans = 1900000
Is it possible to get a similar behavior in Julia? Currently with julia
Version 0.3.0-prerelease+3930 (2014-06-28 17:54 UTC)
Commit bdbab62* (6 days old master)
x86_64-redhat-linux
I get the following number format.
julia> 95000/0.05
1.9e6
You can use the #printf macro to format. It behaves like the C printf, but unlike printf for C the type need not agree but is rather converted as necessary. For example
julia> using Printf
julia> #printf("Integer Format: %d",95000/0.05);
Integer Format: 1900000
julia> #printf("As a String: %s",95000/0.05);
As a String: 1.9e6
julia> #printf("As a float with column sized larger than needed:%11.2f",95000/0.05);
As a float with column sized larger than needed: 1900000.00
It is possible to use #printf as the default mechanism in the REPL because the REPL is implemented in Julia in Base.REPL, and in particular the following function:
function display(d::REPLDisplay, ::MIME"text/plain", x)
io = outstream(d.repl)
write(io, answer_color(d.repl))
writemime(io, MIME("text/plain"), x)
println(io)
end
To modify the way Float64 is displayed, you merely need to redefine writemime for Float64.
julia> 95000/0.05
1.9e6
julia> Base.Multimedia.writemime(stream,::MIME"text/plain",x::Float64)=#printf("%1.2f",x)
writemime (generic function with 13 methods)
julia> 95000/0.05
1900000.00
Apologies if this rather general - albeit still a coding question.
With a bit of time on my hands I've been trying to learn a bit of Julia. I thought a good start would be to copy the R microbenchmark function - so I could seamlessly compare R and Julia functions.
e.g. this is microbenchmark output for 2 R functions that I am trying to emulate:
Unit: seconds
expr min lq median uq max neval
vectorised(x, y) 0.2058464 0.2165744 0.2610062 0.2612965 0.2805144 5
devectorised(x, y) 9.7923054 9.8095265 9.8097871 9.8606076 10.0144012 5
So thus far in Julia I am trying to write idiomatic and hopefully understandable/terse code. Therefore I replaced a double loop with a list comprehension to create an array of timings, like so:
function timer(fs::Vector{Function}, reps::Integer)
# funs=length(fs)
# times = Array(Float64, reps, funs)
# for funsitr in 1:funs
# for repsitr in 1:reps
# times[reps, funs] = #elapsed fs[funs]()
# end
# end
times= [#elapsed fs[funs]() for x=1:reps, funs=1:length(fs)]
return times
end
This gives an array of timings for each of 2 functions:
julia> test=timer([vec, devec], 10)
10x2 Array{Float64,2}:
0.231621 0.173984
0.237173 0.210059
0.26722 0.174007
0.265869 0.208332
0.266447 0.174051
0.266637 0.208457
0.267824 0.174044
0.26576 0.208687
0.267089 0.174014
0.266926 0.208741
My question (finally) is how do I idiomatically apply a function such as min, max, median across columns (or rows) of an array without using a loop?
I can of course do it easily for this simple case with a loop (sim to that I crossed out above)- but I can't find anything in the docs which is equivalent to say apply(array,1, fun) or even colMeans.
The closest generic sort of function I can think of is
julia> [mean(test[:,col]) for col=1:size(test)[2]]
2-element Array{Any,1}:
0.231621
0.237173
.. but the syntax really really doesn't appeal. Is there a more natural way to apply functions across columns or rows of a multidimensional array in Julia?
The function you want is mapslices.
Anonymous functions was are currently slow in julia, so I would not use them for benchmarking unless you benchmark anonymous functions. That will give wrong performance prediction for code that does not use anonymous functions in performance critical parts of the code.
I think you want the two argument version of the reduction functions, like sum(arr, 1) to sum over the first dimension. If there isn't a library function available, you might use reducedim
I think #ivarne has the right answer (and have ticked it) but I just add that I made an apply like function:
function aaply(fun::Function, dim::Integer, ar::Array)
if !(1 <= dim <= 2)
error("rows is 1, columns is 2")
end
if(dim==1)
res= [fun(ar[row, :]) for row=1:size(ar)[dim]]
end
if(dim==2)
res= [fun(ar[:,col]) for col=1:size(ar)[dim]]
end
return res
end
this then gets what I want like so:
julia> aaply(quantile, 2, test)
2-element Array{Any,1}:
[0.231621,0.265787,0.266542,0.267048,0.267824]
[0.173984,0.174021,0.191191,0.20863,0.210059]
where quantile is a built-in that gives min, lq, median, uq, and max.. just like microbenchmark.
EDIT Following the advice here I tested the new function mapslice which works pretty much like R apply and benchmarked it against the function above. Note that mapslice has dim=1 as by column slice whilst test[:,1] is the first column... so the opposite of R though it has the same indexing?
# nonsense test data big columns
julia> ar=ones(Int64,1000000,4)
1000000x4 Array{Int64,2}:
# built in function
julia> ms()=mapslices(quantile,ar,1)
ms (generic function with 1 method)
# my apply function
julia> aa()=aaply(quantile, 2, ar)
aa (generic function with 1 method)
# compare both functions
julia> aaply(quantile, 2, timer1([ms, aa], 40))
2-element Array{Any,1}:
[0.23566,0.236108,0.236348,0.236735,0.243008]
[0.235401,0.236058,0.236257,0.236686,0.238958]
So the funs are approximately as quick as each other. From reading bits of the Julia mailing list they seem to intend to do some work on this bit of Julialang so that making slices is by reference rather than making new copies of each slice (column row etc)...