Interpreting Body expression from #code_warntype - julia

If we run:
#code_warntype deepcopy(rand(2))
at the Julia REPL, the output contains flagged values in the Body expression. Specifically, the two Any at the end of:
Body:
begin # deepcopy.jl, line 8:
GenSym(0) = (Base.Array)(Base.Any,32)::Array{Any,1}
return (Base.deepcopy_internal)(x::Array{Float64,1},$(Expr(:new, :((top(getfield))(Base,:ObjectIdDict)::Type{ObjectIdDict}), GenSym(0))))::Any
end::Any
I understand from this question that we usually don't need to worry about flagged values in the Body expression if our primary concern is type instability. So instead, my question is this:
Why does a fairly simple function from Base generate any flagged values in #code_warntype? I'm sure there are good reasons, but I am new at interpreting the output from #code_warntype, and had some trouble understanding the discussion of the Body expression from the official docs.

This is an example of a situation where type inference is unable to figure out the return type of a function. (Note the ::Any on the return value!) It is a problem, not because the computation itself will be slower because of type instability, but because the return type cannot be inferred, and so future computations using the return type will suffer from type instability.
You can see this effect by looking at allocations below:
julia> function f()
y = rand(10)
#time y[1] + y[10]
z = deepcopy(y)
#time z[1] + z[10]
end
f (generic function with 1 method)
julia> f(); # ignore output here on first compile
julia> f();
0.000000 seconds
0.000002 seconds (3 allocations: 48 bytes)
Note that the second operations requires allocations and takes time, because unboxing and dynamic dispatch are involved.
In the current nightly build of what will become 0.5 (which will likely be released within a few months), this has been fixed. Thus
julia> #code_warntype deepcopy(rand(2))
Variables:
#self#::Base.#deepcopy
x::Array{Float64,1}
Body:
begin # deepcopy.jl, line 8:
# meta: location dict.jl Type # dict.jl, line 338:
SSAValue(1) = (Core.ccall)(:jl_alloc_array_1d,(Core.apply_type)(Core.Array,Any,1)::Type{Array{Any,1}},(Core.svec)(Core.Any,Core.Int)::SimpleVector,Array{Any,1},0,32,0)::Array{Any,1}
# meta: pop location
return (Core.typeassert)((Base.deepcopy_internal)(x::Array{Float64,1},$(Expr(:new, :(Base.ObjectIdDict), SSAValue(1))))::Any,Array{Float64,1})::Array{Float64,1}
end::Array{Float64,1}
which has no type instability, and
julia> f()
0.000000 seconds
0.000000 seconds
which has no dynamic dispatch and no allocations.

Related

How to store the output of #time to a variable?

Is possible to store the time displayed with #time in a variable ?
For example the following code
for i in 1:10
#time my_function(i)
end
displays the wall time of my function my_function, but I would like to store the number of milliseconds in an array instead, in order to display it in a plot showing the evolution of the execution time regarding the parameter i.
The simplest is to use #elapsed, e.g.:
julia> [#elapsed rand(5^i) for i in 1:10]
10-element Vector{Float64}:
3.96e-6
4.64e-7
7.55e-7
3.909e-6
4.43e-6
1.5367e-5
7.0791e-5
0.000402877
0.001831287
0.071062595
and if you use BenchmarkTools.jl then there is also #belapsed macro there for more accurate benchmarking than #elapsed.
EDIT:
#time: is printing the time it took to execute, the number of allocations, and the total number of bytes its execution caused to be allocated, before returning the value of the expression. Any time spent garbage collecting (gc) or compiling is shown as a percentage.
#elapsed: discarding the resulting value, instead returning the number of seconds it took to execute as a floating-point number
I would like to add another example using #elapsed begin to show how it can be used to time multiple lines of code:
dt = #elapsed begin
x = 1
y = 2
z = x^2 + y
print(z)
end
Additionally, if this is not for benchmarking code and you just want time as an output you can alternatively use time():
t = time()
x = 1
y = 2
z = x^2 + y
print(z)
dt = time() - t

(Julia 1.x) Increased allocations by initialising variables?

A programme I am writing has a user-written file containing parameters which are to be read in and implemented within the code. Users should be able to comment their input file by delimiting them with a comment character (I have gone with "#", in convention with Julia) - in parsing the input file, the code will remove these comments. Whilst making minor optimisations to this parser, I noted that instantiating the second variable prior to calling split() made a noticeable difference to the number allocations:
function removecomments1(line::String; dlm::String="#")
str::String = ""
try
str, tmp = split(line, dlm)
catch
str = line
finally
return str
end
end
function removecomments2(line::String; dlm::String="#")
str::String = ""
tmp::SubString{String} = ""
try
str, tmp = split(line, dlm)
catch
str = line
finally
return str
end
end
line = "Hello world # my comment"
#time removecomments1(line)
#time removecomments2(line)
$> 0.016092 seconds (27.31 k allocations: 1.367 MiB)
0.016164 seconds (31.26 k allocations: 1.548 MiB)
My intuition (coming from a C++ background) tells me that initialising both variables should have resulted in an increase in speed as well as minimising further allocations, since the compiler has already been told that a second variable is required as well as its corresponding type, however this doesn't appear to hold. Why would this be the case?
Aside: Are there any more efficient ways of achieving the same result as these functions?
EDIT:
Following a post by Oscar Smith, initialising str as type SubString{String} instead of String has reduced the allocations by around 10%:
$> 0.014811 seconds (24.29 k allocations: 1.246 MiB)
0.015045 seconds (28.25 k allocations: 1.433 MiB)
In your example, the only reason you need the try-catch block is because you're trying to destructure the output of split even though split will return a one element array when the input line has no comments. If you simply extract the first element from the output of split, then you can avoid the try-catch construct, which will save you time and memory:
julia> using BenchmarkTools
julia> removecomments3(line::String; dlm::String = "#") = first(split(line, dlm))
removecomments3 (generic function with 1 method)
julia> #btime removecomments1($line);
198.522 ns (5 allocations: 224 bytes)
julia> #btime removecomments2($line);
208.507 ns (6 allocations: 256 bytes)
julia> #btime removecomments3($line);
147.001 ns (4 allocations: 192 bytes)
In partial answer to your original question, pre-allocation is mainly used for arrays, not for strings or other scalars. For more discussion of when to use pre-allocation, check out this SO post.
To reason about what this is doing, think about what the split function would return if it was written in c++. It would not be copying, but would instead return a char*. As such, all that str::String = "" is doing is making Julia create an extra string object to ignore.

Changing datatype UInt64 to Float in Julia time

I am trying to calculate running time of a function in Julia. For example:
time = tic(); 7^12000000; toc()
I want to get the result as float. Type of "time" is Uint64, can anyone help me to convert it to Float64?
Thanks in advance
The issue is that tic and toc got removed in Julia 1.0 (on 0.7 they work but throw a deprecation warning). What I propose below works on Julia 0.6, 0.7 and 1.0.
You can use:
#elapsed macro from Base which returns time taken in seconds as Float64 (which, in particular, returns compilation time and run time on the first call of the benchmarked function but only runt time on consecutive runs as the called function will already be compiled)
#belapsed macro from BenchmarkTools.jl which returns the same but is more sophisticated (see BenchmarkTools.jl for details, but the main difference is that it runs your function many times and reports minimum observed time)
Here is an example:
julia> #elapsed sum(rand(10^6)) # includes compilation time
0.182671045
julia> #elapsed sum(rand(10^6)) # benchmarked functions are already precompiled
0.007848933
julia> using BenchmarkTools
julia> #belapsed sum(rand(10^6)) # minimum time from many runs
0.006249196
Your question is not clear. tic() and toc() do not exist in Julia. Use the macro #time.
julia> #time Float64(UInt(7^12000))
0.000048 seconds (7 allocations: 208 bytes)
6.871777734182465e18

How to represent a performant heterogenous stack in Julia

I would like to implement a simple concatenative language (aka Joy or Factor) as a DSL in Julia and I am troubled how to optimally represent the stack.
The stack, which represents both data and program code, should be able to hold a sequence of items of different types. In the simplest case Ints, Symbols and, recursively again, stacks (to represent quoted code). The program will then heavily use push! and pop! to shuffle values between different such stacks.
One obvious implementation in Julia, which works but runs rather slow, is to use cell arrays. For example, the following Joy stack [ 1 [ 1 2 +] i + ] (which evaluates to [4]) can be implemented in Julia as
stack = Any[:+,:i,Any[:+,2,1],1]. My typical code then looks like this:
x = pop!(callstack)
if isa(x,Int)
push!(x,datastack)
elseif isa(x,Symbol)
do_stuff(x,datastack)
end
This, however, runs really slow and uses huge memory allocations, probably because such code is not typestable (which is a big performance bottleneck in Julia).
Using C, I would represent the stack compactly as an array (or alternatively as a linked list) of a union:
typedef union Stackelem{
int val;
char *sym;
union Stackelem *quote;
} Stackelem;
Stackelem stack[n];
But how can I achieve such a compact representation of the heterogeneous stack in Julia, and how I avoid the type instability?
This is one way, another way would be to represent args with type Vector{Any}:
julia> immutable Exp
head::Symbol
args::Tuple
end
julia> q = Exp(:+, (1, Exp(:-, (3, 4))))
Exp(:+,(1,Exp(:-,(3,4))))
edit: Another way to represent it might be:
immutable QuoteExp{T} ; vec::Vector{T} ; end
typealias ExpTyp Union{QuoteExp, Int, Symbol}
typealias Exp QuoteExp{ExpTyp}
and then you can do the following:
julia> x = Exp(ExpTyp[:+, 1, 2])
QuoteExp{Union{Int64,QuoteExp{T},Symbol}}(Union{Int64,QuoteExp{T},Symbol}[:+,1,2])
julia> x.vec[1]
:+
julia> x.vec[2]
1
julia> x.vec[3]
2
julia> push!(x.vec,:Scott)
4-element Array{Union{Int64,QuoteExp{T},Symbol},1}:
:+
1
2
:Scott
julia> x.vec[4]
:Scott

Julia #evalpoly macro with varargs

I'm trying to grok using Julia's #evalpoly macro. It works when I supply the coefficients manually, but I've been unable to puzzle out how to provide these via an array
julia> VERSION
v"0.3.5"
julia> #evalpoly 0.5 1 2 3 4
3.25
julia> c = [1, 2, 3, 4]
4-element Array{Int64,1}:
1
2
3
4
julia> #evalpoly 0.5 c
ERROR: BoundsError()
julia> #evalpoly 0.5 c...
ERROR: BoundsError()
julia> #evalpoly(0.5, c...)
ERROR: BoundsError()
Can someone point me in the right direction on this?
Added after seeing the great answers to this question
There is one subtlety to that I hadn't seen until I played with some of these answers. The z argument to #evalpoly can be a variable, but the coefficients are expected to be literals
julia> z = 0.5
0.5
julia> #evalpoly z 1 2 3 4
3.25
julia> #evalpoly z c[1] c[2] c[3] c[4]
ERROR: c not defined
Looking at the output of the expansion of this last command, one can see that it is indeed the case that z is assigned to a variable in the expansion but that the coefficients are inserted literally into the code.
julia> macroexpand(:#evalpoly z c[1] c[2] c[3] c[4])
:(if Base.Math.isa(z,Base.Math.Complex)
#291#t = z
#292#x = Base.Math.real(#291#t)
#293#y = Base.Math.imag(#291#t)
#294#r = Base.Math.+(#292#x,#292#x)
#295#s = Base.Math.+(Base.Math.*(#292#x,#292#x),Base.Math.*(#293#y,#293#y))
#296#a2 = c[4]
#297#a1 = Base.Math.+(c[3],Base.Math.*(#294#r,#296#a2))
#298#a0 = Base.Math.+(Base.Math.-(c[2],Base.Math.*(#295#s,#296#a2)),Base.Math.*(#294#r,#297#a1))
Base.Math.+(Base.Math.*(#298#a0,#291#t),Base.Math.-(c[1],Base.Math.*(#295#s,#297#a1)))
else
#299#t = z
Base.Math.+(Base.Math.c[1],Base.Math.*(#299#t,Base.Math.+(Base.Math.c[2],Base.Math.*(#299#t,Base.Math.+(Base.Math.c[3],Base.Math.*(#299#t,Base.Math.c[4]))))))
end)
I don't believe what you are trying to do is possible, because #evalpoly is a macro - that means it generates code at compile-time. What it is generating is a very efficient implementation of Horner's method (in the real number case), but to do so it needs to know the degree of the polynomial. The length of c isn't known at compile time, so it doesn't (and cannot) work, whereas when you provide the coefficients directly it has everything it needs.
The error message isn't very good though, so if you can, you could file an issue on the Julia Github page?
UPDATE: In response to the update to the question, yes, the first argument can be a variable. You can think of it like this:
function dostuff()
z = 0.0
# Do some stuff to z
# Time to evaluate a polynomial!
y = #evalpoly z 1 2 3 4
return y
end
is becoming
function dostuff()
z = 0.0
# Do some stuff to z
# Time to evaluate a polynomial!
y = z + 2z^2 + 3z^3 + 4z^4
return y
end
except, not that, because its using Horners rule, but whatever. The problem is, it can't generate that expression at compile time without knowing the number of coefficients. But it doesn't need to know what z is at all.
Macros in Julia are applied to their arguments. To make this work, you need to ensure that c is expanded before #evalpoly is evaluated. This works:
function f()
c=[1,2,3,4]
#eval #evalpoly 0.5 $(c...)
end
Here, #eval evaluates its argument, and expands $(c...). Later, #evalpoly sees five arguments.
As written, this is probably not efficient since #eval is called every time the function f is called. You need to move the call to #eval outside the function definition:
c=[1,2,3,4]
#eval begin
function f()
#evalpoly 0.5 $(c...)
end
end
This calls #eval when f is defined. Obviously, c must be known at this time. Whenever f is actually called, c is not used any more; it is only used while f is being defined.
Erik and Iain have done a great job of explaining why #evalpoly doesn't work and how to coerce it into working. If you just want to evaluate the polynomial, however, the easiest solution is probably just to use Polynomials.jl:
julia> using Polynomials
c = [1,2,3,4]
polyval(Poly(c), 0.5)
3.25

Resources