I'm playing around with the Collatz conjecture (see below), and I have this function:
function txpo(max_n)
for n ∈ 1:max_n
x = n
while x ≥ n && x != 1
x = iseven(x) ? x÷2 : 3x+1
end
end
end
On purpose I don't return anything or check anything, to make it as fast as possible. And boy, is it fast... - actually too fast, I think.
using BenchmarkTools
#btime txpo(100_000_000_000_000_000_000_000_000_000_000_000_000)
2.337 ns (0 allocations: 0 bytes)
Running through 10^38 while loops takes only 2ns? Julia is fast, but I would be surprised if it was that fast. It seems like the code is not evaluated. Or what am I missing here?
Collatz Conjecture
Take a whole number and build a series by using the following rule: if the number is even divide by 2; if the number is odd multiply by 3 and add 1. Conjecture: every series will end up in the loop 4-2-1-4.
Indeed, the whole function body is optimized out:
julia> function txpo(max_n)
for n ∈ 1:max_n
x = n
while x ≥ n && x != 1
x = iseven(x) ? x÷2 : 3x+1
end
end
end
txpo (generic function with 1 method)
julia> #code_llvm txpo(100000000000)
; # REPL[1]:1 within `txpo`
define void #julia_txpo_179(i64 signext %0) #0 {
top:
; # REPL[1]:5 within `txpo`
ret void
}
If you use 10^38, this only changes the function signature to accept 128-bit integers.
The LLVM IR code is basically the same as this Julia code:
julia_txpo_179(_0::Int64) = nothing
This makes perfect sense because why execute a function that doesn't return anything that depends on its computations and has no side-effects?
Finally, the native code that your CPU executes is literally retq, a.k.a. "return to caller":
julia> #code_native txpo(100000000000)
.section __TEXT,__text,regular,pure_instructions
; ┌ # REPL[1]:5 within `txpo`
retq
nopw %cs:(%rax,%rax)
; └
julia>
Related
I am taking a course in data-structures and trying to replicate the same in Julia.
I am using the below code in Julia -
function factorial(n)
if n <= 1
return 1
else
return n*factorial(n-1)
end
end
It is working fine with number less than or equal to 20, but for 21 and greater I am getting negative value. Same logic is working fine in Python, below is the code -
def factorial(n):
assert n >= 0 and int(n) == n, 'The number must be positive integer only'
if n <= 1:
return 1
else:
return n*factorial(n-1)
can you pleas help me understand what might be the problem?
As 张实唯 mentioned in the comments, you could pass in a BigInt as an input to calculate larger numbers.
To keep your code type stable, return one(n) instead of 1. This will make sure that whatever type was sent as input will be returned, keeping your code type stable.
julia> function factorial(n)
if n <= 1
return one(n)
else
return factorial(n - 1) * n
end
end
factorial (generic function with 1 method)
Outputs
julia> typeof(factorial(10))
Int64
julia> typeof(factorial(BigInt(10)))
BigInt
julia> typeof(factorial(big"100"))
BigInt
julia> factorial(big"100")
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
Alternative one-liner
You could also write the above function as a line-liner, using Julia's ternary operator.
factorial(n) = n <= 1 ? one(n) : factorial(n-1) * n
As explained here:
In Julia, exceeding the maximum representable value of a given type results in a wraparound behavior
And the range for Int64 type (used by default on 64-bit machines to store integers) is:
julia> typemin(Int64)
-9223372036854775808
julia> typemax(Int64)
9223372036854775807
Instead use BigInt that implements arbitrary precision integers. You can convert any integer into BigInt using the big function. The downside of this approach is that the function will be slower:
function factorial(n)
if n <= 1
return big(1)
else
return big(n) * factorial(n-1)
end
end
And now you have:
julia> factorial(21)
51090942171709440000
julia> factorial(100)
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
EDIT
The example showing that the code of OP is not type stable:
julia> using Test
julia> function factorial(n)
if n <= 1
return 1
else
return n*factorial(n-1)
end
end
factorial (generic function with 1 method)
julia> #inferred factorial(big"21")
ERROR: return type BigInt does not match inferred return type Union{Int64, BigInt}
I'm trying to build a function that will output an expression to be assigned to a new in-memory function. I might be misinterpreting the capability of metaprogramming but, I'm trying to build a function that generates a math series and assigns it to a function such as:
main.jl
function series(iter)
S = ""
for i in 1:iter
a = "x^$i + "
S = S*a
end
return chop(S, tail=3)
end
So, this will build the pattern and I'm temporarily working with it in the repl:
julia> a = Meta.parse(series(4))
:(x ^ 1 + x ^ 2 + x ^ 3 + x ^ 4)
julia> f =eval(Meta.parse(series(4)))
120
julia> f(x) =eval(Meta.parse(series(4)))
ERROR: cannot define function f; it already has a value
Obviously eval isn't what I'm looking for in this case but, is there another function I can use? Or, is this just not a viable way to accomplish the task in Julia?
The actual error you get has to do nothing with metaprogramming, but with the fact that you are reassigning f, which was assigned a value before:
julia> f = 10
10
julia> f(x) = x + 1
ERROR: cannot define function f; it already has a value
Stacktrace:
[1] top-level scope at none:0
[2] top-level scope at REPL[2]:1
It just doesn't like that. Call either of those variables differently.
Now to the conceptual problem. First, what you do here is not "proper" metaprogramming in Julia: why deal with strings and parsing at all? You can work directly on expressions:
julia> function series(N)
S = Expr(:call, :+)
for i in 1:N
push!(S.args, :(x ^ $i))
end
return S
end
series (generic function with 1 method)
julia> series(3)
:(x ^ 1 + x ^ 2 + x ^ 3)
This makes use of the fact that + belongs to the class of expressions that are automatically collected in repeated applications.
Second, you don't call eval at the appropriate place. I assume you meant to say "give me the function of x, with the body being what series(4) returns". Now, while the following works:
julia> f3(x) = eval(series(4))
f3 (generic function with 1 method)
julia> f3(2)
30
it is not ideal, as you newly compile the body every time the function is called. If you do something like that, it is preferred to expand the code once into the body at function definition:
julia> #eval f2(x) = $(series(4))
f2 (generic function with 1 method)
julia> f2(2)
30
You just need to be careful with hygiene here. All depends on the fact that you know that the generated body is formulated in terms of x, and the function argument matches that. In my opinion, the most Julian way of implementing your idea is through a macro:
julia> macro series(N::Int, x)
S = Expr(:call, :+)
for i in 1:N
push!(S.args, :($x ^ $i))
end
return S
end
#series (macro with 1 method)
julia> #macroexpand #series(4, 2)
:(2 ^ 1 + 2 ^ 2 + 2 ^ 3 + 2 ^ 4)
julia> #series(4, 2)
30
No free variables remaining in the output.
Finally, as has been noted in the comments, there's a function (and corresponding macro) evalpoly in Base which generalizes your use case. Note that this function does not use code generation -- it uses a well-designed generated function, which in combination with the optimizations results in code that is usually equal to the macro-generated code.
Another elegant option would be to use the multiple-dispatch mechanism of Julia and dispatch the generated code on type rather than value.
#generated function series2(p::Val{N}, x) where N
S = Expr(:call, :+)
for i in 1:N
push!(S.args, :(x ^ $i))
end
return S
end
Usage
julia> series2(Val(20), 150.5)
3.5778761722367333e43
julia> series2(Val{20}(), 150.5)
3.5778761722367333e43
This task can be accomplished with comprehensions. I need to RTFM...
https://docs.julialang.org/en/v1/manual/arrays/#Generator-Expressions
I want to overwrite a function in Julia using its old definition. It seems the way to do this would be to clone the function and overwrite the original using the copy — something like the following. However, it appears deepcopy(f) just returns a reference to f, so this doesn't work.
f(x) = x
f_old = deepcopy(f)
f(x) = 1 + f_old(x)
How can I clone a function?
Background: I'm interesting in writing a macro #override that allows me to override functions pointwise (or maybe even piecewise).
fib(n::Int) = fib(n-1) + fib(n-2)
#override fib(0) = 1
#override fib(1) = 1
This particular example would be slow and could be made more efficient using #memoize. There may be good reasons not to do this, but there may also be situations in which one does not know a function fully when it is defined and overriding is necessary.
We can do this using IRTools.jl.
(Note, on newer versions of IRTools, you may need to ask for IRTools.Inner.code_ir instead of IRTools.code_ir.)
using IRTools
fib(n::Int) = fib(n-1) + fib(n-2)
const fib_ir = IRTools.code_ir(fib, Tuple{Int})
const fib_old = IRTools.func(fib_ir)
fib(n::Int) = n < 2 ? 1 : fib_old(fib, n)
julia> fib(10)
89
What we did there was captured the intermediate representation of the function fib, and then rebuilt it into a new function which we called fib_old. Then we were free to overwrite the definition of fib in terms of fib_old! Notice that since fib_old was defined as recursively calling fib, not fib_old, there's no stack overflow when we call fib(10).
The other thing to notice is that when we called fib_old, we wrote fib_old(fib, n) instead of fib_old(n). This is due to how IRTools.func works.
According to Mike Innes on Slack:
In Julia IR, all functions take a hidden extra argument that represents the function itself
The reason for this is that closures are structs with fields, which you need access to in the IR
Here's an implementation of your #override macro with a slightly different syntax:
function _get_type_sig(fdef)
d = splitdef(fdef)
types = []
for arg in d[:args]
if arg isa Symbol
push!(types, :Any)
elseif #capture(arg, x_::T_)
push!(types, T)
else
error("whoops!")
end
end
if isempty(d[:whereparams])
:(Tuple{$(types...)})
else
:((Tuple{$(types...)} where {$(d[:whereparams]...)}).body)
end
end
macro override(cond, fdef)
d = splitdef(fdef)
shadowf = gensym()
sig = _get_type_sig(fdef)
f = d[:name]
quote
const $shadowf = IRTools.func(IRTools.code_ir($(d[:name]), $sig))
function $f($(d[:args]...)) where {$(d[:whereparams]...)}
if $cond
$(d[:body])
else
$shadowf($f, $(d[:args]...))
end
end
end |> esc
end
Now one can type
fib(n::Int) = fib(n-1) + fib(n-2)
#override n < 2 fib(n::Int) = 1
julia> fib(10)
89
The best part is that this is nearly as fast (at runtime, not compile time!) as if we had written the conditions into the original function!
n = 15
fib2(n::Int) = n < 2 ? 1 : fib2(n-1) + fib2(n-2)
julia> #btime fib($(Ref(15))[])
4.239 μs (0 allocations: 0 bytes)
89
julia> #btime fib2($(Ref(15))[])
3.022 μs (0 allocations: 0 bytes)
89
I really don't see why you'd want to do this (there must a better way to get what you want!).
Nonetheless, although not exactly equivalent you can get what you want by using anonymous functions:
julia> f = x->x
#3 (generic function with 1 method)
julia> f_old = deepcopy(f)
#3 (generic function with 1 method)
julia> f = x->1+f_old(x)
#5 (generic function with 1 method)
julia> f(4)
5
I am going to implement a program that uses recursion quite a bit. So, before I started to get stack overflows exceptions, I figured it would be nice to have a trampoline implemented and use thunks in case it was needed.
A first try I did was with factorial. Here the code:
callable(f) = !isempty(methods(f))
function trampoline(f, arg1, arg2)
v = f(arg1, arg2)
while callable(v)
v = v()
end
return v
end
function factorial(n, continuation)
if n == 1
continuation(1)
else
(() -> factorial(n-1, (z -> (() -> continuation(n*z)))))
end
end
function cont(x)
x
end
Also, I implemented a naive factorial to check if, as a matter of fact, I would be preventing stack overflows:
function factorial_overflow(n)
if n == 1
1
else
n*factorial_overflow(n-1)
end
end
The results are:
julia> factorial_overflow(140000)
ERROR: StackOverflowError:
#JITing with a small input
julia> trampoline(factorial, 10, cont)
3628800
#Testing
julia> trampoline(factorial, 140000, cont)
0
So, yes, I am avoiding StacksOverflows. And yes, I know the result is nonsense as I am getting integers overflows, but here I just cared about the stack. A production version of course would have that fixed.
(Also, I know for the factorial case there is a built-in, I wouldn't use either of these, I made them for testing my trampoline).
The trampoline version takes a lot of time when running for the first time, and then it gets quick... when computing the same or lower values.
If I did trampoline(factorial, 150000, cont) I will have some compiling time again.
It seems to me (educated guess) that I am JITing many different signatures for factorial: one for every thunk generated.
My question is: can I avoid this?
I think the problem is that every closure is its own type, which is specialized on the captured variables. To avoid this specialization, one can instead use functors, that are not fully specialized:
struct L1
f
n::Int
z::Int
end
(o::L1)() = o.f(o.n*o.z)
struct L2
f
n::Int
end
(o::L2)(z) = L1(o.f, o.n, z)
struct Factorial
f
c
n::Int
end
(o::Factorial)() = o.f(o.n-1, L2(o.c, o.n))
callable(f) = false
callable(f::Union{Factorial, L1, L2}) = true
function myfactorial(n, continuation)
if n == 1
continuation(1)
else
Factorial(myfactorial, continuation, n)
end
end
function cont(x)
x
end
function trampoline(f, arg1, arg2)
v = f(arg1, arg2)
while callable(v)
v = v()
end
return v
end
Note that the function fields are untyped. Now the function run much faster on the first run:
julia> #time trampoline(myfactorial, 10, cont)
0.020673 seconds (4.24 k allocations: 264.427 KiB)
3628800
julia> #time trampoline(myfactorial, 10, cont)
0.000009 seconds (37 allocations: 1.094 KiB)
3628800
julia> #time trampoline(myfactorial, 14000, cont)
0.001277 seconds (55.55 k allocations: 1.489 MiB)
0
julia> #time trampoline(myfactorial, 14000, cont)
0.001197 seconds (55.55 k allocations: 1.489 MiB)
0
I just translated every closure in your code into a corresponding functor. This might not be needed and probably there are be better solutions, but it works and hopefully demonstrates the approach.
Edit:
To make the reason for the slowdown more clear, one can use:
function factorial(n, continuation)
if n == 1
continuation(1)
else
tmp = (z -> (() -> continuation(n*z)))
#show typeof(tmp)
(() -> factorial(n-1, tmp))
end
end
This outputs:
julia> trampoline(factorial, 10, cont)
typeof(tmp) = ##31#34{Int64,#cont}
typeof(tmp) = ##31#34{Int64,##31#34{Int64,#cont}}
typeof(tmp) = ##31#34{Int64,##31#34{Int64,##31#34{Int64,#cont}}}
typeof(tmp) = ##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,#cont}}}}
typeof(tmp) = ##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,#cont}}}}}
typeof(tmp) = ##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,#cont}}}}}}
typeof(tmp) = ##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,#cont}}}}}}}
typeof(tmp) = ##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,#cont}}}}}}}}
typeof(tmp) = ##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,#cont}}}}}}}}}
3628800
tmp is a closure. Its automatically created type ##31#34 looks similar to
struct Tmp{T,F}
n::T
continuation::F
end
The specialization on the type F of the continuation field is the reason for the long compilation times.
By using L2 instead, which is not specialized on the corresponding field f, the continuation argument to factorial has always the type L2 and the problem is avoided.
Does Julia support the reflection just like java?
What I need is something like this:
str = ARGS[1] # str is a string
# invoke the function str()
The Good Way
The recommended way to do this is to convert the function name to a symbol and then look up that symbol in the appropriate namespace:
julia> fn = "time"
"time"
julia> Symbol(fn)
:time
julia> getfield(Main, Symbol(fn))
time (generic function with 2 methods)
julia> getfield(Main, Symbol(fn))()
1.448981716732318e9
You can change Main here to any module to only look at functions in that module. This lets you constrain the set of functions available to only those available in that module. You can use a "bare module" to create a namespace that has only the functions you populate it with, without importing all name from Base by default.
The Bad Way
A different approach that is not recommended but which many people seem to reach for first is to construct a string for code that calls the function and then parse that string and evaluate it. For example:
julia> eval(parse("$fn()")) # NOT RECOMMENDED
1.464877410113412e9
While this is temptingly simple, it's not recommended since it is slow, brittle and dangerous. Parsing and evaling code is inherently much more complicated and thus slower than doing a name lookup in a module – name lookup is essentially just a hash table lookup. In Julia, where code is just-in-time compiled rather than interpreted, eval is much slower and more expensive since it doesn't just involve parsing, but also generating LLVM code, running optimization passes, emitting machine code, and then finally calling a function. Parsing and evaling a string is also brittle since all intended meaning is discarded when code is turned into text. Suppose, for example, someone accidentally provides an empty function name – then the fact that this code is intended to call a function is completely lost by accidental similarity of syntaxes:
julia> fn = ""
""
julia> eval(parse("$fn()"))
()
Oops. That's not what we wanted at all. In this case the behavior is fairly harmless but it could easily be much worse:
julia> fn = "println(\"rm -rf /important/directory\"); time"
"println(\"rm -rf /important/directory\"); time"
julia> eval(parse("$fn()"))
rm -rf /important/directory
1.448981974309033e9
If the user's input is untrusted, this is a massive security hole. Even if you trust the user, it is still possible for them to accidentally provide input that will do something unexpected and bad. The name lookup approach avoids these issues:
julia> getfield(Main, Symbol(fn))()
ERROR: UndefVarError: println("rm -rf /important/directory"); time not defined
in eval(::Module, ::Any) at ./boot.jl:225
in macro expansion at ./REPL.jl:92 [inlined]
in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:46
The intent of looking up a name and then calling it as a function is explicit, instead of implicit in the generated string syntax, so at worst one gets an error about a strange name being undefined.
Performance
If you're going to call a dynamically specified function in an inner loop or as part of some recursive computation, you will want to avoid doing a getfield lookup every time you call the function. In this case all you need to do is make a const binding to the dynamically specified function before defining the iterative/recursive procedure that calls it. For example:
fn = "deg2rad" # converts angles in degrees to radians
const f = getfield(Main, Symbol(fn))
function fast(n)
t = 0.0
for i = 1:n
t += f(i)
end
return t
end
julia> #time fast(10^6) # once for JIT compilation
0.010055 seconds (2.97 k allocations: 142.459 KB)
8.72665498661791e9
julia> #time fast(10^6) # now it's fast
0.003055 seconds (6 allocations: 192 bytes)
8.72665498661791e9
julia> #time fast(10^6) # see?
0.002952 seconds (6 allocations: 192 bytes)
8.72665498661791e9
The binding f must be constant for optimal performance, since otherwise the compiler can't know that you won't change f to point at another function at any time (or even something that's not a function), so it has to emit code that looks f up dynamically on every loop iteration – effectively the same thing as if you manually call getfield in the loop. Here, since f is const, the compiler knows f can't change so it can emit fast code that just calls the right function directly. But the compiler can sometimes do even better than that – in this case it actually inlines the implementation of the deg2rad function, which is just a multiplication by pi/180:
julia> #code_llvm fast(100000)
define double #julia_fast_51089(i64) #0 {
top:
%1 = icmp slt i64 %0, 1
br i1 %1, label %L2, label %if.preheader
if.preheader: ; preds = %top
br label %if
L2.loopexit: ; preds = %if
br label %L2
L2: ; preds = %L2.loopexit, %top
%t.0.lcssa = phi double [ 0.000000e+00, %top ], [ %5, %L2.loopexit ]
ret double %t.0.lcssa
if: ; preds = %if.preheader, %if
%t.04 = phi double [ %5, %if ], [ 0.000000e+00, %if.preheader ]
%"#temp#.03" = phi i64 [ %2, %if ], [ 1, %if.preheader ]
%2 = add i64 %"#temp#.03", 1
%3 = sitofp i64 %"#temp#.03" to double
%4 = fmul double %3, 0x3F91DF46A2529D39 ; deg2rad(x) = x*(pi/180)
%5 = fadd double %t.04, %4
%6 = icmp eq i64 %"#temp#.03", %0
br i1 %6, label %L2.loopexit, label %if
}
If you need to do this with many different dynamically specified functions, then you can even pass the function to be called in as an argument:
function fast(f,n)
t = 0.0
for i = 1:n
t += f(i)
end
return t
end
julia> #time fast(getfield(Main, Symbol(fn)), 10^6)
0.007483 seconds (1.70 k allocations: 76.670 KB)
8.72665498661791e9
julia> #time fast(getfield(Main, Symbol(fn)), 10^6)
0.002908 seconds (6 allocations: 192 bytes)
8.72665498661791e9
This generates the same fast code as single-argument fast above, but will generate a new version for every different function f that you call it with.