how to change max recursion depth in Julia? - julia

I was curious how quick and accurate, algorithm from Rosseta code ( https://rosettacode.org/wiki/Ackermann_function ) for (4,2) parameters, could be. But got StackOverflowError.
julia> using Memoize
#memoize ack3(m, n) =
m == 0 ? n + 1 :
n == 0 ? ack3(m-1, 1) :
ack3(m-1, ack3(m, n-1))
# WARNING! Next line has to calculate and print number with 19729 digits!
julia> ack3(4,2) # -> StackOverflowError
# has to be -> 2003529930406846464979072351560255750447825475569751419265016973710894059556311
# ...
# 4717124577965048175856395072895337539755822087777506072339445587895905719156733
EDIT:
Oscar Smith is right that trying ack3(4,2) is unrealistic. This is version translated from Rosseta's C++:
module Ackermann
function ackermann(m::UInt, n::UInt)
function ack(m::UInt, n::BigInt)
if m == 0
return n + 1
elseif m == 1
return n + 2
elseif m == 2
return 3 + 2 * n;
elseif m == 3
return 5 + 8 * (BigInt(2) ^ n - 1)
else
if n == 0
return ack(m - 1, BigInt(1))
else
return ack(m - 1, ack(m, n - 1))
end
end
end
return ack(m, BigInt(n))
end
end
julia> import Ackermann;Ackermann.ackermann(UInt(1),UInt(1));#time(a4_2 = Ackermann.ackermann(UInt(4),UInt(2)));t = "$a4_2"; println("len = $(length(t)) first_digits=$(t[1:20]) last digits=$(t[end-20:end])")
0.000041 seconds (57 allocations: 33.344 KiB)
len = 19729 first_digits=20035299304068464649 last digits=445587895905719156733

Julia itself does not have an internal limit to the stack size, but your operating system does. The exact limits here (and how to change them) will be system dependent. On my Mac (and I assume other POSIX-y systems), I can check and change the stack size of programs that get called by my shell with ulimit:
$ ulimit -s
8192
$ julia -q
julia> f(x) = x > 0 ? f(x-1) : 0 # a simpler recursive function
f (generic function with 1 method)
julia> f(523918)
0
julia> f(523919)
ERROR: StackOverflowError:
Stacktrace:
[1] f(::Int64) at ./REPL[1]:1 (repeats 80000 times)
$ ulimit -s 16384
$ julia -q
julia> f(x) = x > 0 ? f(x-1) : 0
f (generic function with 1 method)
julia> f(1048206)
0
julia> f(1048207)
ERROR: StackOverflowError:
Stacktrace:
[1] f(::Int64) at ./REPL[1]:1 (repeats 80000 times)
I believe the exact number of recursive calls that will fit on your stack will depend upon both your system and the complexity of the function itself (that is, how much each recursive call needs to store on the stack). This is the bare minimum. I have no idea how big you'd need to make the stack limit in order to compute that Ackermann function.
Note that I doubled the stack size and it more than doubled the number of recursive calls — this is because of a constant overhead:
julia> log2(523918)
18.998981503278365
julia> 2^19 - 523918
370
julia> log2(1048206)
19.99949084151746
julia> 2^20 - 1048206
370

Just fyi, even if you change the max recursion depth, you won't get the right answer as Julia uses 64 bit integers, so integer overflow with make stuff not work. To get the right answer, you will have to use big ints to have any hope. The next problem is that you probably don't want to memoize, as almost all of the computations are not repeated, and you will be computing the function more than 10^19729 different inputs, which you really do not want to store.

Related

What is the inefficiency in this cairo code using alloc_locals

The following code:
func pow4(n) -> (m : felt):
alloc_locals
local x
jmp body if n != 0
[ap] = 0; ap++
ret
body:
x = n * n
[ap] = x * x; ap++
ret
end
func main():
pow4(n=5)
ret
end
is declared inefficient in the doc because of non-continuous memory.
I ran it and could not see any hole in the memory table:
Addr Value
-----------
⋮
###
⋮
1:0 2:0
1:1 3:0
1:2 5
1:3 1:2
1:4 0:21
1:5 25
1:6 625
so I don't understand where the problem is. I do see a hole with n=0 though:
Addr Value
-----------
⋮
###
⋮
1:0 2:0
1:1 3:0
1:2 0
1:3 1:2
1:4 0:21
⋮
1:6 0
that I can fix using:
jmp body if n != 0
x = 0
ret
but it's not what's suggested:
Move the instruction alloc_locals.
or Use tempvar instead of local.
You are correct that the inefficiency is due to the memory hole, and correct that it appears only for n=0. Holes do not cause an inefficiency just by existing, but rather, their existence usually means that an equivalent code could have executed using fewer memory cells in total (in this case, 6 instead of 7, in the memory section "1:"). To make it more efficient, one should aim to remove the hole (so that the parts before and after it become continuous), whereas your suggested solution just fills it (which the prover does anyway). So your solution would still use 7 memory cells in the memory segment; and will in fact also use 2 additional cells in the program section (for the x=0 command), so it is actually less efficient than leaving the hole empty: compile and check!
The inefficiency in the original code arises from the local variable x being assigned a memory cell even in the case n=0, despite not being used. To make it more efficient we would simply not want to declare and allocate x in this case.
This can be done by moving the alloc_locals command inside "body", so that it isn't executed (and the locals aren't allocated) in the n=0 case, as in the first suggestion: this saves one memory cell in the n=0 case and doesn't affect the n!=0 case. Note that alloc_locals does not have to appear right at the beginning of the code.
The second suggestion is to make x a tempvar instead of a local (and remove alloc_locals entirely). This again means no local variable is allocated in the n=0 case, thus again only 6 memory cells will be used instead of 7. And in fact, because the code is so simple, using the tempvar is also more efficient that declaring it as a local, at least when used as tempvar x = n * n, as it skips merges the ap += 1 command (which is what alloc_locals does in this case) with the x = n * n command, rather than run them separately.
Beyond discussing the theory, you should compile each of these options and compare the lengths of the code and memory segments obtained, and see what is really the most efficient and by how much: this is always true when optimizing, you should always check the actual performance.
So following #dan-carmon answer and for the sake of completeness, here is a summary of the possible implementations and their memory table "1" when n = 0 & n > 0
# n = 0 n = 5
1:0 2:0 1:0 2:0
1:1 3:0 1:1 3:0
1:2 0 1:2 5
1:3 1:2 1:3 1:2
1:4 0:14 1:4 0:14
1:5 0 1:5 25
1:6 625
Note however that the implementation using tempvar uses 2 slots less than the others in the program table "0".
Implementations tested
func pow4_reuse_slot(n) -> (m : felt):
alloc_locals
local x
jmp body if n != 0
x = 0
ret
body:
x = n * n
[ap] = x * x; ap++
ret
end
func pow4_alloc_in_body(n) -> (m : felt):
jmp body if n != 0
[ap] = 0; ap++
ret
body:
alloc_locals
local x
x = n * n
[ap] = x * x; ap++
ret
end
func pow4_use_tempvar(n) -> (m : felt):
jmp body if n != 0
[ap] = 0; ap++
ret
body:
tempvar x = n * n
[ap] = x * x
ret
end

Julia for loops slower than while?

I'm playing around with the Julia language, and noticed that the small program I wrote was quite slow.
Suspecting that it was somehow related to the for loops, I rewrote it to use while, and got about 15x faster.
I'm sure there's something I'm doing wrong with the ranges etc., but I can't figure out what.
function primes_for()
num_primes = 0
for a = 2:3000000
prime = true
sa = floor(sqrt(a))
for c in 2:sa
if a % c == 0
prime = false
break
end
end
if prime
num_primes += 1
end
end
println("Number of primes is $num_primes")
end
function primes()
num_primes = 0
a = 2
while a < 3000000
prime = true
c = 2
while c * c <= a
if a % c == 0
prime = false
break
end
c += 1
end
if prime
num_primes += 1
end
a += 1
end
println("Number of primes is $num_primes")
end
#time primes_for()
#time primes()
As explained in the comments by #Vincent Yu and #Kelly Bundy, this is because sa = floor(sqrt(a)) creates a float. Then c becomes a float, and a % c is slow.
You can replace floor(sqrt(a)) with floor(Int, sqrt(a)), or preferably, I think, with isqrt(a), which returns
Integer square root: the largest integer m such that m*m <= n.
This avoids the (unlikely) event that floor(Int, sqrt(a)) may round down too far, which could happen if sqrt(x^2) = x - ε due to floating point errors.
Edit: Here's a benchmark to demonstrate (note the use of isqrt):
function primes_for2()
num_primes = 0
for a = 2:3000000
prime = true
# sa = floor(sqrt(a))
sa = isqrt(a)
for c in 2:sa
if a % c == 0
prime = false
break
end
end
if prime
num_primes += 1
end
end
println("Number of primes is $num_primes")
end
1.7.0> #time primes_for()
Number of primes is 216816
6.705099 seconds (15 allocations: 480 bytes)
1.7.0> #time primes_for2()
Number of primes is 216816
0.691304 seconds (15 allocations: 480 bytes)
1.7.0> #time primes()
Number of primes is 216816
0.671784 seconds (15 allocations: 480 bytes)
I can note that each call to isqrt on my computer takes approximately 8ns, and that 3000000 times 8ns is 0.024 seconds. A call to regular sqrt is approximately 1ns.
It's not the for/while that makes the speed difference, it's the sqrt. It doesn't help that sqrt returns float, which promotes all the rest of the code around the sqrt output from integers.
Note that #time is not measuring the while and for loops, but also the code outside those loops.
If you are benchmarking code, the rest of your code needs to be the same, and removing the sqrt is one of the prime optimizations in this algorithm. It's also possible to remove the c * c in the test, but this is trickier.

What is the dollar-sign prefix in function arguments used for in Julia?

When I searched about the '$' prefix in Julia, all I could find is that it is for string or expression interpolation. For example, here https://docs.julialang.org/en/v1/base/punctuation/. However, I have seen people' code like
add_broadcast!($y_d, $x_d)
as in this tutorial https://cuda.juliagpu.org/stable/tutorials/introduction/. Here the "$" sign cannot be interpolation, can it? There is nothing about such usage in the functions doc either https://docs.julialang.org/en/v1/manual/functions/. So I am very confused. Any idea is appreciated. Thanks!
The $ sign expressions like you have shown is non-standard Julia code and it typically appears only in expressions passed to macros. This is exactly the case in your example, where the full line is:
#btime add_broadcast!($y_d, $x_d)
which uses #btime macro from BenchmarkTools.jl. And if you go to Quick Start section there you can read:
If the expression you want to benchmark depends on external variables, you should use $ to "interpolate" them into the benchmark expression to avoid the problems of benchmarking with globals. Essentially, any interpolated variable $x or expression $(...) is "pre-computed" before benchmarking begins:
So in short with #btime you use $ to "interpolate" them into the benchmarked expression in order to get a correct benchmark results.
The $ sign is used with macros to interpolate also in other packages, e.g. DataFrameMacros.jl.
EDIT:
An example how not using $ affects execution time when referencing to non-const global variable:
julia> using BenchmarkTools
julia> x = 1
1
julia> #btime (y = 0; for _ in 1:10^6 y += x end; y) # slow and a lot of allocations
22.102 ms (999489 allocations: 15.25 MiB)
1000000
julia> #btime (y = 0; for _ in 1:10^6 y += $x end; y) # loop is optimized out
5.600 ns (0 allocations: 0 bytes)
1000000
julia> const z = 1
1
julia> #btime (y = 0; for _ in 1:10^6 y += z end; y) # loop is optimized out
5.000 ns (0 allocations: 0 bytes)
You can think of it as follows. In the above example not using $ is as-if you have created and run the following function:
function temp1()
y = 0
for _ in 1:10^6
y += x
end
y
end
And you get:
julia> #btime temp1()
22.106 ms (999489 allocations: 15.25 MiB)
1000000
While using $ is as if defined x inside the body of the function like this:
function temp2()
x = 1
y = 0
for _ in 1:10^6
y += x
end
y
end
and now you have:
julia> #btime temp2()
5.000 ns (0 allocations: 0 bytes)
1000000

Utilizing ndgrid/meshgrid functionality in Julia

I'm trying to find functionality in Julia similar to MATLAB's meshgrid or ndgrid. I know Julia has defined ndgrid in the examples but when I try to use it I get the following error.
UndefVarError: ndgrid not defined
Anyone know either how to get the builtin ndgrid function to work or possibly another function I haven't found or library that provides these methods (the builtin function would be preferred)? I'd rather not write my own in this case.
Thanks!
We prefer to avoid these functions, since they allocate arrays that usually aren't necessary. The values in these arrays have such a regular structure that they don't need to be stored; they can just be computed during iteration. For example, one alternative approach is to write an array comprehension:
julia> [ 10i + j for i=1:5, j=1:5 ]
5×5 Array{Int64,2}:
11 12 13 14 15
21 22 23 24 25
31 32 33 34 35
41 42 43 44 45
51 52 53 54 55
Or, you can write for loops, or iterate over a product iterator:
julia> collect(Iterators.product(1:2, 3:4))
2×2 Array{Tuple{Int64,Int64},2}:
(1, 3) (1, 4)
(2, 3) (2, 4)
I do find sometimes it's convenient to use some function like meshgrid in numpy. It's easy to do it with list comprehension:
function meshgrid(x, y)
X = [i for i in x, j in 1:length(y)]
Y = [j for i in 1:length(x), j in y]
return X, Y
end
e.g.
x = 1:4
y = 1:3
X, Y = meshgrid(x, y)
now
julia> X
4×3 Array{Int64,2}:
1 1 1
2 2 2
3 3 3
4 4 4
julia> Y
4×3 Array{Int64,2}:
1 2 3
1 2 3
1 2 3
1 2 3
However, I did not find this makes the code run faster than using iteration. Here's what I mean:
After defining
x = 1:1000
y = x
X, Y = meshgrid(x, y)
I did benchmark on the following two functions
using Statistics
function fun1()
return mean(sqrt.(X.*X + Y.*Y))
end
function fun2()
sum = 0.0
for i in 1:1000
for j in 1:1000
sum += sqrt(i*i + j*j)
end
end
return sum / (1000*1000)
end
Here are the benchmark results:
julia> #btime fun1()
8.310 ms (19 allocations: 30.52 MiB)
julia> #btime run2()
1.671 ms (0 allocations: 0 bytes)
The meshgrid method is both significantly slower and taking more memory. Any Julia expert knows why? I understand Julia is a compiling language unlike Python so iterations won't be slower than vectorization, but I don't understand why vector(array) calculation is many times slower than iteration. (For bigger N this difference is even larger.)
Edit: After reading this post, I have the following updated version of the 'meshgrid' method. The idea is to not create a meshgrid beforehand, but to do it in the calculation via Julia's powerful elementwise array operation:
x = collect(1:1000)
y = x'
function fun1v2()
mean(sqrt.(x .* x .+ y .* y))
end
The trick here is the .+ between a size-M column array and a size-N row array which returns a M-by-N array. It does the 'meshgrid' for you. This function is nearly 3 times faster then fun1, albeit not as fast as fun2.
julia> #btime fun1v2()
3.189 ms (24 allocations: 7.63 MiB)
765.8435104896155
Above, #ChrisRackauckas suggests that the "proper way" to do this is with a lazy operator but he hadn't gotten around to it.
There is now a registered packaged with lazy ndgrid in it:
https://github.com/JuliaArrays/LazyGrids.jl
It is more general than the version in
VectorizedRoutines.jl
because it can handle vectors with different types, e.g.,
ndgrid(1:3, Float16[0:2], ["x", "y", "z"]).
There are Literate.jl examples in the docs that show the lazy performance is pretty good.
Of course lazy meshgrid is just one step away:
meshgrid(y,x) = (ndgrid_lazy(x,y)[[2,1]]...,)

Understanding Julia Int overflow behaviour

Coming from a Python / Matlab background, I'd like to understand better how Julia's Int64 overflow behaviour works.
From the documentation:
In Julia, exceeding the maximum representable value of a given type
results in a wraparound behavior.
julia> x = typemax(Int64)
9223372036854775807
julia> x + 1
-9223372036854775808
Now, I did some experiments with numbers obviously larger than typemax(Int64), but the behaviour I see isn't consistent with the documentation. It seems like things don't always just wrap around. Is only a single wraparound allowed?
julia> x = (10^10)^(10^10)
0
julia> x = 10^10^10^10
1 # ??
julia> x = 10^10^10^10^10
10 # I'd expect it to be 1? 1^10 == 1?
julia> x = 10^10^10^10^10^10
10000000000 # But here 10^10 == 10000000000, so now it works?
julia> typemax(Int64) > 10^19
true
julia > typemax(Int64) > -10^19
true
Can anyone shed light on the behaviour I am seeing?
EDIT:
Why does 9 overflow correctly, and 10 doesn't?
julia> 9^(10^14)
-1193713557845704703
julia> 9^(10^15)
4900281449122627585
julia> 10^(10^2)
0
julia> 10^(10^3)
0
Julia 0.5.0 (2016-09-19)
What you are seeing the result of PEMDAS order of operations, specifically the parenthesis before exponentiation portion. This effectively becomes a right-to-left solving of these expressions.
julia> 10^(10^10) #happens to overflow to 0
0
julia> 10^(10^(10^10)) # same as 10 ^ 0
1
julia> 10^(10^(10^(10^(10^10)))) # same as x = 10^(10^(10^(10^(10000000000)))) -> 10^(10^(10^(0))) -> 10^(10^(1)) -> 10^ 10
10000000000
So it's really just a matter of working through the arithmetic. Or realizing you are going to have such large operations that you start using BigInt from the outset.

Resources