julia> s = "abcdefg"
"abcdefg"
julia> s1 = s[3:4]
"cd"
julia> s2 = match(r"c.", s).match
"cd"
julia> typeof(s)
String
julia> typeof(s1)
String
julia> typeof(s2)
SubString{String}
What functionality does SubString enable? It looks like a container. If so, what other types can it hold? If this is useful, why isn't s1 a SubString?
I found this behavior strange when I had to convert s2 into a pure String to get it into a f(x::String) function. What is the difference between using String(s2) and string(s2) for that conversion?
SubString{String} is just a view of String. s1[3:4] is not a SubString because it is getindex not view function (just like with arrays).
It is SubString{String} to avoid copying of data in the string, see e.g.:
julia> using BenchmarkTools
julia> x = "a"^1_000_000;
julia> #btime $x[1:end];
36.000 μs (1 allocation: 976.69 KiB)
julia> #btime #view $x[1:end];
23.046 ns (0 allocations: 0 bytes)
to note how much difference in allocations and speed it makes
In general you should avoid writing s[3:4] as it is not a safe indexing code (it is only safe if your string is ASCII which you can check with isascii). String indexing in Julia uses byte indices (not character indices)
SubString{String} has String parameter, as there are in general other string types than only String:
julia> using InlineStrings
julia> x = InlineString("abcd")
"abcd"
julia> typeof(x)
String7
julia> y = #view x[1:end]
"abcd"
julia> typeof(y)
SubString{String7}
As it is noted in comment by Antonello - most likely the f function should accept AbstractString and you would not even notice a problem.
All this is explained in https://docs.julialang.org/en/v1/manual/strings/.
If you want something more hands-on check out for example chapter 6 of https://www.manning.com/books/julia-for-data-analysis (I do not want to do too much self promotion, but your question is one of the standard questions users ask and I explained all these topics in this chapter to address them).
I want to overwrite a function in Julia using its old definition. It seems the way to do this would be to clone the function and overwrite the original using the copy — something like the following. However, it appears deepcopy(f) just returns a reference to f, so this doesn't work.
f(x) = x
f_old = deepcopy(f)
f(x) = 1 + f_old(x)
How can I clone a function?
Background: I'm interesting in writing a macro #override that allows me to override functions pointwise (or maybe even piecewise).
fib(n::Int) = fib(n-1) + fib(n-2)
#override fib(0) = 1
#override fib(1) = 1
This particular example would be slow and could be made more efficient using #memoize. There may be good reasons not to do this, but there may also be situations in which one does not know a function fully when it is defined and overriding is necessary.
We can do this using IRTools.jl.
(Note, on newer versions of IRTools, you may need to ask for IRTools.Inner.code_ir instead of IRTools.code_ir.)
using IRTools
fib(n::Int) = fib(n-1) + fib(n-2)
const fib_ir = IRTools.code_ir(fib, Tuple{Int})
const fib_old = IRTools.func(fib_ir)
fib(n::Int) = n < 2 ? 1 : fib_old(fib, n)
julia> fib(10)
89
What we did there was captured the intermediate representation of the function fib, and then rebuilt it into a new function which we called fib_old. Then we were free to overwrite the definition of fib in terms of fib_old! Notice that since fib_old was defined as recursively calling fib, not fib_old, there's no stack overflow when we call fib(10).
The other thing to notice is that when we called fib_old, we wrote fib_old(fib, n) instead of fib_old(n). This is due to how IRTools.func works.
According to Mike Innes on Slack:
In Julia IR, all functions take a hidden extra argument that represents the function itself
The reason for this is that closures are structs with fields, which you need access to in the IR
Here's an implementation of your #override macro with a slightly different syntax:
function _get_type_sig(fdef)
d = splitdef(fdef)
types = []
for arg in d[:args]
if arg isa Symbol
push!(types, :Any)
elseif #capture(arg, x_::T_)
push!(types, T)
else
error("whoops!")
end
end
if isempty(d[:whereparams])
:(Tuple{$(types...)})
else
:((Tuple{$(types...)} where {$(d[:whereparams]...)}).body)
end
end
macro override(cond, fdef)
d = splitdef(fdef)
shadowf = gensym()
sig = _get_type_sig(fdef)
f = d[:name]
quote
const $shadowf = IRTools.func(IRTools.code_ir($(d[:name]), $sig))
function $f($(d[:args]...)) where {$(d[:whereparams]...)}
if $cond
$(d[:body])
else
$shadowf($f, $(d[:args]...))
end
end
end |> esc
end
Now one can type
fib(n::Int) = fib(n-1) + fib(n-2)
#override n < 2 fib(n::Int) = 1
julia> fib(10)
89
The best part is that this is nearly as fast (at runtime, not compile time!) as if we had written the conditions into the original function!
n = 15
fib2(n::Int) = n < 2 ? 1 : fib2(n-1) + fib2(n-2)
julia> #btime fib($(Ref(15))[])
4.239 μs (0 allocations: 0 bytes)
89
julia> #btime fib2($(Ref(15))[])
3.022 μs (0 allocations: 0 bytes)
89
I really don't see why you'd want to do this (there must a better way to get what you want!).
Nonetheless, although not exactly equivalent you can get what you want by using anonymous functions:
julia> f = x->x
#3 (generic function with 1 method)
julia> f_old = deepcopy(f)
#3 (generic function with 1 method)
julia> f = x->1+f_old(x)
#5 (generic function with 1 method)
julia> f(4)
5
Suppose I have
struct X{T} end
and a function dispatching on X, how can I access T inside the function body if it hasn't been specified in the method signature? I.e.
function foo(x::X)
# can i get T in here?
end
This is a rephrasing of a question from the julialang slack: https://julialang.slack.com/archives/C6A044SQH/p1568651904113000
To get access, simply fill this form: https://slackinvite.julialang.org
The best way to go about this is to define an accessor function:
getparam(::X{T}) where {T} = T
and then one can do
function foo(x::X)
T = getparam(x)
...
end
So long as you are not running julia through an interpreter, all the type checks should be elided away at compile time. For instance:
julia> foo(x::X) = getparam(x) + 1
foo (generic function with 1 method)
julia> foo(X{1}())
2
julia> #code_llvm foo(X{1}())
; # REPL[24]:1 within `foo'
define i64 #julia_foo_19216() {
top:
ret i64 2
}
julia> #code_llvm foo(X{2}())
; # REPL[24]:1 within `foo'
define i64 #julia_foo_19221() {
top:
ret i64 3
}
As you may be able to see, the compiler was able to figure out that it can just replace the call foo(X{2}) with 3 at compile time with no runtime overhead at all.
As a side note, this should serve to demonstrate why type stability is important. If we had done something like foo(X{rand(Int)}), the compiler wouldn't have access to the type parameter until it arrives at foo in runtime and then would need to compile a specific method for whatever rand(Int) ended up evaluating to, which would be very slow:
julia> #btime foo(X{rand(Int)}())
2.305 ms (1962 allocations: 125.49 KiB)
-3712756042116422157
Oof, that is slooooow! For comparison,
julia> bar(x) = x + 1
bar (generic function with 1 method)
julia> #btime bar(rand(Int))
9.746 ns (0 allocations: 0 bytes)
5990190339309662951
I am going to implement a program that uses recursion quite a bit. So, before I started to get stack overflows exceptions, I figured it would be nice to have a trampoline implemented and use thunks in case it was needed.
A first try I did was with factorial. Here the code:
callable(f) = !isempty(methods(f))
function trampoline(f, arg1, arg2)
v = f(arg1, arg2)
while callable(v)
v = v()
end
return v
end
function factorial(n, continuation)
if n == 1
continuation(1)
else
(() -> factorial(n-1, (z -> (() -> continuation(n*z)))))
end
end
function cont(x)
x
end
Also, I implemented a naive factorial to check if, as a matter of fact, I would be preventing stack overflows:
function factorial_overflow(n)
if n == 1
1
else
n*factorial_overflow(n-1)
end
end
The results are:
julia> factorial_overflow(140000)
ERROR: StackOverflowError:
#JITing with a small input
julia> trampoline(factorial, 10, cont)
3628800
#Testing
julia> trampoline(factorial, 140000, cont)
0
So, yes, I am avoiding StacksOverflows. And yes, I know the result is nonsense as I am getting integers overflows, but here I just cared about the stack. A production version of course would have that fixed.
(Also, I know for the factorial case there is a built-in, I wouldn't use either of these, I made them for testing my trampoline).
The trampoline version takes a lot of time when running for the first time, and then it gets quick... when computing the same or lower values.
If I did trampoline(factorial, 150000, cont) I will have some compiling time again.
It seems to me (educated guess) that I am JITing many different signatures for factorial: one for every thunk generated.
My question is: can I avoid this?
I think the problem is that every closure is its own type, which is specialized on the captured variables. To avoid this specialization, one can instead use functors, that are not fully specialized:
struct L1
f
n::Int
z::Int
end
(o::L1)() = o.f(o.n*o.z)
struct L2
f
n::Int
end
(o::L2)(z) = L1(o.f, o.n, z)
struct Factorial
f
c
n::Int
end
(o::Factorial)() = o.f(o.n-1, L2(o.c, o.n))
callable(f) = false
callable(f::Union{Factorial, L1, L2}) = true
function myfactorial(n, continuation)
if n == 1
continuation(1)
else
Factorial(myfactorial, continuation, n)
end
end
function cont(x)
x
end
function trampoline(f, arg1, arg2)
v = f(arg1, arg2)
while callable(v)
v = v()
end
return v
end
Note that the function fields are untyped. Now the function run much faster on the first run:
julia> #time trampoline(myfactorial, 10, cont)
0.020673 seconds (4.24 k allocations: 264.427 KiB)
3628800
julia> #time trampoline(myfactorial, 10, cont)
0.000009 seconds (37 allocations: 1.094 KiB)
3628800
julia> #time trampoline(myfactorial, 14000, cont)
0.001277 seconds (55.55 k allocations: 1.489 MiB)
0
julia> #time trampoline(myfactorial, 14000, cont)
0.001197 seconds (55.55 k allocations: 1.489 MiB)
0
I just translated every closure in your code into a corresponding functor. This might not be needed and probably there are be better solutions, but it works and hopefully demonstrates the approach.
Edit:
To make the reason for the slowdown more clear, one can use:
function factorial(n, continuation)
if n == 1
continuation(1)
else
tmp = (z -> (() -> continuation(n*z)))
#show typeof(tmp)
(() -> factorial(n-1, tmp))
end
end
This outputs:
julia> trampoline(factorial, 10, cont)
typeof(tmp) = ##31#34{Int64,#cont}
typeof(tmp) = ##31#34{Int64,##31#34{Int64,#cont}}
typeof(tmp) = ##31#34{Int64,##31#34{Int64,##31#34{Int64,#cont}}}
typeof(tmp) = ##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,#cont}}}}
typeof(tmp) = ##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,#cont}}}}}
typeof(tmp) = ##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,#cont}}}}}}
typeof(tmp) = ##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,#cont}}}}}}}
typeof(tmp) = ##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,#cont}}}}}}}}
typeof(tmp) = ##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,##31#34{Int64,#cont}}}}}}}}}
3628800
tmp is a closure. Its automatically created type ##31#34 looks similar to
struct Tmp{T,F}
n::T
continuation::F
end
The specialization on the type F of the continuation field is the reason for the long compilation times.
By using L2 instead, which is not specialized on the corresponding field f, the continuation argument to factorial has always the type L2 and the problem is avoided.
I have pi approximation code very similar to that on official page:
function piaprox()
sum = 1.0
for i = 2:m-1
sum = sum + (1.0/(i*i))
end
end
m = parse(Int,ARGS[1])
opak = parse(Int,ARGS[2])
#time for i = 0:opak
piaprox()
end
When I try to compare time of C and Julia, then Julia is significantly slower, almost 38 sec for m = 100000000 (time of C is 0.1608328933 sec). Why this is happening?
julia> m=100000000
julia> function piaprox()
sum = 1.0
for i = 2:m-1
sum = sum + (1.0/(i*i))
end
end
piaprox (generic function with 1 method)
julia> #time piaprox()
28.482094 seconds (600.00 M allocations: 10.431 GB, 3.28% gc time)
I would like to mention two very important paragraphs from Performance Tips section of julia documentation:
Avoid global variables A global variable might have its value, and
therefore its type, change at any point. This makes it difficult for
the compiler to optimize code using global variables. Variables should
be local, or passed as arguments to functions, whenever possible.....
The macro #code_warntype (or its function variant code_warntype()) can
sometimes be helpful in diagnosing type-related problems.
julia> #code_warntype piaprox();
Variables:
sum::Any
#s1::Any
i::Any
It's clear from #code_warntype output that compiler could not recognize types of local variables in piaprox(). So we try to declare types and remove global variables:
function piaprox(m::Int)
sum::Float64 = 1.0
i::Int = 0
for i = 2:m-1
sum = sum + (1.0/(i*i))
end
end
julia> #time piaprox(100000000 )
0.009023 seconds (11.10 k allocations: 399.769 KB)
julia> #code_warntype piaprox(100000000);
Variables:
m::Int64
sum::Float64
i::Int64
#s1::Int64
EDIT
as #user3662120 commented, the super fast behavior of the answer is result of a mistake, without a return value LLVM might ignore the for loop, by adding a return line the #time result would be:
julia> #time piaprox(100000000)
0.746795 seconds (11.11 k allocations: 400.294 KB, 0.45% gc time)
1.644934057834575