Getting the whole AST of the file / complex code - julia

Julia manual states:
Every Julia program starts life as a string:
julia> prog = "1 + 1"
"1 + 1"
I can easily get the AST of the simple expression, or even a function with the help of quote / code_*, or using Meta.parse / Meta.show_sexpr if I have the expression in a string.
The question: Is there any way to get the whole AST of the codepiece, possibly including several atomic expressions? Like, read the source file and convert it to AST?

If you want to do this from Julia instead of FemtoLisp, you can do
function parse_file(path::AbstractString)
code = read(path, String)
Meta.parse("begin $code end")
end
This takes in a file path, reads it and parses it to a big expression that can be evaluated.
This comes from #NHDaly's answer, here:
https://stackoverflow.com/a/54317201/751061
If you already have your file as a string and don’t want to have to read it again, you can instead do
parse_all(code::AbstractString) = Meta.parse("begin $code end")
It was pointed out on Slack by Nathan Daly and Taine Zhao that this code won't work for modules:
julia> eval(parse_all("module M x = 1 end"))
ERROR: syntax: "module" expression not at top level
Stacktrace:
[1] top-level scope at REPL[50]:1
[2] eval at ./boot.jl:331 [inlined]
[3] eval(::Expr) at ./client.jl:449
[4] |>(::Expr, ::typeof(eval)) at ./operators.jl:823
[5] top-level scope at REPL[50]:1
This can be fixed as follows:
julia> eval_all(ex::Expr) = ex.head == :block ? for e in ex eval_all(e) end : eval(e);
julia> eval_all(ex::Expr) = ex.head == :block ? eval.(ex.args) : eval(e);
julia> eval_all(parse_all("module M x = 1 end"));
julia> M.x
1
Since the question asker is not convinced that the above code produces a tree, here is a graph representation of the output of parse_all, clearly showing a tree structure.
In case you're curious, those leaves labelled #= none:1 =# are line number nodes, indicating the line on which each following expression takes place.
As suggested in the comments, one can also apply Meta.show_sexpr to an Expr object to get a more "lispy" representation of the AST without all the pretty printing julia does by default:
julia> (Meta.show_sexpr ∘ Meta.parse)("begin x = 1\n y = 2\n z = √(x^2 + y^2)\n end")
(:block,
:(#= none:1 =#),
(:(=), :x, 1),
:(#= none:2 =#),
(:(=), :y, 2),
:(#= none:3 =#),
(:(=), :z, (:call, :√, (:call, :+, (:call, :^, :x, 2), (:call, :^, :y, 2))))
)

There's jl-parse-file in the FemtoLisp implementation of the Julia parser. You can call it from the Lisp REPL (julia --lisp), and it returns an S-expression for the whole file. Since Julia's Expr is not much different from Lisp S-expressions, that might be enough for you purposes.
I still wonder how one would access the result of this from within Julia. If I understand correctly, the Lisp functions are not exported from libjulia, so there's no direct way to just use a ccall. But maybe a variant of jl_parse_eval_all can be implemented.

Related

Julia: How to read in and output characters with diacritics?

Processing ASCII characters beyond the range 1-127 can easily crash Julia.
mystring = "A-Za-zÀ-ÿŽž"
for i in 1:length(mystring)
print(i,":::")
print(Int(mystring[i]),"::" )
println( mystring[i] )
end
gives me
1:::65::A
2:::45::-
3:::90::Z
4:::97::a
5:::45::-
6:::122::z
7:::192::À
8:::ERROR: LoadError: StringIndexError("A-Za-zÀ-ÿŽž", 8)
Stacktrace:
[1] string_index_err(::String, ::Int64) at .\strings\string.jl:12
[2] getindex_continued(::String, ::Int64, ::UInt32) at .\strings\string.jl:220
[3] getindex(::String, ::Int64) at .\strings\string.jl:213
[4] top-level scope at R:\_LV\STZ\Web_admin\Languages\Action\Returning\chars.jl:5
[5] include(::String) at .\client.jl:457
[6] top-level scope at REPL[18]:1
It crashes after outputting the first character outside the normal range, rather than during that output, which is mentioned in the answer to String Index Error (Julia)
If declaring the values in Julia one should declare them as Unicode, but I have these characters in my input.
The manual says that Julia looks at the locale, but is there an "everywhere" locale?
Is there some way to handle input and output of these characters in Julia?
I am working on Windows10, but I can switch to Linux if that works better for this.
Use eachindex to get a list of valid indices in your string:
julia> mystring = "A-Za-zÀ-ÿŽž"
"A-Za-zÀ-ÿŽž"
julia> for i in eachindex(mystring)
print(i, ":::")
print(Int(mystring[i]), "::")
println(mystring[i])
end
1:::65::A
2:::45::-
3:::90::Z
4:::97::a
5:::45::-
6:::122::z
7:::192::À
9:::45::-
10:::255::ÿ
12:::381::Ž
14:::382::ž
Your issue is related to the fact that Julia uses byte-indexing of strings, as is explained in the Julia Manual.
For example character À takes two bytes, therefore, since its location is 7 the next index is 9 not 8.
In UTF-8 encoding which is used by default by Julia only ASCII characters take one byte, all other characters take 2, 3 or 4 bytes, see https://en.wikipedia.org/wiki/UTF-8#Encoding.
For example for À you get two bytes:
julia> codeunits("À")
2-element Base.CodeUnits{UInt8, String}:
0xc3
0x80
I have also written a post at https://bkamins.github.io/julialang/2020/08/13/strings.html that tries to explain how byte-indexing vs character-indexing works in Julia.
If you have additional questions please comment.
String indices in Julia refer to code units (= bytes for UTF-8), the fixed-width building blocks that are used to encode arbitrary characters (code points). This means that not every index into a String is necessarily a valid index for a character. If you index into a string at such an invalid byte index, an error is thrown.
You can use enumerate to get the value and the number of iteration.
mystring = "A-Za-zÀ-ÿŽž"
for (i, x) in enumerate(mystring)
print(i,":::")
print(Int(x),"::")
println(x)
end
#1:::65::A
#2:::45::-
#3:::90::Z
#4:::97::a
#5:::45::-
#6:::122::z
#7:::192::À
#8:::45::-
#9:::255::ÿ
#10:::381::Ž
#11:::382::ž
In case you need the value and index of the string in bytes you can use pairs.
for (i, x) in pairs(mystring)
print(i,":::")
print(Int(x),"::")
println(x)
end
#1:::65::A
#2:::45::-
#3:::90::Z
#4:::97::a
#5:::45::-
#6:::122::z
#7:::192::À
#9:::45::-
#10:::255::ÿ
#12:::381::Ž
#14:::382::ž
In preparation for de-minimising my MCVE for what I want to do, which involves advancing the string position not just in a for-all loop, I used the information in the post written by Bogumił Kamiński, to come up with this:
mystring = "A-Za-zÀ-ÿŽž"
for i in 1:length(mystring)
print(i,":::")
mychar = mystring[nextind(mystring, 0, i)]
print(Int(mychar), "::")
println( mychar )
end

Julia Metaprogramming: Function for Mathematical Series

I'm trying to build a function that will output an expression to be assigned to a new in-memory function. I might be misinterpreting the capability of metaprogramming but, I'm trying to build a function that generates a math series and assigns it to a function such as:
main.jl
function series(iter)
S = ""
for i in 1:iter
a = "x^$i + "
S = S*a
end
return chop(S, tail=3)
end
So, this will build the pattern and I'm temporarily working with it in the repl:
julia> a = Meta.parse(series(4))
:(x ^ 1 + x ^ 2 + x ^ 3 + x ^ 4)
julia> f =eval(Meta.parse(series(4)))
120
julia> f(x) =eval(Meta.parse(series(4)))
ERROR: cannot define function f; it already has a value
Obviously eval isn't what I'm looking for in this case but, is there another function I can use? Or, is this just not a viable way to accomplish the task in Julia?
The actual error you get has to do nothing with metaprogramming, but with the fact that you are reassigning f, which was assigned a value before:
julia> f = 10
10
julia> f(x) = x + 1
ERROR: cannot define function f; it already has a value
Stacktrace:
[1] top-level scope at none:0
[2] top-level scope at REPL[2]:1
It just doesn't like that. Call either of those variables differently.
Now to the conceptual problem. First, what you do here is not "proper" metaprogramming in Julia: why deal with strings and parsing at all? You can work directly on expressions:
julia> function series(N)
S = Expr(:call, :+)
for i in 1:N
push!(S.args, :(x ^ $i))
end
return S
end
series (generic function with 1 method)
julia> series(3)
:(x ^ 1 + x ^ 2 + x ^ 3)
This makes use of the fact that + belongs to the class of expressions that are automatically collected in repeated applications.
Second, you don't call eval at the appropriate place. I assume you meant to say "give me the function of x, with the body being what series(4) returns". Now, while the following works:
julia> f3(x) = eval(series(4))
f3 (generic function with 1 method)
julia> f3(2)
30
it is not ideal, as you newly compile the body every time the function is called. If you do something like that, it is preferred to expand the code once into the body at function definition:
julia> #eval f2(x) = $(series(4))
f2 (generic function with 1 method)
julia> f2(2)
30
You just need to be careful with hygiene here. All depends on the fact that you know that the generated body is formulated in terms of x, and the function argument matches that. In my opinion, the most Julian way of implementing your idea is through a macro:
julia> macro series(N::Int, x)
S = Expr(:call, :+)
for i in 1:N
push!(S.args, :($x ^ $i))
end
return S
end
#series (macro with 1 method)
julia> #macroexpand #series(4, 2)
:(2 ^ 1 + 2 ^ 2 + 2 ^ 3 + 2 ^ 4)
julia> #series(4, 2)
30
No free variables remaining in the output.
Finally, as has been noted in the comments, there's a function (and corresponding macro) evalpoly in Base which generalizes your use case. Note that this function does not use code generation -- it uses a well-designed generated function, which in combination with the optimizations results in code that is usually equal to the macro-generated code.
Another elegant option would be to use the multiple-dispatch mechanism of Julia and dispatch the generated code on type rather than value.
#generated function series2(p::Val{N}, x) where N
S = Expr(:call, :+)
for i in 1:N
push!(S.args, :(x ^ $i))
end
return S
end
Usage
julia> series2(Val(20), 150.5)
3.5778761722367333e43
julia> series2(Val{20}(), 150.5)
3.5778761722367333e43
This task can be accomplished with comprehensions. I need to RTFM...
https://docs.julialang.org/en/v1/manual/arrays/#Generator-Expressions

Evaluate expression with local variables

I'm writing a genetic program in order to test the fitness of randomly generated expressions. Shown here is the function to generate the expression as well a the main function. DIV and GT are defined elsewhere in the code:
function create_single_full_tree(depth, fs, ts)
"""
Creates a single AST with full depth
Inputs
depth Current depth of tree. Initially called from main() with max depth
fs Function Set - Array of allowed functions
ts Terminal Set - Array of allowed terminal values
Output
Full AST of typeof()==Expr
"""
# If we are at the bottom
if depth == 1
# End of tree, return function with two terminal nodes
return Expr(:call, fs[rand(1:length(fs))], ts[rand(1:length(ts))], ts[rand(1:length(ts))])
else
# Not end of expression, recurively go back through and create functions for each new node
return Expr(:call, fs[rand(1:length(fs))], create_single_full_tree(depth-1, fs, ts), create_single_full_tree(depth-1, fs, ts))
end
end
function main()
"""
Main function
"""
# Define functional and terminal sets
fs = [:+, :-, :DIV, :GT]
ts = [:x, :v, -1]
# Create the tree
ast = create_single_full_tree(4, fs, ts)
#println(typeof(ast))
#println(ast)
#println(dump(ast))
x = 1
v = 1
eval(ast) # Error out unless x and v are globals
end
main()
I am generating a random expression based on certain allowed functions and variables. As seen in the code, the expression can only have symbols x and v, as well as the value -1. I will need to test the expression with a variety of x and v values; here I am just using x=1 and v=1 to test the code.
The expression is being returned correctly, however, eval() can only be used with global variables, so it will error out when run unless I declare x and v to be global (ERROR: LoadError: UndefVarError: x not defined). I would like to avoid globals if possible. Is there a better way to generate and evaluate these generated expressions with locally defined variables?
Here is an example for generating an (anonymous) function. The result of eval can be called as a function and your variable can be passed as parameters:
myfun = eval(Expr(:->,:x, Expr(:block, Expr(:call,:*,3,:x) )))
myfun(14)
# returns 42
The dump function is very useful to inspect the expression that the parsers has created. For two input arguments you would use a tuple for example as args[1]:
julia> dump(parse("(x,y) -> 3x + y"))
Expr
head: Symbol ->
args: Array{Any}((2,))
1: Expr
head: Symbol tuple
args: Array{Any}((2,))
1: Symbol x
2: Symbol y
typ: Any
2: Expr
[...]
Does this help?
In the Metaprogramming part of the Julia documentation, there is a sentence under the eval() and effects section which says
Every module has its own eval() function that evaluates expressions in its global scope.
Similarly, the REPL help ?eval will give you, on Julia 0.6.2, the following help:
Evaluate an expression in the given module and return the result. Every Module (except those defined with baremodule) has its own 1-argument definition of eval, which evaluates expressions in that module.
I assume, you are working in the Main module in your example. That's why you need to have the globals defined there. For your problem, you can use macros and interpolate the values of x and y directly inside the macro.
A minimal working example would be:
macro eval_line(a, b, x)
isa(a, Real) || (warn("$a is not a real number."); return :(throw(DomainError())))
isa(b, Real) || (warn("$b is not a real number."); return :(throw(DomainError())))
return :($a * $x + $b) # interpolate the variables
end
Here, #eval_line macro does the following:
Main> #macroexpand #eval_line(5, 6, 2)
:(5 * 2 + 6)
As you can see, the values of macro's arguments are interpolated inside the macro and the expression is given to the user accordingly. When the user does not behave,
Main> #macroexpand #eval_line([1,2,3], 7, 8)
WARNING: [1, 2, 3] is not a real number.
:((Main.throw)((Main.DomainError)()))
a user-friendly warning message is provided to the user at parse-time, and a DomainError is thrown at run-time.
Of course, you can do these things within your functions, again by interpolating the variables --- you do not need to use macros. However, what you would like to achieve in the end is to combine eval with the output of a function that returns Expr. This is what the macro functionality is for. Finally, you would simply call your macros with an # sign preceding the macro name:
Main> #eval_line(5, 6, 2)
16
Main> #eval_line([1,2,3], 7, 8)
WARNING: [1, 2, 3] is not a real number.
ERROR: DomainError:
Stacktrace:
[1] eval(::Module, ::Any) at ./boot.jl:235
EDIT 1. You can take this one step further, and create functions accordingly:
macro define_lines(linedefs)
for (name, a, b) in eval(linedefs)
ex = quote
function $(Symbol(name))(x) # interpolate name
return $a * x + $b # interpolate a and b here
end
end
eval(ex) # evaluate the function definition expression in the module
end
end
Then, you can call this macro to create different line definitions in the form of functions to be called later on:
#define_lines([
("identity_line", 1, 0);
("null_line", 0, 0);
("unit_shift", 0, 1)
])
identity_line(5) # returns 5
null_line(5) # returns 0
unit_shift(5) # returns 1
EDIT 2. You can, I guess, achieve what you would like to achieve by using a macro similar to that below:
macro random_oper(depth, fs, ts)
operations = eval(fs)
oper = operations[rand(1:length(operations))]
terminals = eval(ts)
ts = terminals[rand(1:length(terminals), 2)]
ex = :($oper($ts...))
for d in 2:depth
oper = operations[rand(1:length(operations))]
t = terminals[rand(1:length(terminals))]
ex = :($oper($ex, $t))
end
return ex
end
which will give the following, for instance:
Main> #macroexpand #random_oper(1, [+, -, /], [1,2,3])
:((-)([3, 3]...))
Main> #macroexpand #random_oper(2, [+, -, /], [1,2,3])
:((+)((-)([2, 3]...), 3))
Thanks Arda for the thorough response! This helped, but part of me thinks there may be a better way to do this as it seems too roundabout. Since I am writing a genetic program, I will need to create 500 of these ASTs, all with random functions and terminals from a set of allowed functions and terminals (fs and ts in the code). I will also need to test each function with 20 different values of x and v.
In order to accomplish this with the information you have given, I have come up with the following macro:
macro create_function(defs)
for name in eval(defs)
ex = quote
function $(Symbol(name))(x,v)
fs = [:+, :-, :DIV, :GT]
ts = [x,v,-1]
return create_single_full_tree(4, fs, ts)
end
end
eval(ex)
end
end
I can then supply a list of 500 random function names in my main() function, such as ["func1, func2, func3,.....". Which I can eval with any x and v values in my main function. This has solved my issue, however, this seems to be a very roundabout way of doing this, and may make it difficult to evolve each AST with each iteration.

Change julia promt to include evalutation numbers

When debugging or running julia code in REPL, I usually see error messages showing ... at ./REPL[161]:12 [inlined].... The number 161 means the 161-th evaluation in REPL, I guess. So my question is could we show this number in julia's prompt, i.e. julia [161]> instead of julia>?
One of the advantages of Julia is its ultra flexibility. This is very easy in Julia 0.7 (nightly version).
julia> repl = Base.active_repl.interface.modes[1]
"Prompt(\"julia> \",...)"
julia> repl.prompt = () -> "julia[$(length(repl.hist.history) - repl.hist.start_idx + 1)] >"
#1 (generic function with 1 method)
julia[3] >
julia[3] >2
2
julia[4] >f = () -> error("e")
#3 (generic function with 1 method)
julia[5] >f()
ERROR: e
Stacktrace:
[1] error at .\error.jl:33 [inlined]
[2] (::getfield(, Symbol("##3#4")))() at .\REPL[4]:1
[3] top-level scope
You just need to put the first 2 lines onto your ~/.juliarc and enjoy~
Since there are several changes in the REPL after julia 0.7, these codes do not work in old versions.
EDIT: Well, actually there need a little bit more efforts to make it work in .juliarc.jl. Try this code:
atreplinit() do repl
repl.interface = Base.REPL.setup_interface(repl)
repl = Base.active_repl.interface.modes[1]
repl.prompt = () -> "julia[$(length(repl.hist.history) - repl.hist.start_idx + 1)] >"
end

Why does julia express this expression in this complex way?

I followed the documentation of julia:
julia> :(a in (1,2,3))
:($(Expr(:in, :a, :((1,2,3)))))
Now that :(a in (1,2,3))==:($(Expr(:in, :a, :((1,2,3))))), why does julia express this expression in this way? And what does $ exactly means? It seems to me that $ just evaluates the next expression in a global scope. I found the documentation unclear about this.
The reason :(a in (1,2,3)) is displayed awkwardly as :($(Expr(...))) is because the show function for Expr typed objects (show_unquoted in show.jl) does not understand the in infix operator and fallbacks into a generic printing format.
Essentially it is the same as :(1 + 1) except that show_unquoted recognizes + as an infix operator and formats it nicely.
In any case, :(...) and $(...) are inverse operators in some sense, so :($(..thing..)) is exactly like ..thing.., which in this case is Expr(:in,:a,:((1,2,3))).
One can see this weirdness in :(1+1) for example. The output is of Expr type, as typeof(:(1+1))==Expr confirms. It is actually Expr(:+,1,1), but typing Expr(:+,1,1) on the REPL will show :($(Expr(:+,1,1))) - the generic formatting style of Expr typed objects.
Fixing show.jl to handle in could be a nice change. But the issue is harmless and concerns display formatting.
$ is the interpolation command, Julia use this notation to interpolate Strings as well as Expression:
julia> a=1;
julia> "test $a" # => "test 1"
julia> :(b+$a) # => :(b + 1)
When you type a command in Julia REPL, it tries to evaluates the command and if the code do not have ; char at the end it prints the result, so it's more related to printing functions, that what will be seen on REPL, when a command executes.
so if you want to see the real contents of a variable one possibility is to use dump function:
julia> dump(:(a+b))
Expr
head: Symbol call
args: Array(Any,(3,))
1: Symbol +
2: Symbol a
3: Symbol b
typ: Any
julia> dump(:(a in b))
Expr
head: Symbol in
args: Array(Any,(2,))
1: Symbol a
2: Symbol b
typ: Any
It's clear from above tests, that both expressions use a common data structure of Expr with head, args and typ without any $ inside.
Now try to evaluate and print result:
julia> :(a in b)
:($(Expr(:in, :a, :b)))
julia> :(a+b)
:(a + b)
We already know that both command create a same structure but REPL can't show the result of :(a in b) better that an Expr of result of another Expr and it's why there in a $ inside. But when dealing with :(a+b), REPL do more intelligently and understands that this:
Expr
head: Symbol call
args: Array(Any,(3,))
1: Symbol +
2: Symbol a
3: Symbol b
typ: Any
is equal to :(a+b).

Resources