Julia: How to read in and output characters with diacritics? - julia

Processing ASCII characters beyond the range 1-127 can easily crash Julia.
mystring = "A-Za-zÀ-ÿŽž"
for i in 1:length(mystring)
print(i,":::")
print(Int(mystring[i]),"::" )
println( mystring[i] )
end
gives me
1:::65::A
2:::45::-
3:::90::Z
4:::97::a
5:::45::-
6:::122::z
7:::192::À
8:::ERROR: LoadError: StringIndexError("A-Za-zÀ-ÿŽž", 8)
Stacktrace:
[1] string_index_err(::String, ::Int64) at .\strings\string.jl:12
[2] getindex_continued(::String, ::Int64, ::UInt32) at .\strings\string.jl:220
[3] getindex(::String, ::Int64) at .\strings\string.jl:213
[4] top-level scope at R:\_LV\STZ\Web_admin\Languages\Action\Returning\chars.jl:5
[5] include(::String) at .\client.jl:457
[6] top-level scope at REPL[18]:1
It crashes after outputting the first character outside the normal range, rather than during that output, which is mentioned in the answer to String Index Error (Julia)
If declaring the values in Julia one should declare them as Unicode, but I have these characters in my input.
The manual says that Julia looks at the locale, but is there an "everywhere" locale?
Is there some way to handle input and output of these characters in Julia?
I am working on Windows10, but I can switch to Linux if that works better for this.

Use eachindex to get a list of valid indices in your string:
julia> mystring = "A-Za-zÀ-ÿŽž"
"A-Za-zÀ-ÿŽž"
julia> for i in eachindex(mystring)
print(i, ":::")
print(Int(mystring[i]), "::")
println(mystring[i])
end
1:::65::A
2:::45::-
3:::90::Z
4:::97::a
5:::45::-
6:::122::z
7:::192::À
9:::45::-
10:::255::ÿ
12:::381::Ž
14:::382::ž
Your issue is related to the fact that Julia uses byte-indexing of strings, as is explained in the Julia Manual.
For example character À takes two bytes, therefore, since its location is 7 the next index is 9 not 8.
In UTF-8 encoding which is used by default by Julia only ASCII characters take one byte, all other characters take 2, 3 or 4 bytes, see https://en.wikipedia.org/wiki/UTF-8#Encoding.
For example for À you get two bytes:
julia> codeunits("À")
2-element Base.CodeUnits{UInt8, String}:
0xc3
0x80
I have also written a post at https://bkamins.github.io/julialang/2020/08/13/strings.html that tries to explain how byte-indexing vs character-indexing works in Julia.
If you have additional questions please comment.

String indices in Julia refer to code units (= bytes for UTF-8), the fixed-width building blocks that are used to encode arbitrary characters (code points). This means that not every index into a String is necessarily a valid index for a character. If you index into a string at such an invalid byte index, an error is thrown.
You can use enumerate to get the value and the number of iteration.
mystring = "A-Za-zÀ-ÿŽž"
for (i, x) in enumerate(mystring)
print(i,":::")
print(Int(x),"::")
println(x)
end
#1:::65::A
#2:::45::-
#3:::90::Z
#4:::97::a
#5:::45::-
#6:::122::z
#7:::192::À
#8:::45::-
#9:::255::ÿ
#10:::381::Ž
#11:::382::ž
In case you need the value and index of the string in bytes you can use pairs.
for (i, x) in pairs(mystring)
print(i,":::")
print(Int(x),"::")
println(x)
end
#1:::65::A
#2:::45::-
#3:::90::Z
#4:::97::a
#5:::45::-
#6:::122::z
#7:::192::À
#9:::45::-
#10:::255::ÿ
#12:::381::Ž
#14:::382::ž

In preparation for de-minimising my MCVE for what I want to do, which involves advancing the string position not just in a for-all loop, I used the information in the post written by Bogumił Kamiński, to come up with this:
mystring = "A-Za-zÀ-ÿŽž"
for i in 1:length(mystring)
print(i,":::")
mychar = mystring[nextind(mystring, 0, i)]
print(Int(mychar), "::")
println( mychar )
end

Related

iterating 2D array in Elixir

I am new to Elixir language and I am having some issues while writing a piece of code.
What I am given is a 2D array like
list1 = [
[1 ,2,3,4,"nil"],
[6,7,8,9,10,],
[11,"nil",13,"nil",15],
[16,17,"nil",19,20] ]
Now, what I've to do is to get all the elements that have values between 10 and 20, so what I'm doing is:
final_list = []
Enum.each(list1, fn row ->
Enum.each(row, &(if (&1 >= 10 and &1 <= 99) do final_list = final_list ++ &1 end))
end
)
Doing this, I'm expecting that I'll get my list of numbers in final_list but I'm getting blank final list with a warning like:
warning: variable "final_list" is unused (there is a variable with the same name in the context, use the pin operator (^) to match on it or prefix this variable with underscore if it is not meant to be used)
iex:5
:ok
and upon printing final_list, it is not updated.
When I try to check whether my code is working properly or not, using IO.puts as:
iex(5)> Enum.each(list1, fn row -> ...(5)> Enum.each(row, &(if (&1 >= 10 and &1 <= 99) do IO.puts(final_list ++ &1) end))
...(5)> end
...(5)> )
The Output is:
10
11
13
15
16
17
19
20
:ok
What could I possibly be doing wrong here? Shouldn't it add the elements to the final_list?
If this is wrong ( probably it is), what should be the possible solution to this?
Any kind of help will be appreciated.
As mentioned in Adam's comments, this is a FAQ and the important thing is the message "warning: variable "final_list" is unused (there is a variable with the same name in the context, use the pin operator (^) to match on it or prefix this variable with underscore if it is not meant to be used)" This message actually indicates a very serious problem.
It tells you that the assignment "final_list = final_list ++ &1" is useless since it just creates a local variable, hiding the external one. Elixir variables are not mutable so you need to reorganize seriously your code.
The simplest way is
final_list =
for sublist <- list1,
n <- sublist,
is_number(n),
n in 10..20,
do: n
Note that every time you write final_list = ..., you actually declare a new variable with the same name, so the final_list you declared inside your anonymous function is not the final_list outside the anonymous function.

InexactError: Int64(::Float64)

I am still learning the language Julia and i have this error. I am writing an mosquito population model and i am trying to run my main function a 100 times. This main function uses many other functions to calculate the subpopulation levels.
# Importing KNMI data
xf = XLSX.readxlsx("C:/Scriptie_mosquitoes/knmi_csv.xlsx")
sh = xf["knmi_csv"]
temperature = sh["B3:B368"]
precip = sh["F3:F368"]
subpopulation_amount = 100
imat_list1 = zeros(100,length(temperature))
imat_list = Array{Float64}(imat_list1)
adul_list1 = zeros(100,length(temperature))
adul_list = Array{Float64}(adul_list1)
egg_list1 = zeros(100,length(temperature))
egg_list = Array{Float64}(egg_list1)
diaegg_list1 = zeros(100,length(temperature))
diaegg_list = Array{Float64}(diaegg_list1)
imat_list[1] = 100.0
adul_list[1] = 1000.0
egg_list[1] = 100.0
diaegg_list[1] = 100.0
for counter = 1:1:subpopulation_amount
u = Distributions.Normal()
temp_change = rand(u)
tempa = temperature .+ temp_change
println(tempa)
e = Distributions.Normal()
precip_change = rand(e)
println("hallo", precip_change)
println(counter,tempa,precip,precip_change)
main(counter,tempa::Array{Float64,2},precip::Array{Any,2},precip_change::Float64,imat_list::Array{Float64,2},adul_list::Array{Float64,2},egg_list::Array{Float64,2},diaegg_list::Array{Float64,2})
end
However i get this error which i tried to fix with all the Float64 stuf. I doesn't work unfortunatly. I hope some of you guys see the problem or can help me with understanding the error message.
ERROR: InexactError: Int64(87.39533010546728)
Stacktrace:
[1] Int64 at .\float.jl:710 [inlined]
[2] convert at .\number.jl:7 [inlined]
[3] setindex! at .\array.jl:825 [inlined]
[4] main(::Int64, ::Array{Float64,2}, ::Array{Any,2}, ::Float64, ::Array{Float64,2}, ::Array{Float64,2}, ::Array{Float64,2}, ::Array{Float64,2}) at .\REPL[905]:19
[5] top-level scope at .\REPL[938]:10
You can check the documentation for InexactError by typing ?InexactError:
help?> InexactError
search: InexactError
InexactError(name::Symbol, T, val)
Cannot exactly convert val to type T in a method of function name.
I think that explains it nicely. There is no Int64 that represents the value 87.39533010546728.
You have a variety of options available. Check their help to learn more about them:
julia> trunc(Int, 87.39533010546728)
87
julia> Int(round(87.39533010546728))
87
julia> Int(floor(87.39533010546728))
87
We do not see the code of main. However it seems that you are using values of one of the Arrays that you have as its argument to use for indexing some vector in your code. And since vector indices need to be integers it fails. Most likely some variable is in wrong place in your main - look around [] operators.
When debugging you could also try to change your Arrays to Int elements and see which change causes the problem to stop. E.g. round.(Int, tempa) etc.
The problem is just what it says: you cannot exactly represent a decimal number (87.39) as an integer.
You need to decide what you want to do here - one option is to just round() your decimal number before converting it to an integer.
It's hard to say from the code you posted where exactly the error occurs, but one potentially less obvious way for this to happen is if you try to index into an array (e.g. my_array[i]), and your calculations lead to i having a non-integer value.

Getting the whole AST of the file / complex code

Julia manual states:
Every Julia program starts life as a string:
julia> prog = "1 + 1"
"1 + 1"
I can easily get the AST of the simple expression, or even a function with the help of quote / code_*, or using Meta.parse / Meta.show_sexpr if I have the expression in a string.
The question: Is there any way to get the whole AST of the codepiece, possibly including several atomic expressions? Like, read the source file and convert it to AST?
If you want to do this from Julia instead of FemtoLisp, you can do
function parse_file(path::AbstractString)
code = read(path, String)
Meta.parse("begin $code end")
end
This takes in a file path, reads it and parses it to a big expression that can be evaluated.
This comes from #NHDaly's answer, here:
https://stackoverflow.com/a/54317201/751061
If you already have your file as a string and don’t want to have to read it again, you can instead do
parse_all(code::AbstractString) = Meta.parse("begin $code end")
It was pointed out on Slack by Nathan Daly and Taine Zhao that this code won't work for modules:
julia> eval(parse_all("module M x = 1 end"))
ERROR: syntax: "module" expression not at top level
Stacktrace:
[1] top-level scope at REPL[50]:1
[2] eval at ./boot.jl:331 [inlined]
[3] eval(::Expr) at ./client.jl:449
[4] |>(::Expr, ::typeof(eval)) at ./operators.jl:823
[5] top-level scope at REPL[50]:1
This can be fixed as follows:
julia> eval_all(ex::Expr) = ex.head == :block ? for e in ex eval_all(e) end : eval(e);
julia> eval_all(ex::Expr) = ex.head == :block ? eval.(ex.args) : eval(e);
julia> eval_all(parse_all("module M x = 1 end"));
julia> M.x
1
Since the question asker is not convinced that the above code produces a tree, here is a graph representation of the output of parse_all, clearly showing a tree structure.
In case you're curious, those leaves labelled #= none:1 =# are line number nodes, indicating the line on which each following expression takes place.
As suggested in the comments, one can also apply Meta.show_sexpr to an Expr object to get a more "lispy" representation of the AST without all the pretty printing julia does by default:
julia> (Meta.show_sexpr ∘ Meta.parse)("begin x = 1\n y = 2\n z = √(x^2 + y^2)\n end")
(:block,
:(#= none:1 =#),
(:(=), :x, 1),
:(#= none:2 =#),
(:(=), :y, 2),
:(#= none:3 =#),
(:(=), :z, (:call, :√, (:call, :+, (:call, :^, :x, 2), (:call, :^, :y, 2))))
)
There's jl-parse-file in the FemtoLisp implementation of the Julia parser. You can call it from the Lisp REPL (julia --lisp), and it returns an S-expression for the whole file. Since Julia's Expr is not much different from Lisp S-expressions, that might be enough for you purposes.
I still wonder how one would access the result of this from within Julia. If I understand correctly, the Lisp functions are not exported from libjulia, so there's no direct way to just use a ccall. But maybe a variant of jl_parse_eval_all can be implemented.

Julia convert UInt32 to UInt8 vector

I have the following code and I need to covert several UInt32 variables to UInt8 vectors so then combine them into a single UInt8 vector.
The goal is to take the record I have decoded from a Pcap file and put it into a format that I can append to the end of an existing Pcap file.
The code below takes output from a previous function and returns a hex output of 4 UInt 32's and a vector of UInt8's for the payload.
function pcap_get_record(s::PcapOffline)
rec = PcapRec()
if (!eof(s.file))
rec.ts_sec = s.is_big ? read(s.file, UInt32) : ntoh(read(s.file, UInt32))
rec.ts_usec = s.is_big ? read(s.file, UInt32) : ntoh(read(s.file, UInt32))
rec.incl_len = s.is_big ? read(s.file, UInt32) : ntoh(read(s.file, UInt32))
rec.orig_len = s.is_big ? read(s.file, UInt32) : ntoh(read(s.file, UInt32))
rec.payload = read(s.file, rec.incl_len)
return rec
end
nothing
end
Thanks
Here you are
julia> reinterpret(UInt8, rand(UInt32, 1)) |> Vector
4-element Array{UInt8,1}:
0x4d
0x54
0x34
0xd3
remember to check the byte order.
Update: So I have solved this and I was overthinking what needed to be done.
I just wrote the UInt variable in their raw form and that did the trick.
write(pcap, rec.orig_len) #this is a UInt32
write(pcap, rec.payload) #this is a UInt8 vector
Original:
I was having a hard time making my previous comment readable.
Thanks for the response. I am not however able to get the reinterpret to work with my UInt32 variable.
a = reinterpret(UInt8, rec.ts_usec) |> Vector
ERROR: bitcast: argument size does not match size of target type
Stacktrace:
[1] reinterpret(::Type{UInt8}, ::UInt32) at .\essentials.jl:370
[2] top-level scope at none:0
typeof(rec.ts_usec)
UInt32
after messing around some more I was able to get this to work but this doesn't seem very efficient.
"Edit" I just found that this wont work since it cuts off any leading zeros in the UInt32. example rec.incl_len = 0x00000516 would come out as "516" instead of "00000516" which is needed.
julia> hex(n) = string(n, base = 16, pad = 2)
julia> a = hex2bytes(hex(rec.ts_sec))
4-element Array{UInt8,1}:
0x5b
0x60
0xa3
0xa1

Evaluate expression with local variables

I'm writing a genetic program in order to test the fitness of randomly generated expressions. Shown here is the function to generate the expression as well a the main function. DIV and GT are defined elsewhere in the code:
function create_single_full_tree(depth, fs, ts)
"""
Creates a single AST with full depth
Inputs
depth Current depth of tree. Initially called from main() with max depth
fs Function Set - Array of allowed functions
ts Terminal Set - Array of allowed terminal values
Output
Full AST of typeof()==Expr
"""
# If we are at the bottom
if depth == 1
# End of tree, return function with two terminal nodes
return Expr(:call, fs[rand(1:length(fs))], ts[rand(1:length(ts))], ts[rand(1:length(ts))])
else
# Not end of expression, recurively go back through and create functions for each new node
return Expr(:call, fs[rand(1:length(fs))], create_single_full_tree(depth-1, fs, ts), create_single_full_tree(depth-1, fs, ts))
end
end
function main()
"""
Main function
"""
# Define functional and terminal sets
fs = [:+, :-, :DIV, :GT]
ts = [:x, :v, -1]
# Create the tree
ast = create_single_full_tree(4, fs, ts)
#println(typeof(ast))
#println(ast)
#println(dump(ast))
x = 1
v = 1
eval(ast) # Error out unless x and v are globals
end
main()
I am generating a random expression based on certain allowed functions and variables. As seen in the code, the expression can only have symbols x and v, as well as the value -1. I will need to test the expression with a variety of x and v values; here I am just using x=1 and v=1 to test the code.
The expression is being returned correctly, however, eval() can only be used with global variables, so it will error out when run unless I declare x and v to be global (ERROR: LoadError: UndefVarError: x not defined). I would like to avoid globals if possible. Is there a better way to generate and evaluate these generated expressions with locally defined variables?
Here is an example for generating an (anonymous) function. The result of eval can be called as a function and your variable can be passed as parameters:
myfun = eval(Expr(:->,:x, Expr(:block, Expr(:call,:*,3,:x) )))
myfun(14)
# returns 42
The dump function is very useful to inspect the expression that the parsers has created. For two input arguments you would use a tuple for example as args[1]:
julia> dump(parse("(x,y) -> 3x + y"))
Expr
head: Symbol ->
args: Array{Any}((2,))
1: Expr
head: Symbol tuple
args: Array{Any}((2,))
1: Symbol x
2: Symbol y
typ: Any
2: Expr
[...]
Does this help?
In the Metaprogramming part of the Julia documentation, there is a sentence under the eval() and effects section which says
Every module has its own eval() function that evaluates expressions in its global scope.
Similarly, the REPL help ?eval will give you, on Julia 0.6.2, the following help:
Evaluate an expression in the given module and return the result. Every Module (except those defined with baremodule) has its own 1-argument definition of eval, which evaluates expressions in that module.
I assume, you are working in the Main module in your example. That's why you need to have the globals defined there. For your problem, you can use macros and interpolate the values of x and y directly inside the macro.
A minimal working example would be:
macro eval_line(a, b, x)
isa(a, Real) || (warn("$a is not a real number."); return :(throw(DomainError())))
isa(b, Real) || (warn("$b is not a real number."); return :(throw(DomainError())))
return :($a * $x + $b) # interpolate the variables
end
Here, #eval_line macro does the following:
Main> #macroexpand #eval_line(5, 6, 2)
:(5 * 2 + 6)
As you can see, the values of macro's arguments are interpolated inside the macro and the expression is given to the user accordingly. When the user does not behave,
Main> #macroexpand #eval_line([1,2,3], 7, 8)
WARNING: [1, 2, 3] is not a real number.
:((Main.throw)((Main.DomainError)()))
a user-friendly warning message is provided to the user at parse-time, and a DomainError is thrown at run-time.
Of course, you can do these things within your functions, again by interpolating the variables --- you do not need to use macros. However, what you would like to achieve in the end is to combine eval with the output of a function that returns Expr. This is what the macro functionality is for. Finally, you would simply call your macros with an # sign preceding the macro name:
Main> #eval_line(5, 6, 2)
16
Main> #eval_line([1,2,3], 7, 8)
WARNING: [1, 2, 3] is not a real number.
ERROR: DomainError:
Stacktrace:
[1] eval(::Module, ::Any) at ./boot.jl:235
EDIT 1. You can take this one step further, and create functions accordingly:
macro define_lines(linedefs)
for (name, a, b) in eval(linedefs)
ex = quote
function $(Symbol(name))(x) # interpolate name
return $a * x + $b # interpolate a and b here
end
end
eval(ex) # evaluate the function definition expression in the module
end
end
Then, you can call this macro to create different line definitions in the form of functions to be called later on:
#define_lines([
("identity_line", 1, 0);
("null_line", 0, 0);
("unit_shift", 0, 1)
])
identity_line(5) # returns 5
null_line(5) # returns 0
unit_shift(5) # returns 1
EDIT 2. You can, I guess, achieve what you would like to achieve by using a macro similar to that below:
macro random_oper(depth, fs, ts)
operations = eval(fs)
oper = operations[rand(1:length(operations))]
terminals = eval(ts)
ts = terminals[rand(1:length(terminals), 2)]
ex = :($oper($ts...))
for d in 2:depth
oper = operations[rand(1:length(operations))]
t = terminals[rand(1:length(terminals))]
ex = :($oper($ex, $t))
end
return ex
end
which will give the following, for instance:
Main> #macroexpand #random_oper(1, [+, -, /], [1,2,3])
:((-)([3, 3]...))
Main> #macroexpand #random_oper(2, [+, -, /], [1,2,3])
:((+)((-)([2, 3]...), 3))
Thanks Arda for the thorough response! This helped, but part of me thinks there may be a better way to do this as it seems too roundabout. Since I am writing a genetic program, I will need to create 500 of these ASTs, all with random functions and terminals from a set of allowed functions and terminals (fs and ts in the code). I will also need to test each function with 20 different values of x and v.
In order to accomplish this with the information you have given, I have come up with the following macro:
macro create_function(defs)
for name in eval(defs)
ex = quote
function $(Symbol(name))(x,v)
fs = [:+, :-, :DIV, :GT]
ts = [x,v,-1]
return create_single_full_tree(4, fs, ts)
end
end
eval(ex)
end
end
I can then supply a list of 500 random function names in my main() function, such as ["func1, func2, func3,.....". Which I can eval with any x and v values in my main function. This has solved my issue, however, this seems to be a very roundabout way of doing this, and may make it difficult to evolve each AST with each iteration.

Resources