Parse an array of strings - julia

I have a 1D array of strings ( Array{String,1} ) which describe a matrix of Floats (see below). I need to parse this matrix. Any slick suggestions?
Julia 1.5
MacOS
Yes, I did read this in from a file. I don't want to read the whole thing in using CSV, because I want to maintain the option to read the entire file using memory I/O, which I don't think CSV has. Plus, I have some complex lines including strings and numbers, and strings and strings that I need to parse, which kind of rules out DelimitedFiles. The columns are separated by two spaces.
julia> lines[24+member_total:idx-1]
49-element Array{String,1}:
"0.0000000E+00 0.0000000E+00 0.0000000E+00 1.3308000E+01"
"0.0000000E+00 0.0000000E+00 1.9987500E-01 1.3308000E+01"
"0.0000000E+00 0.0000000E+00 1.1998650E+00 1.3308000E+01"
"0.0000000E+00 0.0000000E+00 2.1998550E+00 1.3308000E+01"
"0.0000000E+00 0.0000000E+00 3.1998450E+00 1.3308000E+01"
"0.0000000E+00 0.0000000E+00 4.1998350E+00 1.3308000E+01"
⋮
"0.0000000E+00 0.0000000E+00 5.9699895E+01 1.4000000E-01"
"0.0000000E+00 0.0000000E+00 6.0199890E+01 1.0100000E-01"
"0.0000000E+00 0.0000000E+00 6.0699885E+01 6.2000000E-02"
"0.0000000E+00 0.0000000E+00 6.1199880E+01 2.3000000E-02"
"0.0000000E+00 0.0000000E+00 6.1500000E+01 0.0000000E+00"

I am strongly against reinventing the wheel and using custom-made parsers due to practivcal robustness of such solutions in production.
If your file is in a single String use:
using DelimitedFiles
readdlm(IOBuffer(strs))
If your file as a Vector of Strings use
cat(readdlm.(IOBuffer.(strsa))...,dims=1)
Finally, there is not conflict in using memory maps togehther with CSV:
using Mmap
s = open("d.txt") # d.txt contains your lines
# if you want to read & wrtie use "w+" option
m = Mmap.mmap(s, Vector{UInt8}) # memory mapping of your file
readdlm(IOBuffer(m))
At the same time you can always set the stream to the beginning and read the data regardless the memory map:
seek(s,0)
readdlm(s)
seek(s,0) # reset the stream

strs = ["0.0000000E+00 0.0000000E+00 0.0000000E+00 1.3308000E+01",
"0.0000000E+00 0.0000000E+00 1.9987500E-01 1.3308000E+01",
"0.0000000E+00 0.0000000E+00 1.1998650E+00 1.3308000E+01"]
mapreduce(vcat, strs) do s
(parse.(Float64, split(s, " ")))'
end
3×4 Array{Float64,2}:
0.0 0.0 0.0 13.308
0.0 0.0 0.199875 13.308
0.0 0.0 1.19986 13.308

I made a work around. Not the most slick thing, but it works...
function rmspaces(line)
line = replace(line, "\t" => " ")
# println("line: ", line)
while occursin(" ", line)
line = replace(line, " "=>" ")
# println("line: ", line)
end
return line
end
function readmatrix(lines, numcolumns::Int64; type=Float64)
#Remove the spaces to one
for i=1:length(lines)
lines[i] = rmspaces(lines[i])
end
matrix = zeros(length(lines), numcolumns)
for i=1:length(lines)
idx = 1 # set the initial stop at the beginning
spot = 1
for j=1:length(lines[i])
if lines[i][j]==' ' && j>1 #Stops at spaces
number = parse(type,lines[i][idx:j]) #from the last stop to this one
idx = j #Set this stop in memory
matrix[i,spot] = number
spot += 1
end
end
if spot<numcolumns+1 #If there isn't a space after the last number,
#we need to attach the last number in every row. If the last number
#was appended, then the spot will be increased to be more than the number
#of columns.
number = parse(type, lines[i][idx:end])
matrix[i,spot] = number
end
end
return matrix
end

Related

Function that sets an exponent in string in Julia

I am looking for a function that does the following rending:
f("2") = 2²
f("15") = 2¹⁵
I tried f(s) = "2\^($s)" but this doesn't seem to be a valid exponent as I can't TAB.
You can try e.g.:
julia> function f(s::AbstractString)
codes = Dict(collect("1234567890") .=> collect("¹²³⁴⁵⁶⁷⁸⁹⁰"))
return "2" * map(c -> codes[c], s)
end
f (generic function with 1 method)
julia> f("2")
"2²"
julia> f("15")
"2¹⁵"
(I have not optimized it for speed, but I hope this is fast enough with the benefit of being easy to read the code)
this should be a little faster, and uses replace:
function exp2text(x)
two = '2'
exponents = ('⁰', '¹', '²', '³', '⁴', '⁵', '⁶', '⁷', '⁸', '⁹')
#'⁰':'⁹' does not contain the ranges
exp = replace(x,'0':'9' =>i ->exponents[Int(i)-48+1])
#Int(i)-48+1 returns the number of the character if the character is a number
return two * exp
end
in this case, i used the fact that replace can accept a Pair{collection,function} that does:
if char in collection
replace(char,function(char))
end

InexactError: Int64(::Float64)

I am still learning the language Julia and i have this error. I am writing an mosquito population model and i am trying to run my main function a 100 times. This main function uses many other functions to calculate the subpopulation levels.
# Importing KNMI data
xf = XLSX.readxlsx("C:/Scriptie_mosquitoes/knmi_csv.xlsx")
sh = xf["knmi_csv"]
temperature = sh["B3:B368"]
precip = sh["F3:F368"]
subpopulation_amount = 100
imat_list1 = zeros(100,length(temperature))
imat_list = Array{Float64}(imat_list1)
adul_list1 = zeros(100,length(temperature))
adul_list = Array{Float64}(adul_list1)
egg_list1 = zeros(100,length(temperature))
egg_list = Array{Float64}(egg_list1)
diaegg_list1 = zeros(100,length(temperature))
diaegg_list = Array{Float64}(diaegg_list1)
imat_list[1] = 100.0
adul_list[1] = 1000.0
egg_list[1] = 100.0
diaegg_list[1] = 100.0
for counter = 1:1:subpopulation_amount
u = Distributions.Normal()
temp_change = rand(u)
tempa = temperature .+ temp_change
println(tempa)
e = Distributions.Normal()
precip_change = rand(e)
println("hallo", precip_change)
println(counter,tempa,precip,precip_change)
main(counter,tempa::Array{Float64,2},precip::Array{Any,2},precip_change::Float64,imat_list::Array{Float64,2},adul_list::Array{Float64,2},egg_list::Array{Float64,2},diaegg_list::Array{Float64,2})
end
However i get this error which i tried to fix with all the Float64 stuf. I doesn't work unfortunatly. I hope some of you guys see the problem or can help me with understanding the error message.
ERROR: InexactError: Int64(87.39533010546728)
Stacktrace:
[1] Int64 at .\float.jl:710 [inlined]
[2] convert at .\number.jl:7 [inlined]
[3] setindex! at .\array.jl:825 [inlined]
[4] main(::Int64, ::Array{Float64,2}, ::Array{Any,2}, ::Float64, ::Array{Float64,2}, ::Array{Float64,2}, ::Array{Float64,2}, ::Array{Float64,2}) at .\REPL[905]:19
[5] top-level scope at .\REPL[938]:10
You can check the documentation for InexactError by typing ?InexactError:
help?> InexactError
search: InexactError
InexactError(name::Symbol, T, val)
Cannot exactly convert val to type T in a method of function name.
I think that explains it nicely. There is no Int64 that represents the value 87.39533010546728.
You have a variety of options available. Check their help to learn more about them:
julia> trunc(Int, 87.39533010546728)
87
julia> Int(round(87.39533010546728))
87
julia> Int(floor(87.39533010546728))
87
We do not see the code of main. However it seems that you are using values of one of the Arrays that you have as its argument to use for indexing some vector in your code. And since vector indices need to be integers it fails. Most likely some variable is in wrong place in your main - look around [] operators.
When debugging you could also try to change your Arrays to Int elements and see which change causes the problem to stop. E.g. round.(Int, tempa) etc.
The problem is just what it says: you cannot exactly represent a decimal number (87.39) as an integer.
You need to decide what you want to do here - one option is to just round() your decimal number before converting it to an integer.
It's hard to say from the code you posted where exactly the error occurs, but one potentially less obvious way for this to happen is if you try to index into an array (e.g. my_array[i]), and your calculations lead to i having a non-integer value.

For-loop where index does not start at 1 or 0 and is incremented by 30 instead of 1

I have the following:
include("as_mod.jl")
solvetimes = 50:200
timevector = Array{Float64}(undef,length(solvetimes))
for i in solvetimes
global T
T = i
include("as_dat_large.jl")
m, x, z = build_model(true,true)
setsolver(m, GurobiSolver(MIPGap = 2e-2, TimeLimit = 3600))
solve(m)
timevector[i-49] = getsolvetime(m)
end
plot(solvetimes,log.(timevector),
title = "solvetimes vs T", xlabel = "T", ylabel = "log(t)")
And this works great as long as my solvetimes vector is incremented by only 1. However, I'm interested in an 30-increment and it obviously does not work then since my timevector then goes out of bounds. Is there any way of solving this issue? I read about and attempted to use the push! function but to no avail.
I apologize if my question is not good but I don't see how to improve it. The question is essentially about for loops where the index does NOT start at 1 and is only incremented with 1 up to an upper bound, but rather a non-one increment and a start different from 0 or one, if that makes sense.
The : syntax in 50:200 or 50:30:200 creates a range object in Julia. These range objects are not only iterable but also implement the method getindex which means that you can simply access the steps in the range with a[index] syntax as if it is an array.
julia> solvetimes = 50:30:200 # 50, 80, 110, 140, ...
50:30:200
julia> solvetimes[3]
110
You can solve your problem in several ways.
First, you can introduce an itercount variable to count the number of iterations and know at which index of timevector you will put the solve-time.
solvetimes = 50:30:200 # increment by 30
timevector = Vector{Float64}(undef,length(solvetimes))
itercount = 1
for i in solvetimes
...
timevector[itercount] = getsolvetime(m)
global itercount
itercount += 1
end
Other way would be to create an empty timevector and push!.
solvetimes = 50:30:200 # increment by 30
timevector = Float64[] # an empty Float64 vector
for i in solvetimes
...
push!(timevector, getsolvetime(m)) # push the value `getsolvetime(m)` into `timevector`
end
push! operation may require julia to allocate memory and copy data to compensate increasing array size, hence might not be very efficient, although it does not really matter in your problem.
Another way would be to iterate from 1 to length of solvetimes. Your loop control variable is still incremented one-by-one but now it represents the index in solvetimes rather than the time point.
solvetimes = 50:30:200 # increment by 30
len = length(solvetimes)
timevector = Vector{Float64}(undef, len)
for i in 1:len
global T
T = solvetimes[i]
...
timevector[i] = getsolvetime(m)
end
With these modifications, kth value in timevector, timevector[k] stands for the solve-time for solvetime[k].
You might also find other ways to solve the issue, like using Dicts etc.

Julia: invoke a function by a given string

Does Julia support the reflection just like java?
What I need is something like this:
str = ARGS[1] # str is a string
# invoke the function str()
The Good Way
The recommended way to do this is to convert the function name to a symbol and then look up that symbol in the appropriate namespace:
julia> fn = "time"
"time"
julia> Symbol(fn)
:time
julia> getfield(Main, Symbol(fn))
time (generic function with 2 methods)
julia> getfield(Main, Symbol(fn))()
1.448981716732318e9
You can change Main here to any module to only look at functions in that module. This lets you constrain the set of functions available to only those available in that module. You can use a "bare module" to create a namespace that has only the functions you populate it with, without importing all name from Base by default.
The Bad Way
A different approach that is not recommended but which many people seem to reach for first is to construct a string for code that calls the function and then parse that string and evaluate it. For example:
julia> eval(parse("$fn()")) # NOT RECOMMENDED
1.464877410113412e9
While this is temptingly simple, it's not recommended since it is slow, brittle and dangerous. Parsing and evaling code is inherently much more complicated and thus slower than doing a name lookup in a module – name lookup is essentially just a hash table lookup. In Julia, where code is just-in-time compiled rather than interpreted, eval is much slower and more expensive since it doesn't just involve parsing, but also generating LLVM code, running optimization passes, emitting machine code, and then finally calling a function. Parsing and evaling a string is also brittle since all intended meaning is discarded when code is turned into text. Suppose, for example, someone accidentally provides an empty function name – then the fact that this code is intended to call a function is completely lost by accidental similarity of syntaxes:
julia> fn = ""
""
julia> eval(parse("$fn()"))
()
Oops. That's not what we wanted at all. In this case the behavior is fairly harmless but it could easily be much worse:
julia> fn = "println(\"rm -rf /important/directory\"); time"
"println(\"rm -rf /important/directory\"); time"
julia> eval(parse("$fn()"))
rm -rf /important/directory
1.448981974309033e9
If the user's input is untrusted, this is a massive security hole. Even if you trust the user, it is still possible for them to accidentally provide input that will do something unexpected and bad. The name lookup approach avoids these issues:
julia> getfield(Main, Symbol(fn))()
ERROR: UndefVarError: println("rm -rf /important/directory"); time not defined
in eval(::Module, ::Any) at ./boot.jl:225
in macro expansion at ./REPL.jl:92 [inlined]
in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:46
The intent of looking up a name and then calling it as a function is explicit, instead of implicit in the generated string syntax, so at worst one gets an error about a strange name being undefined.
Performance
If you're going to call a dynamically specified function in an inner loop or as part of some recursive computation, you will want to avoid doing a getfield lookup every time you call the function. In this case all you need to do is make a const binding to the dynamically specified function before defining the iterative/recursive procedure that calls it. For example:
fn = "deg2rad" # converts angles in degrees to radians
const f = getfield(Main, Symbol(fn))
function fast(n)
t = 0.0
for i = 1:n
t += f(i)
end
return t
end
julia> #time fast(10^6) # once for JIT compilation
0.010055 seconds (2.97 k allocations: 142.459 KB)
8.72665498661791e9
julia> #time fast(10^6) # now it's fast
0.003055 seconds (6 allocations: 192 bytes)
8.72665498661791e9
julia> #time fast(10^6) # see?
0.002952 seconds (6 allocations: 192 bytes)
8.72665498661791e9
The binding f must be constant for optimal performance, since otherwise the compiler can't know that you won't change f to point at another function at any time (or even something that's not a function), so it has to emit code that looks f up dynamically on every loop iteration – effectively the same thing as if you manually call getfield in the loop. Here, since f is const, the compiler knows f can't change so it can emit fast code that just calls the right function directly. But the compiler can sometimes do even better than that – in this case it actually inlines the implementation of the deg2rad function, which is just a multiplication by pi/180:
julia> #code_llvm fast(100000)
define double #julia_fast_51089(i64) #0 {
top:
%1 = icmp slt i64 %0, 1
br i1 %1, label %L2, label %if.preheader
if.preheader: ; preds = %top
br label %if
L2.loopexit: ; preds = %if
br label %L2
L2: ; preds = %L2.loopexit, %top
%t.0.lcssa = phi double [ 0.000000e+00, %top ], [ %5, %L2.loopexit ]
ret double %t.0.lcssa
if: ; preds = %if.preheader, %if
%t.04 = phi double [ %5, %if ], [ 0.000000e+00, %if.preheader ]
%"#temp#.03" = phi i64 [ %2, %if ], [ 1, %if.preheader ]
%2 = add i64 %"#temp#.03", 1
%3 = sitofp i64 %"#temp#.03" to double
%4 = fmul double %3, 0x3F91DF46A2529D39 ; deg2rad(x) = x*(pi/180)
%5 = fadd double %t.04, %4
%6 = icmp eq i64 %"#temp#.03", %0
br i1 %6, label %L2.loopexit, label %if
}
If you need to do this with many different dynamically specified functions, then you can even pass the function to be called in as an argument:
function fast(f,n)
t = 0.0
for i = 1:n
t += f(i)
end
return t
end
julia> #time fast(getfield(Main, Symbol(fn)), 10^6)
0.007483 seconds (1.70 k allocations: 76.670 KB)
8.72665498661791e9
julia> #time fast(getfield(Main, Symbol(fn)), 10^6)
0.002908 seconds (6 allocations: 192 bytes)
8.72665498661791e9
This generates the same fast code as single-argument fast above, but will generate a new version for every different function f that you call it with.

Getting count of occurrences for X in string

Im looking for a function like Pythons
"foobar, bar, foo".count("foo")
Could not find any functions that seemed able to do this, in a obvious way. Looking for a single function or something that is not completely overkill.
Julia-1.0 update:
For single-character count within a string (in general, any single-item count within an iterable), one can use Julia's count function:
julia> count(i->(i=='f'), "foobar, bar, foo")
2
(The first argument is a predicate that returns a ::Bool).
For the given example, the following one-liner should do:
julia> length(collect(eachmatch(r"foo", "bar foo baz foo")))
2
Julia-1.7 update:
Starting with Julia-1.7 Base.Fix2 can be used, through ==('f') below, as to shorten and sweeten the syntax:
julia> count(==('f'), "foobar, bar, foo")
2
What about regexp ?
julia> length(matchall(r"ba", "foobar, bar, foo"))
2
I think that right now the closest built-in thing to what you're after is the length of a split (minus 1). But it's not difficult to specifically create what you're after.
I could see a searchall being generally useful in Julia's Base, similar to matchall. If you don't care about the actual indices, you could just use a counter instead of growing the idxs array.
function searchall(s, t; overlap::Bool=false)
idxfcn = overlap ? first : last
r = findnext(s, t, firstindex(t))
idxs = typeof(r)[] # Or to only count: n = 0
while r !== nothing
push!(idxs, r) # n += 1
r = findnext(s, t, idxfcn(r) + 1)
end
idxs # return n
end
Adding an answer to this which allows for interpolation:
julia> a = ", , ,";
julia> b = ",";
julia> length(collect(eachmatch(Regex(b), a)))
3
Actually, this solution breaks for some simple cases due to use of Regex. Instead one might find this useful:
"""
count_flags(s::String, flag::String)
counts the number of flags `flag` in string `s`.
"""
function count_flags(s::String, flag::String)
counter = 0
for i in 1:length(s)
if occursin(flag, s)
s = replace(s, flag=> "", count=1)
counter+=1
else
break
end
end
return counter
end
Sorry to post another answer instead of commenting previous one, but i've not managed how to deal with code blocks in comments :)
If you don't like regexps, maybe a tail recursive function like this one (using the search() base function as Matt suggests) :
function mycount(what::String, where::String)
function mycountacc(what::String, where::String, acc::Int)
res = search(where, what)
res == 0:-1 ? acc : mycountacc(what, where[last(res) + 1:end], acc + 1)
end
what == "" ? 0 : mycountacc(what, where, 0)
end
This is simple and fast (and does not overflow the stack):
function mycount2(where::String, what::String)
numfinds = 0
starting = 1
while true
location = search(where, what, starting)
isempty(location) && return numfinds
numfinds += 1
starting = location.stop + 1
end
end
one liner: (Julia 1.3.1):
julia> sum([1 for i = eachmatch(r"foo", "foobar, bar, foo")])
2
Since Julia 1.3, there has been a count method that does exactly this.
count(
pattern::Union{AbstractChar,AbstractString,AbstractPattern},
string::AbstractString;
overlap::Bool = false,
)
Return the number of matches for pattern in string.
This is equivalent to calling length(findall(pattern, string)) but more
efficient.
If overlap=true, the matching sequences are allowed to overlap indices in the
original string, otherwise they must be from disjoint character ranges.
│ Julia 1.3
│
│ This method requires at least Julia 1.3.
julia> count("foo", "foobar, bar, foo")
2
julia> count("ana", "bananarama")
1
julia> count("ana", "bananarama", overlap=true)
2

Resources