How to convert from char to string in Julia? - julia

I'm trying to convert a word to an array of one-hot encoded arrays using a simple vocabulary. The dictionary I've constructed is keyed off of characters.
vocab = "abc"
char_id = Dict([ (index, char) for (char, index) in enumerate(vocab) ])
# Dict{Char,Int64} with 3 entries:
# 'a' => 1
# 'c' => 3
# 'b' => 2
function char_to_one_hot(char, char_id, max_length)
one_hot = zeros(max_length)
setindex!(one_hot, 1.0, char_id[char])
end
function word_to_one_hot(word, char_id, max_length)
map((char) -> char_to_one_hot(char, char_id, max_length), split(word, ""))
end
word_to_one_hot(word, char_id, max_length)
Unfortunately, this returns an error because the char_id Dict is uses char keys instead of strings. How can I convert either the dictionary to use string values as keys, or chars to strings so the comparison matches?
ERROR: KeyError: key "a" not found
Stacktrace:
[1] getindex at ./dict.jl:467 [inlined]
[2] char_to_one_hot(::SubString{String}, ::Dict{Char,Int64}, ::Int64) at ./REPL[456]:3
[3] #78 at ./REPL[457]:2 [inlined]
[4] iterate at ./generator.jl:47 [inlined]
[5] _collect(::Array{SubString{String},1}, ::Base.Generator{Array{SubString{String},1},var"#78#79"{Dict{Char,Int64},Int64}}, ::Base.EltypeUnknown, ::Base.HasShape{1}) at ./array.jl:699
[6] collect_similar at ./array.jl:628 [inlined]
[7] map at ./abstractarray.jl:2162 [inlined]
[8] word_to_one_hot(::String, ::Dict{Char,Int64}, ::Int64) at ./REPL[457]:2
[9] top-level scope at REPL[458]:1

A string can already be seen as a collection of characters, so you shouldn't need to split the word.
However, map is specialized in such a way that on strings you can only map functions which return chars. And strings are also treated as scalars by the broadcasting system. This leaves us with a few options: a simple for loop or maybe a generator/comprehension.
I think in this case I'd go with the comprehension:
function char_to_one_hot(char, char_id, max_length)
one_hot = zeros(max_length)
setindex!(one_hot, 1.0, char_id[char])
end
function word_to_one_hot(word, char_id, max_length)
[char_to_one_hot(char, char_id, max_length) for char in word]
end
which I think gives what you'd expect:
julia> vocab = "abc"
"abc"
julia> char_id = Dict([ (index, char) for (char, index) in enumerate(vocab) ])
Dict{Char,Int64} with 3 entries:
'a' => 1
'c' => 3
'b' => 2
julia> word_to_one_hot("acb", char_id, 5)
3-element Array{Array{Float64,1},1}:
[1.0, 0.0, 0.0, 0.0, 0.0]
[0.0, 0.0, 1.0, 0.0, 0.0]
[0.0, 1.0, 0.0, 0.0, 0.0]
If you still want to convert between 1-character strings and characters, you can do it this way:
julia> str="a"; first(str)
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
julia> chr='a'; string(chr)
"a"

To convert length 1 string to char, reference the string's first char with [1].
To convert char to string, use string().
julia> s = "c"
"c"
julia> s[1]
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
julia> string(s)
"c"

Related

How to append to an empty list in Julia?

I want to create an empty lsit and gardually fill that out with tuples. I've tried the following and each returns an error. My question is: how to append or add and element to an empty array?
My try:
A = []
A.append((2,5)) # return Error type Array has no field append
append(A, (2,5)) # ERROR: UndefVarError: append not defined
B = Vector{Tuple{String, String}}
# same error occues
You do not actually want to append, you want to push elements into your vector. To do that use the function push! (the trailing ! indicates that the function modifies one of its input arguments. It's a naming convention only, the ! doesn't do anything).
I would also recommend creating a typed vector instead of A = [], which is a Vector{Any} with poor performance.
julia> A = Tuple{Int, Int}[]
Tuple{Int64, Int64}[]
julia> push!(A, (2,3))
1-element Vector{Tuple{Int64, Int64}}:
(2, 3)
julia> push!(A, (11,3))
2-element Vector{Tuple{Int64, Int64}}:
(2, 3)
(11, 3)
For the vector of string tuples, do this:
julia> B = Tuple{String, String}[]
Tuple{String, String}[]
julia> push!(B, ("hi", "bye"))
1-element Vector{Tuple{String, String}}:
("hi", "bye")
This line in your code is wrong, btw:
B = Vector{Tuple{String, String}}
It does not create a vector, but a type variable. To create an instance you can write e.g. one of these:
B = Tuple{String, String}[]
B = Vector{Tuple{String,String}}() # <- parens necessary to construct an instance
It can also be convenient to use the NTuple notation:
julia> NTuple{2, String} === Tuple{String, String}
true
julia> NTuple{3, String} === Tuple{String, String, String}
true

Filtering a dictionary in julia

I want to filter a dictionary using filter() function but I am having trouble with it. What I wish to accomplish is, to return the key for some condition of the value. However I am getting a method error
using Agents: AbstractAgent
# Define types
mutable struct Casualty <: AbstractAgent
id::Int
ts::Int
rescued::Bool
function Casualty(id,ts; rescued = false)
new(id,ts,rescued)
end
end
mutable struct Rescuer <: AbstractAgent
id::Int
actions::Int
dist::Float64
function Rescuer(id; action = rand(1:3) , dist = rand(1)[1])
new(id,action,dist)
end
end
cas1 = Casualty(1,2)
cas2 = Casualty(2,3)
resc1 = Rescuer(3)
agents = Dict(1=> cas1, 2 => cas2, 3 => resc1)
Now to filter
filter((k,v) -> v isa Casualty, agents)
# ERROR: MethodError: no method matching (::var"#22#23")(::Pair{Int64, AbstractAgent})
# what I truly wish to achieve is return the key for some condition of the value
filter((k,v) -> k ? v isa Casualty : "pass", agents)
# ofcourse I am not sure how to "pass" using this format
Any idea how I can achieve this. Thanks
For dictionaries filter gets a key-value pair, so do either (destructuring Pair):
julia> dict = Dict(1=>"a", 2=>"b", 3=>"c")
Dict{Int64, String} with 3 entries:
2 => "b"
3 => "c"
1 => "a"
julia> filter(((k,v),) -> k == 1 || v == "c", dict)
Dict{Int64, String} with 2 entries:
3 => "c"
1 => "a"
or for example (getting Pair as a whole):
julia> filter(p -> first(p) == 1 || last(p) == "c", dict)
Dict{Int64, String} with 2 entries:
3 => "c"
1 => "a"
julia> filter(p -> p[1] == 1 || p[2] == "c", dict)
Dict{Int64, String} with 2 entries:
3 => "c"
1 => "a"
EDIT
Explanation why additional parentheses are needed:
julia> f = (x, y) -> (x, y)
#1 (generic function with 1 method)
julia> g = ((x, y),) -> (x, y)
#3 (generic function with 1 method)
julia> methods(f)
# 1 method for anonymous function "#1":
[1] (::var"#1#2")(x, y) in Main at REPL[1]:1
julia> methods(g)
# 1 method for anonymous function "#3":
[1] (::var"#3#4")(::Any) in Main at REPL[2]:1
julia> f(1, 2)
(1, 2)
julia> f((1, 2))
ERROR: MethodError: no method matching (::var"#1#2")(::Tuple{Int64, Int64})
Closest candidates are:
(::var"#1#2")(::Any, ::Any) at REPL[1]:1
julia> g(1, 2)
ERROR: MethodError: no method matching (::var"#3#4")(::Int64, ::Int64)
Closest candidates are:
(::var"#3#4")(::Any) at REPL[2]:1
julia> g((1, 2))
(1, 2)
As you can see f takes 2 positional argument, while g takes one positional argument that gets destructured (i.e. the assumption is that argument passed to g is iterable and has at least 2 elements).
See also https://docs.julialang.org/en/v1/manual/functions/#Argument-destructuring.
Now comes the tricky part:
julia> h1((x, y)) = (x, y)
h1 (generic function with 1 method)
julia> methods(h1)
# 1 method for generic function "h1":
[1] h1(::Any) in Main at REPL[1]:1
julia> h2 = ((x, y)) -> (x, y)
#1 (generic function with 1 method)
julia> methods(h2)
# 1 method for anonymous function "#1":
[1] (::var"#1#2")(x, y) in Main at REPL[3]:1
In this example h1 is a named function. In this case it is enough to just wrap arguments in extra parentheses to get destructuring behavior. For anonymous functions, because of how Julia parser works an extra , is needed - if you omit it the extra parentheses are ignored.
Now let us check filter docstring:
filter(f, d::AbstractDict)
Return a copy of d, removing elements for which f is false.
The function f is passed key=>value pairs.
As you can see from this docstring f is passed a single argument that is Pair. That is why you need to use either destructuring or define a single argument function and extract its elements inside the function.
The right syntax is:
filter(((k,v),) -> v isa Casualty, agents)
which prints
julia> filter(((k,v),) -> v isa Casualty, agents)
Dict{Int64, AbstractAgent} with 2 entries:
2 => Casualty(2, 3, false)
1 => Casualty(1, 2, false)
About the problem of only getting involved keys... I have no idea beside:
julia> filter(((k,v),) -> v isa Casualty, agents) |> keys
which prints
julia> filter(((k,v),) -> v isa Casualty, agents) |> keys
KeySet for a Dict{Int64, AbstractAgent} with 2 entries. Keys:
2
1

Julia & Avro.jl : Issue with tuples

I am trying to get Avro working in Julia and having some real issues. It is important for my application that I use a row-oriented data format to which I can append a hierarchical data structure row by row as they are generated.
Avro seems like a good fit. But I am having issues in Julia. I have things working in Python test, but I need to be in Julia as the main code is in julia.
Here are my simplified test examples which show my issue. The first one works, the rest don't. Any help would be appreciated. The second gives the wrong answer. The rest give errors.
import Avro
v1=Dict("RUTHERFORD" => 7, "DURHAM" => 11)
buf=Avro.write(v1)
Avro.read(buf,typeof(v1))
output:
Dict{String, Int64} with 2 entries:
"DURHAM" => 11
"RUTHERFORD" => 7
example 2:
#show v3=Dict((5,2) => 7, (5,4) => 11)
#show typeof(v3)
buf=Avro.write(v3)
Avro.read(buf,typeof(v3))
output:
v3 = Dict((5, 2) => 7, (5, 4) => 11) = Dict((5, 2) => 7, (5, 4) => 11)
typeof(v3) = Dict{Tuple{Int64, Int64}, Int64}
Dict{Tuple{Int64, Int64}, Int64} with 1 entry:
(40, 53) => 11
example 3:
#show v2=Dict(("jcm",2) => 7, ("sem",4) => 11)
#show typeof(v2)
buf=Avro.write(v2)
v2o=Avro.read(buf,typeof(v2))
output:
v2 = Dict(("jcm", 2) => 7, ("sem", 4) => 11) = Dict(("sem", 4) => 11, ("jcm", 2) => 7)
typeof(v2) = Dict{Tuple{String, Int64}, Int64}
MethodError: Cannot `convert` an object of type Char to an object of type String
Closest candidates are:
convert(::Type{String}, ::String) at essentials.jl:210
convert(::Type{T}, ::T) where T<:AbstractString at strings/basic.jl:231
convert(::Type{T}, ::AbstractString) where T<:AbstractString at strings/basic.jl:232
...
Stacktrace:
[1] _totuple
# ./tuple.jl:316 [inlined]
[2] Tuple{String, Int64}(itr::String)
# Base ./tuple.jl:303
[3] construct(T::Type, args::String; kw::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
# StructTypes ~/.julia/packages/StructTypes/NJXhA/src/StructTypes.jl:310
[4] construct(T::Type, args::String)
# StructTypes ~/.julia/packages/StructTypes/NJXhA/src/StructTypes.jl:310
[5] construct(::Type{Tuple{String, Int64}}, ptr::Ptr{UInt8}, len::Int64; kw::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
# StructTypes ~/.julia/packages/StructTypes/NJXhA/src/StructTypes.jl:435
[6] construct(::Type{Tuple{String, Int64}}, ptr::Ptr{UInt8}, len::Int64)
# StructTypes ~/.julia/packages/StructTypes/NJXhA/src/StructTypes.jl:435
[7] readvalue(B::Avro.Binary, #unused#::Avro.StringType, #unused#::Type{Tuple{String, Int64}}, buf::Vector{UInt8}, pos::Int64, len::Int64, opts::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
# Avro ~/.julia/packages/Avro/JEoRa/src/types/binary.jl:247
[8] readvalue(B::Avro.Binary, MT::Avro.MapType, #unused#::Type{Dict{Tuple{String, Int64}, Int64}}, buf::Vector{UInt8}, pos::Int64, buflen::Int64, opts::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
# Avro ~/.julia/packages/Avro/JEoRa/src/types/maps.jl:63
[9] read(buf::Vector{UInt8}, ::Type{Dict{Tuple{String, Int64}, Int64}}; schema::Avro.MapType, jsonencoding::Bool, kw::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
# Avro ~/.julia/packages/Avro/JEoRa/src/types/binary.jl:58
[10] read(buf::Vector{UInt8}, ::Type{Dict{Tuple{String, Int64}, Int64}})
# Avro ~/.julia/packages/Avro/JEoRa/src/types/binary.jl:58
[11] top-level scope
# In[209]:5
[12] eval
# ./boot.jl:360 [inlined]
[13] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
# Base ./loading.jl:1094
Last example:
v=Dict(("RUTHERFORD", "05A", "371619611022065") => 7, ("DURHAM", "28","jcm") => 11)
buf=Avro.write(v)
vo=Avro.read(buf,typeof(v))
output:
MethodError: Cannot `convert` an object of type Char to an object of type String
Closest candidates are:
convert(::Type{String}, ::String) at essentials.jl:210
convert(::Type{T}, ::T) where T<:AbstractString at strings/basic.jl:231
convert(::Type{T}, ::AbstractString) where T<:AbstractString at strings/basic.jl:232
...
Stacktrace:
[1] _totuple
# ./tuple.jl:316 [inlined]
[2] Tuple{String, String, String}(itr::String)
# Base ./tuple.jl:303
[3] construct(T::Type, args::String; kw::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
# StructTypes ~/.julia/packages/StructTypes/NJXhA/src/StructTypes.jl:310
[4] construct(T::Type, args::String)
# StructTypes ~/.julia/packages/StructTypes/NJXhA/src/StructTypes.jl:310
[5] construct(::Type{Tuple{String, String, String}}, ptr::Ptr{UInt8}, len::Int64; kw::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
# StructTypes ~/.julia/packages/StructTypes/NJXhA/src/StructTypes.jl:435
[6] construct(::Type{Tuple{String, String, String}}, ptr::Ptr{UInt8}, len::Int64)
# StructTypes ~/.julia/packages/StructTypes/NJXhA/src/StructTypes.jl:435
[7] readvalue(B::Avro.Binary, #unused#::Avro.StringType, #unused#::Type{Tuple{String, String, String}}, buf::Vector{UInt8}, pos::Int64, len::Int64, opts::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
# Avro ~/.julia/packages/Avro/JEoRa/src/types/binary.jl:247
[8] readvalue(B::Avro.Binary, MT::Avro.MapType, #unused#::Type{Dict{Tuple{String, String, String}, Int64}}, buf::Vector{UInt8}, pos::Int64, buflen::Int64, opts::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
# Avro ~/.julia/packages/Avro/JEoRa/src/types/maps.jl:63
[9] read(buf::Vector{UInt8}, ::Type{Dict{Tuple{String, String, String}, Int64}}; schema::Avro.MapType, jsonencoding::Bool, kw::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
# Avro ~/.julia/packages/Avro/JEoRa/src/types/binary.jl:58
[10] read(buf::Vector{UInt8}, ::Type{Dict{Tuple{String, String, String}, Int64}})
# Avro ~/.julia/packages/Avro/JEoRa/src/types/binary.jl:58
[11] top-level scope
# In[210]:3
[12] eval
# ./boot.jl:360 [inlined]
[13] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
# Base ./loading.jl:1094
What is going wrong?
Avro.jl is unable to properly read from the buffer into a Dict (or, as Avro calls it, a "Map") that uses a Tuple as a key because, according to the Avro specification:
Map keys are assumed to be strings.
This assumption is hard-coded into Avro.jl: no matter what the actual type of the Dict keys are, the code forces the key to be a String. Avro.jl does not bother to check that the key is actually a subtype of String because as long as the type can be converted to a String via the Base.string method, the code will write that string representation to the buffer. And that is exactly what is happening when you write a Dict with Tuple keys:
v = Dict((1,2) => 3)
buf = Avro.write(v)
Char.(buf)
This decodes the bytes in buf as ASCII/Unicode characters and prints them to the REPL. You should see the string representation of the Tuple (1,2) in there encoded as "(1, 2)":
11-element Vector{Char}:
'\x01': ASCII/Unicode U+0001 (category Cc: Other, control)
'\x10': ASCII/Unicode U+0010 (category Cc: Other, control)
'\f': ASCII/Unicode U+000C (category Cc: Other, control)
'(': ASCII/Unicode U+0028 (category Ps: Punctuation, open)
'1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)
',': ASCII/Unicode U+002C (category Po: Punctuation, other)
' ': ASCII/Unicode U+0020 (category Zs: Separator, space)
'2': ASCII/Unicode U+0032 (category Nd: Number, decimal digit)
')': ASCII/Unicode U+0029 (category Pe: Punctuation, close)
'\x06': ASCII/Unicode U+0006 (category Cc: Other, control)
'\0': ASCII/Unicode U+0000 (category Cc: Other, control)
The problem arises when you try to read that key back into a Tuple. When reading a key of a Map element, Avro.jl will try to read whatever is in the buffer as a String and stuff it into whatever type the key is. If the type is a Tuple of N types that can be constructed from UInt8 values (eltype(buf)), then the next N UInt8 values in the buffer will be used to create the key:
Avro.read(buf, typeof(v))
# Dict{Tuple{Int64, Int64}, Int64} with 1 entry:
# (40, 49) => 3
Why 40 and 49? Because those are the Int64 representations of the Chars '(' and '1', respectively:
Char(40)
# '(': ASCII/Unicode U+0028 (category Ps: Punctuation, open)
Char(49)
# '1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)
Note that this is why your second example is only reading one element in the Dict even though two are written. The two-element Tuple that is being parsed as the key is only reading the first to characters of the string representation, which are both '(' and '5' in your example. The Dict cannot have duplicate keys, so the second value simply overwrites the first.
How to fix it
Avoid using non-strings as keys
Because the Avro specifications specifically state that the key of a Map is assumed to be a string, you should probably follow the specification and avoid using non-strings as keys. In my opinion, Avro.jl should not let the user write a Dict with keys that are not subtypes of AbstractString. Maybe that's a design choice, or maybe that's a bug, but it might be worth filing an issue on the project page just in case.
Use a custom type as a key
If you really, really want to use something other than a String as a key, Avro.jl will always convert the key to a String when it serializes a Map to a buffer using the Base.string method. During deserialization, if the code recognizes the key as a struct, it will try to pass the serialized String to the struct's constructor. Therefore all you have to do is define a custom struct with a constructor that takes a String and make it do the right thing (and optionally overload the Base.string method). Here's an example:
struct XY
x::Int64
y::Int64
end
function XY(s::String)
# parse the default string representation of an XY value
# very inefficient: for demonstration purposes only
m = match(r"XY\((\d+), (\d+)\)", s)
XY(parse.(Int64, m.captures)...)
end
v2 = Dict(XY(1,2) => 3)
buf2 = Avro.write(v2)
Avro.read(buf2, typeof(v2)
# Dict{XY, Int64} with 1 entry:
# XY(1, 2) => 3
Write your own Tuple construct method
If you really, really, really want to use a Tuple as a key, you can take advantage of StructType.StringType and define your own StructType.construct method. Because Avro.jl uses the unsafe pointer version, you're stuck defining the same for your Tuple. Here is an awkward example:
function StructTypes.construct(::Type{Tuple{Int64,Int64}}, ptr::Ptr{UInt8}, len::Int; kw...)
arr = unsafe_wrap(Vector{UInt8}, ptr, len)
s = join(Char.(arr))
m = findall(r"\d+", s)
(parse(Int64, s[m[1]]), parse(Int64, s[m[2]]))
end
Avro.read(buf, typeof(v))
# Dict{Tuple{Int64, Int64}, Int64} with 1 entry:
# (1, 2) => 3
For the curious: why does Avro.jl get the value right, even if the key is parsed incorrectly?
In Avro's binary encoding scheme, strings are serialized with their lengths stored at the beginning of the string. This allows Avro.jl to pass the known length of the string key to the pointer-based StructTypes.construct method, which passes an Array{UInt8,1} to the Tuple constructor. A fun fact about Julia is that the iterable-based constructor for a Tuple will only read as many elements from the iterable as necessary to construct the Tuple, then stop. Example:
Tuple{Int64, Int64}([1,2,3,4])
# (1, 2)
So Avro.jl passes a 6-element Array{UInt8,1} (['(', '1', ',', ' ', '2', ')']) to the constructor of Tuple{Int64,Int64} which in turn reads only the first two elements, then returns the Tuple for Avro.jl to use as the key of the Map element. Avro.jl then skips ahead to where it knows the string ends (remember: it stores the length of the string in the buffer) and starts reading there for the value of the Map element. Avro.jl knows that value should be an Int64, and it knows how to parse an Int64, so it reads the appropriate value. Neat!

How to append a value to a nested dictionary?

I'm trying to create a list nested within a dictionary and append values to it. In python, I would have written the following:
samples = {'x' : [1], 'y' : [-1]}
and to append values in a for-loop:
samples['x'].append(new_value)
How can I achieve something equivalent in Julia?
Here it is:
julia> samples = Dict("x" => [1], "y" => [-1])
Dict{String, Vector{Int64}} with 2 entries:
"x" => [1]
"y" => [-1]
julia> push!(samples["x"],4);
julia> samples
Dict{String, Vector{Int64}} with 2 entries:
"x" => [1, 4]
"y" => [-1]
Perhaps in Julia one would consider Symbols as keys instead of Strings so it could be samples = Dict(:x => [1], :y => [-1])
Finally, if you know that the keys are only x and y you would use a NamedTuple:
julia> samples2 = (x = [1], y = [-1])
(x = [1], y = [-1])
julia> typeof(samples2)
NamedTuple{(:x, :y), Tuple{Vector{Int64}, Vector{Int64}}}
julia> push!(samples2.x, 111);
julia> samples2
(x = [1, 111], y = [-1])

A function or a macro for retrieving attributes of annotated strings

I have strings with annotated attributes. You can think of them as XML-document strings, but with custom syntax of annotation.
Attributes in a string are encoded as follows:
#<atr_name>=<num_of_chars>:<atr_value>\n
where
<atr_name> is a name of the attribute
<atr_value> is a value of the attribute
<num_of_chars> is a character length of the <atr_value>
That is attribute name is prefixed with # and postfixed with =, then followed by number that indicates number of characters in the value of the attribute, then followed by :, then followed by the attribute's value itself, and then followed by with newline character \n
Here is one example:
julia> string_with_attributes = """
some text
...
#name=6:Azamat
...
#year=4:2016
...
some other text
"""
Now I want to write a function or a macro that would allow me to call as:
julia> string_with_attributes["name"]
"Azamat"
julia> string_with_attributes["year"]
"2016"
julia>
Any ideas on how to do this?
Following #Gnimuc answer, you could make your own string macro AKA non standard string literal if that suit your needs, ie:
julia> function attr_str(s::S)::Dict{S, S} where {S <: AbstractString}
d = Dict{S, S}()
for i in eachmatch(r"(?<=#)\b.*(?==).*(?=\n)", s)
push!(d, match(r".*(?==)", i.match).match => match(r"(?<=:).*", i.match).match)
end
push!(d, "string" => s)
return d
end
attr_str (generic function with 1 method)
julia> macro attr_str(s::AbstractString)
:(attr_str($s))
end
#attr_str (macro with 1 method)
julia> attr"""
some text
dgdfg:dgdf=ert
#name=6:Azamat
all34)%(*)#:DG:Ko_=ddhaogj;ldg
#year=4:2016
#dkgjdlkdag:dfgdfgd
some other text
"""
Dict{String,String} with 3 entries:
"name" => "Azamat"
"string" => "some text\ndgdfg:dgdf=ert\n#name=6:Azamat\nall34)%(*)#:DG:Ko_=ddhaogj;ldg\n#year=4:2016\n#dkgjdlkdag:dfgdfgd\nsome other text\n"
"year" => "2016"
julia>
seems like a job for regex:
julia> string_with_attributes = """
some text
dgdfg:dgdf=ert
#name=6:Azamat
all34)%(*)#:DG:Ko_=ddhaogj;ldg
#year=4:2016
#dkgjdlkdag:dfgdfgd
some other text
"""
"some text\ndgdfg:dgdf=ert\n#name=6:Azamat\nall34)%(*)#:DG:Ko_=ddhaogj;ldg\n#year=4:2016\n#dkgjdlkdag:dfgdfgd\nsome other text\n"
julia> s = Dict()
Dict{Any,Any} with 0 entries
julia> for i in eachmatch(r"(?<=#)\b.*(?==).*(?=\n)", string_with_attributes)
push!(s, match(r".*(?==)", i.match).match => match(r"(?<=:).*", i.match).match)
end
julia> s
Dict{Any,Any} with 2 entries:
"name" => "Azamat"
"year" => "2016"
So, turns out what I needed was to extend the Base.getindex method from Indexing interface.
Here is the solution that I ended up doing:
julia>
function Base.getindex(object::S, attribute::AbstractString) where {S <: AbstractString}
m = match( Regex("#$(attribute)=(\\d*):(.*)\n"), object )
(typeof(m) == Void) && error("$(object) has no attribute with the name $(attribute)")
return m.captures[end]::SubString{S}
end
julia> string_with_attributes = """
some text
dgdfg:dgdf=ert
#name=6:Azamat
all34)%(*)#:DG:Ko_=ddhaogj;ldg
#year=4:2016
#dkgjdlkdag:dfgdfgd
some other text
"""
julia> string_with_attributes["name"]
"Azamat"
julia> string_with_attributes["year"]
"2016"

Resources