How do I organize complex data in julia - julia

I decided to try julia a few days ago and tried to translate one of my python projects to julia. I understand that using the type system is crucial for good performance. However, I have something like this in python:
class Phonon(object):
# it has an attribute called D which looks like
# D = {'on_site': [D00, D11, D22, D33 ...], 'lead':{'l': [Dl00, Dl01, Dl11], 'r': [Dr00, Dr01, Dr11]},'couple': [D01, D12, D23 ...], 'lead_center':{'l': Dlcl, 'r': Dlcr}}
# all D00, D11, D22 matrices are numpy arrays
If I translate this into julia, it would be:
type Phonon:
D::Dict{ASCIIString, Any}
end
It seems that compiler cannot get much infomation about what phonons are. So my question is: How do julia people organize their complex data?

if i understood you properly, you may want something like this:
type PhononDict{T<:Number}
on_site::Vector{Matrix{T}}
lead::Dict{ASCIIString, Vector{Matrix{T}}}
couple::Vector{Matrix{T}}
lead_center::Dict{ASCIIString, Matrix{T}}
end
i assumed that the element type of your numpy array <: Number, you can adjust it to something like T<:Union{Int64, Float64} instead.
the key problem here is lead::Dict, so D::Dict{ASCIIString, Any}:
julia> typejoin(Array, Dict)
Any
i suggest to change D into a composite type and then you can pass more info to compiler. more information concerning parametric types.

Related

Creating custom types in Julia

In Julia, how do I create custom types MyOrderedDictA and MyOrderedDictB such that:
Each has all the functionality of an OrderdDict, and can be passed to any function that accepts AbstractDicts
They are distinct from each other, so that I can take advantage of multiple dispatch.
I suspect\hope this is straightforward, but haven’t been able to figure it out.
Basically, what you have to do is to define your type MyOrderedDictA, wrapping a regular OrderedDict, and forward all functions that one can apply to an OrderedDict to this wrapped dict.
Unfortunately, the AbstractDict interface is (to my knowledge) currently not documented (cf. AbstractArray). You could look at their definition and check which functions are defined for them. Alternatively, there is the more practical approach to just use your MyOrderedDictA and whenever you get an error message, because a function is not defined, you forward this function "on-the-fly".
In any case, using the macro #forward from Lazy.jl you can do something along the lines of the following.
using Lazy
struct MyOrderedDictA{T,S} <: AbstractDict{T,S}
dict::OrderedDict{T,S}
end
MyOrderedDictA{T,S}(args...; kwargs...) where {T,S} = new{T,S}(OrderedDict{T,S}(args...; kwargs...))
function MyOrderedDictA(args...; kwargs...)
d = OrderedDict(args...; kwargs...)
MyOrderedDictA{keytype(d),valtype(d)}(d)
end
#forward MyOrderedDictA.dict (Base.length, Base.iterate, Base.getindex, Base.setindex!)
d = MyOrderedDictA(2=>1, 1=>2)
Others will be better placed to answer this, but a quick take:
For this you will need to look at the OrderedDict implementation, and specifically which methods are defined for OrderedDicts. If you want to be able to pass it to methods accepting AbstractDicts you need to subtype it like struct MyDictA{T, S} <: AbstractDict{T, S}
If you define two structs they will automatically be discting from each other!? (I might be misunderstanding the question here)

Initialise empty\undef Dict while using abstract types

I want to generate a Dict with undef values so I can later loop over the keys and fill in correct values. I can initialise an such a Dict using concrete types in the following way and it all works fine:
currencies = ["USD", "AUD", "GBP"]
struct example
num::Float64
end
undef_array = Array{example}(undef,3)
Dict{String,example}(zip(currencies, undef_array))
When my struct has an abstract type however I can still generate the undef array but I cannot create the dict. I get an error "UndefRefError: access to undefined reference"
abstract type abstract_num end
struct example2
num::abstract_num
end
undef_array = Array{example2}(undef,3)
Dict{String,example2}(zip(currencies, undef_array))
Although it is possible to create such a Dict with a concrete array:
struct numnum <: abstract_num
num::Float64
end
def_array = [example2(numnum(5.0)), example2(numnum(6.0)), example2(numnum(4.5))]
Dict{String,example2}(zip(currencies, def_array))
Question
My question is whether it is possible to generate a Dict with undef values of a type that relies on an abstract type? Is it is possible what is the best way to do it?
In your second (not working) example, undef_array is an array whos elements aren't initialized:
julia> undef_array = Array{example2}(undef,3)
3-element Array{example2,1}:
#undef
#undef
#undef
The reason is that it's not possible to instantiate an object of type example2 because your abstract type abstract_num (the type of the field of example2) doesn't have any concrete subtypes and, thus, can't be instantiated either. As a consequence even indexing undef_array[1] gives an UndefRefError and, hence, also zip won't work.
Compare this to the first case where the array elements are (arbitrarily) initialized:
julia> undef_array = Array{example}(undef,3)
3-element Array{example,1}:
example(1.17014136e-315)
example(1.17014144e-315)
example(1.17014152e-315)
and undef_array[1] works just fine.
Having said that, I'm not really sure what you try to achieve here. Why not just create a mydict = Dict{String, example2}() and fill it with content when the time comes? (As said above, you would have to define concrete subtypes of abstract_num first)
For performance reasons you should, in general, avoid creating types with fields of an abstract type.
Try:
a=Dict{String,Union{example3,UndefInitializer}}(currencies .=>undef)
However, for representing missing values the type Missing is usually more appropriate:
b=Dict{String,Union{example3,Missing}}(currencies .=>missing)
Please note that typeof(undef) yields UndefInitializer while typeof(missing) yields Missing - hence the need for Union types in the Dict. The dot (.) you can see above (.=>) is the famous Julia dot operator.
Moreover, I recommend to keep to Julia's naming conversion - struct and DataType names should start with a Capital Letter.
Last but not least, in your first example where concrete type Float64 was given, Julia has allocated the array to some concrete address in memory - beware that it can contain some garbage data (have a look at console log below):
julia> undef_array = Array{example}(undef,3)
3-element Array{example,1}:
example(9.13315366e-316)
example(1.43236026e-315)
example(1.4214423e-316)

Immutable dictionary

Is there a way to enforce a dictionary being constant?
I have a function which reads out a file for parameters (and ignores comments) and stores it in a dict:
function getparameters(filename::AbstractString)
f = open(filename,"r")
dict = Dict{AbstractString, AbstractString}()
for ln in eachline(f)
m = match(r"^\s*(?P<key>\w+)\s+(?P<value>[\w+-.]+)", ln)
if m != nothing
dict[m[:key]] = m[:value]
end
end
close(f)
return dict
end
This works just fine. Since i have a lot of parameters, which i will end up using on different places, my idea was to let this dict be global. And as we all know, global variables are not that great, so i wanted to ensure that the dict and its members are immutable.
Is this a good approach? How do i do it? Do i have to do it?
Bonus answerable stuff :)
Is my code even ok? (it is the first thing i did with julia, and coming from c/c++ and python i have the tendencies to do things differently.) Do i need to check whether the file is actually open? Is my reading of the file "julia"-like? I could also readall and then use eachmatch. I don't see the "right way to do it" (like in python).
Why not use an ImmutableDict? It's defined in base but not exported. You use one as follows:
julia> id = Base.ImmutableDict("key1"=>1)
Base.ImmutableDict{String,Int64} with 1 entry:
"key1" => 1
julia> id["key1"]
1
julia> id["key1"] = 2
ERROR: MethodError: no method matching setindex!(::Base.ImmutableDict{String,Int64}, ::Int64, ::String)
in eval(::Module, ::Any) at .\boot.jl:234
in macro expansion at .\REPL.jl:92 [inlined]
in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at .\event.jl:46
julia> id2 = Base.ImmutableDict(id,"key2"=>2)
Base.ImmutableDict{String,Int64} with 2 entries:
"key2" => 2
"key1" => 1
julia> id.value
1
You may want to define a constructor which takes in an array of pairs (or keys and values) and uses that algorithm to define the whole dict (that's the only way to do so, see the note at the bottom).
Just an added note, the actual internal representation is that each dictionary only contains one key-value pair, and a dictionary. The get method just walks through the dictionaries checking if it has the right value. The reason for this is because arrays are mutable: if you did a naive construction of an immutable type with a mutable field, the field is still mutable and thus while id["key1"]=2 wouldn't work, id.keys[1]=2 would. They go around this by not using a mutable type for holding the values (thus holding only single values) and then also holding an immutable dict. If you wanted to make this work directly on arrays, you could use something like ImmutableArrays.jl but I don't think that you'd get a performance advantage because you'd still have to loop through the array when checking for a key...
First off, I am new to Julia (I have been using/learning it since only two weeks). So do not put any confidence in what I am going to say unless it is validated by others.
The dictionary data structure Dict is defined here
julia/base/dict.jl
There is also a data structure called ImmutableDict in that file. However as const variables aren't actually const why would immutable dictionaries be immutable?
The comment states:
ImmutableDict is a Dictionary implemented as an immutable linked list,
which is optimal for small dictionaries that are constructed over many individual insertions
Note that it is not possible to remove a value, although it can be partially overridden and hidden
by inserting a new value with the same key
So let us call what you want to define as a dictionary UnmodifiableDict to avoid confusion. Such object would probably have
a similar data structure as Dict.
a constructor that takes a Dict as input to fill its data structure.
specialization (a new dispatch?) of the the method setindex! that is called by the operator [] =
in order to forbid modification of the data structure. This should be the case of all other functions that end with ! and hence modify the data.
As far as I understood, It is only possible to have subtypes of abstract types. Therefore you can't make UnmodifiableDict as a subtype of Dict and only redefine functions such as setindex!
Unfortunately this is a needed restriction for having run-time types and not compile-time types. You can't have such a good performance without a few restrictions.
Bottom line:
The only solution I see is to copy paste the code of the type Dict and its functions, replace Dict by UnmodifiableDict everywhere and modify the functions that end with ! to raise an exception if called.
you may also want to have a look at those threads.
https://groups.google.com/forum/#!topic/julia-users/n-lqjybIO_w
https://github.com/JuliaLang/julia/issues/1974
REVISION
Thanks to Chris Rackauckas for pointing out the error in my earlier response. I'll leave it below as an illustration of what doesn't work. But, Chris is right, the const declaration doesn't actually seem to improve performance when you feed the dictionary into the function. Thus, see Chris' answer for the best resolution to this issue:
D1 = [i => sind(i) for i = 0.0:5:3600];
const D2 = [i => sind(i) for i = 0.0:5:3600];
function test(D)
for jdx = 1:1000
# D[2] = 2
for idx = 0.0:5:3600
a = D[idx]
end
end
end
## Times given after an initial run to allow for compiling
#time test(D1); # 0.017789 seconds (4 allocations: 160 bytes)
#time test(D2); # 0.015075 seconds (4 allocations: 160 bytes)
Old Response
If you want your dictionary to be a constant, you can use:
const MyDict = getparameters( .. )
Update Keep in mind though that in base Julia, unlike some other languages, it's not that you cannot redefine constants, instead, it's just that you get a warning when doing so.
julia> const a = 2
2
julia> a = 3
WARNING: redefining constant a
3
julia> a
3
It is odd that you don't get the constant redefinition warning when adding a new key-val pair to the dictionary. But, you still see the performance boost from declaring it as a constant:
D1 = [i => sind(i) for i = 0.0:5:3600];
const D2 = [i => sind(i) for i = 0.0:5:3600];
function test1()
for jdx = 1:1000
for idx = 0.0:5:3600
a = D1[idx]
end
end
end
function test2()
for jdx = 1:1000
for idx = 0.0:5:3600
a = D2[idx]
end
end
end
## Times given after an initial run to allow for compiling
#time test1(); # 0.049204 seconds (1.44 M allocations: 22.003 MB, 5.64% gc time)
#time test2(); # 0.013657 seconds (4 allocations: 160 bytes)
To add to the existing answers, if you like immutability and would like to get performant (but still persistent) operations which change and extend the dictionary, check out FunctionalCollections.jl's PersistentHashMap type.
If you want to maximize performance and take maximal advantage of immutability, and you don't plan on doing any operations on the dictionary whatsoever, consider implementing a perfect hash function-based dictionary. In fact, if your dictionary is a compile-time constant, these can even be computed ahead of time (using metaprogramming) and precompiled.

Assert type information onto computed results in Julia

Problem
I read in an array of strings from a file.
julia> file = open("word-pairs.txt");
julia> lines = readlines(file);
But Julia doesn't know that they're strings.
julia> typeof(lines)
Array{Any,1}
Question
Can I tell Julia this somehow?
Is it possible to insert type information onto a computed result?
It would be helpful to know the context where this is an issue, because there might be a better way to express what you need - or there could be a subtle bug somewhere.
Can I tell Julia this somehow?
No, because the readlines function explicitly creates an Any array (a = {}): https://github.com/JuliaLang/julia/blob/master/base/io.jl#L230
Is it possible to insert type information onto a computed result?
You can convert the array:
r = convert(Array{ASCIIString,1}, w)
Or, create your own readstrings function based on the link above, but using ASCIIString[] for the collection array instead of {}.
Isaiah is right about the limits of readlines. More generally, often you can say
n = length(A)::Int
when generic type inference fails but you can guarantee the type in your particular case.
As of 0.3.4:
julia> typeof(lines)
Array{Union(ASCIIString,UTF8String),1}
I just wanted to warn against:
convert(Array{ASCIIString,1}, lines)
that can fail (for non-ASCII) while I guess, in this case nothing needs to be done, this should work:
convert(Array{UTF8String,1}, lines)

How to create a collection in Julia?

This seems like a really basic question, but can't find the answer. How do I create a collection in Julia? For example, I want to open a text file and parse each line to create an (iterable or otherwise) collection. Obviously I don't know how many elements there are in advance.
I can iterate through the lines like this
I = each_line(open(fileName,"r"))
state = start(I)
while !done(I, state)
(i, state) = next(I, state)
println(i)
end
But I don't know how to put each i into an array or other collection. I tried
map( i -> println(i), each_line(open(fileName,"r") ) )
But got the error
no method map(Function,EachLine)
You could do this:
lines = String[]
for line in each_line(open(fileName))
push!(lines, line)
end
And then lines contains the list of lines. You need the String in the first line to make the array extensible.
Standard collections and supported operations are mainly covered in the standard library documentation.
Specifically, the Deques section covers all of the operations supported by the 1d Array type (vector), including push! and pop! as well as insertion, resizing, etc.
Omar's answer is correct, and I will just add a small qualification: String[] creates a 1d array of Strings. The same constructor syntax may be used for example to create Int[], Float[], or even Any[] vectors. The latter type may hold objects of any type.
Depending on your Julia version, you may also be able to write collect(eachline(open("LICENSE.md"))) or [eachline(open("LICENSE.md"))...]. I think these won't work in 0.1.x versions but will working in newer 0.2 development versions (which are recommended at this point – 0.2 is on its way soon).

Resources