Immutable dictionary - dictionary

Is there a way to enforce a dictionary being constant?
I have a function which reads out a file for parameters (and ignores comments) and stores it in a dict:
function getparameters(filename::AbstractString)
f = open(filename,"r")
dict = Dict{AbstractString, AbstractString}()
for ln in eachline(f)
m = match(r"^\s*(?P<key>\w+)\s+(?P<value>[\w+-.]+)", ln)
if m != nothing
dict[m[:key]] = m[:value]
end
end
close(f)
return dict
end
This works just fine. Since i have a lot of parameters, which i will end up using on different places, my idea was to let this dict be global. And as we all know, global variables are not that great, so i wanted to ensure that the dict and its members are immutable.
Is this a good approach? How do i do it? Do i have to do it?
Bonus answerable stuff :)
Is my code even ok? (it is the first thing i did with julia, and coming from c/c++ and python i have the tendencies to do things differently.) Do i need to check whether the file is actually open? Is my reading of the file "julia"-like? I could also readall and then use eachmatch. I don't see the "right way to do it" (like in python).

Why not use an ImmutableDict? It's defined in base but not exported. You use one as follows:
julia> id = Base.ImmutableDict("key1"=>1)
Base.ImmutableDict{String,Int64} with 1 entry:
"key1" => 1
julia> id["key1"]
1
julia> id["key1"] = 2
ERROR: MethodError: no method matching setindex!(::Base.ImmutableDict{String,Int64}, ::Int64, ::String)
in eval(::Module, ::Any) at .\boot.jl:234
in macro expansion at .\REPL.jl:92 [inlined]
in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at .\event.jl:46
julia> id2 = Base.ImmutableDict(id,"key2"=>2)
Base.ImmutableDict{String,Int64} with 2 entries:
"key2" => 2
"key1" => 1
julia> id.value
1
You may want to define a constructor which takes in an array of pairs (or keys and values) and uses that algorithm to define the whole dict (that's the only way to do so, see the note at the bottom).
Just an added note, the actual internal representation is that each dictionary only contains one key-value pair, and a dictionary. The get method just walks through the dictionaries checking if it has the right value. The reason for this is because arrays are mutable: if you did a naive construction of an immutable type with a mutable field, the field is still mutable and thus while id["key1"]=2 wouldn't work, id.keys[1]=2 would. They go around this by not using a mutable type for holding the values (thus holding only single values) and then also holding an immutable dict. If you wanted to make this work directly on arrays, you could use something like ImmutableArrays.jl but I don't think that you'd get a performance advantage because you'd still have to loop through the array when checking for a key...

First off, I am new to Julia (I have been using/learning it since only two weeks). So do not put any confidence in what I am going to say unless it is validated by others.
The dictionary data structure Dict is defined here
julia/base/dict.jl
There is also a data structure called ImmutableDict in that file. However as const variables aren't actually const why would immutable dictionaries be immutable?
The comment states:
ImmutableDict is a Dictionary implemented as an immutable linked list,
which is optimal for small dictionaries that are constructed over many individual insertions
Note that it is not possible to remove a value, although it can be partially overridden and hidden
by inserting a new value with the same key
So let us call what you want to define as a dictionary UnmodifiableDict to avoid confusion. Such object would probably have
a similar data structure as Dict.
a constructor that takes a Dict as input to fill its data structure.
specialization (a new dispatch?) of the the method setindex! that is called by the operator [] =
in order to forbid modification of the data structure. This should be the case of all other functions that end with ! and hence modify the data.
As far as I understood, It is only possible to have subtypes of abstract types. Therefore you can't make UnmodifiableDict as a subtype of Dict and only redefine functions such as setindex!
Unfortunately this is a needed restriction for having run-time types and not compile-time types. You can't have such a good performance without a few restrictions.
Bottom line:
The only solution I see is to copy paste the code of the type Dict and its functions, replace Dict by UnmodifiableDict everywhere and modify the functions that end with ! to raise an exception if called.
you may also want to have a look at those threads.
https://groups.google.com/forum/#!topic/julia-users/n-lqjybIO_w
https://github.com/JuliaLang/julia/issues/1974

REVISION
Thanks to Chris Rackauckas for pointing out the error in my earlier response. I'll leave it below as an illustration of what doesn't work. But, Chris is right, the const declaration doesn't actually seem to improve performance when you feed the dictionary into the function. Thus, see Chris' answer for the best resolution to this issue:
D1 = [i => sind(i) for i = 0.0:5:3600];
const D2 = [i => sind(i) for i = 0.0:5:3600];
function test(D)
for jdx = 1:1000
# D[2] = 2
for idx = 0.0:5:3600
a = D[idx]
end
end
end
## Times given after an initial run to allow for compiling
#time test(D1); # 0.017789 seconds (4 allocations: 160 bytes)
#time test(D2); # 0.015075 seconds (4 allocations: 160 bytes)
Old Response
If you want your dictionary to be a constant, you can use:
const MyDict = getparameters( .. )
Update Keep in mind though that in base Julia, unlike some other languages, it's not that you cannot redefine constants, instead, it's just that you get a warning when doing so.
julia> const a = 2
2
julia> a = 3
WARNING: redefining constant a
3
julia> a
3
It is odd that you don't get the constant redefinition warning when adding a new key-val pair to the dictionary. But, you still see the performance boost from declaring it as a constant:
D1 = [i => sind(i) for i = 0.0:5:3600];
const D2 = [i => sind(i) for i = 0.0:5:3600];
function test1()
for jdx = 1:1000
for idx = 0.0:5:3600
a = D1[idx]
end
end
end
function test2()
for jdx = 1:1000
for idx = 0.0:5:3600
a = D2[idx]
end
end
end
## Times given after an initial run to allow for compiling
#time test1(); # 0.049204 seconds (1.44 M allocations: 22.003 MB, 5.64% gc time)
#time test2(); # 0.013657 seconds (4 allocations: 160 bytes)

To add to the existing answers, if you like immutability and would like to get performant (but still persistent) operations which change and extend the dictionary, check out FunctionalCollections.jl's PersistentHashMap type.
If you want to maximize performance and take maximal advantage of immutability, and you don't plan on doing any operations on the dictionary whatsoever, consider implementing a perfect hash function-based dictionary. In fact, if your dictionary is a compile-time constant, these can even be computed ahead of time (using metaprogramming) and precompiled.

Related

Julia: Even-number datatype for functions

I have about 50 functions which should consume only even positive numbers. Right now I am checking each time with an "if" whether the number put in is zero or not:
function grof(x::Int)
if (x % 2) == 0
println("good")
else
throw("x is not an even number!!!!!!!!!!!!! Stupid programmer!")
end
end
Ideally, I would like to have a datatype which produces this automatically, i.e.
function grof(x::EvenInt)
println("good")
end
However, I am not able to produce this datatype by my own since I am unable to understand the documentary. Thanks for your help!
Best, v.
I don't think creating a type is warranted in such a situation: I would simply #assert that the condition is verified at the beginning of the function(s). (Funnily enough, checking the whether a number is even is the example that was chosen in the documentation to illustrate the effect of #assert)
For example:
julia> function grof(x::Int)
#assert iseven(x) "Stupid programmer!"
println("good")
end
grof (generic function with 1 method)
julia> grof(2)
good
julia> grof(3)
ERROR: AssertionError: Stupid programmer!
Stacktrace:
[1] grof(::Int64) at ./REPL[5]:2
[2] top-level scope at REPL[7]:1
EDIT: If you really want to create a type enforcing such a constraint, it is possible. The way to do this would be to
create a type (possibly subtyping one of the Number abstract types; maybe Signed)
define an inner constructor ensuring that such a type cannot hold an odd value
A very simple example to build upon would be along the lines of:
# A wrapper around an even integer value
struct EvenInt
val :: Int
# inner constructor
function EvenInt(val)
#assert iseven(val)
new(val)
end
end
# Accessor to the value of an EvenInt
val(x::EvenInt) = x.val
# A method working only on even numbers
grof(x::EvenInt) = println("good: $(val(x)) is even")
You'd use this like so:
julia> x = EvenInt(42)
EvenInt(42)
julia> grof(x)
good: 42 is even
julia> y = EvenInt(1)
ERROR: AssertionError: iseven(val)
Stacktrace:
[1] EvenInt(::Int64) at ./REPL[1]:5
[2] top-level scope at REPL[6]:1
but note that you can't do anything on EvenInts yet: you need to either unwrap them (using val() in this case), or define operations on them (a task which can be vastly simplified if you make EvenInt a subtype of one of the abstract number types and follow the relevant interface).
All integers multiplied by two are even, so redefine your function to take half the number it currently takes.
function grof2(halfx::Int)
x=2*halfx
println("good")
end

What does the "Base" keyword mean in Julia?

I saw this example in the Julia language documentation. It uses something called Base. What is this Base?
immutable Squares
count::Int
end
Base.start(::Squares) = 1
Base.next(S::Squares, state) = (state*state, state+1)
Base.done(S::Squares, s) = s > S.count;
Base.eltype(::Type{Squares}) = Int # Note that this is defined for the type
Base.length(S::Squares) = S.count;
Base is a module which defines many of the functions, types and macros used in the Julia language. You can view the files for everything it contains here or call whos(Base) to print a list.
In fact, these functions and types (which include things like sum and Int) are so fundamental to the language that they are included in Julia's top-level scope by default.
This means that we can just use sum instead of Base.sum every time we want to use that particular function. Both names refer to the same thing:
Julia> sum === Base.sum
true
Julia> #which sum # show where the name is defined
Base
So why, you might ask, is it necessary is write things like Base.start instead of simply start?
The point is that start is just a name. We are free to rebind names in the top-level scope to anything we like. For instance start = 0 will rebind the name 'start' to the integer 0 (so that it no longer refers to Base.start).
Concentrating now on the specific example in docs, if we simply wrote start(::Squares) = 1, then we find that we have created a new function with 1 method:
Julia> start
start (generic function with 1 method)
But Julia's iterator interface (invoked using the for loop) requires us to add the new method to Base.start! We haven't done this and so we get an error if we try to iterate:
julia> for i in Squares(7)
println(i)
end
ERROR: MethodError: no method matching start(::Squares)
By updating the Base.start function instead by writing Base.start(::Squares) = 1, the iterator interface can use the method for the Squares type and iteration will work as we expect (as long as Base.done and Base.next are also extended for this type).
I'll grant that for something so fundamental, the explanation is buried a bit far down in the documentation, but http://docs.julialang.org/en/release-0.4/manual/modules/#standard-modules describes this:
There are three important standard modules: Main, Core, and Base.
Base is the standard library (the contents of base/). All modules
implicitly contain using Base, since this is needed in the vast
majority of cases.

julia static field of composite type

In Julia am I allowed to create & use static fields? Let me explain my problem with a simplified example. Let's say we have a type:
type Foo
bar::Dict()
baz::Int
qux::Float64
function Foo(fname,baz_value,qux_value)
dict = JLD.load(fname)["dict"] # It is a simple dictionary loading from a special file
new(dict,baz_value,quz_value)
end
end
Now, As you can see, I load a dictionary from a jld file and store it into the Foo type with the other two variables baz and qux_value. Now, let's say I will create 3 Foo object type.
vars = [ Foo("mydictfile.jld",38,37.0) for i=1:3]
Here, as you can see, all of the Foo objects load the same dictionary. This is a quite big file (~10GB)and I don't want to load it many times. So,
I simply ask that, is there any way in julia so that, I load it just once and all of there 3 types can reach it? (That's way I simply use Static keyword inside the question)
For such a simple question, my approach might look like silly, but as a next step, I make this Foo type iterable and I need to use this dictionary inside the next(d::Foo, state) function.
EDIT
Actually, I've found a way right now. But I want to ask that whether this is a correct or not.
Rather than giving the file name to the FOO constructor, If I load the dictionary into a variable before creating the objects and give the same variable into all of the constructors, I guess all the constructors just create a pointer to the same dictionary rather than creating again and again. Am I right ?
So, modified version will be like that:
dict = JLD.load("mydictfile.jld")["dict"]
vars = [ Foo(dict,38,37.0) for i=1:3]
By the way,I still want to hear if I do the same thing completely inside the Foo type (I mean constructor of it)
You are making the type "too special" by adding the inner constructor. Julia provides default constructors if you do not provide an inner constructor; these just fill in the fields in an object of the new type.
So you can do something like:
immutable Foo{K,V}
bar::Dict{K,V}
baz::Int
qux::Float64
end
dict = JLD.load("mydictfile.jld")["dict"]
vars = [Foo(dict, i, i+1) for i in 1:3]
Note that it was a syntax error to include the parentheses after Dict in the type definition.
The {K,V} makes the Foo type parametric, so that you can make different kinds of Foo type, with different Dict types inside, if necessary. Even if you only use it for a single type of Dict, this will give more efficient code, since the type parameters K and V will be inferred when you create the Foo object. See the Julia manual: http://docs.julialang.org/en/release-0.5/manual/performance-tips/#avoid-fields-with-abstract-containers
So now you can try the code without even having the JLD file available (as we do not, for example):
julia> dict = Dict("a" => 1, "b" => 2)
julia> vars = [Foo(dict, i, Float64(i+1)) for i in 1:3]
3-element Array{Foo{String,Int64},1}:
Foo{String,Int64}(Dict("b"=>2,"a"=>1),1,2.0)
Foo{String,Int64}(Dict("b"=>2,"a"=>1),2,3.0)
Foo{String,Int64}(Dict("b"=>2,"a"=>1),3,4.0)
You can see that it is indeed the same dictionary (i.e. only a reference is actually stored in the type object) by modifying one of them and seeing that the others also change, i.e. that they point to the same dictionary object:
julia> vars[1].bar["c"] = 10
10
julia> vars
3-element Array{Foo{String,Int64},1}:
Foo{String,Int64}(Dict("c"=>10,"b"=>2,"a"=>1),1,2.0)
Foo{String,Int64}(Dict("c"=>10,"b"=>2,"a"=>1),2,3.0)
Foo{String,Int64}(Dict("c"=>10,"b"=>2,"a"=>1),3,4.0)

In julia functions : passed by reference or value?

In julia how do we know if a type is manipulated by value or by reference?
In java for example (at least for the sdk):
the basic types (those that have names starting with lower case letters, like "int") are manipulated by value
Objects (those that have names starting with capital letters, like "HashMap") and arrays are manipulated by reference
It is therefore easy to know what happens to a type modified inside a function.
I am pretty sure my question is a duplicate but I can't find the dup...
EDIT
This code :
function modifyArray(a::Array{ASCIIString,1})
push!(a, "chocolate")
end
function modifyInt(i::Int)
i += 7
end
myarray = ["alice", "bob"]
modifyArray(myarray)
#show myarray
myint = 1
modifyInt(myint)
#show myint
returns :
myarray = ASCIIString["alice","bob", "chocolate"]
myint = 1
which was a bit confusing to me, and the reason why I submitted this question. The comment of #StefanKarpinski clarified the issue.
My confusion came from the fact i considred += as an operator , a method like push! which is modifying the object itself . but it is not.
i += 7 should be seen as i = i + 7 ( a binding to a different object ). Indeed this behavior will be the same for modifyArray if I use for example a = ["chocolate"].
The corresponding terms in Julia are mutable and immutable types:
immutable objects (either bitstypes, such as Int or composite types declared with immutable, such as Complex) cannot be modified once created, and so are passed by copying.
mutable objects (arrays, or composite types declared with type) are passed by reference, so can be modified by calling functions. By convention such functions end with an exclamation mark (e.g., sort!), but this is not enforced by the language.
Note however that an immutable object can contain a mutable object, which can still be modified by a function.
This is explained in more detail in the FAQ.
I think the most rigourous answer is the one in
Julia function argument by reference
Strictly speaking, Julia is not "call-by-reference" but "call-by-value where the value is a
reference" , or "call-by-sharing", as used by most languages such as
python, java, ruby...

Strange Dict behavior with keys of a custom type

I have a recursive function which utilizes a global dict to store values already obtained when traversing the tree. However, at least some of the values stored in the dict seem to disappear! This simplified code shows the problem:
type id
level::Int32
x::Int32
end
Vdict = Dict{id,Float64}()
function getV(w::id)
if haskey(Vdict,w)
return Vdict[w]
end
if w.level == 12
return 1.0
end
w.x == -111 && println("dont have: ",w)
local vv = 0.0
for j = -15:15
local wj = id(w.level+1,w.x+j)
vv += getV(wj)
end
Vdict[w] = vv
w.x == -111 && println("just stored: ",w)
vv
end
getV(id(0,0))
The output has many lines like this:
just stored: id(11,-111)
dont have: id(11,-111)
just stored: id(11,-111)
dont have: id(11,-111)
just stored: id(11,-111)
dont have: id(11,-111)
...
Do I have a silly error, or is there a bug in Julia's dict?
By default, custom types come with implementations of equality and hashing by object identity. Since your id type is mutable, Julia is conservative and assumes that you care about distinguishing each instance from another (since they could potentially diverge):
julia> type Id # There's a strong convention to capitalize type names in Julia
level::Int32
x::Int32
end
julia> x = Id(11, -111)
y = Id(11, -111)
x == y
false
julia> x.level = 12; (x,y)
(Id(12,-111),Id(11,-111))
Julia doesn't know whether you care about the object's long-term behavior or its current value.
There are two ways to make this behave as you'd like:
Make your custom type immutable. It looks like you don't need to mutate the contents of Id. The simplest and most straightforward way to solve this is to define it as immutable Id. Now Id(11, -111) is completely indistinguishable from any other construction of Id(11, -111) since its values can never change. As a bonus, you may see better performance, too.
If you do need to mutate the values, you could alternatively define your own implementations of == and Base.hash so they only care about the current value:
==(a::Id, b::Id) = a.level == b.level && a.x == b.x
Base.hash(a::Id, h::Uint) = hash(a.level, hash(a.x, h))
As #StefanKarpinski just pointed out on the mailing list, this isn't the default for mutable values "since it makes it easy to stick something in a dict, then mutate it, and 'lose it'." That is, the object's hash value has changed but the dictionary stored it in a place based upon its old hash value, and now you can no longer access that key/value pair by key lookup. Even if you create a second object with the same original properties as the first it won't be able to find it since the dictionary checks equality after finding a hash match. The only way to lookup that key is to mutate it back to its original value or explicitly asking the dictionary to Base.rehash! its contents.
In this case, I highly recommend option 1.

Resources