Strange Dict behavior with keys of a custom type - dictionary

I have a recursive function which utilizes a global dict to store values already obtained when traversing the tree. However, at least some of the values stored in the dict seem to disappear! This simplified code shows the problem:
type id
level::Int32
x::Int32
end
Vdict = Dict{id,Float64}()
function getV(w::id)
if haskey(Vdict,w)
return Vdict[w]
end
if w.level == 12
return 1.0
end
w.x == -111 && println("dont have: ",w)
local vv = 0.0
for j = -15:15
local wj = id(w.level+1,w.x+j)
vv += getV(wj)
end
Vdict[w] = vv
w.x == -111 && println("just stored: ",w)
vv
end
getV(id(0,0))
The output has many lines like this:
just stored: id(11,-111)
dont have: id(11,-111)
just stored: id(11,-111)
dont have: id(11,-111)
just stored: id(11,-111)
dont have: id(11,-111)
...
Do I have a silly error, or is there a bug in Julia's dict?

By default, custom types come with implementations of equality and hashing by object identity. Since your id type is mutable, Julia is conservative and assumes that you care about distinguishing each instance from another (since they could potentially diverge):
julia> type Id # There's a strong convention to capitalize type names in Julia
level::Int32
x::Int32
end
julia> x = Id(11, -111)
y = Id(11, -111)
x == y
false
julia> x.level = 12; (x,y)
(Id(12,-111),Id(11,-111))
Julia doesn't know whether you care about the object's long-term behavior or its current value.
There are two ways to make this behave as you'd like:
Make your custom type immutable. It looks like you don't need to mutate the contents of Id. The simplest and most straightforward way to solve this is to define it as immutable Id. Now Id(11, -111) is completely indistinguishable from any other construction of Id(11, -111) since its values can never change. As a bonus, you may see better performance, too.
If you do need to mutate the values, you could alternatively define your own implementations of == and Base.hash so they only care about the current value:
==(a::Id, b::Id) = a.level == b.level && a.x == b.x
Base.hash(a::Id, h::Uint) = hash(a.level, hash(a.x, h))
As #StefanKarpinski just pointed out on the mailing list, this isn't the default for mutable values "since it makes it easy to stick something in a dict, then mutate it, and 'lose it'." That is, the object's hash value has changed but the dictionary stored it in a place based upon its old hash value, and now you can no longer access that key/value pair by key lookup. Even if you create a second object with the same original properties as the first it won't be able to find it since the dictionary checks equality after finding a hash match. The only way to lookup that key is to mutate it back to its original value or explicitly asking the dictionary to Base.rehash! its contents.
In this case, I highly recommend option 1.

Related

why does as.vector deep copy a matrix?

Using top, I manually measured the following memory usages at the specific points designated in the comments of the following code block:
x <- matrix(rnorm(1e9),nrow=1e4)
#~15gb
gc()
# ~7gb after gc()
y <- as.vector(x)
gc()
#~15gb after gc()
It's pretty clear that rnorm(1e9) is a ~7gb vector that's then copied to create the matrix. gc() removes the original vector since it's not assigned to anything. as.vector(x) then coerces and copies the data to vector.
My question is, why can't these three objects all point to the same memory block (at least until one is modified)? Isn't a matrix really just a vector with some additional metadata?
This is in R version 3.6.2
edit: also tested in 4.0.3, same results.
The question you're asking is to the reasoning. That seems more suited for R-devel, and I am assuming the answer in return is "no one knows". The relevant function from R-source is the do_asvector function.
Going down the source code of a call to as.vector(matrix(...)), it is important to note that the default argument for mode is any. This translates to ANYSXP (see R internals). This lets us find the evil culprit (line 1524) of the copy-behaviour.
// source reference: do_asvector
...
if(type == ANYSXP || TYPEOF(x) == type) {
switch(TYPEOF(x)) {
case LGLSXP:
case INTSXP:
case REALSXP:
case CPLXSXP:
case STRSXP:
case RAWSXP:
if(ATTRIB(x) == R_NilValue) return x;
ans = MAYBE_REFERENCED(x) ? duplicate(x) : x; // <== evil culprit
CLEAR_ATTRIB(ans);
return ans;
case EXPRSXP:
case VECSXP:
return x;
default:
;
}
...
Going one step further, we can find the definition for MAYBE_REFERENCED in src/include/Rinternals.h, and by digging a bit we can find that it checks whether sxpinfo.named is equal to 0 (false) or not (true). What I am guessing here is that the assignment operator <- increments the sxpinfo.named counter and thus MAYBE_REFERENCED(x) returns TRUE and we get a duplicate (deep copy).
However, Is this behaviour necessary?
That is a great question. If we had given an argument to mode other than any or class(x) (same as our input class), we skip the duplicate line, and we continue down the function, until we hit a ascommon. So I dug a bit extra and took a look at the source code for ascommon, we can see that if we were to try and convert to list manually (setting mode = "list"), ascommon only calls shallowDuplicate.
// Source reference: ascommon
---
if ((type == LISTSXP) &&
!(TYPEOF(u) == LANGSXP || TYPEOF(u) == LISTSXP ||
TYPEOF(u) == EXPRSXP || TYPEOF(u) == VECSXP)) {
if (MAYBE_REFERENCED(v)) v = shallow_duplicate(v); // <=== ascommon duplication behaviour
CLEAR_ATTRIB(v);
}
return v;
}
---
So one could imagine that the call to duplicate in do_asvector could be replaced by a call to shallow_duplicate. Perhaps a "better safe than sorry" strategy was chosen when the code was originally implemented (prior to R-2.13.0 according to a comment in the source code), or perhaps there is a scenario in one of the types not handled by ascommon that requires a deep-copy.
For now I would test if the function does a deep-copy if we set mode='list' or pass the list without assignment. In either case it might not be a bad idea to send a follow-up question to the R-devel mailing list.
Edit: <- behaviour
I took the liberty to confirm my suspicion, and looked at the source code for <-. I previously stated that I assumed that <- incremented sxpinfo.named, and we can confirm this by looking at do_set (the c source code for <-). When assigning as x <- ... x is a SYMSXP, and this we can see that the source code calls INCREMENT_NAMED which in turn calls SET_NAMED(x, NAMED(X) + 1). So everything else equal we should see a copy behaviour for x <- matrix(...); y <- as.vector(x) while we shouldn't for y <- as.vector(matrix(...)).
At the final gc(), you have x pointing to a vector with a dim attribute, and y pointing to a vector without any dim attribute. The data is an intrinsic part of the object, it's not an attribute, so those two vectors have to be different.
If matrices had been implemented as lists, e.g.
x <- list(data = rnorm(1e9), dim = c(1e4, 1e5))
then a shallow copy would be possible, but that's not how it was done. You can read the details of the internal structure of objects in the R Internals manual. For the current release, that's here: https://cloud.r-project.org/doc/manuals/r-release/R-ints.html#SEXPs .
You may wonder why things were implemented this way. I suspect it's intended to be efficient for the common use cases. Converting a matrix to a vector isn't generally necessary (you can treat x as a vector already, e.g. x[100000] and y[100000] will give the same value), so there's no need for "convert to vector" to be efficient. On the other hand, extracting elements is very common, so you don't want to have an extra pointer dereference slowing that down.

Rocket Universe Dictionary passing VM attribute value to subroutine

Okay this might get a tad complex or not.
Have a file with a multivalues in attribute 4
I want to write another dictionary item that loops through the multivalue list, calls a subroutine and returns calculated values for each item in attribute 4.
somthing like
<4> a]b]c]d]e
New attribute
#RECORD<4>;SUBR("SUB.CALC.AMT", #1)
Result
<4> AMT
a 5.00
b 15.00
c 13.50
d 3.25
Not quite sure how to pass in the values from RECORD<4>, had a vague notion of a #CNT system variable, but that's not working, which might mean it was from SB+ or one of the other 4GLs.
You might be over thinking this.
You should just be able to reference it without out doing the ";" and #1 thing (I am not familiar with that convention). Using an I-Descriptor this should do the trick, though I have traditionally used the actual dictionary names instead of the #RECORD thing.
SUBR("SUB.CALC.AMT", #RECORD<4>)
This should work provided your subroutine is compiled, cataloged and returns your desired value with the same value/subvalue structure as #RECORD<4> in the first parameter of the Subroutine.
SUBROUTINE SUB.CALC.AMT(RETURN.VALUE,JUICY.BITS)
JBC = DCOUNT(JUICY.BITS<1>,#VM)
FOR X=1 TO JBC
RETURN.VALUE<1,X> = JUICY.BITS<1,X>:" or something else"
NEXT X
RETURN
END
Good luck.

Second dict refering to first dict doesn't register changes to first

dictOne = {'a':1, 'b':2}
dictTwo = {'aa':dictOne['a'], 'bb':dictOne['b']}
print(dictTwo['aa']
# returns 1
If I make a change to dictOne like:
dictOne['a'] = 2
print(dictOne['a'])
# returns 2
but the second dict that refers to the first still returns the original value.
print(dictTwo['aa'])
# returns 1
What is happening here? I'm sure this is somehow an inappropriate usage of dict but I need to resolve this in the immediate. Thanks.
You're extracting the value from the key 'a' inside dictOne with the below piece of code
dictTwo = {'aa':dictOne['a']}
You may find some value in reading the python FAQ on pass by assignment
Without knowing more about the problem, it's difficult to say exactly how you can solve this. If you need to create a mapping between different sets of keys, there's the option to do something like:
dictTwo = {'aa' : 'a', 'bb' : 'b'}
dictOne[dictTwo['aa']]
Although maybe what you're looking for is a multi key dict
The line here:
dictTwo = {'aa':dictOne['a'], 'bb':dictOne['b']}
Is equivalent to:
dictTwo = {'aa':1, 'bb':2}
Since dictOne['a'] and dictOne['b'] both return immutable values (integers), they are passed by copy, and not by reference. See How do I pass a variable by reference?
Had you done dictTwo = dictOne, updating dictOne would also update dictTwo, however they would have the same key values.

In julia functions : passed by reference or value?

In julia how do we know if a type is manipulated by value or by reference?
In java for example (at least for the sdk):
the basic types (those that have names starting with lower case letters, like "int") are manipulated by value
Objects (those that have names starting with capital letters, like "HashMap") and arrays are manipulated by reference
It is therefore easy to know what happens to a type modified inside a function.
I am pretty sure my question is a duplicate but I can't find the dup...
EDIT
This code :
function modifyArray(a::Array{ASCIIString,1})
push!(a, "chocolate")
end
function modifyInt(i::Int)
i += 7
end
myarray = ["alice", "bob"]
modifyArray(myarray)
#show myarray
myint = 1
modifyInt(myint)
#show myint
returns :
myarray = ASCIIString["alice","bob", "chocolate"]
myint = 1
which was a bit confusing to me, and the reason why I submitted this question. The comment of #StefanKarpinski clarified the issue.
My confusion came from the fact i considred += as an operator , a method like push! which is modifying the object itself . but it is not.
i += 7 should be seen as i = i + 7 ( a binding to a different object ). Indeed this behavior will be the same for modifyArray if I use for example a = ["chocolate"].
The corresponding terms in Julia are mutable and immutable types:
immutable objects (either bitstypes, such as Int or composite types declared with immutable, such as Complex) cannot be modified once created, and so are passed by copying.
mutable objects (arrays, or composite types declared with type) are passed by reference, so can be modified by calling functions. By convention such functions end with an exclamation mark (e.g., sort!), but this is not enforced by the language.
Note however that an immutable object can contain a mutable object, which can still be modified by a function.
This is explained in more detail in the FAQ.
I think the most rigourous answer is the one in
Julia function argument by reference
Strictly speaking, Julia is not "call-by-reference" but "call-by-value where the value is a
reference" , or "call-by-sharing", as used by most languages such as
python, java, ruby...

Immutable dictionary

Is there a way to enforce a dictionary being constant?
I have a function which reads out a file for parameters (and ignores comments) and stores it in a dict:
function getparameters(filename::AbstractString)
f = open(filename,"r")
dict = Dict{AbstractString, AbstractString}()
for ln in eachline(f)
m = match(r"^\s*(?P<key>\w+)\s+(?P<value>[\w+-.]+)", ln)
if m != nothing
dict[m[:key]] = m[:value]
end
end
close(f)
return dict
end
This works just fine. Since i have a lot of parameters, which i will end up using on different places, my idea was to let this dict be global. And as we all know, global variables are not that great, so i wanted to ensure that the dict and its members are immutable.
Is this a good approach? How do i do it? Do i have to do it?
Bonus answerable stuff :)
Is my code even ok? (it is the first thing i did with julia, and coming from c/c++ and python i have the tendencies to do things differently.) Do i need to check whether the file is actually open? Is my reading of the file "julia"-like? I could also readall and then use eachmatch. I don't see the "right way to do it" (like in python).
Why not use an ImmutableDict? It's defined in base but not exported. You use one as follows:
julia> id = Base.ImmutableDict("key1"=>1)
Base.ImmutableDict{String,Int64} with 1 entry:
"key1" => 1
julia> id["key1"]
1
julia> id["key1"] = 2
ERROR: MethodError: no method matching setindex!(::Base.ImmutableDict{String,Int64}, ::Int64, ::String)
in eval(::Module, ::Any) at .\boot.jl:234
in macro expansion at .\REPL.jl:92 [inlined]
in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at .\event.jl:46
julia> id2 = Base.ImmutableDict(id,"key2"=>2)
Base.ImmutableDict{String,Int64} with 2 entries:
"key2" => 2
"key1" => 1
julia> id.value
1
You may want to define a constructor which takes in an array of pairs (or keys and values) and uses that algorithm to define the whole dict (that's the only way to do so, see the note at the bottom).
Just an added note, the actual internal representation is that each dictionary only contains one key-value pair, and a dictionary. The get method just walks through the dictionaries checking if it has the right value. The reason for this is because arrays are mutable: if you did a naive construction of an immutable type with a mutable field, the field is still mutable and thus while id["key1"]=2 wouldn't work, id.keys[1]=2 would. They go around this by not using a mutable type for holding the values (thus holding only single values) and then also holding an immutable dict. If you wanted to make this work directly on arrays, you could use something like ImmutableArrays.jl but I don't think that you'd get a performance advantage because you'd still have to loop through the array when checking for a key...
First off, I am new to Julia (I have been using/learning it since only two weeks). So do not put any confidence in what I am going to say unless it is validated by others.
The dictionary data structure Dict is defined here
julia/base/dict.jl
There is also a data structure called ImmutableDict in that file. However as const variables aren't actually const why would immutable dictionaries be immutable?
The comment states:
ImmutableDict is a Dictionary implemented as an immutable linked list,
which is optimal for small dictionaries that are constructed over many individual insertions
Note that it is not possible to remove a value, although it can be partially overridden and hidden
by inserting a new value with the same key
So let us call what you want to define as a dictionary UnmodifiableDict to avoid confusion. Such object would probably have
a similar data structure as Dict.
a constructor that takes a Dict as input to fill its data structure.
specialization (a new dispatch?) of the the method setindex! that is called by the operator [] =
in order to forbid modification of the data structure. This should be the case of all other functions that end with ! and hence modify the data.
As far as I understood, It is only possible to have subtypes of abstract types. Therefore you can't make UnmodifiableDict as a subtype of Dict and only redefine functions such as setindex!
Unfortunately this is a needed restriction for having run-time types and not compile-time types. You can't have such a good performance without a few restrictions.
Bottom line:
The only solution I see is to copy paste the code of the type Dict and its functions, replace Dict by UnmodifiableDict everywhere and modify the functions that end with ! to raise an exception if called.
you may also want to have a look at those threads.
https://groups.google.com/forum/#!topic/julia-users/n-lqjybIO_w
https://github.com/JuliaLang/julia/issues/1974
REVISION
Thanks to Chris Rackauckas for pointing out the error in my earlier response. I'll leave it below as an illustration of what doesn't work. But, Chris is right, the const declaration doesn't actually seem to improve performance when you feed the dictionary into the function. Thus, see Chris' answer for the best resolution to this issue:
D1 = [i => sind(i) for i = 0.0:5:3600];
const D2 = [i => sind(i) for i = 0.0:5:3600];
function test(D)
for jdx = 1:1000
# D[2] = 2
for idx = 0.0:5:3600
a = D[idx]
end
end
end
## Times given after an initial run to allow for compiling
#time test(D1); # 0.017789 seconds (4 allocations: 160 bytes)
#time test(D2); # 0.015075 seconds (4 allocations: 160 bytes)
Old Response
If you want your dictionary to be a constant, you can use:
const MyDict = getparameters( .. )
Update Keep in mind though that in base Julia, unlike some other languages, it's not that you cannot redefine constants, instead, it's just that you get a warning when doing so.
julia> const a = 2
2
julia> a = 3
WARNING: redefining constant a
3
julia> a
3
It is odd that you don't get the constant redefinition warning when adding a new key-val pair to the dictionary. But, you still see the performance boost from declaring it as a constant:
D1 = [i => sind(i) for i = 0.0:5:3600];
const D2 = [i => sind(i) for i = 0.0:5:3600];
function test1()
for jdx = 1:1000
for idx = 0.0:5:3600
a = D1[idx]
end
end
end
function test2()
for jdx = 1:1000
for idx = 0.0:5:3600
a = D2[idx]
end
end
end
## Times given after an initial run to allow for compiling
#time test1(); # 0.049204 seconds (1.44 M allocations: 22.003 MB, 5.64% gc time)
#time test2(); # 0.013657 seconds (4 allocations: 160 bytes)
To add to the existing answers, if you like immutability and would like to get performant (but still persistent) operations which change and extend the dictionary, check out FunctionalCollections.jl's PersistentHashMap type.
If you want to maximize performance and take maximal advantage of immutability, and you don't plan on doing any operations on the dictionary whatsoever, consider implementing a perfect hash function-based dictionary. In fact, if your dictionary is a compile-time constant, these can even be computed ahead of time (using metaprogramming) and precompiled.

Resources