Unicity of complex key dictionaries in Go but not in Julia? - dictionary

In GO when I use a struct as a key for a map, there is an unicity of the keys.
For example, the following code produce a map with only one key : map[{x 1}:1]
package main
import (
"fmt"
)
type MyT struct {
A string
B int
}
func main() {
dic := make(map[MyT]int)
for i := 1; i <= 10; i++ {
dic[MyT{"x", 1}] = 1
}
fmt.Println(dic)
}
// result : map[{x 1}:1]
I Tried to do the same in Julia and I had a strange surprise :
This Julia code, similar to the GO one, produces a dictionary whith 10 keys !
type MyT
A::String
B::Int64
end
dic = Dict{MyT, Int64}()
for i in 1:10
dic[MyT("x", 1)] = 1
end
println(dic)
# Dict(MyT("x",1)=>1,MyT("x",1)=>1,MyT("x",1)=>1,MyT("x",1)=>1,MyT("x",1)=>1,MyT("x",1)=>1,MyT("x",1)=>1,MyT("x",1)=>1,MyT("x",1)=>1,MyT("x",1)=>1)
println(keys(dic))
# MyT[MyT("x",1),MyT("x",1),MyT("x",1),MyT("x",1),MyT("x",1),MyT("x",1),MyT("x",1),MyT("x",1),MyT("x",1),MyT("x",1)]
So what I did wrong ?
Thank you #DanGetz for the solution ! :
immutable MyT # or struct MyT with julia > 0.6
A::String
B::Int64
end
dic = Dict{MyT, Int64}()
for i in 1:10
dic[MyT("x", 1)] = 1
end
println(dic) # Dict(MyT("x", 1)=>1)
println(keys(dic)) # MyT[MyT("x", 1)]

Mutable values hash by identity in Julia, since without additional knowledge about what a type represents, one cannot know if two values with the same structure mean the same thing or not. Hashing mutable objects by value can be especially problematic if you mutate a value after using it as a dictionary key – this is not a problem when hashing by identity since the identity of a mutable object remains the same even when it is modified. On the other hand, it's perfectly safe to hash immutable objects by value – since they cannot be mutated, and accordingly that is the default behavior for immutable types. In the given example, if you make MyT immutable you will automatically get the behavior you're expecting:
immutable MyT # `struct MyT` in 0.6
A::String
B::Int64
end
dic = Dict{MyT, Int64}()
for i in 1:10
dic[MyT("x", 1)] = 1
end
julia> dic
Dict{MyT,Int64} with 1 entry:
MyT("x", 1) => 1
julia> keys(dic)
Base.KeyIterator for a Dict{MyT,Int64} with 1 entry. Keys:
MyT("x", 1)
For a type holding a String and an Int value that you want to use as a hash key, immutability is probably the right choice. In fact, immutability is the right choice more often than not, which is why the keywords introducing structural types has been change in 0.6 to struct for immutable structures and mutable struct for mutable structures – on the principle that people will reach for the shorter, simpler name first, so that should be the better default choice – i.e. immutability.
As #ntdef has written, you can change the hashing behavior of your type by overloading the Base.hash function. However, his definition is incorrect in a few respects (which is probably our fault for failing to document this more prominently and thoroughly):
The method signature of Base.hash that you want to overload is Base.hash(::T, ::UInt).
The Base.hash(::T, ::UInt) method must return a UInt value.
If you are overloading Base.hash, you should also overload Base.== to match.
So this would be a correct way to make your mutable type hash by value (new Julia session required to redefine MyT):
type MyT # `mutable struct MyT` in 0.6
A::String
B::Int64
end
import Base: ==, hash
==(x::MyT, y::MyT) = x.A == y.A && x.B == y.B
hash(x::MyT, h::UInt) = hash((MyT, x.A, x.B), h)
dic = Dict{MyT, Int64}()
for i in 1:10
dic[MyT("x", 1)] = 1
end
julia> dic
Dict{MyT,Int64} with 1 entry:
MyT("x", 1) => 1
julia> keys(dic)
Base.KeyIterator for a Dict{MyT,Int64} with 1 entry. Keys:
MyT("x", 1)
This is kind of annoying to do manually, but the AutoHashEquals package automates this, taking the tedium out of it. All you need to do is prefix the type definition with the #auto_hash_equals macro:
using AutoHashEquals
#auto_hash_equals type MyT # `#auto_hash_equals mutable struct MyT` in 0.6
A::String
B::Int64
end
Bottom line:
If you have a type that should have value-based equality and hashing, seriously consider making it immutable.
If your type really has to be mutable, then think hard about whether it's a good idea to use as a hash key.
If you really need to use a mutable type as a hash key with value-based equality and hashing semantics, use the AutoHashEquals package.

You did not do anything wrong. The difference between the languages is in how they choose to hash a struct when using it as a key in the map/Dict. In go, structs are hashed by their values rather than their pointer addresses. This allows programmers to more easily implement multidimensional maps by using structs rather than maps of maps. See this blog post for more info.
Reproducing Julia's Behavior in Go
To reproduce Julia's behavior in go, redefine the map to use a pointer to MyT as the key type:
func main() {
dic := make(map[MyT]int)
pdic := make(map[*MyT]int)
for i := 1; i <= 10; i++ {
t := MyT{"x", 1}
dic[t] = 1
pdic[&t] = 1
}
fmt.Println(dic)
fmt.Println(pdic)
}
Here, pdic uses the pointer to a MyT struct as its key type. Because each MyT created in the loop has a different memory address, the key will be different. This produces the output:
map[{x 1}:1]
map[0x1040a140:1 0x1040a150:1 0x1040a160:1 0x1040a180:1 0x1040a1b0:1 0x1040a1c0:1 0x1040a130:1 0x1040a170:1 0x1040a190:1 0x1040a1a0:1]
You can play with this on play.golang.org. Unlike in Julia (see below), the way the map type is implemented go means you cannot specify a custom hashing function for a user-defined struct.
Reproducing Go's Behavior in Julia
Julia uses the function Base.hash(::K, ::UInt) to hash keys for its Dict{K,V} type. While it doesn't explicitly say so in the documentation, the default hashing algorithm uses the output from object_id, as you can see in the source code. To reproduce go's behavior in Julia, define a new hash function for your type that hashes the values of the struct:
Base.hash(t::MyT, h::Uint) = Base.hash((t.A, t.B), h)
Note that you should also define the == operator in the same way to guarantee hash(x)==hash(y) implies isequal(x,y), as mentioned in the documentation.
However, the easiest way to get Julia to act like go in your example is to redefine MyT as immutable. As an immutable type, Julia will hash MyT by its value rather than its object_id. As an example:
immutable MyT
A::String
B::Int64
end
dic = Dict{MyT, Int64}()
for i in 1:10
dic[MyT("x", 1)] = 1
end
dic[MyT("y", 2)] = 2
println(dic) # prints "Dict(MyT("y",2)=>2,MyT("x",1)=>1)"

Edit: Please refer to #StefanKarpinski's answer. The Base.hash function must return a UInt for it to be a valid hash, so my example won't work. Also there's some funkiness regarding user defined types which involves recursion.
The reason you get 10 different keys is due to the fact that Julia uses the hash function when determining the key to a dict. In this case, I'm guessing that it's using the address of the object in memory as the key for the dictionary. If you'd like to explicitly make (A,B) the unique key, you'll need to override the hash function for your particular type, with something like this:
Base.hash(x::MyT) = (x.A, x.B)
That will replicate the Go behavior, with only one item in the Dict.
Here's the documentation to the hash function.
Hope that helps!

Related

Can a julia struct be defined with persistent requirements on field dimensions?

If I define a new struct as
mutable struct myStruct
data::AbstractMatrix
labels::Vector{String}
end
and I want to throw an error if the length of labels is not equal to the number of columns of data, I know that I can write a constructor that enforces this condition like
myStruct(data, labels) = length(labels) != size(data)[2] ? error("Labels incorrect length") : new(data,labels)
However, once the struct is initialized, the labels field can be set to the incorrect length:
m = myStruct(randn(2,2), ["a", "b"])
m.labels = ["a"]
Is there a way to throw an error if the labels field is ever set to length not equal to the number of columns in data?
You could use StaticArrays.jl to fix the matrix and vector's sizes to begin with:
using StaticArrays
mutable struct MatVec{R, C, RC, VT, MT}
data::MMatrix{R, C, MT, RC} # RC should be R*C
labels::MVector{C, VT}
end
but there's the downside of having to compile for every concrete type with a unique permutation of type parameters R,C,MT,VT. StaticArrays also does not scale as well as normal Arrays.
If you don't restrict dimensions in the type parameters (with all those downsides) and want to throw an error at runtime, you got good and bad news.
The good news is you can control whatever mutation happens to your type. m.labels = v would call the method setproperty!(object::myStruct, name::Symbol, v), which you can define with all the safeguards you like.
The bad news is that you can't control mutation to the fields' types. push!(m.labels, 1) mutates in the push!(a::Vector{T}, item) method. The myStruct instance itself doesn't actually change; it still points to the same Vector. If you can't guarantee that you won't do something like x = m.labels; push!(x, "whoops") , then you really do need runtime checks, like iscorrect(m::myStruct) = length(m.labels) == size(m.data)[2]
A good option is to not access the fields of your struct directly. Instead, do it using a function. Eg:
mutable struct MyStruct
data::AbstractMatrix
labels::Vector{String}
end
function modify_labels(s::MyStruct, new_labels::Vector{String})
# do all checks and modifications
end
You should check chapter 8 from "Hands-On Design Patterns and Best Practices with Julia: Proven solutions to common problems in software design for Julia 1.x"

Julia immutable struct with mutable list

Here's some broken code:
struct NumberedList
index::Int64
values::Vector{Int64}
NumberedList(i) = new(i, Int64[])
end
function set_values!(list::NumberedList, new_values::Vector{Int64})
list.values = new_values
end
# ---
mylist = NumberedList(1)
set_values!(mylist, [1, 2, 3])
I don't want to do either of these:
I could declare the NumberedList as a mutable struct (making everything mutable)
I could replace set_values! with something like this (copy all the values):
function set_values!(list::NumberedList, new_values::Vector{Int64})
for i in new_values
push!(list.values, i)
end
end
But I'd like to make index immutable, but allow values to be assigned to.
Other notes:
Thread from the Discourse: "Mutable field in immutable type," but this doesn't apply to vectors in this setting.
It depends on what exactly you want to mutate.
If you really want to reassign the field values::Vector{Int64} itself, then you have to use a mutable struct. There's no way around that, because the actual data of the struct changes when you reassign that field.
If you use an immutable struct with a values::Vector{Int64} field, it means that you cannot change which array is contained, but the array itself is mutable and can change its elements (which are not stored in the struct). In this case, you really do have to copy values to it from external arrays, like your example code (though I would point out that your code did not reset the array to an empty one). I personally think this would be cleaner:
function set_values!(list::NumberedList, new_values::Vector{Int64})
empty!(list.values) # reset list.values to Int64[]
append!(list.values, new_values)
end
The thread you linked talks about using Base.Ref. Base.Ref is pretty much THE way to make a field of an immutable struct indirectly act like a mutable field. It works like this: the field cannot change which RefValue{Vector{Int64}} instance is contained, but the instance itself is mutable and can change its reference (again, not stored in the struct) to any Int64 array. You have to use indexing values[] to get to the array, though:
struct NumberedList
index::Int64
values::Ref{Vector{Int64}}
NumberedList(i) = new(i, Int64[])
end
function set_values!(list::NumberedList, new_values::Vector{Int64})
list.values[] = new_values # "reassign" different array to Ref
end
# ---
mylist = NumberedList(1)
set_values!(mylist, [1, 2, 3])

How to make composite key for a hash map in go

First, my definition of composite key - two ore more values combine to make the key. Not to confuse with composite keys in databases.
My goal is to save computed values of pow(x, y) in a hash table, where x and y are integers. This is where I need ideas on how to make a key, so that given x and y, I can look it up in the hash table, to find pow(x,y).
For example:
pow(2, 3) => {key(2,3):8}
What I want to figure out is how to get the map key for the pair (2,3), i.e. the best way to generate a key which is a combination of multiple values, and use it in hash table.
The easiest and most flexible way is to use a struct as the key type, including all the data you want to be part of the key, so in your case:
type Key struct {
X, Y int
}
And that's all. Using it:
m := map[Key]int{}
m[Key{2, 2}] = 4
m[Key{2, 3}] = 8
fmt.Println("2^2 = ", m[Key{2, 2}])
fmt.Println("2^3 = ", m[Key{2, 3}])
Output (try it on the Go Playground):
2^2 = 4
2^3 = 8
Spec: Map types: You may use any types as the key where the comparison operators == and != are fully defined, and the above Key struct type fulfills this.
Spec: Comparison operators: Struct values are comparable if all their fields are comparable. Two struct values are equal if their corresponding non-blank fields are equal.
One important thing: you should not use a pointer as the key type (e.g. *Key), because comparing pointers only compares the memory address, and not the pointed values.
Also note that you could also use arrays (not slices) as key type, but arrays are not as flexible as structs. You can read more about this here: Why have arrays in Go?
This is how it would look like with arrays:
type Key [2]int
m := map[Key]int{}
m[Key{2, 2}] = 4
m[Key{2, 3}] = 8
fmt.Println("2^2 = ", m[Key{2, 2}])
fmt.Println("2^3 = ", m[Key{2, 3}])
Output is the same. Try it on the Go Playground.
Go can't make a hash of a slice of ints.
Therefore the way I would approach this is mapping a struct to a number.
Here is an example of how that could be done:
package main
import (
"fmt"
)
type Nums struct {
num1 int
num2 int
}
func main() {
powers := make(map[Nums]int)
numbers := Nums{num1: 2, num2: 4}
powers[numbers] = 6
fmt.Printf("%v", powers[input])
}
I hope that helps
Your specific problem is nicely solved by the other answers. I want to add an additional trick that may be useful in some corner cases.
Given that map keys must be comparable, you can also use interfaces. Interfaces are comparable if their dynamic values are comparable.
This allows you to essentially partition the map, i.e. to use multiple types of keys within the same data structure. For example if you want to store in your map n-tuples (it wouldn't work with arrays, because the array length is part of the type).
The idea is to define an interface with a dummy method (but it can surely be not dummy at all), and use that as map key:
type CompKey interface {
isCompositeKey() bool
}
var m map[CompKey]string
At this point you can have arbitrary types implementing the interface, either explicitly or by just embedding it.
In this example, the idea is to make the interface method unexported so that other structs may just embed the interface without having to provide an actual implementation — the method can't be called from outside its package. It will just signal that the struct is usable as a composite map key.
type AbsoluteCoords struct {
CompKey
x, y int
}
type RelativeCoords struct {
CompKey
x, y int
}
func foo() {
p := AbsoluteCoords{x: 1, y: 2}
r := RelativeCoords{x: 10, y: 20}
m[p] = "foo"
m[r] = "bar"
fmt.Println(m[AbsoluteCoords{x: 10, y: 20}]) // "" (empty, types don't match)
fmt.Println(m[RelativeCoords{x: 10, y: 20}]) // "bar" (matches, key present)
}
Of course nothing stops you from declaring actual methods on the interface, that may be useful when ranging over the map keys.
The disadvantage of this interface key is that it is now your responsibility to make sure the implementing types are actually comparable. E.g. this map key will panic:
type BadKey struct {
CompKey
nonComparableSliceField []int
}
b := BadKey{nil, []int{1,2}}
m[b] = "bad!" // panic: runtime error: hash of unhashable type main.BadKey
All in all, this might be an interesting approach when you need to keep two sets of K/V pairs in the same map, e.g. to keep some sanity in function signatures or to avoid defining structs with N very similar map fields.
Playground https://play.golang.org/p/0t7fcvSWdy7

Julia: non-destructively update immutable type variable

Let's say there is a type
immutable Foo
x :: Int64
y :: Float64
end
and there is a variable foo = Foo(1,2.0). I want to construct a new variable bar using foo as a prototype with field y = 3.0 (or, alternatively non-destructively update foo producing a new Foo object). In ML languages (Haskell, OCaml, F#) and a few others (e.g. Clojure) there is an idiom that in pseudo-code would look like
bar = {foo with y = 3.0}
Is there something like this in Julia?
This is tricky. In Clojure this would work with a data structure, a dynamically typed immutable map, so we simply call the appropriate method to add/change a key. But when working with types we'll have to do some reflection to generate an appropriate new constructor for the type. Moreover, unlike Haskell or the various MLs, Julia isn't statically typed, so one does not simply look at an expression like {foo with y = 1} and work out what code should be generated to implement it.
Actually, we can build a Clojure-esque solution to this; since Julia provides enough reflection and dynamism that we can treat the type as a sort of immutable map. We can use fieldnames to get the list of "keys" in order (like [:x, :y]) and we can then use getfield(foo, :x) to get field values dynamically:
immutable Foo
x
y
z
end
x = Foo(1,2,3)
with_slow(x, p) =
typeof(x)(((f == p.first ? p.second : getfield(x, f)) for f in fieldnames(x))...)
with_slow(x, ps...) = reduce(with_slow, x, ps)
with_slow(x, :y => 4, :z => 6) == Foo(1,4,6)
However, there's a reason this is called with_slow. Because of the reflection it's going to be nowhere near as fast as a handwritten function like withy(foo::Foo, y) = Foo(foo.x, y, foo.z). If Foo is parametised (e.g. Foo{T} with y::T) then Julia will be able to infer that withy(foo, 1.) returns a Foo{Float64}, but won't be able to infer with_slow at all. As we know, this kills the crab performance.
The only way to make this as fast as ML and co is to generate code effectively equivalent to the handwritten version. As it happens, we can pull off that version as well!
# Fields
type Field{K} end
Base.convert{K}(::Type{Symbol}, ::Field{K}) = K
Base.convert(::Type{Field}, s::Symbol) = Field{s}()
macro f_str(s)
:(Field{$(Expr(:quote, symbol(s)))}())
end
typealias FieldPair{F<:Field, T} Pair{F, T}
# Immutable `with`
for nargs = 1:5
args = [symbol("p$i") for i = 1:nargs]
#eval with(x, $([:($p::FieldPair) for p = args]...), p::FieldPair) =
with(with(x, $(args...)), p)
end
#generated function with{F, T}(x, p::Pair{Field{F}, T})
:($(x.name.primary)($([name == F ? :(p.second) : :(x.$name)
for name in fieldnames(x)]...)))
end
The first section is a hack to produce a symbol-like object, f"foo", whose value is known within the type system. The generated function is like a macro that takes types as opposed to expressions; because it has access to Foo and the field names it can generate essentially the hand-optimised version of this code. You can also check that Julia is able to properly infer the output type, if you parametrise Foo:
#code_typed with(x, f"y" => 4., f"z" => "hello") # => ...::Foo{Int,Float64,String}
(The for nargs line is essentially a manually-unrolled reduce which enables this.)
Finally, lest I be accused of giving slightly crazy advice, I want to warn that this isn't all that idiomatic in Julia. While I can't give very specific advice without knowing your use case, it's generally best to have fields with a manageable (small) set of fields and a small set of functions which do the basic manipulation of those fields; you can build on those functions to create the final public API. If what you want is really an immutable dict, you're much better off just using a specialised data structure for that.
There is also setindex (without the ! at the end) implemented in the FixedSizeArrays.jl package, which does this in an efficient way.

Function with different argument types

I read about polymorphism in function and saw this example
fun len nil = 0
| len rest = 1 + len (tl rest)
All the other examples dealt with nil arg too.
I wanted to check the polymorphism concept on other types, like
fun func (a : int) : int = 1
| func (b : string) : int = 2 ;
and got the follow error
stdIn:1.6-2.33 Error: parameter or result constraints of clauses don't agree
[tycon mismatch]
this clause: string -> int
previous clauses: int -> int
in declaration:
func = (fn a : int => 1: int
| b : string => 2: int)
What is the mistake in the above function? Is it legal at all?
Subtype Polymorphism:
In a programming languages like Java, C# o C++ you have a set of subtyping rules that govern polymorphism. For instance, in object-oriented programming languages if you have a type A that is a supertype of a type B; then wherever A appears you can pass a B, right?
For instance, if you have a type Mammal, and Dog and Cat were subtypes of Mammal, then wherever Mammal appears you could pass a Dog or a Cat.
You can achive the same concept in SML using datatypes and constructors. For instance:
datatype mammal = Dog of String | Cat of String
Then if you have a function that receives a mammal, like:
fun walk(m: mammal) = ...
Then you could pass a Dog or a Cat, because they are constructors for mammals. For instance:
walk(Dog("Fido"));
walk(Cat("Zoe"));
So this is the way SML achieves something similar to what we know as subtype polymorphism in object-oriented languajes.
Ad-hoc Polymorphysm:
Coercions
The actual point of confusion could be the fact that languages like Java, C# and C++ typically have automatic coercions of types. For instance, in Java an int can be automatically coerced to a long, and a float to a double. As such, I could have a function that accepts doubles and I could pass integers. Some call these automatic coercions ad-hoc polymorphism.
Such form of polymorphism does not exist in SML. In those cases you are forced to manually coerced or convert one type to another.
fun calc(r: real) = r
You cannot call it with an integer, to do so you must convert it first:
calc(Real.fromInt(10));
So, as you can see, there is no ad-hoc polymorphism of this kind in SML. You must do castings/conversions/coercions manually.
Function Overloading
Another form of ad-hoc polymorphism is what we call method overloading in languages like Java, C# and C++. Again, there is no such thing in SML. You may define two different functions with different names, but no the same function (same name) receiving different parameters or parameter types.
This concept of function or method overloading must not be confused with what you use in your examples, which is simply pattern matching for functions. That is syntantic sugar for something like this:
fun len xs =
if null xs then 0
else 1 + len(tl xs)
Parametric Polymorphism:
Finally, SML offers parametric polymorphism, very similar to what generics do in Java and C# and I understand that somewhat similar to templates in C++.
So, for instance, you could have a type like
datatype 'a list = Empty | Cons of 'a * 'a list
In a type like this 'a represents any type. Therefore this is a polymorphic type. As such, I could use the same type to define a list of integers, or a list of strings:
val listOfString = Cons("Obi-wan", Empty);
Or a list of integers
val numbers = Cons(1, Empty);
Or a list of mammals:
val pets = Cons(Cat("Milo", Cons(Dog("Bentley"), Empty)));
This is the same thing you could do with SML lists, which also have parametric polymorphism:
You could define lists of many "different types":
val listOfString = "Yoda"::"Anakin"::"Luke"::[]
val listOfIntegers 1::2::3::4::[]
val listOfMammals = Cat("Misingo")::Dog("Fido")::Cat("Dexter")::Dog("Tank")::[]
In the same sense, we could have parametric polymorphism in functions, like in the following example where we have an identity function:
fun id x = x
The type of x is 'a, which basically means you can substitute it for any type you want, like
id("hello");
id(35);
id(Dog("Diesel"));
id(Cat("Milo"));
So, as you can see, combining all these different forms of polymorphism you should be able to achieve the same things you do in other statically typed languages.
No, it's not legal. In SML, every function has a type. The type of the len function you gave as an example is
fn : 'a list -> int
That is, it takes a list of any type and returns an integer. The function you're trying to make takes and integer or a string, and returns an integer, and that's not legal in the SML type system. The usual workaround is to make a wrapper type:
datatype wrapper = I of int | S of string
fun func (I a) = 1
| func (S a) = 2
That function has type
fn : wrapper -> int
Where wrapper can contain either an integer or a string.

Resources