Hash instability in Julia composite types - julia

In Julia, composite types with at least one field that have identical values hash to different values. This means composite types don't function correctly if you use them as dictionary keys or anything else that is dependent on the hashed value. This behavior is inconsistent with the behavior of other types, such as Vector{Int} as well.
More concretely,
vectors of non-composite types that are different objects but have identical values hash to the same value:
julia> hash([1,2,3])==hash([1,2,3])
true
composite types with no fields hash to the same value:
julia> type A end
julia> hash(A())==hash(A())
true
composite types with at least one field hash to different values if they're different objects that have the same value:
julia> type B
b::Int
end
julia> hash(B(1))==hash(B(1))
false
however, the same object maintains its hash even if the underlying values change:
julia> b=B(1)
julia> hash(b)
0x0b2c67452351ff52
julia> b.b=2;
julia> hash(b)
0x0b2c67452351ff52
this is inconsistent with the behavior of vectors (if you change an element, the hash changes):
julia> a = [1,2,3];
julia> hash(a)
0xd468fb40d24a17cf
julia> a[1]=2;
julia> hash(a)
0x777c61a790f5843f
this issue is not present for immutable types:
julia> immutable C
c::Int
end
julia> hash(C(1))==hash(C(1))
true
Is there something fundamental driving this behavior from a language design perspective? Are there plans to fix or correct this behavior?

I'm not a Julia language designer, but I'l say this sort of behavior is not surprising when comparing mutable and immutable values. Your type B is mutable: it's not entirely obvious that two instances of it, even if they have the same value for field b, should be considered equal. If you feel like this should be the case, you are free to implement a hash function for it. But in general, mutable objects have independent identities. Immutable entities, like C are impossible to tell apart from each other, therefore it makes sense for them to obey structural hashing.
Two bank accounts that happen to have $5 in them are not identical and probably shouldn't hash to the same number. But two instances of $5 are impossible to distinguish from each other.

Related

For Sets (immutable) and Strings (mutable), the behavior of "===" doesn't seem to comply to the documentation in Julia (v1.5 at least)

I think I am missing a point, as in my tests it seems the behavior of "===" doesn't comply to the documentation.
The documentation states "First the types of x and y are compared. If those are identical, mutable objects are compared by address in memory and immutable objects (such as numbers) are compared by contents at the bit level".
I understand from this definition is that :
for mutable objects, two distincts objects (ie. different memory address) should not be "==="
for immutable objects, when the contents are identical, they should be "==="
However :
The Sets are immutable, but two identical objects by the content are not "==="
set1 = Set(["S"])
set2 = Set(["S"])
ismutable(set1)
Returns false
set1 === set2
Returns false, but according to the documentation should return true, as set1 and set2 are two immutable objects with identical contents. (or ismutable(set1) should return true?)
The Strings are mutable, but two distinct objects are "==="
string1 = String("test")
string2 = String("test")
ismutable(string1)
Returns true
string1 === string2
Returns true, but according to the documentation should return false as string1 and string2 are two distinct mutable objects, and hence their address in memory should be different. (or ismutable(string1) should return false?)
What is the point I am missing ?
These cases are tricky and I agree that not intuitive. Set is a bit easier to explain.
Set is defined as:
struct Set{T} <: AbstractSet{T}
dict::Dict{T,Nothing}
Set{T}() where {T} = new(Dict{T,Nothing}())
Set{T}(s::Set{T}) where {T} = new(Dict{T,Nothing}(s.dict))
end
so you can see that although it is an immutable struct its field is a mutable object dict. So the comparison of dict field uses memory address and all is consistent.
But I agree that it is confusing that Set is immutable, but Dict is mutable; however, all is consistent here.
With String the situation is more complex. They are defined to be immutable, see here:
As in Java, strings are immutable: the value of an AbstractString object cannot be changed. To construct a different string value, you construct a new string from parts of other strings.
but they are implemented in non standard way in Julia Base for performance.
Therefore most of the functions follow the rule of respecting immutability (in particular ===). I think ismutable for string should either change its docstring or return false (my personal opinion is that it should return false).

Type Hierarchy for Containers Values confusing [duplicate]

I try to understand typing in Julia and encounter the following problem with Array. I wrote a function bloch_vector_2d(Array{Complex,2}); the detailed implementation is irrelevant. When calling, here is the complaint:
julia> bloch_vector_2d(rhoA)
ERROR: MethodError: no method matching bloch_vector_2d(::Array{Complex{Float64},2})
Closest candidates are:
bloch_vector_2d(::Array{Complex,2}) at REPL[56]:2
bloch_vector_2d(::StateAB) at REPL[54]:1
Stacktrace:
[1] top-level scope at REPL[64]:1
The problem is that an array of parent type is not automatically a parent of an array of child type.
julia> Complex{Float64} <: Complex
true
julia> Array{Complex{Float64},2} <: Array{Complex,2}
false
I think it would make sense to impose in julia that Array{Complex{Float64},2} <: Array{Complex,2}. Or what is the right way to implement this in Julia? Any helps or comments are appreciated!
This issue is discussed in detail in the Julia Manual here.
Quoting the relevant part of it:
In other words, in the parlance of type theory, Julia's type parameters are invariant, rather than being covariant (or even contravariant). This is for practical reasons: while any instance of Point{Float64} may conceptually be like an instance of Point{Real} as well, the two types have different representations in memory:
An instance of Point{Float64} can be represented compactly and efficiently as an immediate pair of 64-bit values;
An instance of Point{Real} must be able to hold any pair of instances of Real. Since objects that are instances of Real can be of arbitrary size and structure, in practice an instance of Point{Real} must be represented as a pair of pointers to individually allocated Real objects.
Now going back to your question how to write a method signature then you have:
julia> Array{Complex{Float64},2} <: Array{<:Complex,2}
true
Note the difference:
Array{<:Complex,2} represents a union of all types that are 2D arrays whose eltype is a subtype of Complex (i.e. no array will have this exact type).
Array{Complex,2} is a type that an array can have and this type means that you can store Complex values in it that can have mixed parameter.
Here is an example:
julia> x = Complex[im 1im;
1.0im Float16(1)im]
2×2 Array{Complex,2}:
im 0+1im
0.0+1.0im 0.0+1.0im
julia> typeof.(x)
2×2 Array{DataType,2}:
Complex{Bool} Complex{Int64}
Complex{Float64} Complex{Float16}
Also note that the notation Array{<:Complex,2} is the same as writing Array{T,2} where T<:Complex (or more compactly Matrix{T} where T<:Complex).
This is more of a comment, but I can't hesitate posting it. This question apprars so often. I'll tell you why that phenomenon must arise.
A Bag{Apple} is a Bag{Fruit}, right? Because, when I have a JuicePress{Fruit}, I can give it a Bag{Apple} to make some juice, because Apples are Fruits.
But now we run into a problem: my fruit juice factory, in which I process different fruits, has a failure. I order a new JuicePress{Fruit}. Now, I unfortunately get delivered a replacement JuicePress{Lemon} -- but Lemons are Fruits, so surely a JuicePress{Lemon} is a JuicePress{Fruit}, right?
However, the next day, I feed apples to the new press, and the machine explodes. I hope you see why: JuicePress{Lemon} is not a JuicePress{Fruit}. On the contrary: a JuicePress{Fruit} is a JuicePress{Lemon} -- I can press lemons with a fruit-agnostic press! They could have sent me a JuicePress{Plant}, though, since Fruits are Plants.
Now we can get more abstract. The real reason is: function input arguments are contravariant, while function output arguments are covariant (in an idealized setting)2. That is, when we have
f : A -> B
then I can pass in supertypes of A, and end up with subtypes of B. Hence, when we fix the first argument, the induced function
(Tree -> Apple) <: (Tree -> Fruit)
whenever Apple <: Fruit -- this is the covariant case, it preserves the direction of <:. But when we fix the second one,
(Fruit -> Juice) <: (Apple -> Juice)
whenever Fruit >: Apple -- this inverts the diretion of <:, and therefore is called contravariant.
This carries over to other parametric data types, since there, too, you usually have "output-like" parameters (as in the Bag), and "input-like" parameters (as with the JuicePress). There can also be parameters that behave like neither (e.g., when they occur in both fashions) -- these are then called invariant.
There are now two ways in which languages with parametric types solve this problem. The, in my opinion, more elegant one is to mark every parameter: no annotation means invariant, + means covariant, - means contravariant (this has technical reasons -- those parameters are said to occur in "positive" and "negative position"). So we had the Bag[+T <: Fruit], or the JuicePress[-T <: Fruit] (should be Scala syntax, but I haven't tried it). This makes subtyping more complicated, though.
The other route to go is what Julia does (and, BTW, Java): all types are invariant1, but you can specify upper and lower unions at the call site. So you have to say
makejuice(::JoicePress{>:T}, ::Bag{<:T}) where {T}
And that's how we arrive at the other answers.
1Except for tuples, but that's weird.
2This terminology comes from category theory. The Hom-functor is contravariant in the first, and covariant in the second argument. There's an intuitive realization of subtyping through the "forgetful" functor from the category Typ to the poset of Types under the <: relation. And the CT terminology in turn comes from tensors.
While the "how it works" discussion has been done in the another answer, the best way to implement your method is the following:
function bloch_vector_2d(a::AbstractArray{Complex{T}}) where T<:Real
sum(a) + 5*one(T) # returning something to see how this is working
end
Now this will work like this:
julia> bloch_vector_2d(ones(Complex{Float64},4,3))
17.0 + 0.0im

Arrays of abstract type in julia in functions

I try to understand typing in Julia and encounter the following problem with Array. I wrote a function bloch_vector_2d(Array{Complex,2}); the detailed implementation is irrelevant. When calling, here is the complaint:
julia> bloch_vector_2d(rhoA)
ERROR: MethodError: no method matching bloch_vector_2d(::Array{Complex{Float64},2})
Closest candidates are:
bloch_vector_2d(::Array{Complex,2}) at REPL[56]:2
bloch_vector_2d(::StateAB) at REPL[54]:1
Stacktrace:
[1] top-level scope at REPL[64]:1
The problem is that an array of parent type is not automatically a parent of an array of child type.
julia> Complex{Float64} <: Complex
true
julia> Array{Complex{Float64},2} <: Array{Complex,2}
false
I think it would make sense to impose in julia that Array{Complex{Float64},2} <: Array{Complex,2}. Or what is the right way to implement this in Julia? Any helps or comments are appreciated!
This issue is discussed in detail in the Julia Manual here.
Quoting the relevant part of it:
In other words, in the parlance of type theory, Julia's type parameters are invariant, rather than being covariant (or even contravariant). This is for practical reasons: while any instance of Point{Float64} may conceptually be like an instance of Point{Real} as well, the two types have different representations in memory:
An instance of Point{Float64} can be represented compactly and efficiently as an immediate pair of 64-bit values;
An instance of Point{Real} must be able to hold any pair of instances of Real. Since objects that are instances of Real can be of arbitrary size and structure, in practice an instance of Point{Real} must be represented as a pair of pointers to individually allocated Real objects.
Now going back to your question how to write a method signature then you have:
julia> Array{Complex{Float64},2} <: Array{<:Complex,2}
true
Note the difference:
Array{<:Complex,2} represents a union of all types that are 2D arrays whose eltype is a subtype of Complex (i.e. no array will have this exact type).
Array{Complex,2} is a type that an array can have and this type means that you can store Complex values in it that can have mixed parameter.
Here is an example:
julia> x = Complex[im 1im;
1.0im Float16(1)im]
2×2 Array{Complex,2}:
im 0+1im
0.0+1.0im 0.0+1.0im
julia> typeof.(x)
2×2 Array{DataType,2}:
Complex{Bool} Complex{Int64}
Complex{Float64} Complex{Float16}
Also note that the notation Array{<:Complex,2} is the same as writing Array{T,2} where T<:Complex (or more compactly Matrix{T} where T<:Complex).
This is more of a comment, but I can't hesitate posting it. This question apprars so often. I'll tell you why that phenomenon must arise.
A Bag{Apple} is a Bag{Fruit}, right? Because, when I have a JuicePress{Fruit}, I can give it a Bag{Apple} to make some juice, because Apples are Fruits.
But now we run into a problem: my fruit juice factory, in which I process different fruits, has a failure. I order a new JuicePress{Fruit}. Now, I unfortunately get delivered a replacement JuicePress{Lemon} -- but Lemons are Fruits, so surely a JuicePress{Lemon} is a JuicePress{Fruit}, right?
However, the next day, I feed apples to the new press, and the machine explodes. I hope you see why: JuicePress{Lemon} is not a JuicePress{Fruit}. On the contrary: a JuicePress{Fruit} is a JuicePress{Lemon} -- I can press lemons with a fruit-agnostic press! They could have sent me a JuicePress{Plant}, though, since Fruits are Plants.
Now we can get more abstract. The real reason is: function input arguments are contravariant, while function output arguments are covariant (in an idealized setting)2. That is, when we have
f : A -> B
then I can pass in supertypes of A, and end up with subtypes of B. Hence, when we fix the first argument, the induced function
(Tree -> Apple) <: (Tree -> Fruit)
whenever Apple <: Fruit -- this is the covariant case, it preserves the direction of <:. But when we fix the second one,
(Fruit -> Juice) <: (Apple -> Juice)
whenever Fruit >: Apple -- this inverts the diretion of <:, and therefore is called contravariant.
This carries over to other parametric data types, since there, too, you usually have "output-like" parameters (as in the Bag), and "input-like" parameters (as with the JuicePress). There can also be parameters that behave like neither (e.g., when they occur in both fashions) -- these are then called invariant.
There are now two ways in which languages with parametric types solve this problem. The, in my opinion, more elegant one is to mark every parameter: no annotation means invariant, + means covariant, - means contravariant (this has technical reasons -- those parameters are said to occur in "positive" and "negative position"). So we had the Bag[+T <: Fruit], or the JuicePress[-T <: Fruit] (should be Scala syntax, but I haven't tried it). This makes subtyping more complicated, though.
The other route to go is what Julia does (and, BTW, Java): all types are invariant1, but you can specify upper and lower unions at the call site. So you have to say
makejuice(::JoicePress{>:T}, ::Bag{<:T}) where {T}
And that's how we arrive at the other answers.
1Except for tuples, but that's weird.
2This terminology comes from category theory. The Hom-functor is contravariant in the first, and covariant in the second argument. There's an intuitive realization of subtyping through the "forgetful" functor from the category Typ to the poset of Types under the <: relation. And the CT terminology in turn comes from tensors.
While the "how it works" discussion has been done in the another answer, the best way to implement your method is the following:
function bloch_vector_2d(a::AbstractArray{Complex{T}}) where T<:Real
sum(a) + 5*one(T) # returning something to see how this is working
end
Now this will work like this:
julia> bloch_vector_2d(ones(Complex{Float64},4,3))
17.0 + 0.0im

How does the Julia Type Graph treat arrays?

I'm trying to understand the structure of arrays in the Julia Type Graph. This seems very counter-intuitive to me:
julia> Int64 <: Number
true
julia> Array{Int64,1} <: Array{Number,1}
false
julia> Array{Int64,1} <: Array{Int,1}
true
It seems that a <: b is not sufficient for Array{a,1} <: Array{b,1}. When does Array{a,1} <: Array{b,1}?
A practical corollary: how can I type-declare an abstract array of numbers?
In the following page of the manual, it's described how julia's types are invariant as opposed to covariant.
https://docs.julialang.org/en/v1/manual/types/#Parametric-Composite-Types-1
See in particular the warning admonition stating
This last point is very important: even though Float64 <: Real we DO NOT have Point{Float64} <: Point{Real}.
And the following explanation given
In other words, in the parlance of type theory, Julia's type parameters are invariant, rather than being covariant (or even contravariant). This is for practical reasons: while any instance of Point{Float64} may conceptually be like an instance of Point{Real} as well, the two types have different representations in memory:
An instance of Point{Float64} can be represented compactly and efficiently as an immediate pair of 64-bit values;
An instance of Point{Real} must be able to hold any pair of instances of Real. Since objects that are instances of Real can be of arbitrary size and structure, in practice an instance of Point{Real} must be represented as a pair of pointers to individually allocated Real objects.
An abstract array with any kind of number is denoted like this
AbstractArray{<:Number} which is short for AbstractArray{T} where T <: Number

Why do structs and a mutable structs have different default equality operators?

I have the following code:
julia> struct Point
x
y
end
julia> Point(1,2) == Point(1,2)
true
julia> mutable struct Points
x
y
end
julia> Points(1,2) == Points(1,2)
false
Why are the two objects equal when it is a normal struct but not equal when it is a mutable struct?
The reason is that by default == falls back to ===. Now the way === works is (citing the documentation):
First the types of x and y are compared. If those are identical,
mutable objects are compared by address in memory and immutable objects (such as numbers)
are compared by contents at the bit level.
So for Point, which is immutable, the comparison of contents is performed (and it is identical in your case). While Points is mutable, so memory addresses of passed objects are compared and they are different as you have created two distinct objects.
Bogumił Kamiński is correct, but you might as why that difference in the definition of === exists between mutable and immutable types. The reason is that your immutable structs Point are actually indistinguishable. Since they can't change, their values will always be the same, and so they might as well be two names for the same object. Therefore, In the language they are defined by only their value.
In contrast, for mutabke structs there are at least two ways you can distinguish them. First, since mutable structs can't usually be stack allocated, they have a memory location, and you can compare the memory location of the two mutable structs and see they are different. Second, you can simply mutate one of them, and see that only one object changes whereas the other doesn't.
So, the reason for the difference in the definition of === is that two identitcal mutable structs can be distinguished, but two immutable ones cannot.

Resources