When should i use `==` vs `===` vs `isequal` - julia

I see that julia has 3 different ways
to do equality.
==, ===, and isequal
Which should I use, and when?

=== is the built-in equality.
On primitives, it is value equality: if they have the same bit level representation then they are equal.
On mutable structs, it is referential equality: things are equal if they are the same memory locations.
On immutable structs, it is structural equality: two structures are equal if they are of the same type and all their fields are equal.
In all 3 cases this is all more or less just bit-level equality, references to memory are pointers.
(But of fancyness to make the immutable struct version recursive)
Ideally, one wouldn't use this much, because it is not customizable.
Sometimes though it is good to use because the optimizer can reason about it really well, so it can lead to better performance for hot loops.
== is general purpose equality
It is overloadable.
For Floats it follows the IEEE rules, i.e. -0.0 == 0.0, NaN != NaN.
And it follows 3 values logic rules for missing == missing resulting in missing.
if == is not defined then it falls back to ===
If you have defined ==, you need to define hash as isequal falls back to using ==, see below.
isequal is equality for purpose of dictionaries.
I don't know a much better way to put it.
Things that are isequal are considered the same for purposes of Dict and Set.
So you can't have two items that are isequal as distinct keys in a Dict.
Use this if you want to be sure that NaNs are equal to each other,
and similarly that missings are equal to each other.
When defining isequal you also must define hash.
isequal(a, b) implies hash(a) == hash(b)
If isequal is not defined, then it falls back to ==

Basically:
== when you are interested in the values of two objects: 1 == 1->true and 1 == 1.0->true
=== when you want to make sure that two objects cannot be distinguished (including arrays pointing to different memory): 1 === 1->true but 1 === 1.0->false and A = [1, 2, 3]; B = [1, 2, 3] leads to A == B->true but A === B->false (A === A->true).
isequal() same as == but handles floating point numbers differently: NaN == NaN->false but isequal(NaN, NaN)->true.
A deeper discussion here.

Related

For Sets (immutable) and Strings (mutable), the behavior of "===" doesn't seem to comply to the documentation in Julia (v1.5 at least)

I think I am missing a point, as in my tests it seems the behavior of "===" doesn't comply to the documentation.
The documentation states "First the types of x and y are compared. If those are identical, mutable objects are compared by address in memory and immutable objects (such as numbers) are compared by contents at the bit level".
I understand from this definition is that :
for mutable objects, two distincts objects (ie. different memory address) should not be "==="
for immutable objects, when the contents are identical, they should be "==="
However :
The Sets are immutable, but two identical objects by the content are not "==="
set1 = Set(["S"])
set2 = Set(["S"])
ismutable(set1)
Returns false
set1 === set2
Returns false, but according to the documentation should return true, as set1 and set2 are two immutable objects with identical contents. (or ismutable(set1) should return true?)
The Strings are mutable, but two distinct objects are "==="
string1 = String("test")
string2 = String("test")
ismutable(string1)
Returns true
string1 === string2
Returns true, but according to the documentation should return false as string1 and string2 are two distinct mutable objects, and hence their address in memory should be different. (or ismutable(string1) should return false?)
What is the point I am missing ?
These cases are tricky and I agree that not intuitive. Set is a bit easier to explain.
Set is defined as:
struct Set{T} <: AbstractSet{T}
dict::Dict{T,Nothing}
Set{T}() where {T} = new(Dict{T,Nothing}())
Set{T}(s::Set{T}) where {T} = new(Dict{T,Nothing}(s.dict))
end
so you can see that although it is an immutable struct its field is a mutable object dict. So the comparison of dict field uses memory address and all is consistent.
But I agree that it is confusing that Set is immutable, but Dict is mutable; however, all is consistent here.
With String the situation is more complex. They are defined to be immutable, see here:
As in Java, strings are immutable: the value of an AbstractString object cannot be changed. To construct a different string value, you construct a new string from parts of other strings.
but they are implemented in non standard way in Julia Base for performance.
Therefore most of the functions follow the rule of respecting immutability (in particular ===). I think ismutable for string should either change its docstring or return false (my personal opinion is that it should return false).

Why do structs and a mutable structs have different default equality operators?

I have the following code:
julia> struct Point
x
y
end
julia> Point(1,2) == Point(1,2)
true
julia> mutable struct Points
x
y
end
julia> Points(1,2) == Points(1,2)
false
Why are the two objects equal when it is a normal struct but not equal when it is a mutable struct?
The reason is that by default == falls back to ===. Now the way === works is (citing the documentation):
First the types of x and y are compared. If those are identical,
mutable objects are compared by address in memory and immutable objects (such as numbers)
are compared by contents at the bit level.
So for Point, which is immutable, the comparison of contents is performed (and it is identical in your case). While Points is mutable, so memory addresses of passed objects are compared and they are different as you have created two distinct objects.
Bogumił Kamiński is correct, but you might as why that difference in the definition of === exists between mutable and immutable types. The reason is that your immutable structs Point are actually indistinguishable. Since they can't change, their values will always be the same, and so they might as well be two names for the same object. Therefore, In the language they are defined by only their value.
In contrast, for mutabke structs there are at least two ways you can distinguish them. First, since mutable structs can't usually be stack allocated, they have a memory location, and you can compare the memory location of the two mutable structs and see they are different. Second, you can simply mutate one of them, and see that only one object changes whereas the other doesn't.
So, the reason for the difference in the definition of === is that two identitcal mutable structs can be distinguished, but two immutable ones cannot.

What's the difference between "equal (=)" and "identical (==)" in ocaml?

In OCaml, we have two kinds of equity comparisons:
x = y and x == y,
So what's exact the difference between them?
Is that x = y in ocaml just like x.equals(y) in Java?
and x == y just like x == y (comparing the address) in Java?
I don't know exactly how x.equals(y) works in Java. If it does a "deep" comparison, then the analogy is pretty close. One thing to be careful of is that physical equality is a slippery concept in OCaml (and functional languages in general). The compiler and runtime system are going to move values around, and may merge and unmerge pure (non-mutable) values at will. So you should only use == if you really know what you're doing. At some level, it requires familiarity with the implementation (which is something to avoid unless necessary).
The specific guarantees that OCaml makes for == are weak. Mutable values compare as physically equal in the way you would expect (i.e., if mutating one of the two will actually mutate the other also). But for non-mutable values, the only guarantee is that values that compare physically equal (==) will also compare as equal (=). Note that the converse is not true, as sepp2k points out for floating values.
In essence, what the language spec is telling you for non-mutable values is that you can use == as a quick check to decide if two non-mutable values are equal (=). If they compare physically equal, they are equal value-wise. If they don't compare physically equal, you don't know if they're equal value-wise. You still have to use = to decide.
Edit: this answer delves into details of the inner working of OCaml, based on the Obj module. That knowledge isn't meant to be used without extra care (let me emphasis on that very important point once more: don't use it for your program, but only if you wish to experiment with the OCaml runtime). That information is also available, albeit perhaps in a more understandable form in the O'Reilly book on OCaml, available online (pretty good book, though a bit dated now).
The = operator is checking structural equality, whereas == only checks physical equality.
Equality checking is based on the way values are allocated and stored within memory. A runtime value in OCaml may roughly fit into 2 different categories : either boxed or unboxed. The former means that the value is reachable in memory through an indirection, and the later means that the value is directly accessible.
Since int (int31 on 32 bit systems, or int63 on 64 bit systems) are unboxed values, both operators are behaving the same with them. A few other types or values, whose runtime implementations are actually int, will also see both operators behaving the same with them, like unit (), the empty list [], constants in algebraic datatypes and polymorphic variants, etc.
Once you start playing with more complex values involving structures, like lists, arrays, tuples, records (the C struct equivalent), the difference between these two operators emerges: values within structures will be boxed, unless they can be runtime represented as native ints (1). This necessity arises from how the runtime system must handle values, and manage memory efficiently. Structured values are allocated when constructed from other values, which may be themselves structured values, in which case references are used (since they are boxed).
Because of allocations, it is very unlikely that two values instantiated at different points of a program could be physically equal, although they'd be structurally equal. Each of the fields, or inner elements within the values could be identical, even up to physical identity, but if these two values are built dynamically, then they would end up using different spaces in memory, and thus be physically different, but structurally equal.
The runtime tries to avoid unecessary allocations though: for instance, if you have a function returning always the same value (in other words, if the function is constant), either simple or structured, that function will always return the same physical value (ie, the same data in memory), so that testing for physical equality the result of two invocations of that function will be successful.
One way to observe when the physical operator will actually return true is to use the Obj.is_block function on its runtime representation (That is to say, the result of Obj.repr on it). This function simply tells whether its parameter runtime representation is boxed.
A more contrived way is to use the following function:
let phy x : int = Obj.magic (Obj.repr x);;
This function will return an int which is the actual value of the pointer to the value bound to x in memory, if this value is boxed. If you try it on a int literal, you will get the exact same value! That's because int are unboxed (ie. the value is stored directly in memory, not through a reference).
Now that we know that boxed values are actually "referenced" values, we can deduce that these values can be modified, even though the language says that they are immutable.
consider for instance the reference type:
# type 'a ref = {mutable contents : 'a };;
We could define an immutable ref like this:
# type 'a imm = {i : 'a };;
type 'a imm = {i : 'a; }
And then use the Obj.magic function to coerce one type into the other, because structurally, these types will be reduced to the same runtime representation.
For instance:
# let x = { i = 1 };;
- : val x : int imm = { i = 1 }
# let y : int ref = Obj.magic x;;
- : val y : int ref = { contents = 1 }
# y := 2;;
- : unit = ()
# x
- : int imm = { i = 2 }
There are a few exceptions to this:
if values are objects, then even seemingly structurally identical values will return false on structural comparison
# let o1 = object end;;
val o1 : < > = <obj>
# let o2 = object end;;
val o2 : < > = <obj>
# o1 = o2;;
- : bool = false
# o1 = o1;;
- : bool = true
here we see that = reverts to physical equivalence.
If values are functions, you cannot compare them structurally, but physical comparison works as intended.
lazy values may or may not be structurally comparable, depending on whether they have been forced or not (respectively).
# let l1 = lazy (40 + 2);;
val l1 : lazy_t = <lazy>
# let l2 = lazy (40 + 2);;
val l2 : lazy_t = <lazy>
# l1 = l2;;
Exception: Invalid_argument "equal: functional value".
# Lazy.force l1;;
- : int = 42
# Lazy.force l2;;
- : int = 42
# l1 = l2;;
- : bool = true
module or record values are also comparable if they don't contain any functional value.
In general, I guess that it is safe to say that values which are related to functions, or may hold functions inside are not comparable with =, but may be compared with ==.
You should obviously be very cautious with all this: relying on the implementation details of the runtime is incorrect (Note: I jokingly used the word evil in my initial version of that answer, but changed it by fear of it being taken too seriously). As you aptly pointed out in comments, the behaviour of the javascript implementation is different for floats (structurally equivalent in javascript, but not in the reference implementation, and what about the java one?).
(1) If I recall correctly, floats are also unboxed when stored in arrays to avoid a double indirection, but they become boxed once extracted, so you shouldn't see a difference in behaviour with boxed values.
Is that x = y in ocaml just like x.equals(y) in Java?
and x == y just like x == y (comparing the address) in Java?
Yes, that's it. Except that in OCaml you can use = on every kind of value, whereas in Java you can't use equals on primitive types. Another difference is that floating point numbers in OCaml are reference types, so you shouldn't compare them using == (not that it's generally a good idea to compare floating point numbers directly for equality anyway).
So in summary, you basically should always be using = to compare any kind of values.
according to http://rigaux.org/language-study/syntax-across-languages-per-language/OCaml.html, == checks for shallow equality, and = checks for deep equality

Pointer equality in Haskell?

Is there any notion of pointer quality in Haskell? == requires things to be deriving Eq, and I have something which contains a (Value -> IO Value), and neither -> nor IO derive Eq.
EDIT: I'm creating an interpreter for another language which does have pointer equality, so I'm trying to model this behavior while still being able to use Haskell functions to model closures.
EDIT: Example: I want a function special that would do this:
> let x a = a * 2
> let y = x
> special x y
True
> let z a = a * 2
> special x z
False
EDIT: Given your example, you could model this with the IO monad. Just assign your functions to IORefs and compare them.
Prelude Data.IORef> z <- newIORef (\x -> x)
Prelude Data.IORef> y <- newIORef (\x -> x)
Prelude Data.IORef> z == z
True
Prelude Data.IORef> z == y
False
Pointer equality would break referential transparency, so NO.
Perhaps surprisingly, it is actually possible to compute extensional equality of total functions on compact spaces, but in general (e.g. functions on the integers with possible non-termination) this is impossible.
EDIT: I'm creating an interpreter for another language
Can you just keep the original program AST or source location alongside the Haskell functions you've translated them into? It seems that you want "equality" based on that.
== requires things to be deriving Eq
Actually (==) requires an instance of Eq, not necessarily a derived instance.
What you probably need to do is to provide your own instance of Eq which simply ignores the (Value -> IO Value) part. E.g.,
data D = D Int Bool (Value -> IO Value)
instance Eq D where
D x y _ == D x' y' _ = x==x && y==y'
Does that help, maybe?
Another way to do this is to exploit StableNames.
However, special would have to return its results inside of the IO monad unless you want to abuse unsafePerformIO.
The IORef solution requires IO throughout the construction of your structure. Checking StableNames only uses it when you want to check referential equality.
IORefs derive Eq. I don't understand what else you need to get the pointer equality of.
Edit: The only things that it makes sense to compare the pointer equality of are mutable structures. But as I mentioned above, mutable structures, like IORefs, already instance Eq, which allows you to see if two IORefs are the same structure, which is exactly pointer equality.
I'm creating an interpreter for
another language which does have
pointer equality, so I'm trying to
model this behavior while still being
able to use Haskell functions to model
closures.
I am pretty sure that in order to write such an interpreter, your code should be monadic. Don't you have to maintain some sort of environment or state? If so, you could also maintain a counter for function closures. Thus, whenever you create a new closure you equip it with a unique id. Then for pointer equivalence you simply compare these identifiers.

What is the difference between equality and equivalence?

I've read a few instances in reading mathematics and computer science that use the equivalence symbol ≡, (basically an '=' with three lines) and it always makes sense to me to read this as if it were equality. What is the difference between these two concepts?
Wikipedia: Equivalence relation:
In mathematics, an equivalence
relation is a binary relation between
two elements of a set which groups
them together as being "equivalent" in
some way. Let a, b, and c be arbitrary
elements of some set X. Then "a ~ b"
or "a ≡ b" denotes that a is
equivalent to b.
An equivalence relation "~" is reflexive, symmetric, and transitive.
In other words, = is just an instance of equivalence relation.
Edit: This seemingly simple criteria of being reflexive, symmetric, and transitive are not always trivial. See Bloch's Effective Java 2nd ed p. 35 for example,
public final class CaseInsensitiveString {
...
// broken
#Override public boolean equals(Object o) {
if (o instance of CaseInsensitiveString)
return s.equalsIgnoreCase(
((CaseInsensitiveString) o).s);
if (o instanceof String) // One-way interoperability!
return s.equalsIgnoreCase((String) o);
return false;
}
}
The above equals implementation breaks the symmetry because CaseInsensitiveString knows about String class, but the String class doesn't know about CaseInsensitiveString.
I take your question to be about math notation rather than programming. The triple equal sign you refer to can be written ≡ in HTML or \equiv in LaTeX.
a ≡ b most commonly means "a is defined to be b" or "let a be equal to b".
So 2+2=4 but φ ≡ (1+sqrt(5))/2.
Here's a handy equivalence table:
Mathematicians Computer scientists
-------------- -------------------
= ==
≡ =
(The other answers about equivalence relations are correct too but I don't think those are as common. There's also a ≡ b (mod m) which is pronounced "a is congruent to b, mod m" and in programmer parlance would be expressed as mod(a,m) == mod(b,m). In other words, a and b are equal after mod'ing by m.)
A lot of languages distinguish between equality of the objects and equality of the values of those objects.
Ruby for example has 3 different ways to test equality. The first, equal?, compares two variables to see if they point to the same instance. This is equivalent in a C-style language of doing a check to see if 2 pointers refer to the same address. The second method, ==, tests value equality. So 3 == 3.0 would be true in this case. The third, eql?, compares both value and class type.
Lisp also has different concepts of equality depending on what you're trying to test.
In languages that I have seen that differentiate between equality and equivalence, equality usually means the type and value are the same while equivalence means that just the values are the same. For example:
int i = 3;
double d = 3.0;
i and d would be have an equivalence relationship since they represent the same value but not equality since they have different types. Other languages may have different ideas of equivalence (such as whether two variables represent the same object).
The answers above are right / partially right but they don't explain what the difference is exactly. In theoretical computer science (and probably in other branches of maths) it has to do with quantification over free variables of the logical equation (that is when we use the two notations at once).
For me the best ways to understand the difference is:
By definition
A ≡ B
means
For all possible values of free variables in A and B, A = B
or
A ≡ B <=> [A = B]
By example
x=2x
iff (in fact iff is the same as ≡)
x=0
x ≡ 2x
iff (because it is not the case that x = 2x for all possible values of x)
False
I hope it helps
Edit:
Another thing that came to my head is the definitions of the two.
A = B is defined as A <= B and A >= B, where <= (smaller equal, not implies) can be any ordering relation
A ≡ B is defined as A <=> B (iff, if and only if, implies both sides), worth noting that implication is also an ordering relation and so it is possible (but less precise and often confusing) to use = instead of ≡.
I guess the conclusion is that when you see =, then you have to figure out the authors intention based on the context.
Take it outside the realm of programming.
(31) equal -- (having the same quantity, value, or measure as another; "on equal terms"; "all men are equal before the law")
equivalent, tantamount -- (being essentially equal to something; "it was as good as gold"; "a wish that was equivalent to a command"; "his statement was tantamount to an admission of guilt"
At least in my dictionary, 'equivelance' means its a good-enough subsitute for the original, but not necessarily identical, and likewise 'equality' conveys complete identical.
null == 0 # true , null is equivelant to 0 ( in php )
null === 0 # false, null is not equal to 0 ( in php )
( Some people use ≈ to represent nonidentical values instead )
The difference resides above all in the level at which the two concepts are introduced. '≡' is a symbol of formal logic where, given two propositions a and b, a ≡ b means (a => b AND b => a).
'=' is instead the typical example of an equivalence relation on a set, and presumes at least a theory of sets. When one defines a particular set, usually he provides it with a suitable notion of equality, which comes in the form of an equivalence relation and uses the symbol '='. For example, when you define the set Q of the rational numbers, you define equality a/b = c/d (where a/b and c/d are rational) if and only if ad = bc (where ad and bc are integers, the notion of equality for integers having already been defined elsewhere).
Sometimes you will find the informal notation f(x) ≡ g(x), where f and g are functions: It means that f and g have the same domain and that f(x) = g(x) for each x in such domain (this is again an equivalence relation). Finally, sometimes you find ≡ (or ~) as a generic symbol to denote an equivalence relation.
You could have two statements that have the same truth value (equivalent) or two statements that are the same (equality). As well the "equal sign with three bars" can also mean "is defined as."
Equality really is a special kind of equivalence relation, in fact. Consider what it means to say:
0.9999999999999999... = 1
That suggests that equality is just an equivalence relation on "string numbers" (which are defined more formally as functions from Z -> {0,...,9}). And we can see from this case, the equivalence classes are not even singletons.
The first problem is, what equality and equivalence mean in this case? Essentially, contexts are quite free to define these terms.
The general tenor I got from various definitions is: For values called equal, it should make no difference which one you read from.
The grossest example that violates this expectation is C++: x and y are said to be equal if x == y evaluates to true, and x and y are said to be equivalent if !(x < y) && !(y < x). Even apart from user-defined overloads of these operators, for floating-point numbers (float, double) those are not the same: All NaN values are equivalent to each other (in fact, equivalent to everything), but not equal to anything including themselves, and the values -0.0 and +0.0 compare equal (and equivalent) although you can distinguish them if you’re clever.
In a lot of cases, you’d need better terms to convey your intent precisely. Given two variables x and y,
identity or “the same” for expressing that there is only one object and x and y refer to it. Any change done through x is inadvertantly observable through y and vice versa. In Java, reference type variables are checked for identity using ==, in C# using the ReferenceEquals method. In C++, if x and y are references, std::addressof(x) == std::addressof(y) will do (whereas &x == &y will work most of the time, but & can be customized for user-defined types).
bitwise or structure equality for expressing that the internal representations of x and y are the same. Notice that bitwise equality breaks down when objects can reference (parts of) themselves internally. To get the intended meaning, the notion has to be refined in such cases to say: Structured the same. In D, bitwise equality is checked via is and C offers memcmp. I know of no language that has built-in structure equality testing.
indistinguishability or substitutability for expressing that values cannot be distinguished (through their public interface): If a function f takes two parameters and x and y are indistinguishable, the calls f(x, y), f(x, x), and f(y, y) always return indistinguishable values – unless f checks for identity (see bullet point above) directly or maybe by mutating the parameters. An example could be two search-trees that happen to contain indistinguishable elements, but the internal trees are layed-out differently. The internal tree layout is an implementation detail that normally cannot be observed through its public methods.
This is also called Leibniz-equality after Gottfried Wilhelm Leibniz who defined equality as the lack of differences.
equivalence for expressing that objects represent values considered essentially the same from some abstract reasoning. For an example for distinguishable equivalent values, observe that floating-point numbers have a negative zero -0.0 distinct from +0.0, and e.g. sign(1/x) is different for -0.0 and +0.0. Equivalence for floating-point numbers is checked using == in many languages with C-like syntax (aka. Algol syntax). Most object-oriented languages check equivalence of objects using an equals (or similarly named) method. C# has the IEquatable<T> interface to designate that the class has a standard/canonical/default equivalence relation defined on it. In Java, one overrides the equals method every class inherits from Object.
As you can see, the notions become increasingly vague. Checking for identity is something most languages can express. Identity and bitwise equality usually cannot be hooked by the programmer as the notions are independent from interpretations. There was a C++20 proposal, which ended up being rejected, that would have introduced the last two notions as strong† and weak equality†. († This site looks like CppReference, but is not; it is not up-to-date.) The original paper is here.
There are languages without mutation, primarily functional languages like Haskell. The difference between equality and equivalence there is less of an issue and tilts to the mathematical use of those words. (In math, generally speaking, (recursively defined) sequences are used instead of re-assignments.)
Everything C has, is also available to C++ and any language that can use C functionality. Everything said about C# is true for Visual Basic .NET and probably all languages built on the .NET framework. Analogously, Java represents the JRE languages that also include Kotlin and Scala.
If you just want stupid definitions without wisdom: An equivalence relation is a reflexive, symmetrical, and transitive binary relation on a set. Equality then is the intersection of all those equivalence relations.

Resources