What is the difference between "==" and "===" comparison operators in Julia? - julia

What is the difference between == and === comparison operators in Julia?

#ChrisRackauckas's answer is accurate as far as it goes – i.e. for mutable objects. There's a bit more to the issue than that, however, so I'll elaborate a bit here.
The === operator (an alias for the is function) implements Henry Baker's EGAL predicate [1, 2]: x === y is true when two objects are programmatically indistinguishable – i.e. you cannot write code that demonstrates any difference between x and y. This boils down to the following rules:
For mutable values (arrays, mutable composite types), === checks object identity: x === y is true if x and y are the same object, stored at the same location in memory.
For immutable composite types, x === y is true if x and y have the same type – and thus the same structure – and their corresponding components are all recursively ===.
For bits types (immutable chunks of data like Int or Float64), x === y is true if x and y contain exactly the same bits.
These rules, applied recursively, define the behavior of ===.
The == function, on the other hand, is user-definable, and implements "abstract value equality". Overloadability is one key difference:
The === is not overloadable – it is a builtin function with fixed, pre-defined behavior. You cannot extend or change its behavior.
The == is overloadable – it is a normal (for Julia) generic function with infix syntax. It has fallback definitions that give it useful default behavior on user-defined types, but you can change that as you see fit by adding new, more specific methods to == for your types.
To provide more detail about how == behaves for built-in types and how it should behave for user-defined types when people extend it, from the docs:
For example, all numeric types are compared by numeric value, ignoring
type. Strings are compared as sequences of characters, ignoring
encoding.
You can think of this as "intuitive equality". If two numbers are numerically equal, they are ==:
julia> 1 == 1.0 == 1 + 0im == 1.0 + 0.0im == 1//1
true
julia> 0.5 == 1/2 == 1//2
true
Note, however that == implements exact numerical equality:
julia> 2/3 == 2//3
false
These values are unequal because 2/3 is the floating-point value 0.6666666666666666, which is the closest Float64 to the mathematical value 2/3 (or in Julia notation for a rational values, 2//3), but 0.6666666666666666 is not exactly equal to 2/3. Moreover, ==
Follows IEEE 754 semantics for floating-point numbers.
This includes some possibly unexpected properties:
There are distinct positive and negative floating-point zeros (0.0 and -0.0): they are ==, even though they behave differently and are thus not ===.
There are many different not-a-number (NaN) values: they are not == to themselves, each other, or any other value; they are each === to themselves, but not !== to each other since they have different bits.
Examples:
julia> 0.0 === -0.0
false
julia> 0.0 == -0.0
true
julia> 1/0.0
Inf
julia> 1/-0.0
-Inf
julia> NaN === NaN
true
julia> NaN === -NaN
false
julia> -NaN === -NaN
true
julia> NaN == NaN
false
julia> NaN == -NaN
false
julia> NaN == 1.0
false
This is kind of confusing, but that's the IEEE standard.
Further, the docs for == also state:
Collections should generally implement == by calling == recursively on all contents.
Thus, the notion of value equality as given by == is extended recursively to collections:
julia> [1, 2, 3] == [1, 2, 3]
true
julia> [1, 2, 3] == [1.0, 2.0, 3.0]
true
julia> [1, 2, 3] == Any[1//1, 2.0, 3 + 0im]
true
Accordingly, this inherits the foibles of scalar == comparisons:
julia> a = [1, NaN, 3]
3-element Array{Float64,1}:
1.0
NaN
3.0
julia> a == a
false
The === comparison, on the other hand always tests object identity, so even if two arrays have the same type and contain identical values, they are only equal if they are the same array:
julia> b = copy(a)
3-element Array{Float64,1}:
1.0
NaN
3.0
julia> a === a
true
julia> a === b
false
julia> b === b
true
The reason that a and b are not === is that even though they currently happen to contain the same data here, since they are mutable and not the same object, you could mutate one of them and then it would become apparent that they are different:
julia> a[1] = -1
-1
julia> a # different than before
3-element Array{Int64,1}:
-1
2
3
julia> b # still the same as before
3-element Array{Int64,1}:
1
2
3
Thus you can tell that a and b are different objects through mutation. The same logic doesn't apply to immutable objects: if they contain the same data, they are indistinguishable as long as they have the same value. Thus, immutable values are freed from the being tied to a specific location, which is one of the reasons that compilers are able to optimize uses of immutable values so effectively.
See Also:
Get rid of Julia's `WARNING: redifining constant` for strings that are not changed?

=== means that it's actually the same object, i.e. the variables point to the same spot in memory. == means that the objects have the same values. For example:
julia> A = rand(5,5) #Make an array
5x5 Array{Float64,2}:
0.349193 0.408216 0.703084 0.163128 0.815687
0.211441 0.0185634 0.378299 0.0734293 0.187445
0.667637 0.139323 0.286794 0.359962 0.229784
0.476224 0.49812 0.648244 0.831006 0.1787
0.960756 0.488886 0.195973 0.148958 0.200619
julia> B = A # This sets the pointer of B to the pointer of A
5x5 Array{Float64,2}:
0.349193 0.408216 0.703084 0.163128 0.815687
0.211441 0.0185634 0.378299 0.0734293 0.187445
0.667637 0.139323 0.286794 0.359962 0.229784
0.476224 0.49812 0.648244 0.831006 0.1787
0.960756 0.488886 0.195973 0.148958 0.200619
julia> B === A # Same spot in memory
true
julia> B[1,1]=2 #Change a value of B
2
julia> B
5x5 Array{Float64,2}:
2.0 0.408216 0.703084 0.163128 0.815687
0.211441 0.0185634 0.378299 0.0734293 0.187445
0.667637 0.139323 0.286794 0.359962 0.229784
0.476224 0.49812 0.648244 0.831006 0.1787
0.960756 0.488886 0.195973 0.148958 0.200619
julia> A #Also changes A since they point to the same spot
5x5 Array{Float64,2}:
2.0 0.408216 0.703084 0.163128 0.815687
0.211441 0.0185634 0.378299 0.0734293 0.187445
0.667637 0.139323 0.286794 0.359962 0.229784
0.476224 0.49812 0.648244 0.831006 0.1787
0.960756 0.488886 0.195973 0.148958 0.200619
julia> B = copy(A) #Now make B a copy of A, no longer the same pointer
5x5 Array{Float64,2}:
2.0 0.408216 0.703084 0.163128 0.815687
0.211441 0.0185634 0.378299 0.0734293 0.187445
0.667637 0.139323 0.286794 0.359962 0.229784
0.476224 0.49812 0.648244 0.831006 0.1787
0.960756 0.488886 0.195973 0.148958 0.200619
julia> B === A # Now this is false
false
julia> B == A # This is still true
true
julia> B[1,1] = 1 #Changing B
1
julia> B
5x5 Array{Float64,2}:
1.0 0.408216 0.703084 0.163128 0.815687
0.211441 0.0185634 0.378299 0.0734293 0.187445
0.667637 0.139323 0.286794 0.359962 0.229784
0.476224 0.49812 0.648244 0.831006 0.1787
0.960756 0.488886 0.195973 0.148958 0.200619
julia> A #Now does not change A
5x5 Array{Float64,2}:
2.0 0.408216 0.703084 0.163128 0.815687
0.211441 0.0185634 0.378299 0.0734293 0.187445
0.667637 0.139323 0.286794 0.359962 0.229784
0.476224 0.49812 0.648244 0.831006 0.1787
0.960756 0.488886 0.195973 0.148958 0.200619

Related

When will changing an object also change the copy of the object?

I am confused by the copy() function. As I understood, = is pointer style assignment and deepcopy() is creating a new independent copy. However, I found copy() is not very "stable". Please see the following two examples:
b = [[1,2,3], [4,5,6]];
a = copy(b);
b[1][1] = 10;
a
b
In the example above, a also changed after the assignment of b[1][1]
While in the second example:
b = [[1,2,3], [4,5,6]];
a = copy(b);
b[1] = [10,2,3];
a
b
The assignment of b[1] does not really change a. This is really confusing. Can anyone explain briefly what is happening? Thank you!
copy craetes a shallow copy and hence in your case references to the object are copied rather than real data.
This happens because your b is a Vector of Vectors so this is storied as:
b = [<reference to the first vector>, <reference to the second vector>]
When you create a shallow-copy only those references are being copied but no the underlying data. Hence the copied references still point to the same memory address.
In your second example you are replacing the actual reference. Since the object a holds a copy of the reference, replacing the entire reference in b is not seen in a.
This behavior will be seen everywhere where you have "objects inside objects" data structure. On the other hand if you have arrays of primitives (on references) you will get an actual copy such as:
julia> a = [1 3; 3 4]
2×2 Matrix{Int64}:
1 3
3 4
julia> b = copy(a); b[1,1] = 100
100
julia> a
2×2 Matrix{Int64}:
1 3
3 4
This is a more detailed explanation of the differences between the equal sign, copy and deepcopy functions, extracted from the chapter 2 of my "Julia Quick Syntax Reference: A Pocket Guide for Data Science Programming" (Apress 2019) book:
Memory and copy issues
In order to avoid copying large amount of data, Julia by default copies only the memory address of objects, unless the programmer explicitly request a so-called "deep" copy or the compiler "judges" an actual copy more efficient.
Use copy() or deepcopy() when you don't want that subsequent modifications to the copied object would apply to the original object.
In details:
Equal sign (a=b)
performs a name binding, i.e. binds (assigns) the entity (object) referenced by b also to the a identifier (the variable name)
it results that:
if b then rebinds to some other object, a remains referenced to the original object
if the object referenced by b mutates (i.e. it internally changes), so does (being the same object) those referenced by a
if b is immutable and small in memory, under some circumstances, the compiler would instead create a new object and bind it to a, but being immutable for the user this difference would not be noticeable
as for many high level languages, we don't need to explicitly worry about memory leaks. A Garbage Collector exists such that objects that are no longer accessible are automatically destroyed.
a = copy(b)
creates a new, "independent" copy of the object and bind it to a. This new object may however reference in turn other objects trough their memory address. In this case it is their memory address that is copied and not the referenced objects themselves.
it results that:
if these referenced objects (e.g. the individual elements of a
vector) are rebound to some other objects, the new object referenced
by a maintains the reference to the original objects
if these referenced objects mutate, so do (being the same objects) those referenced by the new object referenced by a
a = deepcopy(b)
everything is deep copied recursively
The following code snippet highlights the differences between these three methods of "copying" an object:
julia> a = [[[1,2],3],4]
2-element Array{Any,1}:
Any[[1, 2], 3]
4
julia> b = a
2-element Array{Any,1}:
Any[[1, 2], 3]
4
julia> c = copy(a)
2-element Array{Any,1}:
Any[[1, 2], 3]
4
julia> d = deepcopy(a)
2-element Array{Any,1}:
Any[[1, 2], 3]
4
# rebinds a[2] to an other objects.
# At the same time mutates object a:
julia> a[2] = 40
40
julia> b
2-element Array{Any,1}:
Any[[1, 2], 3]
40
julia> c
2-element Array{Any,1}:
Any[[1, 2], 3]
4
julia> d
2-element Array{Any,1}:
Any[[1, 2], 3]
4
# rebinds a[1][2] and at the same
# time mutates both a and a[1]:
julia> a[1][2] = 30
30
julia> b
2-element Array{Any,1}:
Any[[1, 2], 30]
40
julia> c
2-element Array{Any,1}:
Any[[1, 2], 30]
4
julia> d
2-element Array{Any,1}:
Any[[1, 2], 3]
4
# rebinds a[1][1][2] and at the same
# time mutates a, a[1] and a[1][1]:
julia> a[1][1][2] = 20
20
julia> b
2-element Array{Any,1}:
Any[[1, 20], 30]
40
julia> c
2-element Array{Any,1}:
Any[[1, 20], 30]
4
julia> d
2-element Array{Any,1}:
Any[[1, 2], 3]
4
# rebinds a:
julia> a = 5
5
julia> b
2-element Array{Any,1}:
Any[[1, 20], 30]
40
julia> c
2-element Array{Any,1}:
Any[[1, 20], 30]
4
julia> d
2-element Array{Any,1}:
Any[[1, 2], 3]
4
We can check if two objects have the same values with == and if two objects are actually the same with === (in the sense that immutable objects are checked at the bit level and mutable objects are checked for their memory address):
given a = [1, 2]; b = [1, 2]; a == b and a === a are true, but a === b is false;
given a = (1, 2); b = (1, 2); all a == b, a === a and a === b are true.

I want to find the number which act to 0 of Julia - I mean the nearest number of 0

Why does this happen in Julia?
My input is
A = []
for i = 17:21
t = 1/(10^(i))
push!(A, t)
end
return(A)
And the output was:
5-element Array{Any,1}:
1.0e-17
1.0e-18
-1.1838881245526248e-19
1.2876178137472069e-19
2.5800991659088344e-19
I observed that
A[3]>0
false
I want to find the number which act to 0 of Julia, but I found this and don’t understand.
The reason for this problem is when you have i = 19, note that then:
julia> 10^19
-8446744073709551616
and it is unrelated to floating point numbers, but is caused by Int64 overflow.
Here is the code that will work as you expect. Either use 10.0 instead of 10 as 10.0 is a Float64 value:
julia> A=[]
Any[]
julia> for i=17:21
t=1/(10.0^(i))
push!(A,t)
end
julia> A
5-element Array{Any,1}:
1.0e-17
1.0e-18
1.0e-19
1.0e-20
1.0e-21
or using high precision BigInt type that is created using big(10)
julia> A=[]
Any[]
julia> for i=17:21
t=1/(big(10)^(i))
push!(A,t)
end
julia> A
5-element Array{Any,1}:
9.999999999999999999999999999999999999999999999999999999999999999999999999999967e-18
9.999999999999999999999999999999999999999999999999999999999999999999999999999997e-19
9.999999999999999999999999999999999999999999999999999999999999999999999999999997e-20
1.000000000000000000000000000000000000000000000000000000000000000000000000000004e-20
9.999999999999999999999999999999999999999999999999999999999999999999999999999927e-22
You can find more discussion of this here https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Overflow-behavior.
For example notice that (which you might find surprising not knowing about the overflow):
julia> x = typemin(Int64)
-9223372036854775808
julia> x^2
0
julia> y = typemax(Int64)
9223372036854775807
julia> y^2
1
Finally to find smallest positive Float64 number use:
julia> nextfloat(0.0)
5.0e-324
or
julia> eps(0.0)
5.0e-324

Create a Vector of Integers and missing Values

What a hazzle...
I'm trying to create a vector of integers and missing values. This works fine:
b = [4, missing, missing, 3]
But I would actually like the vector to be longer with more missing values and therefore use repeat(), but this doesn't work
append!([1,2,3], repeat([missing], 1000))
and this also doesn't work
[1,2,3, repeat([missing], 1000)]
Please, help me out, here.
It is also worth to note that if you do not need to do an in-place operation with append! actually in such cases it is much easier to do vertical concatenation:
julia> [[1, 2, 3]; repeat([missing], 2); 4; 5] # note ; that denotes vcat
7-element Array{Union{Missing, Int64},1}:
1
2
3
missing
missing
4
5
julia> vcat([1,2,3], repeat([missing], 2), 4, 5) # this is the same but using a different syntax
7-element Array{Union{Missing, Int64},1}:
1
2
3
missing
missing
4
5
The benefit of vcat is that it automatically does the type promotion (as opposed to append! in which case you have to correctly specify the eltype of the target container before the operation).
Note that because vcat does automatic type promotion in corner cases you might get a different eltype of the result of the operation:
julia> x = [1, 2, 3]
3-element Array{Int64,1}:
1
2
3
julia> append!(x, [1.0, 2.0]) # conversion from Float64 to Int happens here
5-element Array{Int64,1}:
1
2
3
1
2
julia> [[1, 2, 3]; [1.0, 2.0]] # promotion of Int to Float64 happens in this case
5-element Array{Float64,1}:
1.0
2.0
3.0
1.0
2.0
See also https://docs.julialang.org/en/v1/manual/arrays/#man-array-literals.
This will work:
append!(Union{Int,Missing}[1,2,3], repeat([missing], 1000))
[1,2,3] creates just a Vector{Int} and since Julia is strongly typed the Vector{Int} cannot accept values of non-Int type. Hence, when defining a structure, that you plan to hold more data types within, you need to explicitly state it - here we have defined Vector{Union{Int,Missing}}.

How to check if two arrays are equal even if they contain NaN values in Julia?

I am trying to compare two arrays. It just so happens that the data for the arrays contains NaN values and when you compare arrays with NaN values, the results are not what I would have expected.
julia> a = [1,2, NaN]
3-element Array{Float64,1}:
1.0
2.0
NaN
julia> b = [1,2, NaN]
3-element Array{Float64,1}:
1.0
2.0
NaN
julia> a == b
false
Is there an elegant way to ignore these Nan's during comparison or replace them efficiently?
Use isequal:
Similar to ==, except for the treatment of floating point numbers and
of missing values. isequal treats all floating-point NaN values as
equal to each other, treats -0.0 as unequal to 0.0, and missing as
equal to missing. Always returns a Bool value.
julia> a = [1,2, NaN]
3-element Array{Float64,1}:
1.0
2.0
NaN
julia> b = [1,2, NaN]
3-element Array{Float64,1}:
1.0
2.0
NaN
julia> isequal(a, b)
true
You probably want to use isequal(a, b) (which also treats missing equal to missing, but -0.0 as unequal to 0.0).
You could filter out the NaN's on each array:
a = [1, 2, NaN]
filteredA = filter(x -> !isnan(x), a)
b = [1, 2, NaN]
filteredB = filter(x -> !isnan(x), b)
print(a == b)
print(filteredA == filteredB)
You could then create a function that does the filtering, and a custom compare function that uses the filtering function on both arguments and compare. Not sure if there is a more Julia-esque way.
Or create a new type. And create a Singleton nan which you use instead of NaN.
struct MyNaN end
nan = MyNaN()
and write a function for replacing NaNs by it.
with_nan(l) = map((x) -> if isnan(x) nan else x end, l)
Then you can wrap your lists using this function.
a = [1, 2, NaN]
b = [1, 2, NaN]
with_nan(a) == with_nan(b)
## true

How can I have a cell array in Julia?

Does a cell array exist in Julia? I want an array which its elements are vector or matrix.
for example A={1,[2 3],[5 6;7 8];"salam", [1 2 3 4],magic(5)}.
if you don't mind please help me.
An Array{Any} is equivalent to a MATLAB cell array. You can put anything in there. ["hi",:bye,10]. a = Array{Any}(undef,5) builds an uninitialized one, you can a[1] = ... to modify values, push!(a,...) to increase its length, etc.
A cell array is a data type with indexed data containers called cells, where each cell can contain any type of data
In Julia, arrays can contain values of homogeneous ([1, 2, 3]) or heterogeneous types ([1, 2.5, "3"]). Julia will try to promote the values to a common concrete type by default. If Julia can not promote the types contained, the resulting array would be of the abstract type Any.
Example ported from Access Data in Cell Array, using Julia 1.0.3:
julia> C = ["one" "two" "three"; # Matrix literal
1 2 3 ]
2×3 Array{Any,2}:
"one" "two" "three"
1 2 3
julia> upperLeft = C[1:2,1:2] # slicing
2×2 Array{Any,2}:
"one" "two"
1 2
julia> C[1,1:3] = ["first","second","third"] # slice assignment
3-element Array{String,1}:
"first"
"second"
"third"
julia> C
2×3 Array{Any,2}:
"first" "second" "third"
1 2 3
julia> numericCells = C[2,1:3]
3-element Array{Any,1}:
1
2
3
julia> last = C[2,3] # indexing
3
julia> C[2,3] = 300 # indexing assignment
300
julia> C
2×3 Array{Any,2}:
"first" "second" "third"
1 2 300
julia> r1c1, r2c1, r1c2, r2c2 = C[1:2,1:2] # destructuring
2×2 Array{Any,2}:
"first" "second"
1 2
julia> r1c1
"first"
julia> r2c1
1
julia> r1c2
"second"
julia> r2c2
2
julia> nums = C[2,:]
3-element Array{Any,1}:
1
2
300
Example ported from Combining Cell Arrays with Non-Cell Arrays:
Notice the use of the splice operator (...) to incorporate the values of the inner array into the outer one, and the usage of the Any[] syntax to prevent Julia from promoting the UInt8 to an Int.
julia> A = [100, Any[UInt8(200), 300]..., "Julia"]
4-element Array{Any,1}:
100
0xc8
300
"Julia"
The .( broadcast syntax, applies the function typeof element wise.
julia> typeof.(A)
4-element Array{DataType,1}:
Int64
UInt8
Int64
String
So in summary Julia doesn't need cell arrays, it uses parametric n-dimensional arrays instead. Also Julia only uses brackets for both slicing and indexing (A[n], A[i, j], A[a:b, x:y]), parenthesis after a variable symbol is reserved for function calls (foo(), foo(args...), foo(bar = "baz")).

Resources