Can a julia struct be defined with persistent requirements on field dimensions? - julia

If I define a new struct as
mutable struct myStruct
data::AbstractMatrix
labels::Vector{String}
end
and I want to throw an error if the length of labels is not equal to the number of columns of data, I know that I can write a constructor that enforces this condition like
myStruct(data, labels) = length(labels) != size(data)[2] ? error("Labels incorrect length") : new(data,labels)
However, once the struct is initialized, the labels field can be set to the incorrect length:
m = myStruct(randn(2,2), ["a", "b"])
m.labels = ["a"]
Is there a way to throw an error if the labels field is ever set to length not equal to the number of columns in data?

You could use StaticArrays.jl to fix the matrix and vector's sizes to begin with:
using StaticArrays
mutable struct MatVec{R, C, RC, VT, MT}
data::MMatrix{R, C, MT, RC} # RC should be R*C
labels::MVector{C, VT}
end
but there's the downside of having to compile for every concrete type with a unique permutation of type parameters R,C,MT,VT. StaticArrays also does not scale as well as normal Arrays.
If you don't restrict dimensions in the type parameters (with all those downsides) and want to throw an error at runtime, you got good and bad news.
The good news is you can control whatever mutation happens to your type. m.labels = v would call the method setproperty!(object::myStruct, name::Symbol, v), which you can define with all the safeguards you like.
The bad news is that you can't control mutation to the fields' types. push!(m.labels, 1) mutates in the push!(a::Vector{T}, item) method. The myStruct instance itself doesn't actually change; it still points to the same Vector. If you can't guarantee that you won't do something like x = m.labels; push!(x, "whoops") , then you really do need runtime checks, like iscorrect(m::myStruct) = length(m.labels) == size(m.data)[2]

A good option is to not access the fields of your struct directly. Instead, do it using a function. Eg:
mutable struct MyStruct
data::AbstractMatrix
labels::Vector{String}
end
function modify_labels(s::MyStruct, new_labels::Vector{String})
# do all checks and modifications
end
You should check chapter 8 from "Hands-On Design Patterns and Best Practices with Julia: Proven solutions to common problems in software design for Julia 1.x"

Related

Trying to pass an array into a function

I'm very new to Julia, and I'm trying to just pass an array of numbers into a function and count the number of zeros in it. I keep getting the error:
ERROR: UndefVarError: array not defined
I really don't understand what I am doing wrong, so I'm sorry if this seems like such an easy task that I can't do.
function number_of_zeros(lst::array[])
count = 0
for e in lst
if e == 0
count + 1
end
end
println(count)
end
lst = [0,1,2,3,0,4]
number_of_zeros(lst)
There are two issues with your function definition:
As noted in Shayan's answer and Dan's comment, the array type in Julia is called Array (capitalized) rather than array. To see:
julia> array
ERROR: UndefVarError: array not defined
julia> Array
Array
Empty square brackets are used to instantiate an array, and if preceded by a type, they specifically instantiate an array holding objects of that type:
julia> x = Int[]
Int64[]
julia> push!(x, 3); x
1-element Vector{Int64}:
3
julia> push!(x, "test"); x
ERROR: MethodError: Cannot `convert` an object of type String to an object of type Int64
Thus when you do Array[] you are actually instantiating an empty vector of Arrays:
julia> y = Array[]
Array[]
julia> push!(y, rand(2)); y
1-element Vector{Array}:
[0.10298669573927233, 0.04327245960128345]
Now it is important to note that there's a difference between a type and an object of a type, and if you want to restrict the types of input arguments to your functions, you want to do this by specifying the type that the function should accept, not an instance of this type. To see this, consider what would happen if you had fixed your array typo and passed an Array[] instead:
julia> f(x::Array[])
ERROR: TypeError: in typeassert, expected Type, got a value of type Vector{Array}
Here Julia complains that you have provided a value of the type Vector{Array} in the type annotation, when I should have provided a type.
More generally though, you should think about why you are adding any type restrictions to your functions. If you define a function without any input types, Julia will still compile a method instance specialised for the type of input provided when first call the function, and therefore generate (most of the time) machine code that is optimal with respect to the specific types passed.
That is, there is no difference between
number_of_zeros(lst::Vector{Int64})
and
number_of_zeros(lst)
in terms of runtime performance when the second definition is called with an argument of type Vector{Int64}. Some people still like type annotations as a form of error check, but you also need to consider that adding type annotations makes your methods less generic and will often restrict you from using them in combination with code other people have written. The most common example of this are Julia's excellent autodiff capabilities - they rely on running your code with dual numbers, which are a specific numerical type enabling automatic differentiation. If you strictly type your functions as suggested (Vector{Int}) you preclude your functions from being automatically differentiated in this way.
Finally just a note of caution about the Array type - Julia's array's can be multidimensional, which means that Array{Int} is not a concrete type:
julia> isconcretetype(Array{Int})
false
to make it concrete, the dimensionality of the array has to be provided:
julia> isconcretetype(Array{Int, 1})
true
First, it might be better to avoid variable names similar to function names. count is a built-in function of Julia. So if you want to use the count function in the number_of_zeros function, you will undoubtedly face a problem.
Second, consider returning the value instead of printing it (Although you didn't write the print function in the correct place).
Third, You can update the value by += not just a +!
Last but not least, Types in Julia are constantly introduced with the first capital letter! So we don't have an array standard type. It's an Array.
Here is the correction of your code.
function number_of_zeros(lst::Array{Int64})
counter = 0
for e in lst
if e == 0
counter += 1
end
end
return counter
end
lst = [0,1,2,3,0,4]
number_of_zeros(lst)
would result in 2.
Additional explanation
First, it might be better to avoid variable names similar to function names. count is a built-in function of Julia. So if you want to use the count function in the number_of_zeros function, you will undoubtedly face a problem.
Check this example:
function number_of_zeros(lst::Array{Int64})
count = 0
for e in lst
if e == 0
count += 1
end
end
return count, count(==(1), lst)
end
number_of_zeros(lst)
This code will lead to this error:
ERROR: MethodError: objects of type Int64 are not callable
Maybe you forgot to use an operator such as *, ^, %, / etc. ?
Stacktrace:
[1] number_of_zeros(lst::Vector{Int64})
# Main \t.jl:10
[2] top-level scope
# \t.jl:16
Because I overwrote the count variable on the count function! It's possible to avoid such problems by calling the function from its module:
function number_of_zeros(lst::Array{Int64})
count = 0
for e in lst
if e == 0
count += 1
end
end
return count, Base.count(==(1), lst)
The point is I used Base.count, then the compiler knows which count I mean by Base.count.

Julia: Defining promote_rule with multiple parametric types

Say I want to define a promote_rule() for a type that has multiple parametric types, for example for type MyType:
abstract type State end
struct Open<:State end
struct Closed<:State end
struct MyType{T,S<:State}
x::T
state::S
end
Is there a way to define a promote_rule() which only promotes the first type and not the second, for example:
myFloat = MyType(1.0, Open()) # MyType{Float64, Open}
myInt = MyType(2, Closed()) # MyType{Int64, Closed}
promote(myFloat, myInt)
# (MyType{Float64, Open}, MyType{Float64, Closed})
By definition, the result of a promotion is one common type. So while you can just recursively promote the Ts, you have to resort to a common supertype for the Ss if you want to keep them as is. Simply using State would be a valid choice, but a Union leads to a bit more fine-grained results:
julia> Base.promote_rule(::Type{MyType{T1, S1}}, ::Type{MyType{T2, S2}}) where {T1, T2, S1, S2} = MyType{promote_type(T1, T2), <:Union{S1, S2}}
julia> promote_type(MyType{Int, Closed}, MyType{Float64, Closed})
MyType{Float64,#s12} where #s12<:Closed
julia> promote_type(MyType{Int, Closed}, MyType{Float64, Open})
MyType{Float64,#s12} where #s12<:Union{Closed, Open}
You still have to define the respective convert methods for promote to work, of course; specifically, one ignoring the state type:
julia> Base.convert(::Type{<:MyType{T}}, m::MyType) where {T} = MyType(convert(T, m.x), m.state)
julia> promote(myFloat, myInt)
(MyType{Float64,Open}(1.0, Open()), MyType{Float64,Closed}(2.0, Closed()))
But be sure to test all kinds of combinations really well. Promotion and conversion is really fiddly and hard to get right the first time, in my experience.

Unicity of complex key dictionaries in Go but not in Julia?

In GO when I use a struct as a key for a map, there is an unicity of the keys.
For example, the following code produce a map with only one key : map[{x 1}:1]
package main
import (
"fmt"
)
type MyT struct {
A string
B int
}
func main() {
dic := make(map[MyT]int)
for i := 1; i <= 10; i++ {
dic[MyT{"x", 1}] = 1
}
fmt.Println(dic)
}
// result : map[{x 1}:1]
I Tried to do the same in Julia and I had a strange surprise :
This Julia code, similar to the GO one, produces a dictionary whith 10 keys !
type MyT
A::String
B::Int64
end
dic = Dict{MyT, Int64}()
for i in 1:10
dic[MyT("x", 1)] = 1
end
println(dic)
# Dict(MyT("x",1)=>1,MyT("x",1)=>1,MyT("x",1)=>1,MyT("x",1)=>1,MyT("x",1)=>1,MyT("x",1)=>1,MyT("x",1)=>1,MyT("x",1)=>1,MyT("x",1)=>1,MyT("x",1)=>1)
println(keys(dic))
# MyT[MyT("x",1),MyT("x",1),MyT("x",1),MyT("x",1),MyT("x",1),MyT("x",1),MyT("x",1),MyT("x",1),MyT("x",1),MyT("x",1)]
So what I did wrong ?
Thank you #DanGetz for the solution ! :
immutable MyT # or struct MyT with julia > 0.6
A::String
B::Int64
end
dic = Dict{MyT, Int64}()
for i in 1:10
dic[MyT("x", 1)] = 1
end
println(dic) # Dict(MyT("x", 1)=>1)
println(keys(dic)) # MyT[MyT("x", 1)]
Mutable values hash by identity in Julia, since without additional knowledge about what a type represents, one cannot know if two values with the same structure mean the same thing or not. Hashing mutable objects by value can be especially problematic if you mutate a value after using it as a dictionary key – this is not a problem when hashing by identity since the identity of a mutable object remains the same even when it is modified. On the other hand, it's perfectly safe to hash immutable objects by value – since they cannot be mutated, and accordingly that is the default behavior for immutable types. In the given example, if you make MyT immutable you will automatically get the behavior you're expecting:
immutable MyT # `struct MyT` in 0.6
A::String
B::Int64
end
dic = Dict{MyT, Int64}()
for i in 1:10
dic[MyT("x", 1)] = 1
end
julia> dic
Dict{MyT,Int64} with 1 entry:
MyT("x", 1) => 1
julia> keys(dic)
Base.KeyIterator for a Dict{MyT,Int64} with 1 entry. Keys:
MyT("x", 1)
For a type holding a String and an Int value that you want to use as a hash key, immutability is probably the right choice. In fact, immutability is the right choice more often than not, which is why the keywords introducing structural types has been change in 0.6 to struct for immutable structures and mutable struct for mutable structures – on the principle that people will reach for the shorter, simpler name first, so that should be the better default choice – i.e. immutability.
As #ntdef has written, you can change the hashing behavior of your type by overloading the Base.hash function. However, his definition is incorrect in a few respects (which is probably our fault for failing to document this more prominently and thoroughly):
The method signature of Base.hash that you want to overload is Base.hash(::T, ::UInt).
The Base.hash(::T, ::UInt) method must return a UInt value.
If you are overloading Base.hash, you should also overload Base.== to match.
So this would be a correct way to make your mutable type hash by value (new Julia session required to redefine MyT):
type MyT # `mutable struct MyT` in 0.6
A::String
B::Int64
end
import Base: ==, hash
==(x::MyT, y::MyT) = x.A == y.A && x.B == y.B
hash(x::MyT, h::UInt) = hash((MyT, x.A, x.B), h)
dic = Dict{MyT, Int64}()
for i in 1:10
dic[MyT("x", 1)] = 1
end
julia> dic
Dict{MyT,Int64} with 1 entry:
MyT("x", 1) => 1
julia> keys(dic)
Base.KeyIterator for a Dict{MyT,Int64} with 1 entry. Keys:
MyT("x", 1)
This is kind of annoying to do manually, but the AutoHashEquals package automates this, taking the tedium out of it. All you need to do is prefix the type definition with the #auto_hash_equals macro:
using AutoHashEquals
#auto_hash_equals type MyT # `#auto_hash_equals mutable struct MyT` in 0.6
A::String
B::Int64
end
Bottom line:
If you have a type that should have value-based equality and hashing, seriously consider making it immutable.
If your type really has to be mutable, then think hard about whether it's a good idea to use as a hash key.
If you really need to use a mutable type as a hash key with value-based equality and hashing semantics, use the AutoHashEquals package.
You did not do anything wrong. The difference between the languages is in how they choose to hash a struct when using it as a key in the map/Dict. In go, structs are hashed by their values rather than their pointer addresses. This allows programmers to more easily implement multidimensional maps by using structs rather than maps of maps. See this blog post for more info.
Reproducing Julia's Behavior in Go
To reproduce Julia's behavior in go, redefine the map to use a pointer to MyT as the key type:
func main() {
dic := make(map[MyT]int)
pdic := make(map[*MyT]int)
for i := 1; i <= 10; i++ {
t := MyT{"x", 1}
dic[t] = 1
pdic[&t] = 1
}
fmt.Println(dic)
fmt.Println(pdic)
}
Here, pdic uses the pointer to a MyT struct as its key type. Because each MyT created in the loop has a different memory address, the key will be different. This produces the output:
map[{x 1}:1]
map[0x1040a140:1 0x1040a150:1 0x1040a160:1 0x1040a180:1 0x1040a1b0:1 0x1040a1c0:1 0x1040a130:1 0x1040a170:1 0x1040a190:1 0x1040a1a0:1]
You can play with this on play.golang.org. Unlike in Julia (see below), the way the map type is implemented go means you cannot specify a custom hashing function for a user-defined struct.
Reproducing Go's Behavior in Julia
Julia uses the function Base.hash(::K, ::UInt) to hash keys for its Dict{K,V} type. While it doesn't explicitly say so in the documentation, the default hashing algorithm uses the output from object_id, as you can see in the source code. To reproduce go's behavior in Julia, define a new hash function for your type that hashes the values of the struct:
Base.hash(t::MyT, h::Uint) = Base.hash((t.A, t.B), h)
Note that you should also define the == operator in the same way to guarantee hash(x)==hash(y) implies isequal(x,y), as mentioned in the documentation.
However, the easiest way to get Julia to act like go in your example is to redefine MyT as immutable. As an immutable type, Julia will hash MyT by its value rather than its object_id. As an example:
immutable MyT
A::String
B::Int64
end
dic = Dict{MyT, Int64}()
for i in 1:10
dic[MyT("x", 1)] = 1
end
dic[MyT("y", 2)] = 2
println(dic) # prints "Dict(MyT("y",2)=>2,MyT("x",1)=>1)"
Edit: Please refer to #StefanKarpinski's answer. The Base.hash function must return a UInt for it to be a valid hash, so my example won't work. Also there's some funkiness regarding user defined types which involves recursion.
The reason you get 10 different keys is due to the fact that Julia uses the hash function when determining the key to a dict. In this case, I'm guessing that it's using the address of the object in memory as the key for the dictionary. If you'd like to explicitly make (A,B) the unique key, you'll need to override the hash function for your particular type, with something like this:
Base.hash(x::MyT) = (x.A, x.B)
That will replicate the Go behavior, with only one item in the Dict.
Here's the documentation to the hash function.
Hope that helps!

Julia: non-destructively update immutable type variable

Let's say there is a type
immutable Foo
x :: Int64
y :: Float64
end
and there is a variable foo = Foo(1,2.0). I want to construct a new variable bar using foo as a prototype with field y = 3.0 (or, alternatively non-destructively update foo producing a new Foo object). In ML languages (Haskell, OCaml, F#) and a few others (e.g. Clojure) there is an idiom that in pseudo-code would look like
bar = {foo with y = 3.0}
Is there something like this in Julia?
This is tricky. In Clojure this would work with a data structure, a dynamically typed immutable map, so we simply call the appropriate method to add/change a key. But when working with types we'll have to do some reflection to generate an appropriate new constructor for the type. Moreover, unlike Haskell or the various MLs, Julia isn't statically typed, so one does not simply look at an expression like {foo with y = 1} and work out what code should be generated to implement it.
Actually, we can build a Clojure-esque solution to this; since Julia provides enough reflection and dynamism that we can treat the type as a sort of immutable map. We can use fieldnames to get the list of "keys" in order (like [:x, :y]) and we can then use getfield(foo, :x) to get field values dynamically:
immutable Foo
x
y
z
end
x = Foo(1,2,3)
with_slow(x, p) =
typeof(x)(((f == p.first ? p.second : getfield(x, f)) for f in fieldnames(x))...)
with_slow(x, ps...) = reduce(with_slow, x, ps)
with_slow(x, :y => 4, :z => 6) == Foo(1,4,6)
However, there's a reason this is called with_slow. Because of the reflection it's going to be nowhere near as fast as a handwritten function like withy(foo::Foo, y) = Foo(foo.x, y, foo.z). If Foo is parametised (e.g. Foo{T} with y::T) then Julia will be able to infer that withy(foo, 1.) returns a Foo{Float64}, but won't be able to infer with_slow at all. As we know, this kills the crab performance.
The only way to make this as fast as ML and co is to generate code effectively equivalent to the handwritten version. As it happens, we can pull off that version as well!
# Fields
type Field{K} end
Base.convert{K}(::Type{Symbol}, ::Field{K}) = K
Base.convert(::Type{Field}, s::Symbol) = Field{s}()
macro f_str(s)
:(Field{$(Expr(:quote, symbol(s)))}())
end
typealias FieldPair{F<:Field, T} Pair{F, T}
# Immutable `with`
for nargs = 1:5
args = [symbol("p$i") for i = 1:nargs]
#eval with(x, $([:($p::FieldPair) for p = args]...), p::FieldPair) =
with(with(x, $(args...)), p)
end
#generated function with{F, T}(x, p::Pair{Field{F}, T})
:($(x.name.primary)($([name == F ? :(p.second) : :(x.$name)
for name in fieldnames(x)]...)))
end
The first section is a hack to produce a symbol-like object, f"foo", whose value is known within the type system. The generated function is like a macro that takes types as opposed to expressions; because it has access to Foo and the field names it can generate essentially the hand-optimised version of this code. You can also check that Julia is able to properly infer the output type, if you parametrise Foo:
#code_typed with(x, f"y" => 4., f"z" => "hello") # => ...::Foo{Int,Float64,String}
(The for nargs line is essentially a manually-unrolled reduce which enables this.)
Finally, lest I be accused of giving slightly crazy advice, I want to warn that this isn't all that idiomatic in Julia. While I can't give very specific advice without knowing your use case, it's generally best to have fields with a manageable (small) set of fields and a small set of functions which do the basic manipulation of those fields; you can build on those functions to create the final public API. If what you want is really an immutable dict, you're much better off just using a specialised data structure for that.
There is also setindex (without the ! at the end) implemented in the FixedSizeArrays.jl package, which does this in an efficient way.

Parametric Type Creation

I'm struggling to understand parametric type creation in julia. I know that I can create a type with the following:
type EconData
values
dates::Array{Date}
colnames::Array{ASCIIString}
function EconData(values, dates, colnames)
if size(values, 1) != size(dates, 1)
error("Date/data dimension mismatch.")
end
if size(values, 2) != size(colnames, 2)
error("Name/data dimension mismatch.")
end
new(values, dates, colnames)
end
end
ed1 = EconData([1;2;3], [Date(2014,1), Date(2014,2), Date(2014,3)], ["series"])
However, I can't figure out how to specify how values will be typed. It seems reasonable to me to do something like
type EconData{T}
values::Array{T}
...
function EconData(values::Array{T}, dates, colnames)
...
However, this (and similar attempts) simply produce and error:
ERROR: `EconData{T}` has no method matching EconData{T}(::Array{Int64,1}, ::Array{Date,1}, ::Array{ASCIIString,2})
How can I specify the type of values?
The answer is that things get funky with parametric types and inner constructors - in fact, I think its probably the most confusing thing in Julia. The immediate solution is to provide a suitable outer constructor:
using Dates
type EconData{T}
values::Vector{T}
dates::Array{Date}
colnames::Array{ASCIIString}
function EconData(values, dates, colnames)
if size(values, 1) != size(dates, 1)
error("Date/data dimension mismatch.")
end
if size(values, 2) != size(colnames, 2)
error("Name/data dimension mismatch.")
end
new(values, dates, colnames)
end
end
EconData{T}(v::Vector{T},d,n) = EconData{T}(v,d,n)
ed1 = EconData([1,2,3], [Date(2014,1), Date(2014,2), Date(2014,3)], ["series"])
What also would have worked is to have done
ed1 = EconData{Int}([1,2,3], [Date(2014,1), Date(2014,2), Date(2014,3)], ["series"])
My explanation might be wrong, but I think the probably is that there is no parametric type constructor method made by default, so you have to call the constructor for a specific instantiation of the type (my second version) or add the outer constructor yourself (first version).
Some other comments: you should be explicit about dimensions. i.e. if all your fields are vectors (1D), use Vector{T} or Array{T,1}, and if their are matrices (2D) use Matrix{T} or Array{T,2}. Make it parametric on the dimension if you need to. If you don't, slow code could be generated because functions using this type aren't really sure about the actual data structure until runtime, so will have lots of checks.

Resources