The Idiomatic Way To Do OOP, Types and Methods in Julia - julia

As I'm learning Julia, I am wondering how to properly do things I might have done in Python, Java or C++ before. For example, previously I might have used an abstract base class (or interface) to define a family of models through classes. Each class might then have a method like calculate. So to call it I might have model.calculate(), where the model is an object from one of the inheriting classes.
I get that Julia uses multiple dispatch to overload functions with different signatures such as calculate(model). The question I have is how to create different models. Do I use the type system for that and create different types like:
abstract type Model end
type BlackScholes <: Model end
type Heston <: Model end
where BlackScholes and Heston are different types of model? If so, then I can overload different calculate methods:
function calculate(model::BlackScholes)
# code
end
function calculate(model::Heston)
# code
end
But I'm not sure if this is a proper and idiomatic use of types in Julia. I will greatly appreciate your guidance!

This is a hard question to answer. Julia offers a wide range of tools to solve any given problem, and it would be hard for even a core developer of the language to assert that one particular approach is "right" or even "idiomatic".
For example, in the realm of simulating and solving stochastic differential equations, you could look at the approach taken by Chris Rackauckas (and many others) in the suite of packages under the JuliaDiffEq umbrella. However, many of these people are extremely experienced Julia coders, and what they do may be somewhat out of reach for less experienced Julia coders who just want to model something in a manner that is reasonably sensible and attainable for a mere mortal.
It is is possible that the only "right" answer to this question is to direct users to the Performance Tips section of the docs, and then assert that as long as you aren't violating any of the recommendations there, then what you are doing is probably okay.
I think the best way I can answer this question from my own personal experience is to provide an example of how I (a mere mortal) would approach the problem of simulating different Ito processes. It is actually not too far off what you have put in the question, although with one additional layer. To be clear, I make no claim that this is the "right" way to do things, merely that it is one approach that utilizes multiple dispatch and Julia's type system in a reasonably sensible fashion.
I start off with an abstract type, for nesting specific subtypes that represent specific models.
abstract type ItoProcess ; end
Now I define some specific model subtypes, e.g.
struct GeometricBrownianMotion <: ItoProcess
mu::Float64
sigma::Float64
end
struct Heston <: ItoProcess
mu::Float64
kappa::Float64
theta::Float64
xi::Float64
end
Note, in this case I don't need to add constructors that convert arguments to Float64, since Julia does this automatically, e.g. GeometricBrownianMotion(1, 2.0) will work out-of-the-box, as Julia will automatically convert 1 to 1.0 when constructing the type.
However, I might want to add some constructors for common parameterizations, e.g.
GeometricBrownianMotion() = GeometricBrownianMotion(0.0, 1.0)
I might also want some functions that return useful information about my models, e.g.
number_parameter(model::GeometricBrownianMotion) = 2
number_parameter(model::Heston) = 4
In fact, given how I've defined the models above, I could actually be a bit sneaky and define a method that works for all subtypes:
number_parameter(model::T) where {T<:ItoProcess} = length(fieldnames(typeof(model)))
Now I want to add some code that allows me to simulate my models:
function simulate(model::T, numobs::Int, stval) where {T<:ItoProcess}
# code here that is common to all subtypes of ItoProcess
simulate_inner(model, somethingelse)
# maybe more code that is common to all subtypes of ItoProcess
end
function simulate_inner(model::GeometricBrownianMotion, somethingelse)
# code here that is specific to GeometricBrownianMotion
end
function simulate_inner(model::Heston, somethingelse)
# code here that is specific to Heston
end
Note that I have used the abstract type to allow me to group all code that is common to all subtypes of ItoProcess in the simulate function. I then use multiple dispatch and simulate_inner to run any code that needs to be specific to a particular subtype of ItoProcess. For the aforementioned reasons, I hesitate to use the phrase "idiomatic", but let me instead say that the above is quite a common pattern in typical Julia code.
The one thing to be careful of in the above code is to ensure that the output type of the simulate function is type-stable, that is, the output type can be uniquely determined by the input types. Type stability is usually an important factor in ensuring performant Julia code. An easy way in this case to ensure type-stability is to always return Matrix{Float64} (if the output type is fixed for all subtypes of ItoProcess then obviously it is uniquely determined). I examine a case where the output type depends on input types below for my estimate example. Anyway, for simulate I might always return Matrix{Float64} since for GeometricBrownianMotion I only need one column, but for Heston I will need two (the first for price of the asset, the second for the volatility process).
In fact, depending on how the code is used, type-stability is not always necessary for performant code (see eg using function barriers to prevent type-instability from flowing through to other parts of your program), but it is a good habit to be in (for Julia code).
I might also want routines to estimate these models. Again, I can follow the same approach (but with a small twist):
function estimate(modeltype::Type{T}, data)::T where {T<:ItoProcess}
# again, code common to all subtypes of ItoProcess
estimate_inner(modeltype, data)
# more common code
return T(some stuff generated from function that can be used to construct T)
end
function estimate_inner(modeltype::Type{GeometricBrownianMotion}, data)
# code specific to GeometricBrownianMotion
end
function estimate_inner(modeltype::Type{Heston}, data)
# code specific to Heston
end
There are a few differences from the simulate case. Instead of inputting an instance of GeometricBrownianMotion or Heston, I instead input the type itself. This is because I don't actually need an instance of the type with defined values for the fields. In fact, the values of those fields is the very thing I am attempting to estimate! But I still want to use multiple dispatch, hence the ::Type{T} construct. Note also I have specified an output type for estimate. This output type is dependent on the ::Type{T} input, and so the function is type-stable (output type can be uniquely determined by input types). But common with the simulate case, I have structured the code so that code that is common to all subtypes of ItoProcess only needs to be written once, and code that is specific to the subtypes is separted out.
This answer is turning into an essay, so I should tie it off here. Hopefully this is useful to the OP, as well as anyone else getting into Julia. I just want to finish by emphasizing that what I have done above is only one approach, there are others that will be just as performant, but I have personally found the above to be useful from a structural perspective, as well as reasonably common across the Julia ecosystem.

Related

Can I define variable in Julia just like in Fortran

I am new to Julia. Have a quick question.
In Fortran, we can do
implicit none
Therefore whenever we use a variable, we need to define it first. Otherwise it will give error.
In Julia, I want to do the same time thing. I want to define the type of each variable first, like Float64, Int64, etc. So that I wish that Julia no longer need to automatically do type conversion, which may slow down the code.
Because I know if Julia code looks like Fortran it usually runs fast.
So, in short, is there a way in Julia such that, I can force it to only use variables whose types has been defined, and otherwise give me an error message? I just want to be strict.
I just wanted to define all the variables first, then use them. Just like Fortran does.
Thanks!
[Edit]
As suggested by the experts who answers the questions, thank you very much! I know that perhaps in Julia there is no need to manually define each variables one by one. Because for one thing, if I do that, the code will become just like Fortran, and it can be very very long especially if I have many variables.
But if I do not define the type of the each variables, is Julia smart enough to know the type? Will it do some conversions which may slow down the code?
Also, even if there is no need to define variables one by one, is there in some situations we may have to manually define the type manually?
No, this is not possible* as such. You can, however, add type annotations at any point in your code, which will raise an error if the variable is not of the expected type:
julia> i = 1
1
julia> i::Int64
1
julia> i = 1.0
1.0
julia> i::Int64
ERROR: TypeError: in typeassert, expected Int64, got a value of type Float64
Stacktrace:
[1] top-level scope
# REPL[4]:1
julia> i=0x01::UInt8
0x01
*Julia is a dynamically-typed language, though there are packages such as https://github.com/aviatesk/JET.jl for static type-checking (i.e., no type annotations required) and https://github.com/JuliaDebug/Cthulhu.jl for exploring Julia's type inference system.
Strict type declaration is not how you achieve performance in Julia.
You achieve the performance by type stability.
Moreover declaring the types is usually not recommended because it decreases interoperability.
Consider the following function:
f(x::Int) = rand() < 4+x ? 1 : "0"
The types of everything seem to be known. However, this is a terrible (from performance point of view) function because the type of output can not be calcuated by looking types of input. And this is exactly how you write the performant code with regard to types.
So how to check your code? There is a special macro #code_warntype to detect such cases so you can correct your code:
Another type related issue are the containers that should not be specified by abstract elements. So you never want to have neither Vector{Any} nor Vector{Real} - rather than that you want Vector{Int} or Vector{Float64}.
See also https://docs.julialang.org/en/v1/manual/performance-tips/ for further discussion.

What's a cterm?

The Isabelle implementation manual says:
Types ctyp and cterm represent certified types and terms, respectively. These are abstract datatypes that guarantee that its values have passed the full well-formedness (and well-typedness) checks, relative to the declarations of type constructors, constants etc. in the background theory.
My understanding is: when I write cterm t, Isabelle checks that the term is well-built according to the theory where it lives in.
The abstract types ctyp and cterm are part of the same inference kernel that is mainly responsible for thm. Thus syntactic operations on ctyp and cterm are located in the Thm module, even though theorems
are not yet involved at that stage.
My understanding is: if I want to modify a cterm at the ML level I will use operations of the Thm module (where can I find that module?)
Furthermore, it looks like cterm t is an entity that converts a term of the theory level to a term of the ML-level. So I inspect the code of cterm in the declaration:
ML_val ‹
some_simproc #{context} #{cterm "some_term"}
›
and get to ml_antiquotations.ML:
ML_Antiquotation.value \<^binding>‹cterm› (Args.term >> (fn t =>
"Thm.cterm_of ML_context " ^ ML_Syntax.atomic (ML_Syntax.print_term t))) #>
This line of code is unreadable to me with my current knowledge.
I wonder if someone could give a better low-level explanation of cterm. What is the meaning of the code below? Where are located the checks that cterm performs on theory terms? Where are located the manipulations that we can do on cterms (the module Thm above)?
The ‘c’ stands for ‘certified’ (Or ‘checked’? Not sure). A cterm is basically a term that has been undergone checking. The #{cterm …} antiquotation allows you to simply write down terms and directly get a cterm in various contexts (in this case probably the context of ML, i.e. you directly get a cterm value with the intended content). The same works for regular terms, i.e. #{term …}.
You can manipulate cterms directly using the functions from the Thm structure (which, incidentally, can be found in ~~/src/Pure/thm.ML; most of these basic ML files are in the Pure directory). However, in my experience, it is usually easier to just convert the cterm to a regular term (using Thm.term_of – unlike Thm.cterm_of, this is a very cheap operation) and then work with the term instead. Directly manipulating cterms only really makes sense if you need another cterm in the end, because re-certifying terms is fairly expensive (still, unless your code is called very often, it probably isn't really a performance problem).
In most cases, I would say the workflow is like this: If you get a cterm as an input, you turn it into a regular term. This you can easily inspect/take apart/whatever. At some point, you might have to turn it into a cterm again (e.g. because you want to instantiate some theorem with it or use it in some other way that involves the kernel) and then you just use Thm.cterm_of to do that.
I don't know exactly what the #{cterm …} antiquotation does internally, but I would imagine that at the end of the day, it just parses its parameter as an Isabelle term and then certifies it with the current context (i.e. #{context}) using something like Thm.cterm_of.
To gather my findings with cterms, I post an answer.
This is how a cterm looks like in Pure:
abstype cterm =
Cterm of {cert: Context.certificate,
t: term, T: typ,
maxidx: int,
sorts: sort Ord_List.T}
(To be continued)

Guidelines for choosing Array of Struct or Multiple Arrays

This question has been asked for other languages. I would like to ask this in relation to Julia.
What are the general guidelines for choosing between an array of struct e.g.
struct vertex
x::Real
y::Real
gradient_x::Real
gradient_y::Real
end
myarray::Array{Vertex}
and multiple arrays.
xpositions::Array{<:Real}
ypositions::Array{<:Real}
gradient_x::Array{<:Real}
gradient_y::Array{<:Real}
Are there any performance considerations? Or is it just a style/readability issue.
Your struct as it currently stands will perform poorly. From the Performance Tips you should always:
Avoid fields with abstract type
Similarly, you should always prefer Vector{<:Real} to Vector{Real}.
The Julian way to approach this is to parameterize your struct as follows:
struct Vertex{T<:Real}
x::T
y::T
gradient_x::T
gradient_y::T
end
Given the above, the two approaches discussed in the question will now have roughly similar performance. In practice, it really depends on what kind of operations you want to perform. For example, if you frequently need a vector of just the x fields, then having multiple arrays will probably be a better approach, since any time you need a vector of x fields from a Vector{Vertex} you will need to loop over the structs to allocate it:
xvec = [ v.x for v in vertexvec ]
On the other hand, if your application lends itself to functions called over all four fields of the struct, then your code will be significantly cleaner if you use a Vector{Vertex} and will be just as performant as looping over 4 arrays. Broadcasting in particular will make for nice clean code here. For example, if you have some function:
f(x, y, gradient_x, gradient_y)
then just add the method:
f(v::Vertex) = f(v.x, v.y, v.gradient_x, v.gradient_y)
Now if you want to apply it to vv::Vector{Vertex}, you can just use:
f.(vv)
Remember, user-defined types in Julia are just as performant as "in-built" types. In fact, many types that you might think of as in-built are just defined in Julia itself, much as you are doing here.
So the short summary is: both approaches are performant, so use whichever makes more sense in the context of your application.

How can I dispatch on traits relating two types, where the second type that co-satisfies the trait is uniquely determined by the first?

Say I have a Julia trait that relates to two types: one type is a sort of "base" type that may satisfy a sort of partial trait, the other is an associated type that is uniquely determined by the base type. (That is, the relation from BaseType -> AssociatedType is a function.) Together, these types satisfy a composite trait that is the one of interest to me.
For example:
using Traits
#traitdef IsProduct{X} begin
isnew(X) -> Bool
coolness(X) -> Float64
end
#traitdef IsProductWithMeasurement{X,M} begin
#constraints begin
istrait(IsProduct{X})
end
measurements(X) -> M
#Maybe some other stuff that dispatches on (X,M), e.g.
#fits_in(X,M) -> Bool
#how_many_fit_in(X,M) -> Int64
#But I don't want to implement these now
end
Now here are a couple of example types. Please ignore the particulars of the examples; they are just meant as MWEs and there is nothing relevant in the details:
type Rope
color::ASCIIString
age_in_years::Float64
strength::Float64
length::Float64
end
type Paper
color::ASCIIString
age_in_years::Int64
content::ASCIIString
width::Float64
height::Float64
end
function isnew(x::Rope)
(x.age_in_years < 10.0)::Bool
end
function coolness(x::Rope)
if x.color=="Orange"
return 2.0::Float64
elseif x.color!="Taupe"
return 1.0::Float64
else
return 0.0::Float64
end
end
function isnew(x::Paper)
(x.age_in_years < 1.0)::Bool
end
function coolness(x::Paper)
(x.content=="StackOverflow Answers" ? 1000.0 : 0.0)::Float64
end
Since I've defined these functions, I can do
#assert istrait(IsProduct{Rope})
#assert istrait(IsProduct{Paper})
And now if I define
function measurements(x::Rope)
(x.length)::Float64
end
function measurements(x::Paper)
(x.height,x.width)::Tuple{Float64,Float64}
end
Then I can do
#assert istrait(IsProductWithMeasurement{Rope,Float64})
#assert istrait(IsProductWithMeasurement{Paper,Tuple{Float64,Float64}})
So far so good; these run without error. Now, what I want to do is write a function like the following:
#traitfn function get_measurements{X,M;IsProductWithMeasurement{X,M}}(similar_items::Array{X,1})
all_measurements = Array{M,1}(length(similar_items))
for i in eachindex(similar_items)
all_measurements[i] = measurements(similar_items[i])::M
end
all_measurements::Array{M,1}
end
Generically, this function is meant to be an example of "I want to use the fact that I, as the programmer, know that BaseType is always associated to AssociatedType to help the compiler with type inference. I know that whenever I do a certain task [in this case, get_measurements, but generically this could work in a bunch of cases] then I want the compiler to infer the output type of that function in a consistently patterned way."
That is, e.g.
do_something_that_makes_arrays_of_assoc_type(x::BaseType)
will always spit out Array{AssociatedType}, and
do_something_that_makes_tuples(x::BaseType)
will always spit out Tuple{Int64,BaseType,AssociatedType}.
AND, one such relationship holds for all pairs of <BaseType,AssociatedType>; e.g. if BatmanType is the base type to which RobinType is associated, and SupermanType is the base type to which LexLutherType is always associated, then
do_something_that_makes_tuple(x::BatManType)
will always output Tuple{Int64,BatmanType,RobinType}, and
do_something_that_makes_tuple(x::SuperManType)
will always output Tuple{Int64,SupermanType,LexLutherType}.
So, I understand this relationship, and I want the compiler to understand it for the sake of speed.
Now, back to the function example. If this makes sense, you will have realized that while the function definition I gave as an example is 'correct' in the sense that it satisfies this relationship and does compile, it is un-callable because the compiler doesn't understand the relationship between X and M, even though I do. In particular, since M doesn't appear in the method signature, there is no way for Julia to dispatch on the function.
So far, the only thing I have thought to do to solve this problem is to create a sort of workaround where I "compute" the associated type on the fly, and I can still use method dispatch to do this computation. Consider:
function get_measurement_type_of_product(x::Rope)
Float64
end
function get_measurement_type_of_product(x::Paper)
Tuple{Float64,Float64}
end
#traitfn function get_measurements{X;IsProduct{X}}(similar_items::Array{X,1})
M = get_measurement_type_of_product(similar_items[1]::X)
all_measurements = Array{M,1}(length(similar_items))
for i in eachindex(similar_items)
all_measurements[i] = measurements(similar_items[i])::M
end
all_measurements::Array{M,1}
end
Then indeed this compiles and is callable:
julia> get_measurements(Array{Rope,1}([Rope("blue",1.0,1.0,1.0),Rope("red",2.0,2.0,2.0)]))
2-element Array{Float64,1}:
1.0
2.0
But this is not ideal, because (a) I have to redefine this map each time, even though I feel as though I already told the compiler about the relationship between X and M by making them satisfy the trait, and (b) as far as I can guess--maybe this is wrong; I don't have direct evidence for this--the compiler won't necessarily be able to optimize as well as I want, since the relationship between X and M is "hidden" inside the return value of the function call.
One last thought: if I had the ability, what I would ideally do is something like this:
#traitdef IsProduct{X} begin
isnew(X) -> Bool
coolness(X) -> Float64
∃ ! M s.t. measurements(X) -> M
end
and then have some way of referring to the type that uniquely witnesses the existence relationship, so e.g.
#traitfn function get_measurements{X;IsProduct{X},IsWitnessType{IsProduct{X},M}}(similar_items::Array{X,1})
all_measurements = Array{M,1}(length(similar_items))
for i in eachindex(similar_items)
all_measurements[i] = measurements(similar_items[i])::M
end
all_measurements::Array{M,1}
end
because this would be somehow dispatchable.
So: what is my specific question? I am asking, given that you presumably by this point understand that my goals are
Have my code exhibit this sort of structure generically, so that
I can effectively repeat this design pattern across a lot of cases
and then program in the abstract at the high-level of X and M,
and
do (1) in such a way that the compiler can still optimize to the best of its ability / is as aware of the relationship among
types as I, the coder, am
then, how should I do this? I think the answer is
Use Traits.jl
Do something pretty similar to what you've done so far
Also do ____some clever thing____ that the answerer will indicate,
but I'm open to the idea that in fact the correct answer is
Abandon this approach, you're thinking about the problem the wrong way
Instead, think about it this way: ____MWE____
I'd also be perfectly satisfied by answers of the form
What you are asking for is a "sophisticated" feature of Julia that is still under development, and is expected to be included in v0.x.y, so just wait...
and I'm less enthusiastic about (but still curious to hear) an answer such as
Abandon Julia; instead use the language ________ that is designed for this type of thing
I also think this might be related to the question of typing Julia's function outputs, which as I take it is also under consideration, though I haven't been able to puzzle out the exact representation of this problem in terms of that one.

How to get a function from a symbol without using eval?

I've got a symbol that represents the name of a function to be called:
julia> func_sym = :tanh
I can use that symbol to get the tanh function and call it using:
julia> eval(func_sym)(2)
0.9640275800758169
But I'd rather avoid the 'eval' there as it will be called many times and it's expensive (and func_sym can have several different values depending on context).
IIRC in Ruby you can say something like:
obj.send(func_sym, args)
Is there something similar in Julia?
EDIT: some more details on why I have functions represented by symbols:
I have a type (from a neural network) that includes the activation function, originally I included it as a funcion:
type NeuralLayer
weights::Matrix{Float32}
biases::Vector{Float32}
a_func::Function
end
However, I needed to serialize these things to files using JLD, but it's not possible to serialize a Function, so I went with a symbol:
type NeuralLayer
weights::Matrix{Float32}
biases::Vector{Float32}
a_func::Symbol
end
And currently I use the eval approach above to call the activation function. There are collections of NeuralLayers and each can have it's own activation function.
#Isaiah's answer is spot-on; perhaps even more-so after the edit to the original question. To elaborate and make this more specific to your case: I'd change your NeuralLayer type to be parametric:
type NeuralLayer{func_type}
weights::Matrix{Float32}
biases::Vector{Float32}
end
Since func_type doesn't appear in the types of the fields, the constructor will require you to explicitly specify it: layer = NeuralLayer{:excitatory}(w, b). One restriction here is that you cannot modify a type parameter.
Now, func_type could be a symbol (like you're doing now) or it could be a more functionally relevant parameter (or parameters) that tunes your activation function. Then you define your activation functions like this:
# If you define your NeuralLayer with just one parameter:
activation(layer::NeuralLayer{:inhibitory}) = …
activation(layer::NeuralLayer{:excitatory}) = …
# Or if you want to use several physiological parameters instead:
activation{g_K,g_Na,g_l}(layer::NeuralLayer{g_K,g_Na,g_l} = f(g_K, g_Na, g_l)
The key point is that functions and behavior are external to the data. Use type definitions and abstract type hierarchies to define behavior, as is coded in the external functions… but only store data itself in the types. This is dramatically different from Python or other strongly object-oriented paradigms, and it takes some getting used to.
But I'd rather avoid the 'eval' there as it will be called many times and it's expensive (and func_sym can have several different values depending on context).
This sort of dynamic dispatch is possible in Julia, but not recommended. Changing the value of 'func_sym' based on context defeats type inference as well as method specialization and inlining. Instead, the recommended approach is to use multiple dispatch, as detailed in the Methods section of the manual.

Resources