I quite don't understand how the interface thing works in OCaml.
Let's see an example:
About the 'a
So what the meaning of 'a here? I mean I understand that when describing the functions, 'a means arbitrary type. Then what's its meaning here? Does it mean arbitrary set?
Also, Why put 'a in front of set?
abstract
When explaining this example, Jason Hickey's Introduction to Objective Caml says:
we need to define a polymorphic type of sets ’a set abstractly. That
is, in the interface we will declare a type ’a set without giving a
definition, preventing other parts of the program from knowing, or
depending on, the particular representation of sets we have chosen.
From the above statements, I guess it means in interface definition, we should hide the implementation details. But what details has been hidden?
type 'a set = 'a list
In the implementation file, it says type 'a set = 'a list.
What does this do then?
Does it mean this set only takes a list? If it does mean this, will it be necessary to tell this in the interface file, since user of this set should know it takes only list, right?
So what the meaning of 'a here?
This is like parametric polymorphism for functions (in Java it is known as generics) but for types. In this code sets which store ints will have type int set, strings --- string set etc. 'a is in front because it is common OCaml syntax. In revised syntax type variables are written after typenames: like list int or set list int. For more information about different kinds of polymorphism I can recommend you a book Types at programming languages, part V: Polymorphism. If you understand parametric polymorphism for functions I think it will not be difficult to enhance your knowledge for types.
what details has been hidden?
In your ML file, the type 'a set is defined as a list of elements. To search some element in a list, one must iterate through the list and call (=) for every element (this is the way the function List.mem works). AFAIR in OCaml the stdlib sets are implemented as balanced trees and values which are stored in sets should have function compare: t -> t -> int where t is the type of elements stored in the set.
However sets can be defined differently and if you look only at abstract types in .mli, then you can only guess how it is implemented in .ml file.
Indeed, in this definition, the type 'a list has been used to implement the type 'a set, but from the interface file, this information is not visible - the hidden part is the fact that the set type is really a list. The way the module has been implemented, and the choice of which information has been made available to the external world, make it possible for a program to use the set type without knowing how it's made.
It's an important feature of software design, since it let the developer change the implementation of that module without having to change the code that uses it. Making the type abstract enforces that separation: you will get a type error if you try to use a list as a set outside the module.
type 'a set = 'a list
AFAIR, this line introduces a 'type synonym' (or alias) saying: here and below the type set is the same as list and you can use set with functions which expect list and vice versa.
When you see 'a set you should understand that it is just a set of something, when you put a string to set then it will be a string set. If you see 'a set you can't say what is stored or will be stored in this set, but if you see string set, you can. Type synonyms are also mentioned in the book above.
P.S.
So you mean type 'a set = 'a list indicates that the set is expecting list as parameters?
No, it doesn't. You just add new type alias in this line. It doesn't shrink a number of types which can be substituted to 'a type variable. If you write
# type 'a set = 'a list;;
type 'a set = 'a list
# let create x : _ set = [x];;
val create : 'a -> 'a set = <fun>
and then
List.map ((+)1) (create 2);;
the compiler will infer the type of create 2 as an int set (since an int value is used for the parameter of type 'a of create, and the return type of that function is 'a set), then it will look at its table of type aliases (synonyms) and when it will understand that type set is the same as type list it will continue the type inference's process.
You should understand that you should write the right number of type variables when creating a new synonym, i.e. type 'a new_t = ('a*'b) list doesn't make any sense both for me and the compiler. There should be at least as many type variables in the left as in the right: type ('a, 'b) new_t = ('a * 'b) list, for example, works.
Related
I'm interested in defining a struct that has a field which is a vector of vectors. Potentially (but not necessarily), the inner vectors would be of type SVector (defined in the StaticArrays package). My naive approach would be to declare the field x::AbstractVector{AbstractVector{T}}; however, julia doesn't regard, say, Vector{SVector{3, Float64}} to be an instance of AbstractVector{AbstractVector}. For that matter, it doesn't regard Vector{Vector{64}} to be AbstractVector{AbstractVector} either. It seems as though the contained type has to be a concrete type, or left out entirely. Am I going about this in the wrong way?
Use AbstractVector{<:AbstractVector} as this is a construct accepting any vector whose element type is a subtype of AbstractVector.
Instead AbstractVector{AbstractVector} requires element type to be AbstractVector exactly.
Using structs with abstract fields are not generally recommended, especially for performance reasons. In a struct field AbstractVector{<:AbstractVector} is abstract, even if the eltype is concrete.
Using AbstractVector{<:AbstractVector} in a function signature is fine, but probably not in a struct definition. Try this instead:
struct Foo{T<:AbstractVector{<:AbstractVector}}
x::T
end
This will give you a concretely typed field, where the type of the field x is encoded in the type parameter.
I have code that declares a mutable dictionary but I will get an error when I try to change an element.
The code:
let layers =
seq {
if recipes.ContainsKey(PositionSide.Short) then yield! buildLayerSide recipes.[PositionSide.Short]
if recipes.ContainsKey(PositionSide.Long) then yield! buildLayerSide recipes.[PositionSide.Long]
}
|> Seq.map (fun l -> l.Id, l)
|> dict
this creates an IDictionary. I understand that the object itself is immutable but the contents of the dictionary should be mutable.
When I change the code by explicitly initializing the dictionary then it becomes mutable:
let layers =
let a =
seq {
if recipes.ContainsKey(PositionSide.Short) then yield! buildLayerSide recipes.[PositionSide.Short]
if recipes.ContainsKey(PositionSide.Long) then yield! buildLayerSide recipes.[PositionSide.Long]
}
|> Seq.map (fun l -> l.Id, l)
|> dict
let x = Dictionary<string, Layer>()
a
|> Seq.iter (fun kvp -> x.[kvp.Key] <- kvp.Value)
x
Why is that?
IDictionary is an interface, not a class. This interface may have multiple different implementations. You can even make one yourself.
Dictionary is indeed one of these implementations. It supports the full functionality of the interface.
But that's not the implementation that the dict function returns. Let's try this:
> let d = dict [(1,2)]
> d.GetType().FullName
"Microsoft.FSharp.Core.ExtraTopLevelOperators+DictImpl`3[...
Turns out the implementation that the dict function returns is Microsoft.FSharp.Core.ExtraTopLevelOperators.DictImpl - a class named DictImpl defined deep in the innards of the F# standard library.
And it just so happens that certain methods on that interface throw a NotSupportedException:
> d.Add(4,5)
System.NotSupportedException: This value cannot be mutated
That's by design. It's done this way on purpose, to support "immutability by default".
If you really want to have a mutable version, you can create a copy by using one of Dictionary's constructors:
> let m = Dictionary(d)
> m.Add(4,5) // Works now
The difference between Map and Dictionary is the implementation, which then implies memory and runtime characteristics.
Dictionary is a hashtable. It offers constant-time insertion and retrieval, but to pay for that it relies on consistent hashing of its keys and its updates are destructive, which also comes with thread-unsafety.
Map is implemented as a tree. It offers logarithmic insertion and retrieval, but in return has the benefits of a persistent data structure. Also, it requires keys to be comparable. Try this:
> type Foo() = class end
> let m = Map [(Foo(), "bar")]
error FS0001: The type 'Foo' does not support the 'comparison' constraint
Comparing keys is essential for building a tree.
The difference is that dict is a read-only dictionary with some mutating methods that throw an exception (which is a disadvantage of that type), whereas the map is an immutable collection that uses functions from the Map module as well as methods to modify elements of the map and return a copy. There is a good explanation of this in Lesson 17 of the book Get Programming with F#.
Also, for the dict, "Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table." from the docs; whereas the map is based on a binary tree, so retrieving a value by using its key has O(log(N)) complexity. See Collection Types. This also means that the keys in a map are ordered; whereas in a dict, they are unordered.
For many use cases, the difference in performance would be negligible, so the default choice for Functional Programming style should be the map, since its programming interface is similar in style to the other F# collections like list and seq.
dict is a helper method to create an iDictionary, this dictionary is immutable (so you need to supply the contents during the creation of the object). It is actually a read-only dictionary so no surprise that you cannot modify its contents. In your second example you explicitly create a Dictionary which is a mutable dictionary. Since Dictionary can take a iDictionary you could just pass in your iDictionary to it.
Rust has the Any trait, but it also has a "do not pay for what you do not use" policy. How does Rust implement reflection?
My guess is that Rust uses lazy tagging. Every type is initially unassigned, but later if an instance of the type is passed to a function expecting an Any trait, the type is assigned a TypeId.
Or maybe Rust puts a TypeId on every type that its instance is possibly passed to that function? I guess the former would be expensive.
First of all, Rust doesn't have reflection; reflection implies you can get details about a type at runtime, like the fields, methods, interfaces it implements, etc. You can not do this with Rust. The closest you can get is explicitly implementing (or deriving) a trait that provides this information.
Each type gets a TypeId assigned to it at compile time. Because having globally ordered IDs is hard, the ID is an integer derived from a combination of the type's definition, and assorted metadata about the crate in which it's contained. To put it another way: they're not assigned in any sort of order, they're just hashes of the various bits of information that go into defining the type. [1]
If you look at the source for the Any trait, you'll see the single implementation for Any:
impl<T: 'static + ?Sized > Any for T {
fn get_type_id(&self) -> TypeId { TypeId::of::<T>() }
}
(The bounds can be informally reduced to "all types that aren't borrowed from something else".)
You can also find the definition of TypeId:
pub struct TypeId {
t: u64,
}
impl TypeId {
pub const fn of<T: ?Sized + 'static>() -> TypeId {
TypeId {
t: unsafe { intrinsics::type_id::<T>() },
}
}
}
intrinsics::type_id is an internal function recognised by the compiler that, given a type, returns its internal type ID. This call just gets replaced at compile time with the literal integer type ID; there's no actual call here. [2] That's how TypeId knows what a type's ID is. TypeId, then, is just a wrapper around this u64 to hide the implementation details from users. If you find it conceptually simpler, you can just think of a type's TypeId as being a constant 64-bit integer that the compiler just knows at compile time.
Any forwards to this from get_type_id, meaning that get_type_id is really just binding the trait method to the appropriate TypeId::of method. It's just there to ensure that if you have an Any, you can find out the original type's TypeId.
Now, Any is implemented for most types, but this doesn't mean that all those types actually have an Any implementation floating around in memory. What actually happens is that the compiler only generates the actual code for a type's Any implementation if someone writes code that requires it. [3] In other words, if you never use the Any implementation for a given type, the compiler will never generate it.
This is how Rust fulfills "do not pay for what do you not use": if you never pass a given type as &Any or Box<Any>, then the associated code is never generated and never takes up any space in your compiled binary.
[1]: Frustratingly, this means that a type's TypeId can change value depending on precisely how the library gets compiled, to the point that compiling it as a dependency (as opposed to as a standalone build) causes TypeIds to change.
[2]: Insofar as I am aware. I could be wrong about this, but I'd be really surprised if that's the case.
[3]: This is generally true of generics in Rust.
I am implementing a simple C-like language in OCaml and, as usual, AST is my intermediate code representation. As I will be doing quite some traversals on the tree, I wanted to implement
a visitor pattern to ease the pain. My AST currently follows the semantics of the language:
type expr = Plus of string*expr*expr | Int of int | ...
type command = While of boolexpr*block | Assign of ...
type block = Commands of command list
...
The problem is now that nodes in a tree are of different type. Ideally, I would pass to the
visiting procedure a single function handling a node; the procedure would switch on type of the node and do the work accordingly. Now, I have to pass a function for each node type, which does not seem like a best solution.
It seems to me that I can (1) really go with this approach or (2) have just a single type above. What is the usual way to approach this? Maybe use OO?
Nobody uses the visitor pattern in functional languages -- and that's a good thing. With pattern matching, you can fortunately implement the same logic much more easily and directly just using (mutually) recursive functions.
For example, assume you wanted to write a simple interpreter for your AST:
let rec run_expr = function
| Plus(_, e1, e2) -> run_expr e1 + run_expr e2
| Int(i) -> i
| ...
and run_command = function
| While(e, b) as c -> if run_expr e <> 0 then (run_block b; run_command c)
| Assign ...
and run_block = function
| Commands(cs) = List.iter run_command cs
The visitor pattern will typically only complicate this, especially when the result types are heterogeneous, like here.
It is indeed possible to define a class with one visiting method per type of the AST (which by default does nothing) and have your visiting functions taking an instance of this class as a parameter. In fact, such a mechanism is used in the OCaml world, albeit not that often.
In particular, the CIL library has a visitor class
(see https://github.com/kerneis/cil/blob/develop/src/cil.mli#L1816 for the interface). Note that CIL's visitors are inherently imperative (transformations are done in place). It is however perfectly possible to define visitors that maps an AST into another one, such as in Frama-C, which is based on CIL and offer in-place and copy visitor. Finally Cαml, an AST generator meant to easily take care of bound variables, generate map and fold visitors together with the datatypes.
If you have to write many different recursive operations over a set of mutually recursive datatypes (such as an AST) then you can use open recursion (in the form of classes) to encode the traversal and save yourself some boiler plate.
There is an example of such a visitor class in Real World OCaml.
The Visitor pattern (and all pattern related to reusable software) has to do with reusability in an inclusion polymorphism context (subtypes and inheritance).
Composite explains a solution in which you can add a new subtype to an existing one without modifying the latter one code.
Visitor explains a solution in which you can add a new function to an existing type (and to all of its subtypes) without modifying the type code.
These solutions belong to object-oriented programming and require message sending (method invocation) with dynamic binding.
You can do this in Ocaml is you use the "O" (the Object layer), with some limitation coming with the advantage of having strong typing.
In OCaml Having a set of related types, deciding whether you will use a class hierarchy and message sending or, as suggested by andreas, a concrete (algebraic) type together with pattern matching and simple function call, is a hard question.
Concrete types are not equivalent. If you choose the latter, you will be unable to define a new node in your AST after your node type will be defined and compiled. Once said that a A is either a A1 or a A2, your cannot say later on that there are also some A3, without modifying the source code.
In your case, if you want to implement a visitor, replace your EXPR concrete type by a class and its subclasses and your functions by methods (which are also functions by the way). Dynamic binding will then do the job.
I've a very big array composed of only nil s and t s .
My questions is; does it make sense for the compiler to make type declaration within a function that handles this specific type of array. If so what should the declaration look like?
For example:
(defun foo(my-array)
(declare (type (array ?????) my-array))
....
First notice that in Common Lisp an array of type (array boolean) (where BOOLEAN is the applicable type) is not an array that just happens to contain only ts and nils, but an array that can only contain those, which is a property that has to be specified during creation of the array. Violating this will result in a run-time error or undefined behaviour depending on your safety level.
I don't think there is much point in specifying the type at function level, since I don't believe there are any applicable optimizations. You might consider using bit-vectors, which are at least tightly packed and allow the use of fast bit processing instructions. That is, if your data is representable in one dimension, since I am not sure how much those apply for multidimensional (array bit) arrays.