OCaml visitor pattern - functional-programming

I am implementing a simple C-like language in OCaml and, as usual, AST is my intermediate code representation. As I will be doing quite some traversals on the tree, I wanted to implement
a visitor pattern to ease the pain. My AST currently follows the semantics of the language:
type expr = Plus of string*expr*expr | Int of int | ...
type command = While of boolexpr*block | Assign of ...
type block = Commands of command list
...
The problem is now that nodes in a tree are of different type. Ideally, I would pass to the
visiting procedure a single function handling a node; the procedure would switch on type of the node and do the work accordingly. Now, I have to pass a function for each node type, which does not seem like a best solution.
It seems to me that I can (1) really go with this approach or (2) have just a single type above. What is the usual way to approach this? Maybe use OO?

Nobody uses the visitor pattern in functional languages -- and that's a good thing. With pattern matching, you can fortunately implement the same logic much more easily and directly just using (mutually) recursive functions.
For example, assume you wanted to write a simple interpreter for your AST:
let rec run_expr = function
| Plus(_, e1, e2) -> run_expr e1 + run_expr e2
| Int(i) -> i
| ...
and run_command = function
| While(e, b) as c -> if run_expr e <> 0 then (run_block b; run_command c)
| Assign ...
and run_block = function
| Commands(cs) = List.iter run_command cs
The visitor pattern will typically only complicate this, especially when the result types are heterogeneous, like here.

It is indeed possible to define a class with one visiting method per type of the AST (which by default does nothing) and have your visiting functions taking an instance of this class as a parameter. In fact, such a mechanism is used in the OCaml world, albeit not that often.
In particular, the CIL library has a visitor class
(see https://github.com/kerneis/cil/blob/develop/src/cil.mli#L1816 for the interface). Note that CIL's visitors are inherently imperative (transformations are done in place). It is however perfectly possible to define visitors that maps an AST into another one, such as in Frama-C, which is based on CIL and offer in-place and copy visitor. Finally Cαml, an AST generator meant to easily take care of bound variables, generate map and fold visitors together with the datatypes.

If you have to write many different recursive operations over a set of mutually recursive datatypes (such as an AST) then you can use open recursion (in the form of classes) to encode the traversal and save yourself some boiler plate.
There is an example of such a visitor class in Real World OCaml.

The Visitor pattern (and all pattern related to reusable software) has to do with reusability in an inclusion polymorphism context (subtypes and inheritance).
Composite explains a solution in which you can add a new subtype to an existing one without modifying the latter one code.
Visitor explains a solution in which you can add a new function to an existing type (and to all of its subtypes) without modifying the type code.
These solutions belong to object-oriented programming and require message sending (method invocation) with dynamic binding.
You can do this in Ocaml is you use the "O" (the Object layer), with some limitation coming with the advantage of having strong typing.
In OCaml Having a set of related types, deciding whether you will use a class hierarchy and message sending or, as suggested by andreas, a concrete (algebraic) type together with pattern matching and simple function call, is a hard question.
Concrete types are not equivalent. If you choose the latter, you will be unable to define a new node in your AST after your node type will be defined and compiled. Once said that a A is either a A1 or a A2, your cannot say later on that there are also some A3, without modifying the source code.
In your case, if you want to implement a visitor, replace your EXPR concrete type by a class and its subclasses and your functions by methods (which are also functions by the way). Dynamic binding will then do the job.

Related

Treating single and multiple elements the same way ("transparent" map operator)

I'm working on a programming language that is supposed to be easy, intuitive, and succinct (yeah, I know, I'm the first person to ever come up with that goal ;-) ).
One of the features that I am considering for simplifying the use of container types is to make the methods of the container's element type available on the container type itself, basically as a shortcut for invoking a map(...) method. The idea is that working with many elements should not be different from working with a single element: I can apply add(5) to a single number or to a whole list of numbers, and I shouldn't have to write slightly different code for the "one" versus the "many" scenario.
For example (Java pseudo-code):
import static java.math.BigInteger.*; // ZERO, ONE, ...
...
// NOTE: BigInteger has an add(BigInteger) method
Stream<BigInteger> numbers = Stream.of(ZERO, ONE, TWO, TEN);
Stream<BigInteger> one2Three11 = numbers.add(ONE); // = 1, 2, 3, 11
// this would be equivalent to: numbers.map(ONE::add)
As far as I can tell, the concept would not only apply to "container" types (streams, lists, sets...), but more generally to all functor-like types that have a map method (e.g., optionals, state monads, etc.).
The implementation approach would probably be more along the lines of syntactic sugar offered by the compiler rather than by manipulating the actual types (Stream<BigInteger> obviously does not extend BigInteger, and even if it did the "map-add" method would have to return a Stream<BigInteger> instead of an Integer, which would be incompatible with most languages' inheritance rules).
I have two questions regarding such a proposed feature:
(1) What are the known caveats with offering such a feature? Method name collisions between the container type and the element type are one problem that comes to mind (e.g., when I call add on a List<BigInteger> do I want to add an element to the list or do I want to add a number to all elements of the list? The argument type should clarify this, but it's something that could get tricky)
(2) Are there any existing languages that offer such a feature, and if so, how is this implemented under the hood? I did some research, and while pretty much every modern language has something like a map operator, I could not find any languages where the one-versus-many distinction would be completely transparent (which leads me to believe that there is some technical difficulty that I'm overlooking here)
NOTE: I am looking at this in a purely functional context that does not support mutable data (not sure if that matters for answering these questions)
Do you come from an object-oriented background? That's my guess because you're thinking of map as a method belonging to each different "type" as opposed to thinking about various things that are of the type functor.
Compare how TypeScript would handle this if map were a property of each individual functor:
declare someOption: Option<number>
someOption.map(val => val * 2) // Option<number>
declare someEither: Either<string, number>
someEither.map(val => val * 2) // Either<string,number>
someEither.mapLeft(string => 'ERROR') // Either<'ERROR', number>
You could also create a constant representing each individual functor instance (option, array, identity, either, async/Promise/Task, etc.), where these constants have map as a method. Then have a standalone map method that takes one of those "functor constant"s, the mapping function, and the starting value, and returns the new wrapped value:
const option: Functor = {
map: <A, B>(f: (a:A) => B) => (o:Option<A>) => Option<B>
}
declare const someOption: Option<number>
map(option)(val => val * 2)(someOption) // Option<number>
declare const either: Functor = {
map: <E, A, B>(f: (a:A) => B) => (e:Either<E, A>) => Either<E, B>
}
declare const either: Either<string,number>
map(either)(val => val * 2)(someEither)
Essentially, you have a functor "map" that uses the first parameter to identify which type you're going to be mapping, and then you pass in the data and the mapping function.
However, with proper functional languages like Haskell, you don't have to pass in that "functor constant" because the language will apply it for you. Haskell does this. I'm not fluent enough in Haskell to write you the examples, unfortunately. But that's a really nice benefit that means even less boilerplate. It also allows you to write a lot of your code in what is "point free" style, so refactoring becomes much easier if you make your language so you don't have to manually specify the type being used in order to take advantage of map/chain/bind/etc.
Consider you initially write your code that makes a bunch of API calls over HTTP. So you use a hypothetical async monad. If your language is smart enough to know which type is being used, you could have some code like
import { map as asyncMap }
declare const apiCall: Async<number>
asyncMap(n => n*2)(apiCall) // Async<number>
Now you change your API so it's reading a file and you make it synchronous instead:
import { map as syncMap }
declare const apiCall: Sync<number>
syncMap(n => n*2)(apiCall)
Look how you have to change multiple pieces of the code. Now imagine you have hundreds of files and tens of thousands of lines of code.
With a point-free style, you could do
import { map } from 'functor'
declare const apiCall: Async<number>
map(n => n*2)(apiCall)
and refactor to
import { map } from 'functor'
declare const apiCall: Sync<number>
map(n => n*2)(apiCall)
If you had a centralized location of your API calls, that would be the only place you're changing anything. Everything else is smart enough to recognize which functor you're using and apply map correctly.
As far as your concerns about name collisions, that's a concern that will exist no matter your language or design. But in functional programming, add would be a combinator that is your mapping function passed into your fmap (Haskell term) / map(lots of imperative/OO languages' term). The function you use to add a new element to the tail end of an array/list might be called snoc ("cons" from "construct" spelled backwards, where cons prepends an element to your array; snoc appends). You could also call it push or append.
As far as your one-vs-many issue, these are not the same type. One is a list/array type, and the other is an identity type. The underlying code treating them would be different as they are different functors (one contains a single element, while one contains multiple elements.
I suppose you could create a language that disallows single elements by automatically wrapping them as a single-element lists and then just uses the list map. But this seems like a lot of work to make two things that are very different look the same.
Instead, the approach where you wrap single elements to be identity and multiple elements to be a list/array, and then array and identity have their own under-the-hood handlers for the functor method map probably would be better.

Subtypes of Julia concrete types

In order to practising with Julia, I am implementing a little module containing some fixed step ODE solvers (Euler, Runge Kutta, Bulirsch Stoer) using the iterator interface.
My idea was to use multiple dispatch to apply the correct method of the function next to the particular iterator, however the Euler and Runge Kutta iterator type (actually immutable) old the same data.
So I have to choose between:
create two immutable type identical except for the name or
crate a unique immutable with an additional field (say solving_method) and use branching instead of multiple dispatch to address this issue
Both choices seem clunky to me (in particular the second, because the solving_method field is checked at every iteration).
Reading the online discussions about inheritance in Julia I understood that Julia does not have (and will never have) subtypes of concrete types, meaning that one cannot "add fields" to a parent type in this way.
But why I cannot have subtypes of concrete types just for dispatching purposes?
One idiomatic way to solve this flavor of problem is to create a type that stores parameters or the state of the solver and then have a second immutable to specify the method:
type SolverOptions
# ... step size, error tol, etc.
end
immutable RungeKutta end
immutable Euler end
function solve(problem::ODE, method::RungeKutta, options::SolverOptions)
# ... code here ...
end
function solve(problem::ODE, method::Euler, options::SolverOptions)
# ... code here ...
end
Of course, RungeKutta and Euler need not be empty if you want to store some data in there. This isn't always the best solution (and I can't be sure that it will work in your particular case) but it can help when you are trying to prevent duplication of fieldnames.
Maybe try parametric types?
abstract OdeType
abstract Euler <: OdeType
abstract RK4 <: OdeType
immutable Common{T<:OdeType}
x::Int
end

Fundamentals of Ada's T'Class

Somewhat embarassed to ask this, but I know it's for the best. I've been programming in Ada for many years now, and understand nearly every part of the language fluently. However, I've never seemed able to wrap my head around T'Class. To borrow from others, can someone "explain it like I'm five?".
Edit: I bought it just to have, but contained within is a great description of, and example use of, T'Class; I refer to “Software Construction and Data Structures with Ada 95” by Michael B. Feldman.
If you start with
package P1 is
type T is tagged private;
procedure Method (Self : T);
end P1;
package P2 is
procedure Proc (Self : T); -- not a primitive
procedure Proc2 (Self : T'Class);
end P2;
In the case of Proc, you are telling the compiler that the parameter should always be considered precisely as of type T (remember that a tagged type is always passed by reference, so the actual type could be derived from T of course, you would not lose the extra data). In particular, that means that within the body of Proc, all calls to Method will be exactly calls to P1.Method, never a call to an overriding Method.
In the case of Proc2, you are telling the compiler that you do not know the exact type statically, so it will need to insert extra code to resolve things at run time. A call to Method, within the body of Proc2, could be call to P1.Method, or to another overriding Method.
Basically: with 'Class, things are resolved at runtime.
Well, if you were five, I would say that T'Class represents the whole family of T.
By family, we mean children and grand-children and grand-grand-children.
As you're not five, it means that this special type represents every tagged type which is in the inheritance tree of T. This way, if you use this type as a parameter, you can use every parameter which has T as ancestor directly or not.
For more information, you can read the wikibooks on this.

Invoking a primitive operation via dot operator fails

I have a problem understanding how the UFCS (Universal Function Call Syntax) works in Ada.
Let's say I have a type, like:
package People
type Person is tagged private;
-- This procedure is a primitive operation:
procedure Say_Name (Person_Object : in Person);
private
type Person is tagged record
Name : String;
end record;
end People;
then I can call the procedure as if it actually belonged to the Person type:
Some_Person_Instance.Say_Name;
Now that works, but in my particular instance it doesn't make sense to have a record, and a subtype would suffice.
subtype Person is String;
At this point (assuming I changed the procedure's workings), it fails to compile and I get the error:
invalid prefix in selected component "Person".
Why? It doesn't even help if I do:
type Person is new String;
Does UFCS only work for records?
I apologize if this is an inane question, but I've no study materials for Ada (apart for couple of e-books) and the textbook I ordered hasn't arrived yet.
UFCS is a full feature of the D language. For historical reasons, Ada has mixed approaches to calls in different parts of the language.
Ordinary subprogram calls are dealt with in ARM 6.4, and look like Subprogram_Name (Parameters) (or just Subprogram_Name if there are no parameters).
Protected subprogram calls (ARM 9.5.1) and entry calls (ARM 9.5.3) look like Object.Subprogram_Or_Entry_Name (Parameters).
Primitive subprograms of tagged types, however, can be called either way; either as an ordinary call, or, if the tagged parameter is the first parameter, using the prefix notation (ARM 4.1.3(9.1)).
There is discussion of this design in AI95-00252; apparently the designers did consider allowing both call forms for all types, but there were too many complications and too few benefits. A shame, I think we all agree, though perhaps it can be taken too far; the D example (from here)
values.multiply(10).divide(3).evens.writeln;
might be a case in point!
With regard to learning Ada and Web resources, have a look at the Ada Resource Association’s resource list.

Do monads have fluent interfaces?

Forgive me if this question seems stupid, but I'm quite new to the whole world of functional programming so I'll need some denizens on StackOverflow to set me straight.
From what I gather, an operation on a monad returns a monad. Does this mean that monads have a fluent interface, whereby each function that is applied on a monad returns that monad after it applies some operation to the variable it wraps?
Presumably you're referring to the bind operator associated with monads, wherein one can start with a monadic value, bind it to a monadic function, and wind up with another monadic value. That's a lot like a "fluent method" (or a set of such making up a "fluent interface") that returns a "this" pointer or reference, yes, but what you'd be missing out on there is that the monadic function need not return a monadic value that's the same type as the input value. The fluent method convention is to return the same type of value so as to continue chaining calls that are all valid on the instance (or instances) being prepared.
The monadic bind operator signature looks more like this:
M[a] -> (a -> M[b]) -> M[b]
That is, the "return value" is possibly of a type different from to the first input value's type. It's only the same when the provided function has the type
(a -> M[a])
It all depends on the type of the monadic function—and, more specifically, the return type of the monadic function.
If you were to constrain the domain of the monadic functions you'd accept to those that return the same type as the monadic value supplied to the bind operator, then yes, you'd have something that behaves like a fluent interface.
Based on what I know about fluent interfaces, they are mostly about making the code "read nicely" by using method chaining. So for example:
Date date = date()
.withYear(2008)
.withMonth(Calendar.JANUARY)
.withDayOfMonth(15)
.toDate();
A Haskell do-notation version of it (using an imaginary date api) could look like:
do date
withYear 2008
withMonth JANUARY
withDayOfMonth 15
toDate
Whether or not this or other do-notation based DSLs like it qualify as a "fluent interface" is probably up for discussion, since there is no formal definition of what a "fluent interface" is. I'd say if it reads like this then it's close enough.
Note that this isn't exactly specific to monads; monads CAN have a fluent interface if you don't require method calling, but that would depend on the function names and the way the API is used.

Resources