Rationale behind Ada encapsulation of dynamically dispatching operations (primitives) - ada

In Ada, Primitive operations of a type T can only be defined in the package where T is defined. For example, if a Vehicules package defines Car and Bike tagged record, both inheriting a common Vehicle abstract tagged type, then all operations than can dispatch on the class-wide Vehicle'Class type must be defined in this Vehicles package.
Let's say that you do not want to add primitive operations: you do not have the permission to edit the source file, or you do not want to clutter the package with unrelated features.
Then, you cannot define operations in other packages that implicitely dispatches on type Vehicle'Class.
For example, you may want to serialize vehicles (define a Vehicles_XML package with a To_Xml dispatching function) or display them as UI elements (define a Vehicles_GTK package with Get_Label, Get_Icon, ... dispatching functions), etc.
The only way to perform dynamic dispatch is to write the code explicitely; for example, inside Vechicle_XML:
if V in Car'Class then
return Car_XML (Car (V));
else
if V in Bike'Class then
return Bike_XML (Bike (V));
else
raise Constraint_Error
with "Vehicle_XML is only defined for Car and Bike."
end if;
(And a Visitor pattern defined in Vehicles and used elsewhere would work, of course, but that still requires the same kind of explicit dispatching code. edit in fact, no, but there is still some boilerplate code to write)
My question is then:
is there a reason why operations dynamically dispatching on T are restricted to be defined in the defining package of T?
Is this intentional? Is there some historical reasons behind this?
Thanks
EDIT:
Thanks for the current answers: basically, it seems that it is a matter of language implementation (freezing rules/virtual tables).
I agree that compilers are developped incrementally over time and that not all features fit nicely in an existing tool.
As such, isolating dispatching operators in a unique package seems to be a decision mostly guided by existing implementations than by language design. Other languages outside of the C++/Java family provide dynamic dispatch without such requirement (e.g. OCaml, Lisp (CLOS); if that matters, those are also compiled languages, or more precisely, language for which compilers exist).
When I asked this question, I wanted to know if there were more fundamental reasons, at language specification level, behind this part of Ada specifications (otherwise, does it really mean that the specification assumes/enforces a particular implementation of dynamic disapatch?)
Ideally, I am looking for an authoritative source, like a rationale or guideline section in Reference Manuals, or any kind of archived discussion about this specific part of the language.

I can think of several reasons:
(1) Your example has Car and Bike defined in the same package, both derived from Vehicles. However, that's not the "normal" use case, in my experience; it's more common to define each derived type in its own package. (Which I think is close to how "classes" are used in other compiled languages.) And note also that it's not uncommon to define new derived types afterwards. That's one of the whole points of object-oriented programming, to facilitate reuse; and it's a good thing if, when designing a new feature, you can find some existing type that you can derive from, and reuse its features.
So suppose you have your Vehicles package that defines Vehicle, Car, and Bike. Now in some other package V2, you want to define a new dispatching operation on a Vehicle. For this to work, you have to provide the overriding operations for Car and Bike, with their bodies; and assuming you are not allowed to modify Vehicles, then the language designers have to decide where the bodies of the new operation have to be. Presumably, you'd have to write them in V2. (One consequence is that the body that you write in V2 would not have access to the private part of Vehicles, and therefore it couldn't access implementation details of Car or Bike; so you could only write the body of that operation if terms of already-defined operations.) So then the question is: does V2 need to provide operations for all types that are derived from Vehicle? What about types derived from Vehicle that don't become part of the final program (maybe they're derived to be used in someone else's project)? What about types derived from Vehicle that haven't yet been defined (see preceding paragraph)? In theory, I suppose this could be made to work by checking everything at link time. However, that would be a major paradigm change for the language. It's not something that could be easily. (It's pretty common, by the way, for programmers to think "it would be nice to add feature X to a language, and it shouldn't be too hard because X is simple to talk about", without realizing just what a vast impact such a "simple" feature would have.)
(2) A practical reason has to do with how dispatching is implemented. Typically, it's done with a vector of procedure/function pointers. (I don't know for sure what the exact implementation is in all cases, but I think this is basically the case for every Ada compiler as well as for C++ and Java compilers, and probably C#.) What this means is that when you define a tagged type (or a class, in other languages), the compiler will set up a vector of pointers, and based on how many operations are defined for the type, say N, it will reserve slots 1..N in the vector for the addresses of the subprograms. If a type is derived from that type and defines overriding subprograms, the derived type gets its own vector, where slots 1..N will be pointers to the actual overriding subprograms. Then, when calling a dispatching subprogram, a program can look up the address in some known slot index assigned to that subprogram, and it will jump to the correct address depending on the object's actual type. If a derived type defines new primitive subprograms, new slots are assigned N+1..N2, and types derived from that could define new subprograms that get slots N2+1..N3, and so on.
Adding new dispatching subprograms to Vehicle would interfere with this. Since new types have been derived from Vehicle, you can't insert a new area into the vector after N, because code has already been generated that assumes the slots starting at N+1 have been assigned to new operations derived for derived types. And since we may not know all the types that have been derived from Vehicle and we don't know what other types will be derived from Vehicle in the future and how many new operations will be defined for them, it's hard to pick some other location in the vector that could be used for the new operations. Again, this could be done if all of the slot assignment were deferred until link time, but that would be a major paradigm change, again.
To be honest, I can think of other ways to make this work, by adding new operations not in the "main" dispatch vector but in an auxiliary one; dispatching would probably require a search for the correct vector (perhaps using an ID assigned to the package that defines the new operations). Also, adding interface types to Ada 2005 has already complicated the simple vector implementation somewhat. But I do think this (i.e. it doesn't fit into the model) is one reason why the ability to add new dispatching operations like you suggest isn't present in Ada (or in any other compiled language that I know of).

Without having checked the rationale for Ada 95 (where tagged types were introduced), I am pretty sure the freezing rules for tagged types are derived from the simple requirement that all objects in T'Class should have all the dispatching operations of type T.
To fulfill that requirement, you have to freeze type and say that no more dispatching operations can be added to type T once you:
Derive a type from T, or
Are at the end of the package specification where T was declared.
If you didn't do that, you could have a type derived from type T (i.e. in T'Class), which hadn't inherited all the dispatching operations of type T. If you passed an object of that type as a T'Class parameter to a subprogram, which knew of one more dispatching operation on type T, a call to that operation would have to fail. - We wouldn't want that to happen.

Answering your extended question:
Ada comes with both a Reference Manual (the ISO standard), a Rationale and an Annotated Reference Manual. And a large part of the discussions behind these documents are public as well.
For Ada 2012 see http://www.adaic.org/ada-resources/standards/ada12/
Tagged types (dynamic dispatching) was introduced in Ada 95. The documents related to that version of the standard can be found at http://www.adaic.org/ada-resources/standards/ada-95-documents/

Related

What makes Julia Composable?

I have seen many places the mention that Julia is "Composable". I know that the word itself means:
Composability is a system design principle that deals with the inter-relationships of components. A highly composable system provides components that can be selected and assembled in various combinations to satisfy specific user requirements.
But I am curious what the specific components of Julia are that make it composable. Is it the ability to override base functions with my own implementation?
I guess I'll hazard an answer, though my understanding may be no more complete than yours!
As far as I understand it (in no small part from Stefan's "Unreasonable Effectiveness of Multiple Dispatch" JuliaCon talk as linked by Oscar in the comments), I would say that it is in part:
As you say, the ability override base functions with your own implementation [and, critically, then have it "just work" (be dispatched to) whenever appropriate thanks to multiple dispatch] ...since this means if you make a custom type and define all the fundamental / primitive operations on that type (as in https://docs.julialang.org/en/v1/manual/interfaces/ -- say +-*/ et al. for numeric types, or getindex, setindex! et al. for an array-like type, etc.), then any more complex program built on those fundamentals will also "just work" with your new custom type. And that in turn means your custom type will also work (AKA compose) with other people's packages without any need for (e.g.) explicit compatibility shims as long as people haven't over-constrained their function argument types (which is, incidentally, why over-constraining function argument types is a Julia antipattern )
Following on 1), the fact that so many Base methods are also just plain Julia, so will also work with your new custom type as long as the proper fundamental operations are defined
The fact that Julia's base types and methods are generally performant and convenient enough that in many cases there's no need to do anything custom, so you can just put together blocks that all operate on, e.g., plain Julia arrays or tuples or etc.This last point is perhaps most notable in contrast to a language like Python where, for example, every sufficiently large subset of the ecosystem (numpy, tensorflow, etc.) has their own reimplementation of (e.g.) arrays, which for better performance are all ultimately implemented in some other language entirely (C++, for numpy and TF) and thus probably do not compose with each other.

Changing method dispatch in Common Lisp

I'm trying to simulate something akin to Haskell's typeclasses with Common Lisp's CLOS. That is, I'd like to be able to dispatch a method on an object's "typeclasses" instead of its superclasses.
I have a metaclass defined for classes which have and implement typeclasses(which are just other classes). Those classes(those that implement typeclasses) have a slot containing the list of the typeclasses they implement.
I'd like to be able to define methods for a typeclass, and then be able to dispatch that method on objects whose class implement that typeclass. And I'd like to be able to add and remove typeclasses dynamically.
I figure I could probably do this by changing the method dispatch algorithm, though that doesn't seem too simple.
Anybody is comfortable enough with CLOS and the MOP to give me some suggestions?
Thanks.
Edit: My question might be specified as, how do I implement compute-applicable-methods-using-classes and compute-applicable-methods for a "custom" generic-function class such that if some of the specializers of a generic function method are typeclasses(classes whose metaclass is the 'typeclass' class), then the corresponding argument's class must implement the typeclass(which simply means having the typeclass stored in a slot of the argument's class) for the method to be applicable?
From what I understand from documentation, when a generic function is called, compute-discriminating-functionis first called, which will first attempt to obtain applicable methods through compute-applicable-methods-using-classes, and if unsuccessful, will try the same with compute-applicable-methods.
While my definition of compute-applicable-methods-using-classes seems to work, the generic function fails to dispatch an applicable function. So the problem must be in compute-discriminating-function or compute-effective-method.
See code.
This is not easily achievable in Common Lisp.
In Common Lisp, operations (generic functions) are separate from types (classes), i.e. they're not "owned" by types. Their dispatch is done at runtime, with the possibility of adding, modifying and removing methods at runtime as well.
Usually, errors from missing methods are signaled only at runtime. The compiler has no way to know if a generic function is being "well" used or not.
The idiomatic way in Common Lisp is to use generic functions and describe its requirements, or in other words, the closest to an interface in Common Lisp is a set of generic functions and a marker mixin class. But most usually, only a protocol is specified, and its dependencies on other protocols. See, for instance, the CLIM specification.
As for type classes, it's a key feature that keeps the language not only fully type-safe, but also makes it very extensible in that aspect. Otherwise, either the type system would be too strict, or the lack of expressiveness would lead to type-unsafe situations, at least from the compiler's point of view. Note that Haskell doesn't keep, or doesn't have to keep, object types at runtime, it takes every type inference at compile-time, much in contrast with idiomatic Common Lisp.
To have something similar to type classes in Common Lisp at runtime, you have a few choices
Should you choose to support type classes with its rules, I suggest you use the meta-object protocol:
Define a new generic function meta-class (i.e. one which inherits from standard-generic-function)
Specialize compute-applicable-methods-using-classes to return false as a second value, because classes in Common Lisp are represented solely by their name, they're not "parameterizable" or "constrainable"
Specialize compute-applicable-methods to inspect the argument's meta-classes for types or rules, dispatch accordingly and possibly memoize results
Should you choose to only have parameterizable types (e.g. templates, generics), an existing option is the Lisp Interface Library, where you pass around an object that implements a particular strategy using a protocol. However, I see this mostly as an implementation of the strategy pattern, or an explicit inversion of control, rather than actual parameterizable types.
For actual parameterizable types, you could define abstract unparameterized classes from which you'd intern concrete instances with funny names, e.g. lib1:collection<lib2:object>, where collection is the abstract class defined in the lib1 package, and the lib2:object is actually part of the name as is for a concrete class.
The benefit of this last approach is that you could use these classes and names anywhere in CLOS.
The main disadvantage is that you must still generate concrete classes, so you'd probably have your own defmethod-like macro that would expand into code that uses a find-class-like function which knows how to do this. Thus breaking a significant part of the benefit I just mentioned, or otherwise you should follow the discipline of defining every concrete class in your own library before using them as specializers.
Another disadvantage is that without further non-trivial plumbing, this is too static, not really generic, as it doesn't take into account that e.g. lib1:collection<lib2:subobject> could be a subclass of lib1:collection<lib2:object> or vice-versa. Generically, it doesn't take into account what is known in computer science as covariance and contravariance.
But you could implement it: lib:collection<in out> could represent the abstract class with one contravariant argument and one covariant argument. The hard part would be generating and maintaining the relationships between concrete classes, if at all possible.
In general, a compile-time approach would be more appropriate at the Lisp implementation level. Such Lisp would most probably not be a Common Lisp. One thing you could do is to have a Lisp-like syntax for Haskell. The full meta-circle of it would be to make it totally type-safe at the macro-expansion level, e.g. generating compile-time type errors for macros themselves instead of only for the code they generate.
EDIT: After your question's edit, I must say that compute-applicable-methods-using-classes must return nil as a second value whenever there is a type class specializer in a method. You can call-next-method otherwise.
This is different than there being a type class specializer in an applicable method. Remember that CLOS doesn't know anything about type classes, so by returning something from c-a-m-u-c with a true second value, you're saying it's OK to memoize (cache) given the class alone.
You must really specialize compute-applicable-methods for proper type class dispatching. If there is opportunity for memoization (caching), you must do so yourself here.
I believe you'll need to override compute-applicable-methods and/or compute-applicable-methods-using-classes which compute the list of methods that will be needed to implement a generic function call. You'll then likely need to override compute-effective-method which combines that list and a few other things into a function which can be called at runtime to perform the method call.
I really recommend reading The Art of the Metaobject Protocol (as was already mentioned) which goes into great detail about this. To summarize, however, assume you have a method foo defined on some classes (the classes need not be related in any way). Evaluating the lisp code (foo obj) calls the function returned by compute-effective-method which examines the arguments in order to determine which methods to call, and then calls them. The purpose of compute-effective-method is to eliminate as much of the run-time cost of this as is possible, by compiling the type tests into a case statement or other conditional. The Lisp runtime thus does not have to query for the list of all methods each time you make a method call, but only when you add, remove or change a method implementation. Usually all of that is done once at load time and then saved into your lisp image for even better performance while still allowing you to change these things without stopping the system.

How does Julia implement multimethods?

I've been reading a bit from http://c2.com/cgi/wiki?ImplementingMultipleDispatch
I've been having some trouble finding reference on how Julia implements multimethods. What's the runtime complexity of dispatch, and how does it achieve it?
Dr. Bezanson's thesis is certainly the best source for descriptions of Julia's internals right now:
4.3 Dispatch system
Julia’s dispatch system strongly resembles the multimethod systems found in some object-oriented languages [17, 40, 110, 31, 32, 33]. However we prefer the term type-based dispatch, since our system actually works by dispatching a single tuple type of arguments. The difference is subtle and in many cases not noticeable, but has important conceptual implications. It means that methods are not necessarily restricted to specifying a type for each argument “slot”. For example a method signature could be Union{Tuple{Any,Int}, Tuple{Int,Any}}, which matches calls where either, but not necessarily both, of two arguments is an Int.2
The section continues to describe the type and method caches, sorting by specificity, parametric dispatch, and ambiguities. Note that tuple types are covariant (unlike all other Julian types) to match the covariant behavior of method dispatch.
The biggest key here is that the method definitions are sorted by specificity, so it's just a linear search to check if the type of the argument tuple is a subtype of the signature. So that's just O(n), right? The trouble is that checking subtypes with full generality (including Unions and TypeVars, etc) is hard. Very hard, in fact. Worse than NP-complete, it's estimated to be ΠP2 (see the polynomial hierarchy) — that is, even if P=NP, this problem would still take non-polynomial time! It might even be PSPACE or worse.
Of course, the best source for how it actually works is the implementation in JuliaLang/julia/src/gf.c (gf = generic function). There's a rather useful comment there:
Method caches are divided into three parts: one for signatures where
the first argument is a singleton kind (Type{Foo}), one indexed by the
UID of the first argument's type in normal cases, and a fallback
table of everything else.
So the answer to your question about the complexity of method lookup is: "it depends." The first time a method is called with a new set of argument types, it must go through that linear search, looking for a subtype match. If it finds one, it specializes that method for the particular arguments and places it into one of those caches. This means that before embarking upon the hard subtype search, Julia can perform a quick equality check against the already-seen methods… and the number of methods it needs to check are further reduced since the caches are stored as hashtables based upon the first argument.
But, really, your question was about the runtime complexity of dispatch. In that case, the answer is quite often "what dispatch?" — because it has been entirely eliminated! Julia uses LLVM as a just-barely-ahead-of-time compiler, wherein methods are compiled on-demand as they're needed. In performant Julia code, the types should be concretely inferred at compile time, and so dispatch can also be performed at compile time. This completely eliminates the runtime dispatch overhead, and potentially even inlines the found method directly into the caller's body (if it's small) to remove all function call overhead and allow for further optimizations downstream. If the types aren't concretely inferred, there are other performance pitfalls, and I've not profiled to see how much time is typically spent in dispatch. There are ways to optimize this case further, but there's likely bigger fish to fry first… and for now it's typically easiest to just make the hot loops type-stable in the first place.

slot assignment: `#` vs `slot()` vs `setReplaceMethod()`

I am writing my first R package and I am trying to figure out what the best way to assign values to a slot in an S4 object, keeping in mind that end-users shouldn't have to fuss with the details of the S4 class structures being used. Which of the following is best?
Accessing slot directly using object#MySlot <- value:
I understand that this bad practice (e.g., this Q&A).
Using slot(object, "MySlot") <- value:
The R help says there isn't checking when getting the values, but there is checking when setting (assuming check hasn't been set to FALSE). This sounds reasonable to me, and strikes me as a nice way to do it because I don't have to code my own get/set methods as per below.
Using custom methods with setReplaceMethod():
How does this approach compare against the second option above? It's more work to produce the necessary get/set methods, but I can more explicitly be sure that the values being written to the slots are valid for that slot type.
setGeneric("MySlot", function(object) {
standardGeneric("MySlot")
})
setMethod("MySlot",
signature = "MyClass",
definition = function(object) {
return(object#MySlot)
})
setGeneric("MySlot<-",
function(object, value) {
standardGeneric("MySlot<-")
})
setReplaceMethod("MySlot",
signature="MyClass",
function(object, value) {
object#MySlot<- value
validObject(object) # could add other checks
return(object)
})
By definition, "not having to fuss with the details of the S4 class structure" means end-users should not have to know about your slots. As such, any wrappers you write as in steps 2 and 3 are more for internal consistency checks. I think more important is to check for edge cases where you think your integrity checks would fail using unit tests.
As you have pointed out, #1 can be ruled out fairly easily, and should only be used in internal methods. Whether you encourage #2 or implement #3 depends on the contents of the variable and personal taste, but I would encourage the latter. For example, if you have a logical flag, you can use #2, but more descriptive would be enableFoo(). That being said, if you have to consider developer time (which in real life is almost always true) you should think briefly about the trade-off between relegating mutators to slot<- for members that probably won't be accessed frequently (e.g., by less than 1% of users), versus implementing custom mutators for everything as in #3.
Finally, since almost all three of R's OOP systems are essentially syntactic sugar and aren't really respected semantically by the language (where only S4 can claim some exception), it is easy to forget the fundamental ideas behind object-oriented programming as implemented in other languages: any code executing outside of your methods should not know about the object's members. You are providing an external interface to the world that is and should be a black box.

Is there a programming language that meets these requirements?

I have a few requirements before implementing my next program - hopefully a programming language exists that can do the following:
Given a class (or interface) C, the programming language allows the user access to a list of all classes which extend/implement C.
The programming language allows the user to iterate through all the variables and methods of a class.
The user is able to determine the number and types of arguments a function will take.
eg. foo(int a, String b, int c) can be queried
to return 3 or [int, String, int]
Are these absurd requirements or does some language implement them as basic techniques of reflection?
I know that you prefer a statically-typed language, but if you consider a dynamically-typed one Smalltalk may be a good fit, since everything is an object (classes and methods are no exception ot this rule) and thus everything can be manipulated (not only queried, but also changed). Going to your requirements:
Given a class (or interface) C, the programming language allows the
user access to a list of all classes which extend/implement C.
In Smalltalk there is no built-in notion of interface (though I think that I've seen extensions that added support for it). However, you can:
Given a class, find it direct subclasses: Number subclasses answers {Fraction. Float. Integer}.
Or all the hierarchy under it: Number allSubclasses answers an OrderedCollection(Fraction Float Integer ScaledDecimal SmallInteger LargePositiveInteger LargeNegativeInteger)
You can also find all classes that implement a given selector (pop in this case):
SystemNavigation default allClassesImplementing: #pop answers {ContextPart. FileSystemGuide. LIFOQueue. Stack}
As you can see, defining an "Interface" object to query for classes that implement a set of methods is quite easy (just have a collection of method names and query for classes implementing each of them, adding the classes to a set). However if you want to explicitly state in the class that it implements an interface, then you'll need to do more work.
The programming language allows the user to iterate through all the
variables and methods of a class.
Point instVarNames answers #('x' 'y')
Point allMethods answers a collection of CompiledMethods (the object that represents a method)
Point allSelectors answers a collection of all the method names that an instance of that class can answer to.
The user is able to determine the number and types of arguments a
function will take.
In this case you interact with compiled methods and ask them for the number of arguments they need (there is no notion of parameter type):
(Point methodNamed: #x) numArgs answers 0, since it is just a getter.
(Point methodNamed: #+) numArgs answers 1
This is just a small preview of the reflective capabilities of Smalltalk; if you want to go deeper you can check out some of these links:
Reflective Facilities in Smalltalk-80
Evaluating message passing control techniques in Smalltalk
Debugging Objects
HTH
I would expect most Lisp systems (Scheme, CommonLisp, ...) to meet these requirements.
Java can do this. Yet, note that no language will do:
Given a class (or interface) C, the programming language allows the
user access to a list of all classes which extend/implement C
The reason is that the number of classes that extend C is either zero (in case of final classes) or infinite. In the latter case, which is the norm, only a tiny portion of all classes that extend C has actually been written down and compiled, and only those you can access.

Resources