Hidden remote function call of a local binding in ocaml - functional-programming

Given the following example for generating a lazy list number sequence:
type 'a lazy_list = Node of 'a * (unit -> 'a lazy_list);;
let make =
let rec gen i =
Node(i, fun() -> gen (i + 1))
in gen 0
;;
I asked myself the following questions when trying to understand how the example works (obviously I could not answer myself and therefore I am asking here)
When calling let Node(_, f) = make and then f(), why does the call of gen 1 inside f() succeed although gen is a local binding only existing in make?
Shouldn't the created Node be completely unaware of the existence of gen? (Obviously not since it works.)
How is a construction like this being handled by the compiler?

First of all, the questions that are asking have nothing to do with the concepts of lazy, so we can disregard this particular issue, to simplify the discussion.
As Jeffrey noted in the comment to your question, the answer is simple - it is a closure.
But let me extend it a little bit. Functional programming languages, as well as many other modern languages, including Python and C++, allows to define functions in a scope of another function and to refer to the variables available in the scope of the enclosing function. These variables are called captured variables, and the created functional object along with the captured values is called the closure.
From the compiler perspective, the implementation is rather simple (to understand). The closure is a normal value, that contains a code to be executed, as well as pointers to the extra values, that were captured from the outer scope. Since OCaml is a garbage collected language, the values are preserved, as they are referenced from a live object. In C++ the story is much more complicated, as C++ doesn't have the GC, but this is a completely different story.
Shouldn't the created Node be completely unaware of the existence of gen? (Obviously not since it works.)
The create Node is an object that has two pointers, a pointer to the initial object i, and a pointer to the anonymous function fun() -> gen (i + 1). The anonymous function has a pointer to the same initial object i. In our particular case, the i is an integer, so instead of being a pointer the i value is represented inline, but these are details that are irrelevant to the question.

Related

In (Free) Pascal, can a function return a value that can be modified without dereference?

In Pascal, I understand that one could create a function returning a pointer which can be dereferenced and then assign a value to that, such as in the following (obnoxiously useless) example:
type ptr = ^integer;
var d: integer;
function f(x: integer): ptr;
begin
f := #x;
end;
begin
f(d)^ := 4;
end.
And now d is 4.
(The actual usage is to access part of a quite complicated array of records data structure. I know that a class would be better than an array of nested records, but it isn't my code (it's TeX: The Program) and was written before Pascal implementations supported object-orientation. The code was written using essentially a language built on top of Pascal that added macros which expand before the compiler sees them. Thus you could define some macro m that takes an argument x and expands into thearray[x + 1].f1.f2 instead of writing that every time; the usage would be m(x) := somevalue. I want to replicate this functionality with a function instead of a macro.)
However, is it possible to achieve this functionality without the ^ operator? Can a function f be written such that f(x) := y (no caret) assigns the value y to x? I know that this is stupid and the answer is probably no, but I just (a) don't really like the look of it and (b) am trying to mimic exactly the form of the macro I mentioned above.
References are not first class objects in Pascal, unlike languages such as C++ or D. So the simple answer is that you cannot directly achieve what you want.
Using a pointer as you illustrated is one way to achieve the same effect although in real code you'd need to return the address of an object whose lifetime extends beyond that of the function. In your code that is not the case because the argument x is only valid until the function returns.
You could use an enhanced record with operator overloading to encapsulate the pointer, and so encapsulate the pointer dereferencing code. That may be a good option, but it very much depends on your overall problem, of which we do not have sight.

Why don't Julia closures copy arrays?

Just discovered a nasty bug in my program based on the fact that Julia does not copy arrays when defining a closure. This makes continuation programming hard. What was the motivation for this design choice?
Any suggestions for decoupling the state of my closure from the program state?
As an example
l = [2 1; 0 0];
f = x -> l[2,2];
Then f(1) = 0 but if you change l[2,2] = 1, then f(1) = 1.
Your assumption that this is a "closure" does not hold. l is not a "closed" variable in the context of the anonymous function at that point. It is simply a reference to a variable inherited from 'external' scope (since it has not been redefined locally inside the anonymous function).
Here's an example of a true closure:
f = let l=[2 1;0 0]
x -> l[2,2];
end
The variable l now is local to the let block, and not present at global scope. f still has access to it, even though it has technically gone out of scope. This is what a closure means.
As a result of l having gone out of scope, it is no longer accessible except through f which is a closure having access to it as a closed variable.
PS. I'm going to go out on a limb here and assume that what you're expecting was matlab-like behaviour. The big difference with matlab is that when you define an anonymous function handle there, it captures the current state of the workspace by copying all the variables and making them part of the function 'object'. You can confirm this by using the functions command. Matlab doesn't have references in the same way as julia. This is a strength of julia, not a weakness, as it allow the user to make use of optimizations that avoid reallocation of memory, that are harder to achieve in matlab*.
* though in fairness, matlab shines in other ways, by attempting to optimise this for you
EDIT: Liso pointed out a very important pitfall in the comments. Assume l already exists in the global workspace, and we type
let l=l
while this is perfectly valid syntax, making l a local variable to the let block, this is still initialised simply as a reference to the global l. Therefore any changes to the global l will still affect the closure, which is not what you want. In this case, you should be trying to 'mimic' matlab behaviour by making a copy (or a deep copy, depending on your use case), such that the local variable is truly independent of anything else once it goes out of scope and becomes 'closed' i.e.
let l = deepcopy(l)
Also, for completeness, when one makes a closure in julia, it is worth pointing out how this is implemented under the hood: your resulting f function is simply a callable object, containing a field for each 'closed' variable it needs to be aware of; you could even access this as f.l.

Reflecting on a Type parameter

I am trying to create a function
import Language.Reflection
foo : Type -> TT
I tried it by using the reflect tactic:
foo = proof
{
intro t
reflect t
}
but this reflects on the variable t itself:
*SOQuestion> foo
\t => P Bound (UN "t") (TType (UVar 41)) : Type -> TT
Reflection in Idris is a purely syntactic, compile-time only feature. To predict how it will work, you need to know about how Idris converts your program to its core language. Importantly, you won't be able to get ahold of reflected terms at runtime and reconstruct them like you would with Lisp. Here's how your program is compiled:
Internally, Idris creates a hole that will expect something of type Type -> TT.
It runs the proof script for foo in this state. We start with no assumptions and a goal of type Type -> TT. That is, there's a term being constructed which looks like ?rhs : Type => TT . rhs. The ?foo : ty => body syntax shows that there's a hole called foo whose eventual value will be available inside of body.
The step intro t creates a function whose argument is t : Type - this means that we now have a term like ?foo_body : TT . \t : Type => foo_body.
The reflect t step then fills the current hole by taking the term on its right-hand side and converting it to a TT. That term is in fact just a reference to the argument of the function, so you get the variable t. reflect, like all other proof script steps, only has access to the information that is available directly at compile time. Thus, the result of filling in foo_body with the reflection of the term t is P Bound (UN "t") (TType (UVar (-1))).
If you could do what you are wanting here, it would have major consequences both for understanding Idris code and for running it efficiently.
The loss in understanding would come from the inability to use parametricity to reason about the behavior of functions based on their types. All functions would effectively become potentially ad-hoc polymorphic, because they could (say) run differently on lists of strings than on lists of ints.
The loss in performance would come from representing enough type information to do the reflection. After Idris code is compiled, there is no type information left in it (unlike in a system such as the JVM or .NET or a dynamically typed system such as Python, where types have a runtime representation that code can access). In Idris, types can be very large, because they can contain arbitrary programs - this means that far more information would have to be maintained, and computation occurring at the type level would also have to be preserved and repeated at runtime.
If you're wanting to reflect on the structure of a type for further proof automation at compile time, take a look at the applyTactic tactic. Its argument should be a function that takes a reflected context and goal and gives back a new reflected tactic script. An example can be seen in the Data.Vect source.
So I suppose the summary is that Idris can't do what you want, and it probably never will be able to, but you might be able to make progress another way.

What functional programming terminology distinguishes between avoding modifying variables and objects?

In functional programming, what terminology is used to distinguish between avoiding modifying what a variable refers to, and avoiding modifying an object itself?
For example, in Ruby,
name += title
avoids modifying the object previously referred to by name, instead creating a new object, but regrettably makes name refer to the new object, whereas
full_title = name + title
not only avoids modifying objects, it avoids modifying what name refers to.
What terminology would you use for code that avoids the former?
Using a name to refer to something other than what it did in an enclosing/previous scope is known as "shadowing" that name. It is indeed distinct from mutation. In Haskell, for example I can write
return 1 >>= \x -> return (x + 1) >>= \x -> print x.
The x that is printed is the one introduced by the second lambda, i.e., 2.
In do notation this looks a bit more familiar:
foo = do
x <- return 1
x <- return (x + 1)
print x
As I understand it, Erlang forbids aliasing altogether.
However, I suspect that mathepic is right in terms of Ruby -- its not just shadowing the name but mutating some underlying obect. On the other hand, I don't know Ruby that well...
I think functional programming languages simply do not have any operators that destructively updates one of the source operands (is destructive update, perhaps, the term you're looking for?). A similar philosophy is seen in instruction set design: the RISC philosophy (increasingly used in even the x86 architecture, in the newer extensions) is to have three-operand instructions for binary operators, where you have to explicitly specify that the target operand is the same as one of the sources if you want destructive update.
For the latter, some hybrid languages (like Scala; the same terminologies are used in X10) distinguish between values (val) and variables (var). The former cannot be reassigned, the latter can. If they point to a mutable object, then of course that object itself can still be modified.

Haskell "collections" language design

Why is the Haskell implementation so focused on linked lists?
For example, I know Data.Sequence is more efficient
with most of the list operations (except for the cons operation), and is used a lot;
syntactically, though, it is "hardly supported". Haskell has put a lot of effort into functional abstractions, such as the Functor and the Foldable class, but their syntax is not compatible with that of the default list.
If, in a project I want to optimize and replace my lists with sequences - or if I suddenly want support for infinite collections, and replace my sequences with lists - the resulting code changes are abhorrent.
So I guess my wondering can be made concrete in questions such as:
Why isn't the type of map equal to (Functor f) => (a -> b) -> f a -> f b?
Why can't the [] and (:) functions be used for, for example, the type in Data.Sequence?
I am really hoping there is some explanation for this, that doesn't include the words "backwards compatibility" or "it just grew that way", though if you think there isn't, please let me know. Any relevant language extensions are welcome as well.
Before getting into why, here's a summary of the problem and what you can do about it. The constructors [] and (:) are reserved for lists and cannot be redefined. If you plan to use the same code with multiple data types, then define or choose a type class representing the interface you want to support, and use methods from that class.
Here are some generalized functions that work on both lists and sequences. I don't know of a generalization of (:), but you could write your own.
fmap instead of map
mempty instead of []
mappend instead of (++)
If you plan to do a one-off data type replacement, then you can define your own names for things, and redefine them later.
-- For now, use lists
type List a = [a]
nil = []
cons x xs = x : xs
{- Switch to Seq in the future
-- type List a = Seq a
-- nil = empty
-- cons x xs = x <| xs
-}
Note that [] and (:) are constructors: you can also use them for pattern matching. Pattern matching is specific to one type constructor, so you can't extend a pattern to work on a new data type without rewriting the pattern-matchign code.
Why there's so much list-specific stuff in Haskell
Lists are commonly used to represent sequential computations, rather than data. In an imperative language, you might build a Set with a loop that creates elements and inserts them into the set one by one. In Haskell, you do the same thing by creating a list and then passing the list to Set.fromList. Since lists so closely match this abstraction of computation, they have a place that's unlikely to ever be superseded by another data structure.
The fact remains that some functions are list-specific when they could have been generic. Some common functions like map were made list-specific so that new users would have less to learn. In particular, they provide simpler and (it was decided) more understandable error messages. Since it's possible to use generic functions instead, the problem is really just a syntactic inconvenience. It's worth noting that Haskell language implementations have very little list-speficic code, so new data structures and methods can be just as efficient as the "built-in" ones.
There are several classes that are useful generalizations of lists:
Functor supplies fmap, a generalization of map.
Monoid supplies methods useful for collections with list-like structure. The empty list [] is generalized to other containers by mempty, and list concatenation (++) is generalized to other containers by mappend.
Applicative and Monad supply methods that are useful for interpreting collections as computations.
Traversable and Foldable supply useful methods for running computations over collections.
Of these, only Functor and Monad were in the influential Haskell 98 spec, so the others have been overlooked to varying degrees by library writers, depending on when the library was written and how actively it was maintained. The core libraries have been good about supporting new interfaces.
I remember reading somewhere that map is for lists by default since newcomers to Haskell would be put off if they made a mistake and saw a complex error about "Functors", which they have no idea about. Therefore, they have both map and fmap instead of just map.
EDIT: That "somewhere" is the Monad Reader Issue 13, page 20, footnote 3:
3You might ask why we need a separate map function. Why not just do away with the current
list-only map function, and rename fmap to map instead? Well, that’s a good question. The
usual argument is that someone just learning Haskell, when using map incorrectly, would much
rather see an error about lists than about Functors.
For (:), the (<|) function seems to be a replacement. I have no idea about [].
A nitpick, Data.Sequence isn't more efficient for "list operations", it is more efficient for sequence operations. That said, a lot of the functions in Data.List are really sequence operations. The finger tree inside Data.Sequence has to do quite a bit more work for a cons (<|) equivalent to list (:), and its memory representation is also somewhat larger than a list as it is made from two data types a FingerTree and a Deep.
The extra syntax for lists is fine, it hits the sweet spot at what lists are good at - cons (:) and pattern-matching from the left. Whether or not sequences should have extra syntax is further debate, but as you can get a very long way with lists, and lists are inherently simple, having good syntax is a must.
List isn't an ideal representation for Strings - the memory layout is inefficient as each Char is wrapped with a constructor. This is why ByteStrings were introduced. Although they are laid out as an array ByteStrings have to do a bit of administrative work - [Char] can still be competitive if you are using short strings. In GHC there are language extensions to give ByteStrings more String-like syntax.
The other major lazy functional Clean has always represented strings as byte arrays, but its type system made this more practical - I believe the ByteString library uses unsafePerfomIO under the hood.
With version 7.8, ghc supports overloading list literals, compare the manual. For example, given appropriate IsList instances, you can write
['0' .. '9'] :: Set Char
[1 .. 10] :: Vector Int
[("default",0), (k1,v1)] :: Map String Int
['a' .. 'z'] :: Text
(quoted from the documentation).
I am pretty sure this won't be an answer to your question, but still.
I wish Haskell had more liberal function names(mixfix!) a la Agda. Then, the syntax for list constructors (:,[]) wouldn't have been magic; allowing us to at least hide the list type and use the same tokens for our own types.
The amount of code change while migrating between list and custom sequence types would be minimal then.
About map, you are a bit luckier. You can always hide map, and set it equal to fmap yourself.
import Prelude hiding(map)
map :: (Functor f) => (a -> b) -> f a -> f b
map = fmap
Prelude is great, but it isn't the best part of Haskell.

Resources