What are "downward funargs"? - functional-programming

Jamie Zawinski uses that term in his (1997) article "java sucks" as if you should know what it means:
I really hate the lack of downward-funargs; anonymous classes are a lame substitute. (I can live without long-lived closures, but I find lack of function pointers a huge pain.)
It seems to be Lisper's slang, and I could find the following brief definition here, but somehow, I think I still don't get it:
Many closures are used only during the extent of the bindings they refer to; these are known as "downward funargs" in Lisp parlance.
Were it not for Steve Yegge, I'd just feel stupid now, but it seems, it might be OK to ask:
Jamie Zawinski is a hero. A living legend. [...] A guy who can use the term "downward funargs" and then glare at you just daring you to ask him to explain it, you cretin.
-- XEmacs is dead, long live XEmacs
So is there a Lisper here who can compile this for C-style-programmers like me?

Downward funargs are local functions that are not returned or otherwise leave their declaration scope. They only can be passed downwards to other functions from the current scope.
Two examples. This is a downward funarg:
function () {
var a = 42;
var f = function () { return a + 1; }
foo(f); // `foo` is a function declared somewhere else.
}
While this is not:
function () {
var a = 42;
var f = function () { return a + 1; }
return f;
}

To better understand where the term comes from, you need to know some history.
The reason why an old Lisp hacker might distinguish downward funargs from funargs in general is that downward funargs are easy to implement in a traditional Lisp that lacks lexical variables, whereas the general case is hard.
Traditionally a local variable was implemented in a Lisp interpreter by adding a binding (the symbol name of the variable, paired with its value) to the environment. Such an environment was simple to implement using an association list. Each function had its own environment, and a pointer to the environment of the parent function. A variable reference was resolved by looking in the current environment, and if not found there, then in the parent environment, and so on up the stack of environments until the global environment was reached.
In such an implementation, local variables shadow global variables with the same name. For example, in Emacs Lisp, print-length is a global variable that specifies the maximum length of list to print before abbreviating it. By binding this variable around the call to a function you can change the behaviour of print statements within that function:
(defun foo () (print '(1 2 3 4 5 6))) ; output depends on the value of print-length
(foo) ; use global value of print-length
==> (1 2 3 4 5 6)
(let ((print-length 3)) (foo)) ; bind print-length locally around the call to foo.
==> (1 2 3 ...)
You can see that in such an implementation, downward funargs are really easy to implement, because variables that are in the environment of the function when it's created will still be in the environment of the function when it's evaluated.
Variables that act like this are called special or dynamic variables, and you can create them in Common Lisp using the special declaration.

In Common Lisp:
(let ((a 3))
(mapcar (lambda (b) (+ a b))
(list 1 2 3 4)))
-> (4 5 6 7)
In above form the lambda function is passed DOWNWARD. When called by the higher-order function MAPCAR (which gets a function and a list of values as arguments, and then applies the function to each element of the list and returns a list of the results), the lambda function still refers to the variable 'a' from the LET expression. But it happens all within the LET expression.
Compare above with this version:
(mapcar (let ((a 3))
(lambda (b) (+ a b)))
(list 1 2 3 4))
Here the lambda function is returned from the LET. UPWARD a bit. It then gets passed to the MAPCAR. When MAPCAR calls the lambda function, its surrounding LET is no longer executing - still the function needs to reference the variable 'a' from the LET.

There's a pretty descriptive article on Wiki called Funarg problem
"A downwards funarg may also refer to
a function's state when that function
is not actually executing. However,
because, by definition, the existence
of a downwards funarg is contained in
the execution of the function that
creates it, the activation record for
the function can usually still be
stored on the stack."

Related

How is it possible that a function can call itself

I know about recursion, but I don't know how it's possible. I'll use the fallowing example to further explain my question.
(def (pow (x, y))
(cond ((y = 0) 1))
(x * (pow (x , y-1))))
The program above is in the Lisp language. I'm not sure if the syntax is correct since I came up with it in my head, but it will do. In the program, I am defining the function pow, and in pow it calls itself. I don't understand how it's able to do this. From what I know the computer has to completely analyze a function before it can be defined. If this is the case, then the computer should give an undefined message when I use pow because I used it before it was defined. The principle I'm describing is the one at play when you use an x in x = x + 1, when x was not defined previously.
Compilers are much smarter than you think.
A compiler can turn the recursive call in this definition:
(defun pow (x y)
(cond ((zerop y) 1)
(t (* x (pow x (1- y))))))
into a goto intruction to re-start the function from scratch:
Disassembly of function POW
(CONST 0) = 1
2 required arguments
0 optional arguments
No rest parameter
No keyword parameters
12 byte-code instructions:
0 L0
0 (LOAD&PUSH 1)
1 (CALLS2&JMPIF 172 L15) ; ZEROP
4 (LOAD&PUSH 2)
5 (LOAD&PUSH 3)
6 (LOAD&DEC&PUSH 3)
8 (JSR&PUSH L0)
10 (CALLSR 2 57) ; *
13 (SKIP&RET 3)
15 L15
15 (CONST 0) ; 1
16 (SKIP&RET 3)
If this were a more complicated recursive function that a compiler cannot unroll into a loop, it would merely call the function again.
From what I know the computer has to completely analyze a function before it can be defined.
When the compiler sees that one defines a function POW, then it tells itself: now we are defining function POW. If it then inside the definition sees a call to POW, then the compiler says to itself: oh, this seems to be a call to the function that I'm currently compiling and it can then create code to make a recursive call.
A function is just a block of code. It's name is just help so you don't have to calculate the exact address it will end up in. The programming language will turn the names into where the program is to go to execute.
How one function call another is by storing the address of the next command in this function on the stack, perhaps add arguments to the stack and then jump to the address location of the function. The function itself jumps to the return address it finds so that control goes back to the callee. There are several calling conventions implemented by the language on which side do what. CPUs don't really have function support so just like there is nothing called a while loop in CPUs functions are emulated.
Just like functions have names, arguments have names too, however they are mere pointers just like the return address. When calling itself it just adds a new return address and arguments onto the stack and jump to itself. The top of the stack will be different and thus the same variable names are unique addresses to the call so x and y in the previous call is somewhere else than the current x and y. In fact there is no special treatment needed for calling itself than calling anything else.
Historically the first high level language, Fortran, did not support recursion. It would call itself but when it returned it returned to the original callee without doing the rest of the function after the self call. Fortran itself would have been impossible to write without recursion so while itself used recursion it did not offer it to the programmer that used it. This limitation is the reason why John McCarthy discovered Lisp.
I think to see how this can work in general, and in particular in cases where recursive calls can't be turned into loops, it's worth thinking about how a general compiled language might work, because the problems are not different.
Let's imagine how a compiler might turn this function into machine code:
(defun foo (x)
(+ x (bar x)))
And let's assume that it does not know anything about bar at the time of compilation. Well, it has two options.
It can compile foo in such a way that the call to bar is translated a set of instructions which say, 'look up the function definition stored under the name bar, whatever it currently is, and arrange to call that function with the right arguments'.
It can compile foo in such a way that there is a machine-level function call to a function but the address of that function is left as a placeholder of some kind. And it can then attach some metadata to foo which says: 'before this function is called you need to find the function named bar, find its address, splice it into the code in the right place, and remove this metadata.
Both of these mechanisms allow foo to be defined before it's known what bar is. And note that instead of bar I could have written foo: these mechanisms deal with recursive calls too. They differ apart from that, however.
The first mechanism means that, every time foo is called it needs to do some kind of dynamic lookup for bar which will involve some overhead (but this overhead can be pretty small):
as a consequence of this the first mechanism will be slightly slower than it might be;
but, also as a consequence of this, if bar gets redefined, then the new definition will get picked up, which is a very desirable thing for an interactive language, which Lisp implementations usually are.
The second mechanism means that, after foo has all its references to other functions linked in to it, then the calls happen at the machine level:
this means they will be quick;
but that redefinition will be, at best, more complicated or, at worst, not possible at all.
The second of these implementations is close to how traditional compilers compile code: they compile code leaving a bunch of placeholders with associated metadata saying what names those placeholders correspond to. A linker, (sometimes known as a link-loader, or loader) then grovels over all the files produced by the compiler as well as other libraries of code and resolves all these references, resulting in a bit of code which can actually be run.
A very simple-minded Lisp system might work entirely by the first mechanism (I am pretty sure that this is how Python works, for instance). A more advanced compiler will probably work by some combination of the first and second mechanism. As an example of this, CL allows the compiler to make assumptions that apparent self-calls in functions really are self-calls, and so the compiler may well compile them as direct calls (essentially it will compile the function and then link it on the fly). But when compiling code in general, it might call 'through the name' of the function.
There are also more-or-less heroic strategies which things could do: for instance at the first call of a function link it, on the fly, to all the things it refers to, and note in their definitions that if they change then this thing needs to be unlinked as well so it all happens again. These kind of tricks once seemed implausible, but compilers for languages like JavaScript do things at least as hairy as this all the time now.
Note that compilers and linkers for modern systems actually do something more complicated than I've described, because of shared libraries &c: what I described is more-or-less what happened pre shared-library.

The place of closures in functional programming

I have watched the talk of Robert C Martin "Functional Programming; What? Why? When?"
https://www.youtube.com/watch?v=7Zlp9rKHGD4
The main message of this talk is that a state is unacceptable in functional programming.
Martin goes even further, claims that assigments are 'evil'.
So... keeping in mind this talk my question is, where is a place for closure in functional programming?
When there is no state or no variable in a functional code, what would be a main reason to create and use such closure (closure that does not enclose any state, any variable)? Is the closure mechanism useful?
Without a state or a variable, (maybe only with immutables ids), there is no need to reference to a current lexical scope (there is nothing that could be changed)?
In this approach, that is enough to use Java-like lambda mechanism, where there is no link to current lexical scope (that's why the variables have to be final).
In some sources, closures are meant to be a must have element of functional language.
A lexical scope that can be closed over does not need to be mutable to be useful. Just consider curried functions as an example:
add = \a -> \b -> a+b
add1 = add(1)
add3 = add(3)
[add1(0), add1(2), add3(2), add3(5)] // [1, 2, 5, 8]
Here, the inner lamba closes over the value of a (or over the variable a, which doesn't make a difference because of immutability).
Closures are not ultimately necessary for functional programming, but local variables are not either. Still, they're both very good ideas. Closures allow for a very simple notation of the most(?) important task of functional programming: to dynamically create new functions with specialised behaviour from an abstracted code.
You use closures as you would in a language with mutable variables. The difference is obviously that they (usually) can't be modified.
The following is a simple example, in Clojure (which ironically I'm writing with right now):
(let [a 10
f (fn [b]
(+ a b))]
(println (f 4))) ; Prints "14"
The main benefit to closures in a case like this is I can "partially apply" a function using a closure, then pass the partially applied function around, instead of needed to pass the non-applied function, and any data I'll need to call it (very useful, in many scenarios). In the below example, what if I didn't want to call the function right away? I would need to pass a with it so it's available when f is called.
But you could also add some mutability into the mix if you deemed it necessary (although, as #Bergi points out, this example is "evil"):
(let [a (atom 10) ; Atoms are mutable
f (fn [b]
(do
(swap! a inc) ; Increment a
(+ #a b)))]
(println (f 4)) ; Prints "15"
(println (f 4))); Prints "16"
In this way you can emulate static variables. You can use this to do cool things like define memoize. It uses a "static variable" to cache the input/output of referentially transparent functions. This increases memory use, but can save CPU time if used properly.
I have to disagree with being against the idea of having a state. States aren't evil; they're necessary. Every program has a state. Global, mutable states are evil.
Also note, you can have mutability, and still program functionally. Say I have a function, containing a map over a list. Also say, I need to maintain an accumulator while mapping. I really have 2 options (ignoring "doing it manually"):
Switch the map to a fold.
Create a mutable variable, and mutate it while mapping.
Although option one should be preferred, both these methods can be utilized during functional programming. From the view of "outside the function", there would be no difference, even if one version is internally using a mutable variable. The function can still be referentially transparent, and pure, since the only mutable state being affected is local to the function, and can't possibly effect anything outside.
Example code mutating a local variable:
(defn mut-fn [xs]
(let [a (atom 0)]
(map
(fn [x]
(swap! a inc) ; Increment a
(+ x #a)) ; Set the accumulator to x + a
xs)))
Note the variable a cannot be seen from outside the function, so any effect it has can in no way cause global changes. The function will produce the same output for each input, so it's effectively pure.

Common Lisp Binary Tree

I am trying to write a program in Common Lisp using GNU ClISP to compile it. I would like to enter a list such as (A(B (C) ()) (D (E) (F (G) ()))) and depending on the first word print out the pre-, in-, or post-order traversal. Example:
(pre '(A(B (C)... etc))
I am having trouble putting my logic into Clisp notation. I currently have the following code:
(defun leftchild (L)(cadr L))
(defun rightchild (L)(caddr L))
(defun data (L)(car L))
(defun pre (L)(if (null L) '()((data L)(pre(leftchild L))(pre(rightchild L)))))
... similar in and post functions
I get compiling errors saying that I should use a lambda in my pre function. I think this is due to the double (( infront of data because it is expecting a command, but I am not sure what I should put there. I don't think cond would work, because that would hinder the recursive loop. Also, will data L print as it is now? The compiler did not recognize (print (data L)).
I have been working on this code for over a week now, trying to troubleshoot it myself, but I am at a loss. I would greatly appreciate it if someone could explain what I am doing incorrectly.
Another question that I have is how can I make the program prompt a line to the user to enter the (pre '(A... etc)) so that when I run the compiled file the program will run instead of giving a funcall error?
Thank you for your time.
Short answer: If you want to use if, note that you'll need a progn in order to have more than one form in the consequent and alternative cases.
Long answer – also explains how to traverse accumulating the visited nodes in a list:
I guess this is homework, so I won't give you a full solution, but your question shows that you have basically the right idea, so I'll show you an easy, idiomatic way to do this.
First, you're right: The car of an unquoted form should be a function, so basically anything like (foo ...), where foo is not a function (or macro, special form ...), and the whole thing is to be evaluated, will be an error. Note that this does not hold inside special forms and macros (like cond, for example). These can change the evaluation rules, and not everything that looks like (foo bar) has to be a form that is to be evaluated by the normal evaluation rules. The easiest example would be quote, which simply returns its argument unevaluated, so (quote (foo bar)) will not be an error.
Now, about your problem:
An easy solution would be to have an accumulator and a recursive helper function that traverses the tree, and pushes the values in the accumulator. Something like this:
(defun pre (node)
(let ((result (list)))
(labels ((rec (node)
(cond (...
...
...))))
(rec node)
(nreverse result))))
The labels just introduces a local helper function, which will do the actual recursion, and the outer let gives you an accumulator to collect the node values. This solution will return the result as a list. If you just want to print each nodes value, you don't need the accumulator or the helper function. Just print instead of pushing, and make the helper your toplevel function.
Remember, that you'll need a base case where the recursion stops. You should check for that in the cond. Then, you'll need the recursive steps for each subtree and you'll need to push the node's value to the results. The order in which you do these steps decides whether you're doing pre-, in-, or post-order traversal. Your code shows that you already understand this principle, so you'll just have to make it work in Lisp-code. You can use push to push values to result, and consp to check whether a node is a non-empty list. Since there's nothing to do for empty lists, you'll basically only need one test in the cond, but you can also explicitly check whether the node is null, as you did in your code.

How does one implement a "stackless" interpreted language?

I am making my own Lisp-like interpreted language, and I want to do tail call optimization. I want to free my interpreter from the C stack so I can manage my own jumps from function to function and my own stack magic to achieve TCO. (I really don't mean stackless per se, just the fact that calls don't add frames to the C stack. I would like to use a stack of my own that does not grow with tail calls). Like Stackless Python, and unlike Ruby or... standard Python I guess.
But, as my language is a Lisp derivative, all evaluation of s-expressions is currently done recursively (because it's the most obvious way I thought of to do this nonlinear, highly hierarchical process). I have an eval function, which calls a Lambda::apply function every time it encounters a function call. The apply function then calls eval to execute the body of the function, and so on. Mutual stack-hungry non-tail C recursion. The only iterative part I currently use is to eval a body of sequential s-expressions.
(defun f (x y)
(a x y)) ; tail call! goto instead of call.
; (do not grow the stack, keep return addr)
(defun a (x y)
(+ x y))
; ...
(print (f 1 2)) ; how does the return work here? how does it know it's supposed to
; return the value here to be used by print, and how does it know
; how to continue execution here??
So, how do I avoid using C recursion? Or can I use some kind of goto that jumps across c functions? longjmp, perhaps? I really don't know. Please bear with me, I am mostly self- (Internet- ) taught in programming.
One solution is what is sometimes called "trampolined style". The trampoline is a top-level loop that dispatches to small functions that do some small step of computation before returning.
I've sat here for nearly half an hour trying to contrive a good, short example. Unfortunately, I have to do the unhelpful thing and send you to a link:
http://en.wikisource.org/wiki/Scheme:_An_Interpreter_for_Extended_Lambda_Calculus/Section_5
The paper is called "Scheme: An Interpreter for Extended Lambda Calculus", and section 5 implements a working scheme interpreter in an outdated dialect of Lisp. The secret is in how they use the **CLINK** instead of a stack. The other globals are used to pass data around between the implementation functions like the registers of a CPU. I would ignore **QUEUE**, **TICK**, and **PROCESS**, since those deal with threading and fake interrupts. **EVLIS** and **UNEVLIS** are, specifically, used to evaluate function arguments. Unevaluated args are stored in **UNEVLIS**, until they are evaluated and out into **EVLIS**.
Functions to pay attention to, with some small notes:
MLOOP: MLOOP is the main loop of the interpreter, or "trampoline". Ignoring **TICK**, its only job is to call whatever function is in **PC**. Over and over and over.
SAVEUP: SAVEUP conses all the registers onto the **CLINK**, which is basically the same as when C saves the registers to the stack before a function call. The **CLINK** is actually a "continuation" for the interpreter. (A continuation is just the state of a computation. A saved stack frame is technically continuation, too. Hence, some Lisps save the stack to the heap to implement call/cc.)
RESTORE: RESTORE restores the "registers" as they were saved in the **CLINK**. It's similar to restoring a stack frame in a stack-based language. So, it's basically "return", except some function has explicitly stuck the return value into **VALUE**. (**VALUE** is obviously not clobbered by RESTORE.) Also note that RESTORE doesn't always have to return to a calling function. Some functions will actually SAVEUP a whole new computation, which RESTORE will happily "restore".
AEVAL: AEVAL is the EVAL function.
EVLIS: EVLIS exists to evaluate a function's arguments, and apply a function to those args. To avoid recursion, it SAVEUPs EVLIS-1. EVLIS-1 would just be regular old code after the function application if the code was written recursively. However, to avoid recursion, and the stack, it is a separate "continuation".
I hope I've been of some help. I just wish my answer (and link) was shorter.
What you're looking for is called continuation-passing style. This style adds an additional item to each function call (you could think of it as a parameter, if you like), that designates the next bit of code to run (the continuation k can be thought of as a function that takes a single parameter). For example you can rewrite your example in CPS like this:
(defun f (x y k)
(a x y k))
(defun a (x y k)
(+ x y k))
(f 1 2 print)
The implementation of + will compute the sum of x and y, then pass the result to k sort of like (k sum).
Your main interpreter loop then doesn't need to be recursive at all. It will, in a loop, apply each function application one after another, passing the continuation around.
It takes a little bit of work to wrap your head around this. I recommend some reading materials such as the excellent SICP.
Tail recursion can be thought of as reusing for the callee the same stack frame that you are currently using for the caller. So you could just re-set the arguments and goto to the beginning of the function.

Nested functions: Improper use of side-effects?

I'm learning functional programming, and have tried to solve a couple problems in a functional style. One thing I experienced, while dividing up my problem into functions, was it seemed I had two options: use several disparate functions with similar parameter lists, or using nested functions which, as closures, can simply refer to bindings in the parent function.
Though I ended up going with the second approach, because it made function calls smaller and it seemed to "feel" better, from my reading it seems like I may be missing one of the main points of functional programming, in that this seems "side-effecty"? Now granted, these nested functions cannot modify the outer bindings, as the language I was using prevents that, but if you look at each individual inner function, you can't say "given the same parameters, this function will return the same results" because they do use the variables from the parent scope... am I right?
What is the desirable way to proceed?
Thanks!
Functional programming isn't all-or-nothing. If nesting the functions makes more sense, I'd go with that approach. However, If you really want the internal functions to be purely functional, explicitly pass all the needed parameters into them.
Here's a little example in Scheme:
(define (foo a)
(define (bar b)
(+ a b)) ; getting a from outer scope, not purely functional
(bar 3))
(define (foo a)
(define (bar a b)
(+ a b)) ; getting a from function parameters, purely functional
(bar a 3))
(define (bar a b) ; since this is purely functional, we can remove it from its
(+ a b)) ; environment and it still works
(define (foo a)
(bar a 3))
Personally, I'd go with the first approach, but either will work equally well.
Nesting functions is an excellent way to divide up the labor in many functions. It's not really "side-effecty"; if it helps, think of the captured variables as implicit parameters.
One example where nested functions are useful is to replace loops. The parameters to the nested function can act as induction variables which accumulate values. A simple example:
let factorial n =
let rec facHelper p n =
if n = 1 then p else facHelper (p*n) (n-1)
in
facHelper 1 n
In this case, it wouldn't really make sense to declare a function like facHelper globally, since users shouldn't have to worry about the p parameter.
Be aware, however, that it can be difficult to test nested functions individually, since they cannot be referred to outside of their parent.
Consider the following (contrived) Haskell snippet:
putLines :: [String] -> IO ()
putLines lines = putStr string
where string = concat lines
string is a locally bound named constant. But isn't it also a function taking no arguments that closes over lines and is therefore referentially intransparent? (In Haskell, constants and nullary functions are indeed indistinguishable!) Would you consider the above code “side-effecty” or non-functional because of this?

Resources