Functional "simultanity"? - functional-programming

Functional "simultanity"? - functional-programming

At this link, functional programming is spoken of. Specifically, the author says this:
Simultaneity means that we assume a statement in lambda calculus is evaluated all at once. The trivial function:
λf(x) ::= x f(x)
defines an infinite sequence of whatever you plug in for x. The stepwise expansion looks like this:
0 - f(x)
1 - x f(x)
2 - x x f(x)
3 - x x x f(x)
The point is that we have to assume that the 'f()' and 'x' in step three million have the same meaning they did in step one.
At this point, those of you who know something about FP are muttering "referential transparency" under your collective breath. I know. I'll beat up on that in a minute. For now, just suspend your disbelief enough to admit that the constraint does exist, and the aardvark won't get hurt.
The problem with infinite expansions in a real-world computer is that.. well.. they're infinite. As in, "infinite loop" infinite. You can't evaluate every term of an infinite sequence before moving on to the next evaluation unless you're planning to take a really long coffee break while you wait for the answers.
Fortunately, theoretical logic comes to the rescue and tells us that preorder evaluation will always give us the same results as postorder evaluation.
More vocabulary.. need another function for this.. fortunately, it's a simple one:
λg(x) ::= x x
Now.. when we make the statement:
g(f(x))
Preorder evaluation says we have to expand f(x) completely before plugging it into g(). But that takes forever, which is.. inconvenient. Postorder evaluation says we can do this:
0 - g(f(x))
1 - f(x) f(x)
2 - x f(x) x f(x)
3 - x x f(x) x x f(x)
. . . could someone explain to me what is meant here? I haven't a clue what's being said. Maybe point me to a really good FP primer that would get me started.

(Warning, this answer is very long-winded. I thought it best to include general knowledge of lambda calculus because it is near impossible to find good explanations of it)
The author appears to be using the syntax λg(x) to mean a named function, rather than a traditional function in lambda calculus. The author also appears to be going on at length about how lambda calculus is not functional programming in the same way that a Turing machine isn't imperative programming. There's practicalities and ideals that exist with those abstractions that aren't present in the programming languages frequently used to represent them. But before getting into that, a primer on lambda calculus may help. In lambda calculus, all functions look like this:
λarg.body
That's it. There's a λ symbol (called "lambda", hence the name) followed by a named argument and only one named argument, then followed by a period, then followed by an expression that represents the body of the function. For instance, the identity function which takes anything and just returns it right back would look like this:
λx.x
And evaluating an expression is just a series of simple rules for swapping out functions and arguments with their body expressions. An expression has the form:
function-or-expression arg-or-expression
Reducing it usually has the rules "If the left thing is an expression, reduce it. Otherwise, it must be a function, so use arg-or-expression as the argument to the function, and replace this expression with the body of the function. It is very important to note that there is no requirement that the arg-or-expression be reduced before being used as an argument. That is, both of the following are equivalent and mathematically identical reductions of the expression λx.x (λy.y 0) (assuming you have some sort of definition for 0, because lambda calculus requires you define numbers as functions):
λx.x (λy.y 0)
=> λx.x 0
=> 0
λx.x (λy.y 0)
=> λy.y 0
=> 0
In the first reduction, the argument was reduced before being used in the λx.x function. In the second, the argument was merely substituted into the λx.x function body - it wasn't reduced before being used. When this concept is used in programming, it's called "lazy evaluation" - you don't actually evaluate (reduce) an expression until you need to. What's important to note is that in lambda calculus, it does not matter whether an argument is reduced or not before substitution. The mathematics of lambda calculus prove that you'll get the same result either way as long as both terminate. This is definitely not the case in programming languages, because all sorts of things (usually relating to a change in the program's state) can make lazy evaluation different from normal evaluation.
Lambda calculus needs some extensions to be useful however. There's no way to name things. Suppose we allowed that though. In particular, let's create our own definition of what a function looks like in lambda calculus:
λname(arg).body
We'll say this means that the function λarg.body is bound to name, and anywhere else in any accompanying lambda expressions we can replace name with λarg.body. So we could do this:
λidentity(x).x
And now when we write identity, we'll just replace it with λx.x. This introduces a problem however. What happens if a named function refers to itself?
λevil(x).(evil x)
Now we've got a problem. According to our rule, we should be able to replace the evil in the body with what the name is bound to. But since the name is bound to λx.(evil x), as soon as we try:
λevil(x).(evil x)
=> λevil(x).(λx.(evil x) x)
=> λevil(x).(λx.(λx.(evil x) x) x)
=> ...
We get an infinite loop. We can never evaluate this expression, because we have no way of turning it from our special named lambda form to a regular lambda expression. We can't go from the language with our special extension down to regular lambda calculus because we can't satisfy the rule of "replace evil with the function expression evil is bound to". There are some tricks for dealing with this, but we'll get to that in a minute.
An important point here is that this is completely different from a regular lambda calculus program that evaluates infinitely and never finishes. For instance, consider the self application function which takes something and applies it to itself:
λx.(x x)
If we evaluate this with the identity function, we get:
λx.(x x) λx.x
=> λx.x λx.x
=> λx.x
Using named functions and naming this function self:
self identity
=> identity identity
=> identity
But what happens if we pass self to itself?
λx.(x x) λx.(x x)
=> λx.(x x) λx.(x x)
=> λx.(x x) λx.(x x)
=> ...
We get an expression that loops into repeatedly reducing self self into self self over and over again. This is a plain old infinite loop you'd find in any (Turing-complete) programming language.
The difference between this and our problem with recursive definitions is that our names and definitions are not lambda calculus. They are shorthands which we can expand to lambda calculus by following some rules. But in the case of λevil(x).(evil x), we can't expand it to lambda calculus so we don't even get a lambda calculus expression to run. Our named function "fails to compile" in a sense, similar to when you send the programming language compiler into an infinite loop and your code never even starts as opposed to when the actual runtime loops. (Yes, it is entirely possible to make the compiler get caught in an infinite loop.)
There are some very clever ways to get around this problem, one of which is the infamous Y-combinator. The basic idea is you take our problematic evil function and change it to instead of accepting an argument and trying to be recursive, accepts an argument and returns another function that accepts an argument, so your body expression has two arguments to work with:
λevil(f).λy.(f y)
If we evaluate evil identity, we'll get a new function that takes an argument and just calls identity with it. The following evaluation shows first the name replacement using ->, then the reduction using =>:
(evil identity) 0
-> (λf.λy.(f y) identity) 0
-> (λf.λy.(f y) λx.x) 0
=> λy.(λx.x y) 0
=> λx.x 0
=> 0
Where things get interesting is if we pass evil to itself instead of identity:
(evil evil) 0
-> (λf.λy.(f y) λf.λy.(f y)) 0
=> λy.(λf.λy.(f y) y) 0
=> λf.λy.(f y) 0
=> λy.(0 y)
We ended up with a function that's complete nonsense, but we achieved something important - we created one level of recursion. If we were to evaluate (evil (evil evil)), we would get two levels. With (evil (evil (evil evil))), three. So what we need to do is instead of passing evil to itself, we need to pass a function that somehow accomplishes this recursion for us. In particular, it should be a function with some sort of self application. What we want is the Y-combinator:
λf.(λx.(f (x x)) λx.(f (x x)))
This function is pretty tricky to wrap your head around from the definition, so it's best to just call it Y and see what happens when we try and evaluate a few things with it:
Y evil
-> λf.(λx.(f (x x)) λx.(f (x x))) evil
=> λx.(evil (x x)) λx.(evil (x x))
=> evil (λx.(evil (x x))
λx.(evil (x x)))
=> evil (evil (λx.(evil (x x))
λx.(evil (x x))))
=> evil (evil (evil (λx.(evil (x x))
λx.(evil (x x)))))
And as we can see, this goes on infinitely. What we've done is taken evil, which accepts first one function and then accepts an argument and evaluates that argument using the function, and passed it a specially modified version of the evil function which expands to provide recursion. So we can create a "recursion point" in the evil function by reducing evil (Y evil). So now, whenever we see a named function using recursion like this:
λname(x).(.... some body containing (name arg) in it somewhere)
We can transform it to:
λname-rec(f).λx.(...... body with (name arg) replaced with (f arg))
λname(x).((name-rec (Y name-rec)) x)
We turn the function into a version that first accepts a function to use as a recursion point, then we provide the function Y name-rec as the function to use as the recursion point.
The reason this works, and getting waaaaay back to the original point of the author, is because the expression name-rec (Y name-rec) does not have to fully reduce Y name-rec before starting its own reduction. I cannot stress this enough. We've already seen that reducing Y name-rec results in an infinite loop, so the recursion works if there's some sort of condition in the name-rec function that means that the next step of Y name-rec might not need to be reduced.
This breaks down in many programming languages, including functional ones, because they do not support this kind of lazy evaluation. Additionally, almost all programming languages support mutation. That is, if you define a variable x = 3, later in the same code you can make x = 5 and all the old code that referred to x when it was 3 will now see x as being 5. This means your program could have completely different results if that old code is "delayed" with lazy evaluation and only calculated later on, because by then x could be 5. In a language where things can be arbitrarily executed in any order at any time, you have to completely eliminate your program's dependency on things like order of statements and time-changing values. If you don't, your program could calculate arbitrarily different results depending on what order your code gets run in.
However, writing code that has no sense of order in it whatsoever is extremely difficult. We saw how complicated lambda calculus got just trying to get our heads around trivial recursion. Therefore, most functional programming languages pick a model that systematically defines in what order things are evaluated in, and they never deviate from that model.
Racket, a dialect of Scheme, specifies that in the normal Racket language, all expressions are evaluated "eagerly" (no delaying) and all function arguments are evaluated eagerly from left to right, but the Racket program includes special forms that let you selectively make certain expressions lazy, such as (promise ...). Haskell does the opposite, with expressions defaulting to lazy evaluation and having the compiler run a "strictness analyser" to determine which expressions are needed by functions that are specially declared to need arguments to be eagerly evaluated.
The primary point being made seems to be that it's just too impractical to design a language that completely allows all expressions to be individually lazy or eager, because the limitations this poses on what tools you can use in the language are severe. Therefore, it's important to keep in mind what tools a functional language provides you for manipulating lazy expressions and eager expressions, because they are most certainly not equivalent in all practical functional programming languages.

Related

Prolog arithmetic in foreach

So I'm learning Prolog. One of the things I've found to be really obnoxious is demonstrated in the following example:
foreach(
between(1,10,X),
somePredicate(X,X+Y,Result)
).
This does not work. I am well aware that X+Y is not evaluated, here, and instead I'd have to do:
foreach(
between(1,10,X),
(
XPlusY is X + Y,
somePredicate(X, XPlusY, Result)
)
).
Except, that doesn't work, either. As near as I can tell, the scope of XPlusY extends outside of foreach - i.e., XPlusY is 1 + Y, XPlusY is 2 + Y, etc. must all be true AT ONCE, and there is no XPlusY for which that is the case. So I have to do the following:
innerCode(X, Result) :-
XPlusY is X + Y,
somePredicate(X, XPlusY, Result).
...
foreach(
between(1,10,X),
innerCode(X, Result)
).
This, finally, works. (At least, I think so. I haven't tried this exact code, but this was the path I took earlier from "not working" to "working".) That's fine and all, except that it's exceptionally obnoxious. If I had a way of evaluating arithmetic operations in-line, I could halve the lines of code, make it more readable, and NOT create a one-use clutter predicate.
Question: Is there a way to evaluate arithmetic operations in-line, without declaring a new variable?
Failing that, it would be acceptable (and, in some cases, still useful for other things) if there were a way to restrict the scope of new variables. Suppose, for instance, you could define a block within the foreach, where the variables visible from the outside were marked, and any other variables in the block were considered new for that execution of the block. (I realize my terminology may be incorrect, but hopefully it gets the point across.) For example, something resembling:
foreach(
between(1,10,X),
(X, Result){
XPlusY is X + Y,
somePredicate(X, XPlusY, Result)
}
).
A possible solution might be if we can declare a lambda in-line, and immediately call it. Summed up:
Alternate question: Is there a way to limit the scope of new variables within a predicate, while retaining the ability to perform lasting unifications on one or more existing variables?
(The second half I added as clarification in response to an answer about forall.)
A solution to both questions is preferred, but a solution to either will suffice.

library(yall) allows you to define lambda expressions. For instance
?- foreach(between(1,3,X),call([Y]>>(Z is Y+1,writeln(Z)),X)).
2
3
4
true.
Alternatively, library(lambda) provides the construct:
?- [library(lambda)].
true.
?- foreach(between(1,3,X),call(\Y^(Z is Y+1,writeln(Z)),X)).
2
3
4
true.
In SWI-Prolog, library(yall) is autoloaded, while to get library(lambda) you should install the related pack:
?- pack_install(lambda).

Use in alternative the forall/2 de facto standard predicate:
forall(
between(1,10,X),
somePredicate(X,X+Y,Result)
).
While the foreach/2 predicate is usually implemented in the way you describe, the forall/2 predicate is defined as:
% forall(#callable, #callable)
forall(Generate, Test) :-
\+ (Generate, \+ Test).
Note that the use of negation implies that no bindings will be returned when a call to the predicate succeeds.
Update
Lambda libraries allow the specification of both lambda global (aka lambda free) and lambda local variables (aka lambda parameters). Using Logtalk lambdas syntax (available also in SWI-Prolog in library(yall), you can write (reusing Carlo's example) e.g.
?- G = 2, foreach(between(1,3,X),call({G}/[Y]>>(Z is Y+G,writeln(Z)),X)).
3
4
5
G = 2.
?- G = 4, foreach(between(1,3,X),call({G}/[Y]>>(Z is Y+G,writeln(Z)),X)).
5
6
7
G = 4.
Thus, it's possible to use lambdas to limit the scope of some variables within a goal without also limiting the scope of every unification in the goal.

Functional programming and the closure term birth

I'm studying functional programming and lambda calculus but I'm wondering
if the closure term is also present in the Church's original work or it's a more modern
term strictly concerned to programming languages.
I remember that in the Church's work there were the terms: free variable, closed into...,
and so on.

It is a more modern term, due to (as many things in modern FP are), P. J. Landin (1964), The mechanical evaluation of expressions
Also we represent the value of a λ-expression by a
bundle of information called a "closure," comprising
the λ-expression and the environment relative to which
it was evaluated.

Consider the following function definition in Scheme:
(define (adder a)
(lambda (x) (+ a x)))
The notion of explicit closure is not required in the pure lambda calculus, because variable substitution takes care of it. The above code snippet can be translated
λa λx . (a + x)
When you apply this to a value z, it becomes
λx . (z + x)
by β-reduction, which involves substitution. You can call this closure over a if you want.
(The example uses a function argument, but this holds true for any variable binding, since in the pure lambda calculus all variable bindings must occur via λ terms.)

How are functions curried?

I understand what the concept of currying is, and know how to use it. These are not my questions, rather I am curious as to how this is actually implemented at some lower level than, say, Haskell code.
For example, when (+) 2 4 is curried, is a pointer to the 2 maintained until the 4 is passed in? Does Gandalf bend space-time? What is this magic?

Short answer: yes a pointer is maintained to the 2 until the 4 is passed in.
Longer than necessary answer:
Conceptually, you're supposed to think about Haskell being defined in terms of the lambda calculus and term rewriting. Lets say you have the following definition:
f x y = x + y
This definition for f comes out in lambda calculus as something like the following, where I've explicitly put parentheses around the lambda bodies:
\x -> (\y -> (x + y))
If you're not familiar with the lambda calculus, this basically says "a function of an argument x that returns (a function of an argument y that returns (x + y))". In the lambda calculus, when we apply a function like this to some value, we can replace the application of the function by a copy of the body of the function with the value substituted for the function's parameter.
So then the expression f 1 2 is evaluated by the following sequence of rewrites:
(\x -> (\y -> (x + y))) 1 2
(\y -> (1 + y)) 2 # substituted 1 for x
(1 + 2) # substituted 2 for y
3
So you can see here that if we'd only supplied a single argument to f, we would have stopped at \y -> (1 + y). So we've got a whole term that is just a function for adding 1 to something, entirely separate from our original term, which may still be in use somewhere (for other references to f).
The key point is that if we implement functions like this, every function has only one argument but some return functions (and some return functions which return functions which return ...). Every time we apply a function we create a new term that "hard-codes" the first argument into the body of the function (including the bodies of any functions this one returns). This is how you get currying and closures.
Now, that's not how Haskell is directly implemented, obviously. Once upon a time, Haskell (or possibly one of its predecessors; I'm not exactly sure on the history) was implemented by Graph reduction. This is a technique for doing something equivalent to the term reduction I described above, that automatically brings along lazy evaluation and a fair amount of data sharing.
In graph reduction, everything is references to nodes in a graph. I won't go into too much detail, but when the evaluation engine reduces the application of a function to a value, it copies the sub-graph corresponding to the body of the function, with the necessary substitution of the argument value for the function's parameter (but shares references to graph nodes where they are unaffected by the substitution). So essentially, yes partially applying a function creates a new structure in memory that has a reference to the supplied argument (i.e. "a pointer to the 2), and your program can pass around references to that structure (and even share it and apply it multiple times), until more arguments are supplied and it can actually be reduced. However it's not like it's just remembering the function and accumulating arguments until it gets all of them; the evaluation engine actually does some of the work each time it's applied to a new argument. In fact the graph reduction engine can't even tell the difference between an application that returns a function and still needs more arguments, and one that has just got its last argument.
I can't tell you much more about the current implementation of Haskell. I believe it's a distant mutant descendant of graph reduction, with loads of clever short-cuts and go-faster stripes. But I might be wrong about that; maybe they've found a completely different execution strategy that isn't anything at all like graph reduction anymore. But I'm 90% sure it'll still end up passing around data structures that hold on to references to the partial arguments, and it probably still does something equivalent to factoring in the arguments partially, as it seems pretty essential to how lazy evaluation works. I'm also fairly sure it'll do lots of optimisations and short cuts, so if you straightforwardly call a function of 5 arguments like f 1 2 3 4 5 it won't go through all the hassle of copying the body of f 5 times with successively more "hard-coding".

Try it out with GHC:
ghc -C Test.hs
This will generate C code in Test.hc
I wrote the following function:
f = (+) 16777217
And GHC generated this:
R1.p[1] = (W_)Hp-4;
*R1.p = (W_)&stg_IND_STATIC_info;
Sp[-2] = (W_)&stg_upd_frame_info;
Sp[-1] = (W_)Hp-4;
R1.w = (W_)&integerzmgmp_GHCziInteger_smallInteger_closure;
Sp[-3] = 0x1000001U;
Sp=Sp-3;
JMP_((W_)&stg_ap_n_fast);
The thing to remember is that in Haskell, partially applying is not an unusual case. There's technically no "last argument" to any function. As you can see here, Haskell is jumping to stg_ap_n_fast which will expect an argument to be available in Sp.
The stg here stands for "Spineless Tagless G-Machine". There is a really good paper on it, by Simon Peyton-Jones. If you're curious about how the Haskell runtime is implemented, go read that first.

A Functional-Imperative Hybrid

Pure functional programming languages do not allow mutable data, but some computations are more naturally/intuitively expressed in an imperative way -- or an imperative version of an algorithm may be more efficient. I am aware that most functional languages are not pure, and let you assign/reassign variables and do imperative things but generally discourage it.
My question is, why not allow local state to be manipulated in local variables, but require that functions can only access their own locals and global constants (or just constants defined in an outer scope)? That way, all functions maintain referential transparency (they always give the same return value given the same arguments), but within a function, a computation can be expressed in imperative terms (like, say, a while loop).
IO and such could still be accomplished in the normal functional ways - through monads or passing around a "world" or "universe" token.

My question is, why not allow local state to be manipulated in local variables, but require that functions can only access their own locals and global constants (or just constants defined in an outer scope)?
Good question. I think the answer is that mutable locals are of limited practical value but mutable heap-allocated data structures (primarily arrays) are enormously valuable and form the backbone of many important collections including efficient stacks, queues, sets and dictionaries. So restricting mutation to locals only would not give an otherwise purely functional language any of the important benefits of mutation.
On a related note, communicating sequential processes exchanging purely functional data structures offer many of the benefits of both worlds because the sequential processes can use mutation internally, e.g. mutable message queues are ~10x faster than any purely functional queues. For example, this is idiomatic in F# where the code in a MailboxProcessor uses mutable data structures but the messages communicated between them are immutable.
Sorting is a good case study in this context. Sedgewick's quicksort in C is short and simple and hundreds of times faster than the fastest purely functional sort in any language. The reason is that quicksort mutates the array in-place. Mutable locals would not help. Same story for most graph algorithms.

The short answer is: there are systems to allow what you want. For example, you can do it using the ST monad in Haskell (as referenced in the comments).
The ST monad approach is from Haskell's Control.Monad.ST. Code written in the ST monad can use references (STRef) where convenient. The nice part is that you can even use the results of the ST monad in pure code, as it is essentially self-contained (this is basically what you were wanting in the question).
The proof of this self-contained property is done through the type-system. The ST monad carries a state-thread parameter, usually denoted with a type-variable s. When you have such a computation you'll have monadic result, with a type like:
foo :: ST s Int
To actually turn this into a pure result, you have to use
runST :: (forall s . ST s a) -> a
You can read this type like: give me a computation where the s type parameter doesn't matter, and I can give you back the result of the computation, without the ST baggage. This basically keeps the mutable ST variables from escaping, as they would carry the s with them, which would be caught by the type system.
This can be used to good effect on pure structures that are implemented with underlying mutable structures (like the vector package). One can cast off the immutability for a limited time to do something that mutates the underlying array in place. For example, one could combine the immutable Vector with an impure algorithms package to keep the most of the performance characteristics of the in place sorting algorithms and still get purity.
In this case it would look something like:
pureSort :: Ord a => Vector a -> Vector a
pureSort vector = runST $ do
mutableVector <- thaw vector
sort mutableVector
freeze mutableVector
The thaw and freeze functions are linear-time copying, but this won't disrupt the overall O(n lg n) running time. You can even use unsafeFreeze to avoid another linear traversal, as the mutable vector isn't used again.

"Pure functional programming languages do not allow mutable data" ... actually it does, you just simply have to recognize where it lies hidden and see it for what it is.
Mutability is where two things have the same name and mutually exclusive times of existence so that they may be treated as "the same thing at different times". But as every Zen philosopher knows, there is no such thing as "same thing at different times". Everything ceases to exist in an instant and is inherited by its successor in possibly changed form, in a (possibly) uncountably-infinite succession of instants.
In the lambda calculus, mutability thus takes the form illustrated by the following example: (λx (λx f(x)) (x+1)) (x+1), which may also be rendered as "let x = x + 1 in let x = x + 1 in f(x)" or just "x = x + 1, x = x + 1, f(x)" in a more C-like notation.
In other words, "name clash" of the "lambda calculus" is actually "update" of imperative programming, in disguise. They are one and the same - in the eyes of the Zen (who is always right).
So, let's refer to each instant and state of the variable as the Zen Scope of an object. One ordinary scope with a mutable object equals many Zen Scopes with constant, unmutable objects that either get initialized if they are the first, or inherit from their predecessor if they are not.
When people say "mutability" they're misidentifying and confusing the issue. Mutability (as we've just seen here) is a complete red herring. What they actually mean (even unbeknonwst to themselves) is infinite mutability; i.e. the kind which occurs in cyclic control flow structures. In other words, what they're actually referring to - as being specifically "imperative" and not "functional" - is not mutability at all, but cyclic control flow structures along with the infinite nesting of Zen Scopes that this entails.
The key feature that lies absent in the lambda calculus is, thus, seen not as something that may be remedied by the inclusion of an overwrought and overthought "solution" like monads (though that doesn't exclude the possibility of it getting the job done) but as infinitary terms.
A control flow structure is the wrapping of an unwrapped (possibility infinite) decision tree structure. Branches may re-converge. In the corresponding unwrapped structure, they appear as replicated, but separate, branches or subtrees. Goto's are direct links to subtrees. A goto or branch that back-branches to an earlier part of a control flow structure (the very genesis of the "cycling" of a cyclic control flow structure) is a link to an identically-shaped copy of the entire structure being linked to. Corresponding to each structure is its Universally Unrolled decision tree.
More precisely, we may think of a control-flow structure as a statement that precedes an actual expression that conditions the value of that expression. The archetypical case in point is Landin's original case, itself (in his 1960's paper, where he tried to lambda-ize imperative languages): let x = 1 in f(x). The "x = 1" part is the statement, the "f(x)" is the value being conditioned by the statement. In C-like form, we could write this as x = 1, f(x).
More generally, corresponding to each statement S and expression Q is an expression S[Q] which represents the result Q after S is applied. Thus, (x = 1)[f(x)] is just λx f(x) (x + 1). The S wraps around the Q. If S contains cyclic control flow structures, the wrapping will be infinitary.
When Landin tried to work out this strategy, he hit a hard wall when he got to the while loop and went "Oops. Never mind." and fell back into what become an overwrought and overthought solution, while this simple (and in retrospect, obvious) answer eluded his notice.
A while loop "while (x < n) x = x + 1;" - which has the "infinite mutability" mentioned above, may itself be treated as an infinitary wrapper, "if (x < n) { x = x + 1; if (x < 1) { x = x + 1; if (x < 1) { x = x + 1; ... } } }". So, when it wraps around an expression Q, the result is (in C-like notation) "x < n? (x = x + 1, x < n? (x = x + 1, x < n? (x = x + 1, ...): Q): Q): Q", which may be directly rendered in lambda form as "x < n? (λx x < n (λx x < n? (λx·...) (x + 1): Q) (x + 1): Q) (x + 1): Q". This shows directly the connection between cyclicity and infinitariness.
This is an infinitary expression that, despite being infinite, has only a finite number of distinct subexpressions. Just as we can think of there being a Universally Unrolled form to this expression - which is similar to what's shown above (an infinite decision tree) - we can also think of there being a Maximally Rolled form, which could be obtained by labelling each of the distinct subexpressions and referring to the labels, instead. The key subexpressions would then be:
A: x < n? goto B: Q
B: x = x + 1, goto A
The subexpression labels, here, are "A:" and "B:", while the references to the subexpressions so labelled as "goto A" and "goto B", respectively. So, by magic, the very essence of Imperativitity emerges directly out of the infinitary lambda calculus, without any need to posit it separately or anew.
This way of viewing things applies even down to the level of binary files. Every interpretation of every byte (whether it be a part of an opcode of an instruction that starts 0, 1, 2 or more bytes back, or as part of a data structure) can be treated as being there in tandem, so that the binary file is a rolling up of a much larger universally unrolled structure whose physical byte code representation overlaps extensively with itself.
Thus, emerges the imperative programming language paradigm automatically out of the pure lambda calculus, itself, when the calculus is extended to include infinitary terms. The control flow structure is directly embodied in the very structure of the infinitary expression, itself; and thus requires no additional hacks (like Landin's or later descendants, like monads) - as it's already there.
This synthesis of the imperative and functional paradigms arose in the late 1980's via the USENET, but has not (yet) been published. Part of it was already implicit in the treatment (dating from around the same time) given to languages, like Prolog-II, and the much earlier treatment of cyclic recursive structures by infinitary expressions by Irene Guessarian LNCS 99 "Algebraic Semantics".
Now, earlier I said that the magma-based formulation might get you to the same place, or to an approximation thereof. I believe there is a kind of universal representation theorem of some sort, which asserts that the infinitary based formulation provides a purely syntactic representation, and that the semantics that arise from the monad-based representation factors through this as "monad-based semantics" = "infinitary lambda calculus" + "semantics of infinitary languages".
Likewise, we may think of the "Q" expressions above as being continuations; so there may also be a universal representation theorem for continuation semantics, which similarly rolls this formulation back into the infinitary lambda calculus.
At this point, I've said nothing about non-rational infinitary terms (i.e. infinitary terms which possess an infinite number of distinct subterms and no finite Minimal Rolling) - particularly in relation to interprocedural control flow semantics. Rational terms suffice to account for loops and branches, and so provide a platform for intraprocedural control flow semantics; but not as much so for the call-return semantics that are the essential core element of interprocedural control flow semantics, if you consider subprograms to be directly represented as embellished, glorified macros.
There may be something similar to the Chomsky hierarchy for infinitary term languages; so that type 3 corresponds to rational terms, type 2 to "algebraic terms" (those that can be rolled up into a finite set of "goto" references and "macro" definitions), and type 0 for "transcendental terms". That is, for me, an unresolved loose end, as well.

How does one implement a "stackless" interpreted language?

I am making my own Lisp-like interpreted language, and I want to do tail call optimization. I want to free my interpreter from the C stack so I can manage my own jumps from function to function and my own stack magic to achieve TCO. (I really don't mean stackless per se, just the fact that calls don't add frames to the C stack. I would like to use a stack of my own that does not grow with tail calls). Like Stackless Python, and unlike Ruby or... standard Python I guess.
But, as my language is a Lisp derivative, all evaluation of s-expressions is currently done recursively (because it's the most obvious way I thought of to do this nonlinear, highly hierarchical process). I have an eval function, which calls a Lambda::apply function every time it encounters a function call. The apply function then calls eval to execute the body of the function, and so on. Mutual stack-hungry non-tail C recursion. The only iterative part I currently use is to eval a body of sequential s-expressions.
(defun f (x y)
(a x y)) ; tail call! goto instead of call.
; (do not grow the stack, keep return addr)
(defun a (x y)
(+ x y))
; ...
(print (f 1 2)) ; how does the return work here? how does it know it's supposed to
; return the value here to be used by print, and how does it know
; how to continue execution here??
So, how do I avoid using C recursion? Or can I use some kind of goto that jumps across c functions? longjmp, perhaps? I really don't know. Please bear with me, I am mostly self- (Internet- ) taught in programming.

One solution is what is sometimes called "trampolined style". The trampoline is a top-level loop that dispatches to small functions that do some small step of computation before returning.
I've sat here for nearly half an hour trying to contrive a good, short example. Unfortunately, I have to do the unhelpful thing and send you to a link:
http://en.wikisource.org/wiki/Scheme:_An_Interpreter_for_Extended_Lambda_Calculus/Section_5
The paper is called "Scheme: An Interpreter for Extended Lambda Calculus", and section 5 implements a working scheme interpreter in an outdated dialect of Lisp. The secret is in how they use the **CLINK** instead of a stack. The other globals are used to pass data around between the implementation functions like the registers of a CPU. I would ignore **QUEUE**, **TICK**, and **PROCESS**, since those deal with threading and fake interrupts. **EVLIS** and **UNEVLIS** are, specifically, used to evaluate function arguments. Unevaluated args are stored in **UNEVLIS**, until they are evaluated and out into **EVLIS**.
Functions to pay attention to, with some small notes:
MLOOP: MLOOP is the main loop of the interpreter, or "trampoline". Ignoring **TICK**, its only job is to call whatever function is in **PC**. Over and over and over.
SAVEUP: SAVEUP conses all the registers onto the **CLINK**, which is basically the same as when C saves the registers to the stack before a function call. The **CLINK** is actually a "continuation" for the interpreter. (A continuation is just the state of a computation. A saved stack frame is technically continuation, too. Hence, some Lisps save the stack to the heap to implement call/cc.)
RESTORE: RESTORE restores the "registers" as they were saved in the **CLINK**. It's similar to restoring a stack frame in a stack-based language. So, it's basically "return", except some function has explicitly stuck the return value into **VALUE**. (**VALUE** is obviously not clobbered by RESTORE.) Also note that RESTORE doesn't always have to return to a calling function. Some functions will actually SAVEUP a whole new computation, which RESTORE will happily "restore".
AEVAL: AEVAL is the EVAL function.
EVLIS: EVLIS exists to evaluate a function's arguments, and apply a function to those args. To avoid recursion, it SAVEUPs EVLIS-1. EVLIS-1 would just be regular old code after the function application if the code was written recursively. However, to avoid recursion, and the stack, it is a separate "continuation".
I hope I've been of some help. I just wish my answer (and link) was shorter.

What you're looking for is called continuation-passing style. This style adds an additional item to each function call (you could think of it as a parameter, if you like), that designates the next bit of code to run (the continuation k can be thought of as a function that takes a single parameter). For example you can rewrite your example in CPS like this:
(defun f (x y k)
(a x y k))
(defun a (x y k)
(+ x y k))
(f 1 2 print)
The implementation of + will compute the sum of x and y, then pass the result to k sort of like (k sum).
Your main interpreter loop then doesn't need to be recursive at all. It will, in a loop, apply each function application one after another, passing the continuation around.
It takes a little bit of work to wrap your head around this. I recommend some reading materials such as the excellent SICP.

Tail recursion can be thought of as reusing for the callee the same stack frame that you are currently using for the caller. So you could just re-set the arguments and goto to the beginning of the function.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex