Calculating Big-O time and space complexity for functional languages - functional-programming

I'm thinking of using Ocaml for technical interviews in the future. However, I'm not sure how to calculate time and space complexity for functional languages. What are the basic runtimes for the basic higher level functions like map, reduce, and filter, and how do I calculate runtime and space complexity in general?

The time complexity of persistent recursive implementations is easy to infer directly from the implementation. In this case, the recursive definition maps directly to the recurrence relation. Consider the List.map function as it is implemented in the Standard Library:
let rec map f = function
| [] -> []
| a::l -> f a :: map f l
The complexity is map(N) = 1 + map (N-1) thus it is O(N).
Speaking of the space complexity it is not always that obvious, as it requires an understanding of tail-calls and a skill to see the allocations. The general rule is that in OCaml native integers, characters, and constructors without arguments do no allocate the heap memory, everything else is allocated in the heap and is boxed. All non-tail calls create a stack frame and thus consume the stack space. In our case, the complexity of the map in the stack domain is O(N), as it makes N non-tail calls. The heap-complexity is also O(N) as the :: operator is invoked N times.
Another place, where space is consumed are closures. If a function has at least one free variable (i.e., a variable that is not bound to function parameters and is not in the global scope), then a functional object called closure is created, that contains a pointer to the code and a pointer to each free variable (also called the captured variable).
For example, consider the following function:
let rec rsum = function
| [] -> 0
| x :: xs ->
List.fold_left (fun y -> x + y) 0 xs + rsum xs
For each element of a list, this function computes a sum this element, with all consecutive elements. The naive implementation above is O(N) in the stack (as each step has two non-tail calls), O(N) in the heap size, as each step constructs a new closure (unless the compiler is clever enough to optimize it). Finally, it is O(N^2) in the time domain (rsum(N) = (N-1) + rsum(N-1)).
However, it brings a question - should we take into account a garbage, that is produced by a computation? I.e., those values, that were allocated, during the computation, but are not referenced by it. Or those values, that are referenced only during a step, as in this case. So it all depends on the model of computation that you chose. If you will choose a reference counting GC, then the example above is definitely O(1) in the heap size.
Hope this will give some insights. Feel free to ask questions, if something is not clear.

Related

Folds versus recursion in Erlang

According to Learn you some Erlang :
Pretty much any function you can think of that reduces lists to 1 element can be expressed as a fold. [...]
This means fold is universal in the sense that you can implement pretty much any other recursive function on lists with a fold
My first thought when writing a function that takes a lists and reduces it to 1 element is to use recursion.
What are the guidelines that should help me decide whether to use recursion or a fold?
Is this a stylistic consideration or are there other factors as well (performance, readability, etc.)?
I personally prefer recursion over fold in Erlang (contrary to other languages e.g. Haskell). I don't see fold more readable than recursion. For example:
fsum(L) -> lists:foldl(fun(X,S) -> S+X end, 0, L).
or
fsum(L) ->
F = fun(X,S) -> S+X end,
lists:foldl(F, 0, L).
vs
rsum(L) -> rsum(L, 0).
rsum([], S) -> S;
rsum([H|T], S) -> rsum(T, H+S).
Seems more code but it is pretty straightforward and idiomatic Erlang. Using fold requires less code but the difference becomes smaller and smaller with more payload. Imagine we want a filter and map odd values to their square.
lcfoo(L) -> [ X*X || X<-L, X band 1 =:= 1].
fmfoo(L) ->
lists:map(fun(X) -> X*X end,
lists:filter(fun(X) when X band 1 =:= 1 -> true; (_) -> false end, L)).
ffoo(L) -> lists:foldr(
fun(X, A) when X band 1 =:= 1 -> [X|A];
(_, A) -> A end,
[], L).
rfoo([]) -> [];
rfoo([H|T]) when H band 1 =:= 1 -> [H*H | rfoo(T)];
rfoo([_|T]) -> rfoo(T).
Here list comprehension wins but recursive function is in the second place and fold version is ugly and less readable.
And finally, it is not true that fold is faster than recursive version especially when compiled to native (HiPE) code.
Edit:
I add a fold version with fun in variable as requested:
ffoo2(L) ->
F = fun(X, A) when X band 1 =:= 1 -> [X|A];
(_, A) -> A
end,
lists:foldr(F, [], L).
I don't see how it is more readable than rfoo/1 and I found especially an accumulator manipulation more complicated and less obvious than direct recursion. It is even longer code.
folds are usually both more readable (since everybody know what they do) and faster due to optimized implementations in the runtime (especially foldl which always should be tail recursive). It's worth noting that they are only a constant factor faster, not on another order, so it's usually premature optimization if you find yourself considering one over the other for performance reasons.
Use standard recursion when you do fancy things, such as working on more than one element at a time, splitting into multiple processes and similar, and stick to higher-order functions (fold, map, ...) when they already do what you want.
I expect fold is done recursively, so you may want to look at trying to implement some of the various list functions, such as map or filter, with fold, and see how useful it can be.
Otherwise, if you are doing this recursively you may be re-implementing fold, basically.
Learn to use what comes with the language, is my thought.
This discussion on foldl and recursion is interesting:
Easy way to break foldl
If you look at the first paragraph in this introduction (you may want to read all of it), he states better than I did.
http://www.cs.nott.ac.uk/~gmh/fold.pdf
Old thread but my experience is that fold works slower than a recursive function.

Why does ocaml need both "let" and "let rec"? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Why are functions in Ocaml/F# not recursive by default?
OCaml uses let to define a new function, or let rec to define a function that is recursive. Why does it need both of these - couldn't we just use let for everything?
For example, to define a non-recursive successor function and recursive factorial in OCaml (actually, in the OCaml interpreter) I might write
let succ n = n + 1;;
let rec fact n =
if n = 0 then 1 else n * fact (n-1);;
Whereas in Haskell (GHCI) I can write
let succ n = n + 1
let fact n =
if n == 0 then 1 else n * fact (n-1)
Why does OCaml distinguish between let and let rec? Is it a performance issue, or something more subtle?
Well, having both available instead of only one gives the programmer tighter control on the scope. With let x = e1 in e2, the binding is only present in e2's environment, while with let rec x = e1 in e2 the binding is present in both e1 and e2's environments.
(Edit: I want to emphasize that it is not a performance issue, that makes no difference at all.)
Here are two situations where having this non-recursive binding is useful:
shadowing an existing definition with a refinement that use the old binding. Something like: let f x = (let x = sanitize x in ...), where sanitize is a function that ensures the input has some desirable property (eg. it takes the norm of a possibly-non-normalized vector, etc.). This is very useful in some cases.
metaprogramming, for example macro writing. Imagine I want to define a macro SQUARE(foo) that desugars into let x = foo in x * x, for any expression foo. I need this binding to avoid code duplication in the output (I don't want SQUARE(factorial n) to compute factorial n twice). This is only hygienic if the let binding is not recursive, otherwise I couldn't write let x = 2 in SQUARE(x) and get a correct result.
So I claim it is very important indeed to have both the recursive and the non-recursive binding available. Now, the default behaviour of the let-binding is a matter of convention. You could say that let x = ... is recursive, and one must use let nonrec x = ... to get the non-recursive binder. Picking one default or the other is a matter of which programming style you want to favor and there are good reasons to make either choice. Haskell suffers¹ from the unavailability of this non-recursive mode, and OCaml has exactly the same defect at the type level : type foo = ... is recursive, and there is no non-recursive option available -- see this blog post.
¹: when Google Code Search was available, I used it to search in Haskell code for the pattern let x' = sanitize x in .... This is the usual workaround when non-recursive binding is not available, but it's less safe because you risk writing x instead of x' by mistake later on -- in some cases you want to have both available, so picking a different name can be voluntary. A good idiom would be to use a longer variable name for the first x, such as unsanitized_x. Anyway, just looking for x' literally (no other variable name) and x1 turned a lot of results. Erlang (and all language that try to make variable shadowing difficult: Coffeescript, etc.) has even worse problems of this kind.
That said, the choice of having Haskell bindings recursive by default (rather than non-recursive) certainly makes sense, as it is consistent with lazy evaluation by default, which makes it really easy to build recursive values -- while strict-by-default languages have more restrictions on which recursive definitions make sense.

A Functional-Imperative Hybrid

Pure functional programming languages do not allow mutable data, but some computations are more naturally/intuitively expressed in an imperative way -- or an imperative version of an algorithm may be more efficient. I am aware that most functional languages are not pure, and let you assign/reassign variables and do imperative things but generally discourage it.
My question is, why not allow local state to be manipulated in local variables, but require that functions can only access their own locals and global constants (or just constants defined in an outer scope)? That way, all functions maintain referential transparency (they always give the same return value given the same arguments), but within a function, a computation can be expressed in imperative terms (like, say, a while loop).
IO and such could still be accomplished in the normal functional ways - through monads or passing around a "world" or "universe" token.
My question is, why not allow local state to be manipulated in local variables, but require that functions can only access their own locals and global constants (or just constants defined in an outer scope)?
Good question. I think the answer is that mutable locals are of limited practical value but mutable heap-allocated data structures (primarily arrays) are enormously valuable and form the backbone of many important collections including efficient stacks, queues, sets and dictionaries. So restricting mutation to locals only would not give an otherwise purely functional language any of the important benefits of mutation.
On a related note, communicating sequential processes exchanging purely functional data structures offer many of the benefits of both worlds because the sequential processes can use mutation internally, e.g. mutable message queues are ~10x faster than any purely functional queues. For example, this is idiomatic in F# where the code in a MailboxProcessor uses mutable data structures but the messages communicated between them are immutable.
Sorting is a good case study in this context. Sedgewick's quicksort in C is short and simple and hundreds of times faster than the fastest purely functional sort in any language. The reason is that quicksort mutates the array in-place. Mutable locals would not help. Same story for most graph algorithms.
The short answer is: there are systems to allow what you want. For example, you can do it using the ST monad in Haskell (as referenced in the comments).
The ST monad approach is from Haskell's Control.Monad.ST. Code written in the ST monad can use references (STRef) where convenient. The nice part is that you can even use the results of the ST monad in pure code, as it is essentially self-contained (this is basically what you were wanting in the question).
The proof of this self-contained property is done through the type-system. The ST monad carries a state-thread parameter, usually denoted with a type-variable s. When you have such a computation you'll have monadic result, with a type like:
foo :: ST s Int
To actually turn this into a pure result, you have to use
runST :: (forall s . ST s a) -> a
You can read this type like: give me a computation where the s type parameter doesn't matter, and I can give you back the result of the computation, without the ST baggage. This basically keeps the mutable ST variables from escaping, as they would carry the s with them, which would be caught by the type system.
This can be used to good effect on pure structures that are implemented with underlying mutable structures (like the vector package). One can cast off the immutability for a limited time to do something that mutates the underlying array in place. For example, one could combine the immutable Vector with an impure algorithms package to keep the most of the performance characteristics of the in place sorting algorithms and still get purity.
In this case it would look something like:
pureSort :: Ord a => Vector a -> Vector a
pureSort vector = runST $ do
mutableVector <- thaw vector
sort mutableVector
freeze mutableVector
The thaw and freeze functions are linear-time copying, but this won't disrupt the overall O(n lg n) running time. You can even use unsafeFreeze to avoid another linear traversal, as the mutable vector isn't used again.
"Pure functional programming languages do not allow mutable data" ... actually it does, you just simply have to recognize where it lies hidden and see it for what it is.
Mutability is where two things have the same name and mutually exclusive times of existence so that they may be treated as "the same thing at different times". But as every Zen philosopher knows, there is no such thing as "same thing at different times". Everything ceases to exist in an instant and is inherited by its successor in possibly changed form, in a (possibly) uncountably-infinite succession of instants.
In the lambda calculus, mutability thus takes the form illustrated by the following example: (λx (λx f(x)) (x+1)) (x+1), which may also be rendered as "let x = x + 1 in let x = x + 1 in f(x)" or just "x = x + 1, x = x + 1, f(x)" in a more C-like notation.
In other words, "name clash" of the "lambda calculus" is actually "update" of imperative programming, in disguise. They are one and the same - in the eyes of the Zen (who is always right).
So, let's refer to each instant and state of the variable as the Zen Scope of an object. One ordinary scope with a mutable object equals many Zen Scopes with constant, unmutable objects that either get initialized if they are the first, or inherit from their predecessor if they are not.
When people say "mutability" they're misidentifying and confusing the issue. Mutability (as we've just seen here) is a complete red herring. What they actually mean (even unbeknonwst to themselves) is infinite mutability; i.e. the kind which occurs in cyclic control flow structures. In other words, what they're actually referring to - as being specifically "imperative" and not "functional" - is not mutability at all, but cyclic control flow structures along with the infinite nesting of Zen Scopes that this entails.
The key feature that lies absent in the lambda calculus is, thus, seen not as something that may be remedied by the inclusion of an overwrought and overthought "solution" like monads (though that doesn't exclude the possibility of it getting the job done) but as infinitary terms.
A control flow structure is the wrapping of an unwrapped (possibility infinite) decision tree structure. Branches may re-converge. In the corresponding unwrapped structure, they appear as replicated, but separate, branches or subtrees. Goto's are direct links to subtrees. A goto or branch that back-branches to an earlier part of a control flow structure (the very genesis of the "cycling" of a cyclic control flow structure) is a link to an identically-shaped copy of the entire structure being linked to. Corresponding to each structure is its Universally Unrolled decision tree.
More precisely, we may think of a control-flow structure as a statement that precedes an actual expression that conditions the value of that expression. The archetypical case in point is Landin's original case, itself (in his 1960's paper, where he tried to lambda-ize imperative languages): let x = 1 in f(x). The "x = 1" part is the statement, the "f(x)" is the value being conditioned by the statement. In C-like form, we could write this as x = 1, f(x).
More generally, corresponding to each statement S and expression Q is an expression S[Q] which represents the result Q after S is applied. Thus, (x = 1)[f(x)] is just λx f(x) (x + 1). The S wraps around the Q. If S contains cyclic control flow structures, the wrapping will be infinitary.
When Landin tried to work out this strategy, he hit a hard wall when he got to the while loop and went "Oops. Never mind." and fell back into what become an overwrought and overthought solution, while this simple (and in retrospect, obvious) answer eluded his notice.
A while loop "while (x < n) x = x + 1;" - which has the "infinite mutability" mentioned above, may itself be treated as an infinitary wrapper, "if (x < n) { x = x + 1; if (x < 1) { x = x + 1; if (x < 1) { x = x + 1; ... } } }". So, when it wraps around an expression Q, the result is (in C-like notation) "x < n? (x = x + 1, x < n? (x = x + 1, x < n? (x = x + 1, ...): Q): Q): Q", which may be directly rendered in lambda form as "x < n? (λx x < n (λx x < n? (λx·...) (x + 1): Q) (x + 1): Q) (x + 1): Q". This shows directly the connection between cyclicity and infinitariness.
This is an infinitary expression that, despite being infinite, has only a finite number of distinct subexpressions. Just as we can think of there being a Universally Unrolled form to this expression - which is similar to what's shown above (an infinite decision tree) - we can also think of there being a Maximally Rolled form, which could be obtained by labelling each of the distinct subexpressions and referring to the labels, instead. The key subexpressions would then be:
A: x < n? goto B: Q
B: x = x + 1, goto A
The subexpression labels, here, are "A:" and "B:", while the references to the subexpressions so labelled as "goto A" and "goto B", respectively. So, by magic, the very essence of Imperativitity emerges directly out of the infinitary lambda calculus, without any need to posit it separately or anew.
This way of viewing things applies even down to the level of binary files. Every interpretation of every byte (whether it be a part of an opcode of an instruction that starts 0, 1, 2 or more bytes back, or as part of a data structure) can be treated as being there in tandem, so that the binary file is a rolling up of a much larger universally unrolled structure whose physical byte code representation overlaps extensively with itself.
Thus, emerges the imperative programming language paradigm automatically out of the pure lambda calculus, itself, when the calculus is extended to include infinitary terms. The control flow structure is directly embodied in the very structure of the infinitary expression, itself; and thus requires no additional hacks (like Landin's or later descendants, like monads) - as it's already there.
This synthesis of the imperative and functional paradigms arose in the late 1980's via the USENET, but has not (yet) been published. Part of it was already implicit in the treatment (dating from around the same time) given to languages, like Prolog-II, and the much earlier treatment of cyclic recursive structures by infinitary expressions by Irene Guessarian LNCS 99 "Algebraic Semantics".
Now, earlier I said that the magma-based formulation might get you to the same place, or to an approximation thereof. I believe there is a kind of universal representation theorem of some sort, which asserts that the infinitary based formulation provides a purely syntactic representation, and that the semantics that arise from the monad-based representation factors through this as "monad-based semantics" = "infinitary lambda calculus" + "semantics of infinitary languages".
Likewise, we may think of the "Q" expressions above as being continuations; so there may also be a universal representation theorem for continuation semantics, which similarly rolls this formulation back into the infinitary lambda calculus.
At this point, I've said nothing about non-rational infinitary terms (i.e. infinitary terms which possess an infinite number of distinct subterms and no finite Minimal Rolling) - particularly in relation to interprocedural control flow semantics. Rational terms suffice to account for loops and branches, and so provide a platform for intraprocedural control flow semantics; but not as much so for the call-return semantics that are the essential core element of interprocedural control flow semantics, if you consider subprograms to be directly represented as embellished, glorified macros.
There may be something similar to the Chomsky hierarchy for infinitary term languages; so that type 3 corresponds to rational terms, type 2 to "algebraic terms" (those that can be rolled up into a finite set of "goto" references and "macro" definitions), and type 0 for "transcendental terms". That is, for me, an unresolved loose end, as well.

How does one implement a "stackless" interpreted language?

I am making my own Lisp-like interpreted language, and I want to do tail call optimization. I want to free my interpreter from the C stack so I can manage my own jumps from function to function and my own stack magic to achieve TCO. (I really don't mean stackless per se, just the fact that calls don't add frames to the C stack. I would like to use a stack of my own that does not grow with tail calls). Like Stackless Python, and unlike Ruby or... standard Python I guess.
But, as my language is a Lisp derivative, all evaluation of s-expressions is currently done recursively (because it's the most obvious way I thought of to do this nonlinear, highly hierarchical process). I have an eval function, which calls a Lambda::apply function every time it encounters a function call. The apply function then calls eval to execute the body of the function, and so on. Mutual stack-hungry non-tail C recursion. The only iterative part I currently use is to eval a body of sequential s-expressions.
(defun f (x y)
(a x y)) ; tail call! goto instead of call.
; (do not grow the stack, keep return addr)
(defun a (x y)
(+ x y))
; ...
(print (f 1 2)) ; how does the return work here? how does it know it's supposed to
; return the value here to be used by print, and how does it know
; how to continue execution here??
So, how do I avoid using C recursion? Or can I use some kind of goto that jumps across c functions? longjmp, perhaps? I really don't know. Please bear with me, I am mostly self- (Internet- ) taught in programming.
One solution is what is sometimes called "trampolined style". The trampoline is a top-level loop that dispatches to small functions that do some small step of computation before returning.
I've sat here for nearly half an hour trying to contrive a good, short example. Unfortunately, I have to do the unhelpful thing and send you to a link:
http://en.wikisource.org/wiki/Scheme:_An_Interpreter_for_Extended_Lambda_Calculus/Section_5
The paper is called "Scheme: An Interpreter for Extended Lambda Calculus", and section 5 implements a working scheme interpreter in an outdated dialect of Lisp. The secret is in how they use the **CLINK** instead of a stack. The other globals are used to pass data around between the implementation functions like the registers of a CPU. I would ignore **QUEUE**, **TICK**, and **PROCESS**, since those deal with threading and fake interrupts. **EVLIS** and **UNEVLIS** are, specifically, used to evaluate function arguments. Unevaluated args are stored in **UNEVLIS**, until they are evaluated and out into **EVLIS**.
Functions to pay attention to, with some small notes:
MLOOP: MLOOP is the main loop of the interpreter, or "trampoline". Ignoring **TICK**, its only job is to call whatever function is in **PC**. Over and over and over.
SAVEUP: SAVEUP conses all the registers onto the **CLINK**, which is basically the same as when C saves the registers to the stack before a function call. The **CLINK** is actually a "continuation" for the interpreter. (A continuation is just the state of a computation. A saved stack frame is technically continuation, too. Hence, some Lisps save the stack to the heap to implement call/cc.)
RESTORE: RESTORE restores the "registers" as they were saved in the **CLINK**. It's similar to restoring a stack frame in a stack-based language. So, it's basically "return", except some function has explicitly stuck the return value into **VALUE**. (**VALUE** is obviously not clobbered by RESTORE.) Also note that RESTORE doesn't always have to return to a calling function. Some functions will actually SAVEUP a whole new computation, which RESTORE will happily "restore".
AEVAL: AEVAL is the EVAL function.
EVLIS: EVLIS exists to evaluate a function's arguments, and apply a function to those args. To avoid recursion, it SAVEUPs EVLIS-1. EVLIS-1 would just be regular old code after the function application if the code was written recursively. However, to avoid recursion, and the stack, it is a separate "continuation".
I hope I've been of some help. I just wish my answer (and link) was shorter.
What you're looking for is called continuation-passing style. This style adds an additional item to each function call (you could think of it as a parameter, if you like), that designates the next bit of code to run (the continuation k can be thought of as a function that takes a single parameter). For example you can rewrite your example in CPS like this:
(defun f (x y k)
(a x y k))
(defun a (x y k)
(+ x y k))
(f 1 2 print)
The implementation of + will compute the sum of x and y, then pass the result to k sort of like (k sum).
Your main interpreter loop then doesn't need to be recursive at all. It will, in a loop, apply each function application one after another, passing the continuation around.
It takes a little bit of work to wrap your head around this. I recommend some reading materials such as the excellent SICP.
Tail recursion can be thought of as reusing for the callee the same stack frame that you are currently using for the caller. So you could just re-set the arguments and goto to the beginning of the function.

Are there problems that cannot be written using tail recursion?

Tail recursion is an important performance optimisation stragegy in functional languages because it allows recursive calls to consume constant stack (rather than O(n)).
Are there any problems that simply cannot be written in a tail-recursive style, or is it always possible to convert a naively-recursive function into a tail-recursive one?
If so, one day might functional compilers and interpreters be intelligent enough to perform the conversion automatically?
Yes, actually you can take some code and convert every function call—and every return—into a tail call. What you end up with is called continuation-passing style, or CPS.
For example, here's a function containing two recursive calls:
(define (count-tree t)
(if (pair? t)
(+ (count-tree (car t)) (count-tree (cdr t)))
1))
And here's how it would look if you converted this function to continuation-passing style:
(define (count-tree-cps t ctn)
(if (pair? t)
(count-tree-cps (car t)
(lambda (L) (count-tree-cps (cdr t)
(lambda (R) (ctn (+ L R))))))
(ctn 1)))
The extra argument, ctn, is a procedure which count-tree-cps tail-calls instead of returning. (sdcvvc's answer says that you can't do everything in O(1) space, and that is correct; here each continuation is a closure which takes up some memory.)
I didn't transform the calls to car or cdr or + into tail-calls. That could be done as well, but I assume those leaf calls would actually be inlined.
Now for the fun part. Chicken Scheme actually does this conversion on all code it compiles. Procedures compiled by Chicken never return. There's a classic paper explaining why Chicken Scheme does this, written in 1994 before Chicken was implemented: CONS should not cons its arguments, Part II: Cheney on the M.T.A.
Surprisingly enough, continuation-passing style is fairly common in JavaScript. You can use it to do long-running computation, avoiding the browser's "slow script" popup. And it's attractive for asynchronous APIs. jQuery.get (a simple wrapper around XMLHttpRequest) is clearly in continuation-passing style; the last argument is a function.
It's true but not useful to observe that any collection of mutually recursive functions can be turned into a tail-recursive function. This observation is on a par with the old chestnut fro the 1960s that control-flow constructs could be eliminated because every program could be written as a loop with a case statement nested inside.
What's useful to know is that many functions which are not obviously tail-recursive can be converted to tail-recursive form by the addition of accumulating parameters. (An extreme version of this transformation is the transformation to continuation-passing style (CPS), but most programmers find the output of the CPS transform difficult to read.)
Here's an example of a function that is "recursive" (actually it's just iterating) but not tail-recursive:
factorial n = if n == 0 then 1 else n * factorial (n-1)
In this case the multiply happens after the recursive call.
We can create a version that is tail-recursive by putting the product in an accumulating parameter:
factorial n = f n 1
where f n product = if n == 0 then product else f (n-1) (n * product)
The inner function f is tail-recursive and compiles into a tight loop.
I find the following distinctions useful:
In an iterative or recursive program, you solve a problem of size n by
first solving one subproblem of size n-1. Computing the factorial function
falls into this category, and it can be done either iteratively or
recursively. (This idea generalizes, e.g., to the Fibonacci function, where
you need both n-1 and n-2 to solve n.)
In a recursive program, you solve a problem of size n by first solving two
subproblems of size n/2. Or, more generally, you solve a problem of size n
by first solving a subproblem of size k and one of size n-k, where 1 < k < n. Quicksort and mergesort are two examples of this kind of problem, which
can easily be programmed recursively, but is not so easy to program
iteratively or using only tail recursion. (You essentially have to simulate recursion using an explicit
stack.)
In dynamic programming, you solve a problem of size n by first solving all
subproblems of all sizes k, where k<n. Finding the shortest route from one
point to another on the London Underground is an example of this kind of
problem. (The London Underground is a multiply-connected graph, and you
solve the problem by first finding all points for which the shortest path
is 1 stop, then for which the shortest path is 2 stops, etc etc.)
Only the first kind of program has a simple transformation into tail-recursive form.
Any recursive algorithm can be rewritten as an iterative algorithm (perhaps requiring a stack or list) and iterative algorithms can always be rewritten as tail-recursive algorithms, so I think it's true that any recursive solution can somehow be converted to a tail-recursive solution.
(In comments, Pascal Cuoq points out that any algorithm can be converted to continuation-passing style.)
Note that just because something is tail-recursive doesn't mean that its memory usage is constant. It just means that the call-return stack doesn't grow.
You can't do everything in O(1) space (space hierarchy theorem). If you insist on using tail recursion, then you can store the call stack as one of the arguments. Obviously this doesn't change anything; somewhere internally, there is a call stack, you're simply making it explicitly visible.
If so, one day might functional compilers and interpreters be intelligent enough to perform the conversion automatically?
Such conversion will not decrease space complexity.
As Pascal Cuoq commented, another way is to use CPS; all calls are tail recursive then.
I don't think something like tak could be implemented using only tail calls. (not allowing continuations)

Resources