this is a purely academic question:
I have recently been working with languages that use tail recursion optimization. For practice I wrote two recursive implementations of sum functions in R, one of them being tail recursive. I quickly realized there is no tail recursion optimization in R. I can live with that.
However, I also noticed a different level of allowed depth when using the local helper function for the tail recursion.
Here is the code:
## Recursive
sum <- function (i, end, fun){
if (i>=end) 0
else fun(i) + sum(i+1, end, fun)
}
## Tail recursive
sum_tail <- function (i, end, fun){
sum_helper<- function(i, acc){
if (i>=end) acc
else sum_helper(i+1, acc+fun(i))
}
sum_helper(i, 0)
}
## Simple example
harmonic <- function(k){
return(1/(k))
}
print(sum(1, 1200, harmonic)) # <- This works fine, but is close to the limit
# print(sum_tail(1, 1200, harmonic)) <- This will crash
print(sum_tail(1, 996, harmonic)) # <- This is the deepest allowed
I am fairly intrigued. Can someone explain this behavior or point me towards a document explaining how the allowed recursion depth is calculated?
I'm not sure of R's internal implementation of the call stack, but it's pretty obvious from here that there is a maximum stack depth. (Many languages have this for various reasons, mostly related to memory and detecting infinite recursion.) You can set it with options(), and the default setting seems to depends on the platform -- on my machine, I can do print(sum_tail(1, 996, harmonic)) without difficulty.
Sidebar: you really shouldn't name your naive implementation sum() because you wind up shadowing a builtin. I know you're just playing with recursion here, but you should also generally avoid doing your own implementation of sum() -- it's not provided just as a convenience function but also because it's non trivial to implement a numerically correct version of sum() with floating point.
In your naive implementation, the call to fun() returns before the recursive call -- this means that each recursive call increases the depth of the call stack by exactly 1. In the other case, you've got an additional function call that's waiting to be evaluated. For more details, you should look into how R handles closures and how lazy / eager evaluation in R is handled. If I recall correctly, R uses environments (roughly, R's notion of scope, and deeply related to closures) to wrap arguments in certain situations and delay their evaluation, thus effectively using lazy evaluation. There's a lot of information on R internals available online, see here for a quick overview of argument evaluation. I'm not sure how accurate I am on the details, but it seems that the arguments to the tail-call are themselves getting placed on the call stack, thus increasing the depth of the call-stack by more than 1.
Sidebar the Second: I don't recall well enough how R implements this, and I know placing helper functions in the body is common practice, but placing a helper function definition in the recursive call could lead to each recursive call defining the helper function anew. This could interact in various ways with the way environments and closures are handled, but I'm not sure.
The functions traceback() and trace() could be useful in exploring the call behavior if you're curious about more of the details.
I am making my own Lisp-like interpreted language, and I want to do tail call optimization. I want to free my interpreter from the C stack so I can manage my own jumps from function to function and my own stack magic to achieve TCO. (I really don't mean stackless per se, just the fact that calls don't add frames to the C stack. I would like to use a stack of my own that does not grow with tail calls). Like Stackless Python, and unlike Ruby or... standard Python I guess.
But, as my language is a Lisp derivative, all evaluation of s-expressions is currently done recursively (because it's the most obvious way I thought of to do this nonlinear, highly hierarchical process). I have an eval function, which calls a Lambda::apply function every time it encounters a function call. The apply function then calls eval to execute the body of the function, and so on. Mutual stack-hungry non-tail C recursion. The only iterative part I currently use is to eval a body of sequential s-expressions.
(defun f (x y)
(a x y)) ; tail call! goto instead of call.
; (do not grow the stack, keep return addr)
(defun a (x y)
(+ x y))
; ...
(print (f 1 2)) ; how does the return work here? how does it know it's supposed to
; return the value here to be used by print, and how does it know
; how to continue execution here??
So, how do I avoid using C recursion? Or can I use some kind of goto that jumps across c functions? longjmp, perhaps? I really don't know. Please bear with me, I am mostly self- (Internet- ) taught in programming.
One solution is what is sometimes called "trampolined style". The trampoline is a top-level loop that dispatches to small functions that do some small step of computation before returning.
I've sat here for nearly half an hour trying to contrive a good, short example. Unfortunately, I have to do the unhelpful thing and send you to a link:
http://en.wikisource.org/wiki/Scheme:_An_Interpreter_for_Extended_Lambda_Calculus/Section_5
The paper is called "Scheme: An Interpreter for Extended Lambda Calculus", and section 5 implements a working scheme interpreter in an outdated dialect of Lisp. The secret is in how they use the **CLINK** instead of a stack. The other globals are used to pass data around between the implementation functions like the registers of a CPU. I would ignore **QUEUE**, **TICK**, and **PROCESS**, since those deal with threading and fake interrupts. **EVLIS** and **UNEVLIS** are, specifically, used to evaluate function arguments. Unevaluated args are stored in **UNEVLIS**, until they are evaluated and out into **EVLIS**.
Functions to pay attention to, with some small notes:
MLOOP: MLOOP is the main loop of the interpreter, or "trampoline". Ignoring **TICK**, its only job is to call whatever function is in **PC**. Over and over and over.
SAVEUP: SAVEUP conses all the registers onto the **CLINK**, which is basically the same as when C saves the registers to the stack before a function call. The **CLINK** is actually a "continuation" for the interpreter. (A continuation is just the state of a computation. A saved stack frame is technically continuation, too. Hence, some Lisps save the stack to the heap to implement call/cc.)
RESTORE: RESTORE restores the "registers" as they were saved in the **CLINK**. It's similar to restoring a stack frame in a stack-based language. So, it's basically "return", except some function has explicitly stuck the return value into **VALUE**. (**VALUE** is obviously not clobbered by RESTORE.) Also note that RESTORE doesn't always have to return to a calling function. Some functions will actually SAVEUP a whole new computation, which RESTORE will happily "restore".
AEVAL: AEVAL is the EVAL function.
EVLIS: EVLIS exists to evaluate a function's arguments, and apply a function to those args. To avoid recursion, it SAVEUPs EVLIS-1. EVLIS-1 would just be regular old code after the function application if the code was written recursively. However, to avoid recursion, and the stack, it is a separate "continuation".
I hope I've been of some help. I just wish my answer (and link) was shorter.
What you're looking for is called continuation-passing style. This style adds an additional item to each function call (you could think of it as a parameter, if you like), that designates the next bit of code to run (the continuation k can be thought of as a function that takes a single parameter). For example you can rewrite your example in CPS like this:
(defun f (x y k)
(a x y k))
(defun a (x y k)
(+ x y k))
(f 1 2 print)
The implementation of + will compute the sum of x and y, then pass the result to k sort of like (k sum).
Your main interpreter loop then doesn't need to be recursive at all. It will, in a loop, apply each function application one after another, passing the continuation around.
It takes a little bit of work to wrap your head around this. I recommend some reading materials such as the excellent SICP.
Tail recursion can be thought of as reusing for the callee the same stack frame that you are currently using for the caller. So you could just re-set the arguments and goto to the beginning of the function.
Let's say I have this code here:
do_recv_loop(State) ->
receive
{do,Stuff} ->
case Stuff of
one_thing ->
do_one_thing(),
do_recv_loop(State);
another_thing ->
do_another_thing(),
do_recv_loop(State);
_ ->
im_dead_now
end
{die} -> im_dead_now;
_ -> do_recv_loop(State)
end.
Now, in theory this is tail-recursive, as none of the three calls to do_recv_loop require anything to be returned. But will erlang recognize that this is tail recursive and optimize appropriately? I'm worried that the nested structure might make it not able to recognize it.
Yes, it will. Erlang is required to optimize tail calls, and this is clearly a tail call since nothing happens after the function is called.
I used to wish there were a tailcall keyword in Erlang so the compiler could warn me about invalid uses, but then I got used to it.
Yes, it is tail recursive. The main gotcha to be aware of is if you are wrapped inside exceptions. In that case, sometimes the exception needs to live on the stack and that will make something that looks tail-recursive into something deceptively not so.
The tail-call optimization is applicable if the call is in tail-position. Tail position is the "last thing before the function will return". Note that in
fact(0) -> 1;
fact(N) -> N * fact(N-1).
the recursive call to fact is not in tail position because after fact(N-1) is calculated, you need to run the continuation N * _ (i.e., multiply by N).
This I think is relevant because you are asking about how you know if your recursive function is optimized by the compiler. Since you aren't using lists:reverse/1 the below might not apply but for someone else with the exact same question but with a different code example it might be very relevant.
From the The Eight Myths of Erlang Performance in the Erlang Efficiency Guide
In R12B and later releases, there is
an optimization that will in many
cases reduces the number of words used
on the stack in body-recursive calls,
so that a body-recursive list function
and tail-recursive function that calls
lists:reverse/1 at the end will use
exactly the same amount of memory.
http://www.erlang.org/doc/efficiency_guide/myths.html#id58884
I think the take away message is that you may have to measure in some cases to see what will be best.
I'm pretty new to Erlang but from what I've gathered, the rule seems to be that in order to be tail-recursive, the function has to do one of two things in any given logical branch:
not make a recursive call
return the value of the recursive call and do nothing else after it
That recursive call can be nested into as many if, case, or receive calls as you want as long as nothing actually happens after it.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
What are real-world problems where a recursive approach is the natural solution besides depth-first search (DFS)?
(I don't consider Tower of Hanoi, Fibonacci number, or factorial real-world problems. They are a bit contrived in my mind.)
A real world example of recursion
How about anything involving a directory structure in the file system. Recursively finding files, deleting files, creating directories, etc.
Here is a Java implementation that recursively prints out the content of a directory and its sub-directories.
import java.io.File;
public class DirectoryContentAnalyserOne implements DirectoryContentAnalyser {
private static StringBuilder indentation = new StringBuilder();
public static void main (String args [] ){
// Here you pass the path to the directory to be scanned
getDirectoryContent("C:\\DirOne\\DirTwo\\AndSoOn");
}
private static void getDirectoryContent(String filePath) {
File currentDirOrFile = new File(filePath);
if ( !currentDirOrFile.exists() ){
return;
}
else if ( currentDirOrFile.isFile() ){
System.out.println(indentation + currentDirOrFile.getName());
return;
}
else{
System.out.println("\n" + indentation + "|_" +currentDirOrFile.getName());
indentation.append(" ");
for ( String currentFileOrDirName : currentDirOrFile.list()){
getPrivateDirectoryContent(currentDirOrFile + "\\" + currentFileOrDirName);
}
if (indentation.length() - 3 > 3 ){
indentation.delete(indentation.length() - 3, indentation.length());
}
}
}
}
There are lots of mathy examples here, but you wanted a real world example, so with a bit of thinking, this is possibly the best I can offer:
You find a person who has contracted a given contageous infection, which is non fatal, and fixes itself quickly( Type A) , Except for one in 5 people ( We'll call these type B ) who become permanently infected with it and shows no symptoms and merely acts a spreader.
This creates quite annoying waves of havoc when ever type B infects a multitude of type A.
Your task is to track down all the type Bs and immunise them to stop the backbone of the disease. Unfortunately tho, you cant administer a nationwide cure to all, because the people who are typeAs are also deadly allergic to the cure that works for type B.
The way you would do this, would be social discovery, given an infected person(Type A), choose all their contacts in the last week, marking each contact on a heap. When you test a person is infected, add them to the "follow up" queue. When a person is a type B, add them to the "follow up" at the head ( because you want to stop this fast ).
After processing a given person, select the person from the front of the queue and apply immunization if needed. Get all their contacts previously unvisited, and then test to see if they're infected.
Repeat until the queue of infected people becomes 0, and then wait for another outbreak..
( Ok, this is a bit iterative, but its an iterative way of solving a recursive problem, in this case, breadth first traversal of a population base trying to discover likely paths to problems, and besides, iterative solutions are often faster and more effective, and I compulsively remove recursion everywhere so much its become instinctive. .... dammit! )
Quicksort, merge sort, and most other N-log N sorts.
Matt Dillard's example is good. More generally, any walking of a tree can generally be handled by recursion very easily. For instance, compiling parse trees, walking over XML or HTML, etc.
Recursion is often used in implementations of the Backtracking algorithm. For a "real-world" application of this, how about a Sudoku solver?
Recursion is appropriate whenever a problem can be solved by dividing it into sub-problems, that can use the same algorithm for solving them. Algorithms on trees and sorted lists are a natural fit. Many problems in computational geometry (and 3D games) can be solved recursively using binary space partitioning (BSP) trees, fat subdivisions, or other ways of dividing the world into sub-parts.
Recursion is also appropriate when you are trying to guarantee the correctness of an algorithm. Given a function that takes immutable inputs and returns a result that is a combination of recursive and non-recursive calls on the inputs, it's usually easy to prove the function is correct (or not) using mathematical induction. It's often intractable to do this with an iterative function or with inputs that may mutate. This can be useful when dealing with financial calculations and other applications where correctness is very important.
Surely that many compilers out there use recursion heavily. Computer languages are inherently recursive themselves (i.e., you can embed 'if' statements inside other 'if' statements, etc.).
Disabling/setting read-only for all children controls in a container control. I needed to do this because some of the children controls were containers themselves.
public static void SetReadOnly(Control ctrl, bool readOnly)
{
//set the control read only
SetControlReadOnly(ctrl, readOnly);
if (ctrl.Controls != null && ctrl.Controls.Count > 0)
{
//recursively loop through all child controls
foreach (Control c in ctrl.Controls)
SetReadOnly(c, readOnly);
}
}
People often sort stacks of documents using a recursive method. For example, imagine you are sorting 100 documents with names on them. First place documents into piles by the first letter, then sort each pile.
Looking up words in the dictionary is often performed by a binary-search-like technique, which is recursive.
In organizations, bosses often give commands to department heads, who in turn give commands to managers, and so on.
Famous Eval/Apply cycle from SICP
(source: mit.edu)
Here is the definition of eval:
(define (eval exp env)
(cond ((self-evaluating? exp) exp)
((variable? exp) (lookup-variable-value exp env))
((quoted? exp) (text-of-quotation exp))
((assignment? exp) (eval-assignment exp env))
((definition? exp) (eval-definition exp env))
((if? exp) (eval-if exp env))
((lambda? exp)
(make-procedure (lambda-parameters exp)
(lambda-body exp)
env))
((begin? exp)
(eval-sequence (begin-actions exp) env))
((cond? exp) (eval (cond->if exp) env))
((application? exp)
(apply (eval (operator exp) env)
(list-of-values (operands exp) env)))
(else
(error "Unknown expression type - EVAL" exp))))
Here is the definition of apply:
(define (apply procedure arguments)
(cond ((primitive-procedure? procedure)
(apply-primitive-procedure procedure arguments))
((compound-procedure? procedure)
(eval-sequence
(procedure-body procedure)
(extend-environment
(procedure-parameters procedure)
arguments
(procedure-environment procedure))))
(else
(error
"Unknown procedure type - APPLY" procedure))))
Here is the definition of eval-sequence:
(define (eval-sequence exps env)
(cond ((last-exp? exps) (eval (first-exp exps) env))
(else (eval (first-exp exps) env)
(eval-sequence (rest-exps exps) env))))
eval -> apply -> eval-sequence -> eval
Recursion is used in things like BSP trees for collision detection in game development (and other similar areas).
Real world requirement I got recently:
Requirement A: Implement this feature after thoroughly understanding Requirement A.
Recursion is applied to problems (situations) where you can break it up (reduce it) into smaller parts, and each part(s) looks similar to the original problem.
Good examples of where things that contain smaller parts similar to itself are:
tree structure (a branch is like a tree)
lists (part of a list is still a list)
containers (Russian dolls)
sequences (part of a sequence looks like the next)
groups of objects (a subgroup is a still a group of objects)
Recursion is a technique to keep breaking the problem down into smaller and smaller pieces, until one of those pieces become small enough to be a piece-of-cake. Of course, after you break them up, you then have to "stitch" the results back together in the right order to form a total solution of your original problem.
Some recursive sorting algorithms, tree-walking algorithms, map/reduce algorithms, divide-and-conquer are all examples of this technique.
In computer programming, most stack-based call-return type languages already have the capabilities built in for recursion: i.e.
break the problem down into smaller pieces ==> call itself on a smaller subset of the original data),
keep track on how the pieces are divided ==> call stack,
stitch the results back ==> stack-based return
Feedback loops in a hierarchical organization.
Top boss tells top executives to collect feedback from everyone in the company.
Each executive gathers his/her direct reports and tells them to gather feedback from their direct reports.
And on down the line.
People with no direct reports -- the leaf nodes in the tree -- give their feedback.
The feedback travels back up the tree with each manager adding his/her own feedback.
Eventually all the feedback makes it back up to the top boss.
This is the natural solution because the recursive method allows filtering at each level -- the collating of duplicates and the removal of offensive feedback. The top boss could send a global email and have each employee report feedback directly back to him/her, but there are the "you can't handle the truth" and the "you're fired" problems, so recursion works best here.
Parsers and compilers may be written in a recursive-descent method. Not the best way to do it, as tools like lex/yacc generate faster and more efficient parsers, but conceptually simple and easy to implement, so they remain common.
I have a system that uses pure tail recursion in a few places to simulate a state machine.
Some great examples of recursion are found in functional programming languages. In functional programming languages (Erlang, Haskell, ML/OCaml/F#, etc.), it's very common to have any list processing use recursion.
When dealing with lists in typical imperative OOP-style languages, it's very common to see lists implemented as linked lists ([item1 -> item2 -> item3 -> item4]). However, in some functional programming languages, you find that lists themselves are implemented recursively, where the "head" of the list points to the first item in the list, and the "tail" points to a list containing the rest of the items ([item1 -> [item2 -> [item3 -> [item4 -> []]]]]). It's pretty creative in my opinion.
This handling of lists, when combined with pattern matching, is VERY powerful. Let's say I want to sum a list of numbers:
let rec Sum numbers =
match numbers with
| [] -> 0
| head::tail -> head + Sum tail
This essentially says "if we were called with an empty list, return 0" (allowing us to break the recursion), else return the value of head + the value of Sum called with the remaining items (hence, our recursion).
For example, I might have a list of URLs, I think break apart all the URLs each URL links to, and then I reduce the total number of links to/from all URLs to generate "values" for a page (an approach that Google takes with PageRank and that you can find defined in the original MapReduce paper). You can do this to generate word counts in a document also. And many, many, many other things as well.
You can extend this functional pattern to any type of MapReduce code where you can taking a list of something, transforming it, and returning something else (whether another list, or some zip command on the list).
XML, or traversing anything that is a tree. Although, to be honest, I pretty much never use recursion in my job.
A "real-world" problem solved by recursion would be nesting dolls. Your function is OpenDoll().
Given a stack of them, you would recursilvey open the dolls, calling OpenDoll() if you will, until you've reached the inner-most doll.
Parsing an XML file.
Efficient search in multi-dimensional spaces. E. g. quad-trees in 2D, oct-trees in 3D, kd-trees, etc.
Hierarchical clustering.
Come to think of it, traversing any hierarchical structure naturally lends itself to recursion.
Template metaprogramming in C++, where there are no loops and recursion is the only way.
Suppose you are building a CMS for a website, where your pages are in a tree structure, with say the root being the home-page.
Suppose also your {user|client|customer|boss} requests that you place a breadcrumb trail on every page to show where you are in the tree.
For any given page n, you'll may want to walk up to the parent of n, and its parent, and so on, recursively to build a list of nodes back up to the root of page tree.
Of course, you're hitting the db several times per page in that example, so you may want to use some SQL aliasing where you look up page-table as a, and page-table again as b, and join a.id with b.parent so you make the database do the recursive joins. It's been a while, so my syntax is probably not helpful.
Then again, you may just want to only calculate this once and store it with the page record, only updating it if you move the page. That'd probably be more efficient.
Anyway, that's my $.02
You have an organization tree that is N levels deep. Several of the nodes are checked, and you want to expand out to only those nodes that have been checked.
This is something that I actually coded.
Its nice and easy with recursion.
In my job we have a system with a generic data structure that can be described as a tree. That means that recursion is a very effective technique to work with the data.
Solving it without recursion would require a lot of unnecessary code. The problem with recursion is that it is not easy to follow what happens. You really have to concentrate when following the flow of execution. But when it works the code is elegant and effective.
Calculations for finance/physics, such as compound averages.
Parsing a tree of controls in Windows Forms or WebForms (.NET Windows Forms / ASP.NET).
The best example I know is quicksort, it is a lot simpler with recursion. Take a look at:
shop.oreilly.com/product/9780596510046.do
www.amazon.com/Beautiful-Code-Leading-Programmers-Practice/dp/0596510047
(Click on the first subtitle under the chapter 3: "The most beautiful code I ever wrote").
Phone and cable companies maintain a model of their wiring topology, which in effect is a large network or graph. Recursion is one way to traverse this model when you want to find all parent or all child elements.
Since recursion is expensive from a processing and memory perspective, this step is commonly only performed when the topology is changed and the result is stored in a modified pre-ordered list format.
Inductive reasoning, the process of concept-formation, is recursive in nature. Your brain does it all the time, in the real world.
Ditto the comment about compilers. The abstract syntax tree nodes naturally lend themselves to recursion. All recursive data structures (linked lists, trees, graphs, etc.) are also more easily handled with recursion. I do think that most of us don't get to use recursion a lot once we are out of school because of the types of real-world problems, but it's good to be aware of it as an option.