Understanding Lazy Evaluation in Haskell - functional-programming

I am trying to learn Haskell, but i am stuck in understanding lazy evaluation.
Can someone explain me lazy evaluation in detail and the output of the following 2 cases[with explaination] in relation to the below given
Pseudo Code:
x = keyboard input (5)
y = x + 3 (=8)
echo y (8)
x = keyboard input (2)
echo y
Case 1: Static binding, lazy evaluation
Case 2: Dynamic binding, lazy evaluation.
I need to know what will the last line (echo y) is going to print...in the above 2 cases.

Sorry this is way too long but...
I'm afraid the answer is going to depend a lot on the meaning of the words...
First, here's that code in Haskell (which uses static binding and lazy evaluation):
readInt :: String -> Int
readInt = read
main = do
x <- fmap readInt getLine
let y = x + 3
print y
x <- fmap readInt getLine
print y
It prints 8 and 8.
Now here's that code in R which uses lazy evaluation and what some people call
dynamic binding:
delayedAssign('x', as.numeric(readLines(n=1)))
delayedAssign('y', x + 3)
print(y)
delayedAssign('x', as.numeric(readLines(n=1)))
print(y)
It prints 8 and 8. Not so different!
Now in C++, which uses strict evaluation and static binding:
#include <iostream>
int main() {
int x;
std::cin >> x;
int y = x + 3;
std::cout << y << "\n";
std::cin >> x;
std::cout << y << "\n";
}
It prints 8 and 8.
Now let me tell you what I think the point of the question actually was ;)
"lazy evaluation" can mean many different things. In Haskell it has a very
particular meaning, which is that in nested expressions:
f (g (h x))
evaluation works as if f gets evaluated before g (h x), ie evaluation
goes "outside -> in". Practically this means that if f looks like
f x = 2
ie just throws away its argument, g (h x) never gets evaluated.
But I think that that is not where the question was going with "lazy
evaluation". The reason I think this is that:
+ always evaluates its arguments! + is the same whether you're using lazy
evaluation or not.
The only computation that could actually be delayed is keyboard input --
and that's not really computation, because it causes an action to occur;
that is, it reads from the user.
Haskell people would generally not call this "lazy evaluation" -- they would call
it lazy (or deferred) execution.
So what would lazy execution mean for your question? It would mean that the
action keyboard input gets delayed... until the value x is really really
needed. It looks to me like that happens here:
echo y
because at that point you must show the user a value, and so you must know what
x is! So what would happen with lazy execution and static binding?
x = keyboard input # nothing happens
y = x + 3 # still nothing happens!
echo y (8) # y becomes 8. 8 gets printed.
x = keyboard input (2) # nothing happens
echo y # y is still 8. 8 gets printed.
Now about this word "dynamic binding". It can mean different things:
Variable scope and lifetime is decided at run time. This is what languages
like R do that don't declare variables.
The formula for a computation (like the formula for y is x + 3) isn't
inspected until the variable is evaluated.
My guess is that that is what "dynamic binding" means in your question. Going
over the code again with dynamic binding (sense 2) and lazy execution:
x = keyboard input # nothing happens
y = x + 3 # still nothing happens!
echo y (8) # y becomes 8. 8 gets printed.
x = keyboard input (2) # nothing happens
echo y # y is already evaluated,
# so it uses the stored value and prints 8
I know of no language that would actually print 7 for the last line... but I
really think that's what the question was hoping would happen!

The key thing about lazy evaluation in Haskell is that it doesn't affect the output of your program at all. You can read it just as if everything were evaluated as soon as it is defined, and you'll still get the same result.
Lazy evaluation is just a strategy for figuring out the value of an expression in the program. There are many possible and they all give the same result[1]; any evaluation strategy that changes the meaning of the program wouldn't be a valid strategy!
So from a certain perspective, you don't have to understand lazy evaluation (yet) if it's giving you trouble. When you're learning Haskell, especially if it's your first functional and pure language, thinking about expressing yourself in this way is much more important. I would also rate training yourself to become comfortable with reading Haskell's (often quite dense) syntax as more important than fully "grokking" lazy evaluation. So don't worry about it too much if the concept gives you difficulty.
That said, my go at explaining it is below. I haven't used your examples, as they're not really affected by lazy evaluation, and Owen has talked more clearly than I can about dynamic binding and delayed execution wrt your example.
The most important difference between (valid) evaluation strategies is that some strategies can fail to return a result at all where another strategy might succeed. Lazy evaluation has the particular property that if any (valid) evaluation strategy can find a result, lazy evaluation will find it. In particular, programs that generate infinite data structures and then only use a finite amount of the data can terminate with lazy evaluation. In the strict evaluation you're probably used to, the program has to finish generating the infinite data structure before it can go on to use part of it, and of course it will.
The way lazy evaluation achieves this is by only evaluating something when it's needed to figure out what to do next. When you call a function that returns a list, it "returns" straight away and gives you a placeholder for the list. That placeholder can be passed to other functions, stored in other data structures, anything. Only when the program needs to know something about the list will it be actually evaluated, and only as far as needed.
Say the program now is going to do something different if the list is empty than if it is not. The the function call that originally returned the placeholder is evaluated a little bit further, to see if it returns an empty list or a list with a head element. Then the evaluation stops again, as the program now knows which way to go. If the rest of the list is never needed, it will never be evaluated.
But it's also not evaluated more times than needed. If the placeholder was passed into multiple functions (so it's now involved in other not-yet-evaluated function calls), or stored into several different data structures, Haskell still "knows" that they're all the same thing, and arranges for them all to "see" the effects of any further evaluation of the placeholder triggered from any of them. Eventually, if all of the list is needed somewhere, they'll all be pointing to an ordinary fully-evaluated data structure, and laziness has no further impact.
But the key thing to remember is that everything needed to produce that list is already determined and fixed when the placeholder was generated. It can't be affected by anything else that's happened in the program since. If that were not so, then Haskell would not be pure. And vice versa; impure languages can't have Haskell-style full laziness behind the scenes, because the results you would get could change dramatically depending on when in the future the results are needed. Instead, impure languages that support lazy evaluation tend to have it only for certain things explicitly declared by the programmer, with warnings in the manual saying "don't use laziness on something dependent on side effects".
[1] I lie a little here. Keep reading below the line to see why.

Lazy Evaluation in Haskell: Leftmost-Outermost + Graph Reduction
Square x = x * x
Square (Square 42)
(Square 42) * (Square 42) -> Square 42 will be computed only one time thanks to Graph Reduction
(42 * 42) * (Square 42)
(1764) * (Square 42) -> next is Graph Reduction
1764 * 1764
=3111696
Leftmost-innermost (Java, C++)
Square (Square 42)
square ( 42 * 42)
square ( 1764 )
1764 * 1764
=3111696

Related

Simple example of call-by-need

I'm trying to understand the theorem behind "call-by-need." I do understand the definition, but I'm a bit confused. I would like to see a simple example which shows how call-by-need works.
After reading some previous threads, I found out that Haskell uses this kind of evaluation. Are there any other programming languages which support this feature?
I read about the call-by-name of Scala, and I do understand that call-by-name and call-by-need are similar but different by the fact that call-by-need will keep the evaluated value. But I really would love to see a real-life example (it does not have to be in Haskell), which shows call-by-need.
The function
say_hello numbers = putStrLn "Hello!"
ignores its numbers argument. Under call-by-value semantics, even though an argument is ignored, the parameter at the function call site may need to be evaluated, perhaps because of side effects that the rest of the program depends on.
In Haskell, we might call say_hello as
say_hello [1..]
where [1..] is the infinite list of naturals. Under call-by-value semantics, the CPU would run off trying to build an infinite list and never get to the say_hello at all!
Haskell merely outputs
$ runghc cbn.hs
Hello!
For less dramatic examples, the first ten natural numbers are
ghci> take 10 [1..]
[1,2,3,4,5,6,7,8,9,10]
The first ten odds are
ghci> take 10 $ filter odd [1..]
[1,3,5,7,9,11,13,15,17,19]
Under call-by-need semantics, each value — even a conceptually infinite one as in the examples above — is evaluated only to the extent required and no more.
update: A simple example, as asked for:
ff 0 = 1
ff 1 = 1
ff n = go (ff (n-1))
where
go x = x + x
Under call-by-name, each invocation of go evaluates ff (n-1) twice, each for each appearance of x in its definition (because + is strict in both arguments, i.e. demands the values of the both of them).
Under call-by-need, go's argument is evaluated at most once. Specifically, here, x's value is found out only once, and reused for the second appearance of x in the expression x + x. If it weren't needed, x wouldn't be evaluated at all, just as with call-by-name.
Under call-by-value, go's argument is always evaluated exactly once, prior to entering the function's body, even if it isn't used anywhere in the function's body.
Here's my understanding of it, in the context of Haskell.
According to Wikipedia, "call by need is a memoized variant of call by name where, if the function argument is evaluated, that value is stored for subsequent uses."
Call by name:
take 10 . filter even $ [1..]
With one consumer the produced value disappears after being produced so it might as well be call-by-name.
Call by need:
import qualified Data.List.Ordered as O
h = 1 : map (2*) h <> map (3*) h <> map (5*) h
where
(<>) = O.union
The difference is, here the h list is reused by several consumers, at different tempos, so it is essential that the produced values are remembered. In a call-by-name language there'd be much replication of computational effort here because the computational expression for h would be substituted at each of its occurrences, causing separate calculation for each. In a call-by-need--capable language like Haskell the results of computing the elements of h are shared between each reference to h.
Another example is, most any data defined by fix is only possible under call-by-need. With call-by-value the most we can have is the Y combinator.
See: Sharing vs. non-sharing fixed-point combinator and its linked entries and comments (among them, this, and its links, like Can fold be used to create infinite lists?).

Prolog arithmetic in foreach

So I'm learning Prolog. One of the things I've found to be really obnoxious is demonstrated in the following example:
foreach(
between(1,10,X),
somePredicate(X,X+Y,Result)
).
This does not work. I am well aware that X+Y is not evaluated, here, and instead I'd have to do:
foreach(
between(1,10,X),
(
XPlusY is X + Y,
somePredicate(X, XPlusY, Result)
)
).
Except, that doesn't work, either. As near as I can tell, the scope of XPlusY extends outside of foreach - i.e., XPlusY is 1 + Y, XPlusY is 2 + Y, etc. must all be true AT ONCE, and there is no XPlusY for which that is the case. So I have to do the following:
innerCode(X, Result) :-
XPlusY is X + Y,
somePredicate(X, XPlusY, Result).
...
foreach(
between(1,10,X),
innerCode(X, Result)
).
This, finally, works. (At least, I think so. I haven't tried this exact code, but this was the path I took earlier from "not working" to "working".) That's fine and all, except that it's exceptionally obnoxious. If I had a way of evaluating arithmetic operations in-line, I could halve the lines of code, make it more readable, and NOT create a one-use clutter predicate.
Question: Is there a way to evaluate arithmetic operations in-line, without declaring a new variable?
Failing that, it would be acceptable (and, in some cases, still useful for other things) if there were a way to restrict the scope of new variables. Suppose, for instance, you could define a block within the foreach, where the variables visible from the outside were marked, and any other variables in the block were considered new for that execution of the block. (I realize my terminology may be incorrect, but hopefully it gets the point across.) For example, something resembling:
foreach(
between(1,10,X),
(X, Result){
XPlusY is X + Y,
somePredicate(X, XPlusY, Result)
}
).
A possible solution might be if we can declare a lambda in-line, and immediately call it. Summed up:
Alternate question: Is there a way to limit the scope of new variables within a predicate, while retaining the ability to perform lasting unifications on one or more existing variables?
(The second half I added as clarification in response to an answer about forall.)
A solution to both questions is preferred, but a solution to either will suffice.
library(yall) allows you to define lambda expressions. For instance
?- foreach(between(1,3,X),call([Y]>>(Z is Y+1,writeln(Z)),X)).
2
3
4
true.
Alternatively, library(lambda) provides the construct:
?- [library(lambda)].
true.
?- foreach(between(1,3,X),call(\Y^(Z is Y+1,writeln(Z)),X)).
2
3
4
true.
In SWI-Prolog, library(yall) is autoloaded, while to get library(lambda) you should install the related pack:
?- pack_install(lambda).
Use in alternative the forall/2 de facto standard predicate:
forall(
between(1,10,X),
somePredicate(X,X+Y,Result)
).
While the foreach/2 predicate is usually implemented in the way you describe, the forall/2 predicate is defined as:
% forall(#callable, #callable)
forall(Generate, Test) :-
\+ (Generate, \+ Test).
Note that the use of negation implies that no bindings will be returned when a call to the predicate succeeds.
Update
Lambda libraries allow the specification of both lambda global (aka lambda free) and lambda local variables (aka lambda parameters). Using Logtalk lambdas syntax (available also in SWI-Prolog in library(yall), you can write (reusing Carlo's example) e.g.
?- G = 2, foreach(between(1,3,X),call({G}/[Y]>>(Z is Y+G,writeln(Z)),X)).
3
4
5
G = 2.
?- G = 4, foreach(between(1,3,X),call({G}/[Y]>>(Z is Y+G,writeln(Z)),X)).
5
6
7
G = 4.
Thus, it's possible to use lambdas to limit the scope of some variables within a goal without also limiting the scope of every unification in the goal.

How are functions curried?

I understand what the concept of currying is, and know how to use it. These are not my questions, rather I am curious as to how this is actually implemented at some lower level than, say, Haskell code.
For example, when (+) 2 4 is curried, is a pointer to the 2 maintained until the 4 is passed in? Does Gandalf bend space-time? What is this magic?
Short answer: yes a pointer is maintained to the 2 until the 4 is passed in.
Longer than necessary answer:
Conceptually, you're supposed to think about Haskell being defined in terms of the lambda calculus and term rewriting. Lets say you have the following definition:
f x y = x + y
This definition for f comes out in lambda calculus as something like the following, where I've explicitly put parentheses around the lambda bodies:
\x -> (\y -> (x + y))
If you're not familiar with the lambda calculus, this basically says "a function of an argument x that returns (a function of an argument y that returns (x + y))". In the lambda calculus, when we apply a function like this to some value, we can replace the application of the function by a copy of the body of the function with the value substituted for the function's parameter.
So then the expression f 1 2 is evaluated by the following sequence of rewrites:
(\x -> (\y -> (x + y))) 1 2
(\y -> (1 + y)) 2 # substituted 1 for x
(1 + 2) # substituted 2 for y
3
So you can see here that if we'd only supplied a single argument to f, we would have stopped at \y -> (1 + y). So we've got a whole term that is just a function for adding 1 to something, entirely separate from our original term, which may still be in use somewhere (for other references to f).
The key point is that if we implement functions like this, every function has only one argument but some return functions (and some return functions which return functions which return ...). Every time we apply a function we create a new term that "hard-codes" the first argument into the body of the function (including the bodies of any functions this one returns). This is how you get currying and closures.
Now, that's not how Haskell is directly implemented, obviously. Once upon a time, Haskell (or possibly one of its predecessors; I'm not exactly sure on the history) was implemented by Graph reduction. This is a technique for doing something equivalent to the term reduction I described above, that automatically brings along lazy evaluation and a fair amount of data sharing.
In graph reduction, everything is references to nodes in a graph. I won't go into too much detail, but when the evaluation engine reduces the application of a function to a value, it copies the sub-graph corresponding to the body of the function, with the necessary substitution of the argument value for the function's parameter (but shares references to graph nodes where they are unaffected by the substitution). So essentially, yes partially applying a function creates a new structure in memory that has a reference to the supplied argument (i.e. "a pointer to the 2), and your program can pass around references to that structure (and even share it and apply it multiple times), until more arguments are supplied and it can actually be reduced. However it's not like it's just remembering the function and accumulating arguments until it gets all of them; the evaluation engine actually does some of the work each time it's applied to a new argument. In fact the graph reduction engine can't even tell the difference between an application that returns a function and still needs more arguments, and one that has just got its last argument.
I can't tell you much more about the current implementation of Haskell. I believe it's a distant mutant descendant of graph reduction, with loads of clever short-cuts and go-faster stripes. But I might be wrong about that; maybe they've found a completely different execution strategy that isn't anything at all like graph reduction anymore. But I'm 90% sure it'll still end up passing around data structures that hold on to references to the partial arguments, and it probably still does something equivalent to factoring in the arguments partially, as it seems pretty essential to how lazy evaluation works. I'm also fairly sure it'll do lots of optimisations and short cuts, so if you straightforwardly call a function of 5 arguments like f 1 2 3 4 5 it won't go through all the hassle of copying the body of f 5 times with successively more "hard-coding".
Try it out with GHC:
ghc -C Test.hs
This will generate C code in Test.hc
I wrote the following function:
f = (+) 16777217
And GHC generated this:
R1.p[1] = (W_)Hp-4;
*R1.p = (W_)&stg_IND_STATIC_info;
Sp[-2] = (W_)&stg_upd_frame_info;
Sp[-1] = (W_)Hp-4;
R1.w = (W_)&integerzmgmp_GHCziInteger_smallInteger_closure;
Sp[-3] = 0x1000001U;
Sp=Sp-3;
JMP_((W_)&stg_ap_n_fast);
The thing to remember is that in Haskell, partially applying is not an unusual case. There's technically no "last argument" to any function. As you can see here, Haskell is jumping to stg_ap_n_fast which will expect an argument to be available in Sp.
The stg here stands for "Spineless Tagless G-Machine". There is a really good paper on it, by Simon Peyton-Jones. If you're curious about how the Haskell runtime is implemented, go read that first.

understanding referential transparency

Generally, I have a headache because something is wrong with my reasoning:
For 1 set of arguments, referential transparent function will always return 1 set of output values.
that means that such function could be represented as a truth table (a table where 1 set of output parameters is specified for 1 set of arguments).
that makes the logic behind such functions is combinational (as opposed to sequential)
that means that with pure functional language (that has only rt functions) it is possible to describe only combinational logic.
The last statement is derived from this reasoning, but it's obviously false; that means there is an error in reasoning. [question: where is error in this reasoning?]
UPD2. You, guys, are saying lots of interesting stuff, but not answering my question. I defined it more explicitly now. Sorry for messing up with question definition!
Question: where is error in this reasoning?
A referentially transparent function might require an infinite truth table to represent its behavior. You will be hard pressed to design an infinite circuit in combinatory logic.
Another error: the behavior of sequential logic can be represented purely functionally as a function from states to states. The fact that in the implementation these states occur sequentially in time does not prevent one from defining a purely referentially transparent function which describes how state evolves over time.
Edit: Although I apparently missed the bullseye on the actual question, I think my answer is pretty good, so I'm keeping it :-) (see below).
I guess a more concise way to phrase the question might be: can a purely functional language compute anything an imperative one can?
First of all, suppose you took an imperative language like C and made it so you can't alter variables after defining them. E.g.:
int i;
for (i = 0; // okay, that's one assignment
i < 10; // just looking, that's all
i++) // BUZZZ! Sorry, can't do that!
Well, there goes your for loop. Do we get to keep our while loop?
while (i < 10)
Sure, but it's not very useful. i can't change, so it's either going to run forever or not run at all.
How about recursion? Yes, you get to keep recursion, and it's still plenty useful:
int sum(int *items, unsigned int count)
{
if (count) {
// count the first item and sum the rest
return *items + sum(items + 1, count - 1);
} else {
// no items
return 0;
}
}
Now, with functions, we don't alter state, but variables can, well, vary. Once a variable passes into our function, it's locked in. However, we can call the function again (recursion), and it's like getting a brand new set of variables (the old ones stay the same). Although there are multiple instances of items and count, sum((int[]){1,2,3}, 3) will always evaluate to 6, so you can replace that expression with 6 if you like.
Can we still do anything we want? I'm not 100% sure, but I think the answer is "yes". You certainly can if you have closures, though.
You have it right. The idea is, once a variable is defined, it can't be redefined. A referentially transparent expression, given the same variables, always yields the same result value.
I recommend looking into Haskell, a purely functional language. Haskell doesn't have an "assignment" operator, strictly speaking. For instance:
my_sum numbers = ??? where
i = 0
total = 0
Here, you can't write a "for loop" that increments i and total as it goes along. All is not lost, though. Just use recursion to keep getting new is and totals:
my_sum numbers = f 0 0 where
f i total =
if i < length numbers
then f i' total'
else total
where
i' = i+1
total' = total + (numbers !! i)
(Note that this is a stupid way to sum a list in Haskell, but it demonstrates a method of coping with single assignment.)
Now, consider this highly imperative-looking code:
main = do
a <- readLn
b <- readLn
print (a + b)
It's actually syntactic sugar for:
main =
readLn >>= (\a ->
readLn >>= (\b ->
print (a + b)))
The idea is, instead of main being a function consisting of a list of statements, main is an IO action that Haskell executes, and actions are defined and chained together with bind operations. Also, an action that does nothing, yielding an arbitrary value, can be defined with the return function.
Note that bind and return aren't specific to actions. They can be used with any type that calls itself a Monad to do all sorts of funky things.
To clarify, consider readLn. readLn is an action that, if executed, would read a line from standard input and yield its parsed value. To do something with that value, we can't store it in a variable because that would violate referential transparency:
a = readLn
If this were allowed, a's value would depend on the world and would be different every time we called readLn, meaning readLn wouldn't be referentially transparent.
Instead, we bind the readLn action to a function that deals with the action, yielding a new action, like so:
readLn >>= (\x -> print (x + 1))
The result of this expression is an action value. If Haskell got off the couch and performed this action, it would read an integer, increment it, and print it. By binding the result of an action to a function that does something with the result, we get to keep referential transparency while playing around in the world of state.
As far as I understand it, referential transparency just means: A given function will always yield the same result when invoked with the same arguments. So, the mathematical functions you learned about in school are referentially transparent.
A language you could check out in order to learn how things are done in a purely functional language would be Haskell. There are ways to use "updateable storage possibilities" like the Reader Monad, and the State Monad for example. If you're interested in purely functional data structures, Okasaki might be a good read.
And yes, you're right: Order of evaluation in a purely functional language like haskell does not matter as in non-functional languages, because if there are no side effects, there is no reason to do someting before/after something else -- unless the input of one depends on the output of the other, or means like monads come into play.
I don't really know about the truth-table question.
Here's my stab at answering the question:
Any system can be described as a combinatorial function, large or small.
There's nothing wrong with the reasoning that pure functions can only deal with combinatorial logic -- it's true, just that functional languages hide that from you to some extent or another.
You could even describe, say, the workings of a game engine as a truth table or a combinatorial function.
You might have a deterministic function that takes in "the current state of the entire game" as the RAM occupied by the game engine and the keyboard input, and returns "the state of the game one frame later". The return value would be determined by the combinations of the bits in the input.
Of course, in any meaningful and sane function, the input is parsed down to blocks of integers, decimals and booleans, but the combinations of the bits in those values is still determining the output of your function.
Keep in mind also that basic digital logic can be described in truth tables. The only reason that that's not done for anything more than, say, arithmetic on 4-bit integers, is because the size of the truth table grows exponentially.
The error in Your reasoning is the following:
"that means that such function could be represented as a truth table".
You conclude that from a functional language's property of referential transparency. So far the conclusion would sound plausible, but You oversee that a function is able to accept collections as input and process them in contrast to the fixed inputs of a logic gate.
Therefore a function does not equal a logic gate but rather a construction plan of such a logic gate depending on the actual (at runtime determined) input!
To comment on Your comment: Functional languages can - although stateless - implement a state machine by constructing the states from scratch each time they are being accessed.

What is recursion and when should I use it?

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
One of the topics that seems to come up regularly on mailing lists and online discussions is the merits (or lack thereof) of doing a Computer Science Degree. An argument that seems to come up time and again for the negative party is that they have been coding for some number of years and they have never used recursion.
So the question is:
What is recursion?
When would I use recursion?
Why don't people use recursion?
There are a number of good explanations of recursion in this thread, this answer is about why you shouldn't use it in most languages.* In the majority of major imperative language implementations (i.e. every major implementation of C, C++, Basic, Python, Ruby,Java, and C#) iteration is vastly preferable to recursion.
To see why, walk through the steps that the above languages use to call a function:
space is carved out on the stack for the function's arguments and local variables
the function's arguments are copied into this new space
control jumps to the function
the function's code runs
the function's result is copied into a return value
the stack is rewound to its previous position
control jumps back to where the function was called
Doing all of these steps takes time, usually a little bit more than it takes to iterate through a loop. However, the real problem is in step #1. When many programs start, they allocate a single chunk of memory for their stack, and when they run out of that memory (often, but not always due to recursion), the program crashes due to a stack overflow.
So in these languages recursion is slower and it makes you vulnerable to crashing. There are still some arguments for using it though. In general, code written recursively is shorter and a bit more elegant, once you know how to read it.
There is a technique that language implementers can use called tail call optimization which can eliminate some classes of stack overflow. Put succinctly: if a function's return expression is simply the result of a function call, then you don't need to add a new level onto the stack, you can reuse the current one for the function being called. Regrettably, few imperative language-implementations have tail-call optimization built in.
* I love recursion. My favorite static language doesn't use loops at all, recursion is the only way to do something repeatedly. I just don't think that recursion is generally a good idea in languages that aren't tuned for it.
** By the way Mario, the typical name for your ArrangeString function is "join", and I'd be surprised if your language of choice doesn't already have an implementation of it.
Simple english example of recursion.
A child couldn't sleep, so her mother told her a story about a little frog,
who couldn't sleep, so the frog's mother told her a story about a little bear,
who couldn't sleep, so the bear's mother told her a story about a little weasel...
who fell asleep.
...and the little bear fell asleep;
...and the little frog fell asleep;
...and the child fell asleep.
In the most basic computer science sense, recursion is a function that calls itself. Say you have a linked list structure:
struct Node {
Node* next;
};
And you want to find out how long a linked list is you can do this with recursion:
int length(const Node* list) {
if (!list->next) {
return 1;
} else {
return 1 + length(list->next);
}
}
(This could of course be done with a for loop as well, but is useful as an illustration of the concept)
Whenever a function calls itself, creating a loop, then that's recursion. As with anything there are good uses and bad uses for recursion.
The most simple example is tail recursion where the very last line of the function is a call to itself:
int FloorByTen(int num)
{
if (num % 10 == 0)
return num;
else
return FloorByTen(num-1);
}
However, this is a lame, almost pointless example because it can easily be replaced by more efficient iteration. After all, recursion suffers from function call overhead, which in the example above could be substantial compared to the operation inside the function itself.
So the whole reason to do recursion rather than iteration should be to take advantage of the call stack to do some clever stuff. For example, if you call a function multiple times with different parameters inside the same loop then that's a way to accomplish branching. A classic example is the Sierpinski triangle.
You can draw one of those very simply with recursion, where the call stack branches in 3 directions:
private void BuildVertices(double x, double y, double len)
{
if (len > 0.002)
{
mesh.Positions.Add(new Point3D(x, y + len, -len));
mesh.Positions.Add(new Point3D(x - len, y - len, -len));
mesh.Positions.Add(new Point3D(x + len, y - len, -len));
len *= 0.5;
BuildVertices(x, y + len, len);
BuildVertices(x - len, y - len, len);
BuildVertices(x + len, y - len, len);
}
}
If you attempt to do the same thing with iteration I think you'll find it takes a lot more code to accomplish.
Other common use cases might include traversing hierarchies, e.g. website crawlers, directory comparisons, etc.
Conclusion
In practical terms, recursion makes the most sense whenever you need iterative branching.
Recursion is a method of solving problems based on the divide and conquer mentality.
The basic idea is that you take the original problem and divide it into smaller (more easily solved) instances of itself, solve those smaller instances (usually by using the same algorithm again) and then reassemble them into the final solution.
The canonical example is a routine to generate the Factorial of n. The Factorial of n is calculated by multiplying all of the numbers between 1 and n. An iterative solution in C# looks like this:
public int Fact(int n)
{
int fact = 1;
for( int i = 2; i <= n; i++)
{
fact = fact * i;
}
return fact;
}
There's nothing surprising about the iterative solution and it should make sense to anyone familiar with C#.
The recursive solution is found by recognising that the nth Factorial is n * Fact(n-1). Or to put it another way, if you know what a particular Factorial number is you can calculate the next one. Here is the recursive solution in C#:
public int FactRec(int n)
{
if( n < 2 )
{
return 1;
}
return n * FactRec( n - 1 );
}
The first part of this function is known as a Base Case (or sometimes Guard Clause) and is what prevents the algorithm from running forever. It just returns the value 1 whenever the function is called with a value of 1 or less. The second part is more interesting and is known as the Recursive Step. Here we call the same method with a slightly modified parameter (we decrement it by 1) and then multiply the result with our copy of n.
When first encountered this can be kind of confusing so it's instructive to examine how it works when run. Imagine that we call FactRec(5). We enter the routine, are not picked up by the base case and so we end up like this:
// In FactRec(5)
return 5 * FactRec( 5 - 1 );
// which is
return 5 * FactRec(4);
If we re-enter the method with the parameter 4 we are again not stopped by the guard clause and so we end up at:
// In FactRec(4)
return 4 * FactRec(3);
If we substitute this return value into the return value above we get
// In FactRec(5)
return 5 * (4 * FactRec(3));
This should give you a clue as to how the final solution is arrived at so we'll fast track and show each step on the way down:
return 5 * (4 * FactRec(3));
return 5 * (4 * (3 * FactRec(2)));
return 5 * (4 * (3 * (2 * FactRec(1))));
return 5 * (4 * (3 * (2 * (1))));
That final substitution happens when the base case is triggered. At this point we have a simple algrebraic formula to solve which equates directly to the definition of Factorials in the first place.
It's instructive to note that every call into the method results in either a base case being triggered or a call to the same method where the parameters are closer to a base case (often called a recursive call). If this is not the case then the method will run forever.
Recursion is solving a problem with a function that calls itself. A good example of this is a factorial function. Factorial is a math problem where factorial of 5, for example, is 5 * 4 * 3 * 2 * 1. This function solves this in C# for positive integers (not tested - there may be a bug).
public int Factorial(int n)
{
if (n <= 1)
return 1;
return n * Factorial(n - 1);
}
Recursion refers to a method which solves a problem by solving a smaller version of the problem and then using that result plus some other computation to formulate the answer to the original problem. Often times, in the process of solving the smaller version, the method will solve a yet smaller version of the problem, and so on, until it reaches a "base case" which is trivial to solve.
For instance, to calculate a factorial for the number X, one can represent it as X times the factorial of X-1. Thus, the method "recurses" to find the factorial of X-1, and then multiplies whatever it got by X to give a final answer. Of course, to find the factorial of X-1, it'll first calculate the factorial of X-2, and so on. The base case would be when X is 0 or 1, in which case it knows to return 1 since 0! = 1! = 1.
Consider an old, well known problem:
In mathematics, the greatest common divisor (gcd) … of two or more non-zero integers, is the largest positive integer that divides the numbers without a remainder.
The definition of gcd is surprisingly simple:
where mod is the modulo operator (that is, the remainder after integer division).
In English, this definition says the greatest common divisor of any number and zero is that number, and the greatest common divisor of two numbers m and n is the greatest common divisor of n and the remainder after dividing m by n.
If you'd like to know why this works, see the Wikipedia article on the Euclidean algorithm.
Let's compute gcd(10, 8) as an example. Each step is equal to the one just before it:
gcd(10, 8)
gcd(10, 10 mod 8)
gcd(8, 2)
gcd(8, 8 mod 2)
gcd(2, 0)
2
In the first step, 8 does not equal zero, so the second part of the definition applies. 10 mod 8 = 2 because 8 goes into 10 once with a remainder of 2. At step 3, the second part applies again, but this time 8 mod 2 = 0 because 2 divides 8 with no remainder. At step 5, the second argument is 0, so the answer is 2.
Did you notice that gcd appears on both the left and right sides of the equals sign? A mathematician would say this definition is recursive because the expression you're defining recurs inside its definition.
Recursive definitions tend to be elegant. For example, a recursive definition for the sum of a list is
sum l =
if empty(l)
return 0
else
return head(l) + sum(tail(l))
where head is the first element in a list and tail is the rest of the list. Note that sum recurs inside its definition at the end.
Maybe you'd prefer the maximum value in a list instead:
max l =
if empty(l)
error
elsif length(l) = 1
return head(l)
else
tailmax = max(tail(l))
if head(l) > tailmax
return head(l)
else
return tailmax
You might define multiplication of non-negative integers recursively to turn it into a series of additions:
a * b =
if b = 0
return 0
else
return a + (a * (b - 1))
If that bit about transforming multiplication into a series of additions doesn't make sense, try expanding a few simple examples to see how it works.
Merge sort has a lovely recursive definition:
sort(l) =
if empty(l) or length(l) = 1
return l
else
(left,right) = split l
return merge(sort(left), sort(right))
Recursive definitions are all around if you know what to look for. Notice how all of these definitions have very simple base cases, e.g., gcd(m, 0) = m. The recursive cases whittle away at the problem to get down to the easy answers.
With this understanding, you can now appreciate the other algorithms in Wikipedia's article on recursion!
A function that calls itself
When a function can be (easily) decomposed into a simple operation plus the same function on some smaller portion of the problem. I should say, rather, that this makes it a good candidate for recursion.
They do!
The canonical example is the factorial which looks like:
int fact(int a)
{
if(a==1)
return 1;
return a*fact(a-1);
}
In general, recursion isn't necessarily fast (function call overhead tends to be high because recursive functions tend to be small, see above) and can suffer from some problems (stack overflow anyone?). Some say they tend to be hard to get 'right' in non-trivial cases but I don't really buy into that. In some situations, recursion makes the most sense and is the most elegant and clear way to write a particular function. It should be noted that some languages favor recursive solutions and optimize them much more (LISP comes to mind).
A recursive function is one which calls itself. The most common reason I've found to use it is traversing a tree structure. For example, if I have a TreeView with checkboxes (think installation of a new program, "choose features to install" page), I might want a "check all" button which would be something like this (pseudocode):
function cmdCheckAllClick {
checkRecursively(TreeView1.RootNode);
}
function checkRecursively(Node n) {
n.Checked = True;
foreach ( n.Children as child ) {
checkRecursively(child);
}
}
So you can see that the checkRecursively first checks the node which it is passed, then calls itself for each of that node's children.
You do need to be a bit careful with recursion. If you get into an infinite recursive loop, you will get a Stack Overflow exception :)
I can't think of a reason why people shouldn't use it, when appropriate. It is useful in some circumstances, and not in others.
I think that because it's an interesting technique, some coders perhaps end up using it more often than they should, without real justification. This has given recursion a bad name in some circles.
Recursion is an expression directly or indirectly referencing itself.
Consider recursive acronyms as a simple example:
GNU stands for GNU's Not Unix
PHP stands for PHP: Hypertext Preprocessor
YAML stands for YAML Ain't Markup Language
WINE stands for Wine Is Not an Emulator
VISA stands for Visa International Service Association
More examples on Wikipedia
Recursion works best with what I like to call "fractal problems", where you're dealing with a big thing that's made of smaller versions of that big thing, each of which is an even smaller version of the big thing, and so on. If you ever have to traverse or search through something like a tree or nested identical structures, you've got a problem that might be a good candidate for recursion.
People avoid recursion for a number of reasons:
Most people (myself included) cut their programming teeth on procedural or object-oriented programming as opposed to functional programming. To such people, the iterative approach (typically using loops) feels more natural.
Those of us who cut our programming teeth on procedural or object-oriented programming have often been told to avoid recursion because it's error prone.
We're often told that recursion is slow. Calling and returning from a routine repeatedly involves a lot of stack pushing and popping, which is slower than looping. I think some languages handle this better than others, and those languages are most likely not those where the dominant paradigm is procedural or object-oriented.
For at least a couple of programming languages I've used, I remember hearing recommendations not to use recursion if it gets beyond a certain depth because its stack isn't that deep.
A recursive statement is one in which you define the process of what to do next as a combination of the inputs and what you have already done.
For example, take factorial:
factorial(6) = 6*5*4*3*2*1
But it's easy to see factorial(6) also is:
6 * factorial(5) = 6*(5*4*3*2*1).
So generally:
factorial(n) = n*factorial(n-1)
Of course, the tricky thing about recursion is that if you want to define things in terms of what you have already done, there needs to be some place to start.
In this example, we just make a special case by defining factorial(1) = 1.
Now we see it from the bottom up:
factorial(6) = 6*factorial(5)
= 6*5*factorial(4)
= 6*5*4*factorial(3) = 6*5*4*3*factorial(2) = 6*5*4*3*2*factorial(1) = 6*5*4*3*2*1
Since we defined factorial(1) = 1, we reach the "bottom".
Generally speaking, recursive procedures have two parts:
1) The recursive part, which defines some procedure in terms of new inputs combined with what you've "already done" via the same procedure. (i.e. factorial(n) = n*factorial(n-1))
2) A base part, which makes sure that the process doesn't repeat forever by giving it some place to start (i.e. factorial(1) = 1)
It can be a bit confusing to get your head around at first, but just look at a bunch of examples and it should all come together. If you want a much deeper understanding of the concept, study mathematical induction. Also, be aware that some languages optimize for recursive calls while others do not. It's pretty easy to make insanely slow recursive functions if you're not careful, but there are also techniques to make them performant in most cases.
Hope this helps...
I like this definition:
In recursion, a routine solves a small part of a problem itself, divides the problem into smaller pieces, and then calls itself to solve each of the smaller pieces.
I also like Steve McConnells discussion of recursion in Code Complete where he criticises the examples used in Computer Science books on Recursion.
Don't use recursion for factorials or Fibonacci numbers
One problem with
computer-science textbooks is that
they present silly examples of
recursion. The typical examples are
computing a factorial or computing a
Fibonacci sequence. Recursion is a
powerful tool, and it's really dumb to
use it in either of those cases. If a
programmer who worked for me used
recursion to compute a factorial, I'd
hire someone else.
I thought this was a very interesting point to raise and may be a reason why recursion is often misunderstood.
EDIT:
This was not a dig at Dav's answer - I had not seen that reply when I posted this
1.)
A method is recursive if it can call itself; either directly:
void f() {
... f() ...
}
or indirectly:
void f() {
... g() ...
}
void g() {
... f() ...
}
2.) When to use recursion
Q: Does using recursion usually make your code faster?
A: No.
Q: Does using recursion usually use less memory?
A: No.
Q: Then why use recursion?
A: It sometimes makes your code much simpler!
3.) People use recursion only when it is very complex to write iterative code. For example, tree traversal techniques like preorder, postorder can be made both iterative and recursive. But usually we use recursive because of its simplicity.
Here's a simple example: how many elements in a set. (there are better ways to count things, but this is a nice simple recursive example.)
First, we need two rules:
if the set is empty, the count of items in the set is zero (duh!).
if the set is not empty, the count is one plus the number of items in the set after one item is removed.
Suppose you have a set like this: [x x x]. let's count how many items there are.
the set is [x x x] which is not empty, so we apply rule 2. the number of items is one plus the number of items in [x x] (i.e. we removed an item).
the set is [x x], so we apply rule 2 again: one + number of items in [x].
the set is [x], which still matches rule 2: one + number of items in [].
Now the set is [], which matches rule 1: the count is zero!
Now that we know the answer in step 4 (0), we can solve step 3 (1 + 0)
Likewise, now that we know the answer in step 3 (1), we can solve step 2 (1 + 1)
And finally now that we know the answer in step 2 (2), we can solve step 1 (1 + 2) and get the count of items in [x x x], which is 3. Hooray!
We can represent this as:
count of [x x x] = 1 + count of [x x]
= 1 + (1 + count of [x])
= 1 + (1 + (1 + count of []))
= 1 + (1 + (1 + 0)))
= 1 + (1 + (1))
= 1 + (2)
= 3
When applying a recursive solution, you usually have at least 2 rules:
the basis, the simple case which states what happens when you have "used up" all of your data. This is usually some variation of "if you are out of data to process, your answer is X"
the recursive rule, which states what happens if you still have data. This is usually some kind of rule that says "do something to make your data set smaller, and reapply your rules to the smaller data set."
If we translate the above to pseudocode, we get:
numberOfItems(set)
if set is empty
return 0
else
remove 1 item from set
return 1 + numberOfItems(set)
There's a lot more useful examples (traversing a tree, for example) which I'm sure other people will cover.
Well, that's a pretty decent definition you have. And wikipedia has a good definition too. So I'll add another (probably worse) definition for you.
When people refer to "recursion", they're usually talking about a function they've written which calls itself repeatedly until it is done with its work. Recursion can be helpful when traversing hierarchies in data structures.
An example: A recursive definition of a staircase is:
A staircase consists of:
- a single step and a staircase (recursion)
- or only a single step (termination)
To recurse on a solved problem: do nothing, you're done.
To recurse on an open problem: do the next step, then recurse on the rest.
In plain English:
Assume you can do 3 things:
Take one apple
Write down tally marks
Count tally marks
You have a lot of apples in front of you on a table and you want to know how many apples there are.
start
Is the table empty?
yes: Count the tally marks and cheer like it's your birthday!
no: Take 1 apple and put it aside
Write down a tally mark
goto start
The process of repeating the same thing till you are done is called recursion.
I hope this is the "plain english" answer you are looking for!
A recursive function is a function that contains a call to itself. A recursive struct is a struct that contains an instance of itself. You can combine the two as a recursive class. The key part of a recursive item is that it contains an instance/call of itself.
Consider two mirrors facing each other. We've seen the neat infinity effect they make. Each reflection is an instance of a mirror, which is contained within another instance of a mirror, etc. The mirror containing a reflection of itself is recursion.
A binary search tree is a good programming example of recursion. The structure is recursive with each Node containing 2 instances of a Node. Functions to work on a binary search tree are also recursive.
This is an old question, but I want to add an answer from logistical point of view (i.e not from algorithm correctness point of view or performance point of view).
I use Java for work, and Java doesn't support nested function. As such, if I want to do recursion, I might have to define an external function (which exists only because my code bumps against Java's bureaucratic rule), or I might have to refactor the code altogether (which I really hate to do).
Thus, I often avoid recursion, and use stack operation instead, because recursion itself is essentially a stack operation.
You want to use it anytime you have a tree structure. It is very useful in reading XML.
Recursion as it applies to programming is basically calling a function from inside its own definition (inside itself), with different parameters so as to accomplish a task.
"If I have a hammer, make everything look like a nail."
Recursion is a problem-solving strategy for huge problems, where at every step just, "turn 2 small things into one bigger thing," each time with the same hammer.
Example
Suppose your desk is covered with a disorganized mess of 1024 papers. How do you make one neat, clean stack of papers from the mess, using recursion?
Divide: Spread all the sheets out, so you have just one sheet in each "stack".
Conquer:
Go around, putting each sheet on top of one other sheet. You now have stacks of 2.
Go around, putting each 2-stack on top of another 2-stack. You now have stacks of 4.
Go around, putting each 4-stack on top of another 4-stack. You now have stacks of 8.
... on and on ...
You now have one huge stack of 1024 sheets!
Notice that this is pretty intuitive, aside from counting everything (which isn't strictly necessary). You might not go all the way down to 1-sheet stacks, in reality, but you could and it would still work. The important part is the hammer: With your arms, you can always put one stack on top of the other to make a bigger stack, and it doesn't matter (within reason) how big either stack is.
Recursion is the process where a method call iself to be able to perform a certain task. It reduces redundency of code. Most recurssive functions or methods must have a condifiton to break the recussive call i.e. stop it from calling itself if a condition is met - this prevents the creating of an infinite loop. Not all functions are suited to be used recursively.
hey, sorry if my opinion agrees with someone, I'm just trying to explain recursion in plain english.
suppose you have three managers - Jack, John and Morgan.
Jack manages 2 programmers, John - 3, and Morgan - 5.
you are going to give every manager 300$ and want to know what would it cost.
The answer is obvious - but what if 2 of Morgan-s employees are also managers?
HERE comes the recursion.
you start from the top of the hierarchy. the summery cost is 0$.
you start with Jack,
Then check if he has any managers as employees. if you find any of them are, check if they have any managers as employees and so on. Add 300$ to the summery cost every time you find a manager.
when you are finished with Jack, go to John, his employees and then to Morgan.
You'll never know, how much cycles will you go before getting an answer, though you know how many managers you have and how many Budget can you spend.
Recursion is a tree, with branches and leaves, called parents and children respectively.
When you use a recursion algorithm, you more or less consciously are building a tree from the data.
In plain English, recursion means to repeat someting again and again.
In programming one example is of calling the function within itself .
Look on the following example of calculating factorial of a number:
public int fact(int n)
{
if (n==0) return 1;
else return n*fact(n-1)
}
Any algorithm exhibits structural recursion on a datatype if basically consists of a switch-statement with a case for each case of the datatype.
for example, when you are working on a type
tree = null
| leaf(value:integer)
| node(left: tree, right:tree)
a structural recursive algorithm would have the form
function computeSomething(x : tree) =
if x is null: base case
if x is leaf: do something with x.value
if x is node: do something with x.left,
do something with x.right,
combine the results
this is really the most obvious way to write any algorith that works on a data structure.
now, when you look at the integers (well, the natural numbers) as defined using the Peano axioms
integer = 0 | succ(integer)
you see that a structural recursive algorithm on integers looks like this
function computeSomething(x : integer) =
if x is 0 : base case
if x is succ(prev) : do something with prev
the too-well-known factorial function is about the most trivial example of
this form.
function call itself or use its own definition.

Resources