Prolog Accumulators. Are they really a "different" concept? - recursion

I am learning Prolog under my Artificial Intelligence Lab, from the source Learn Prolog Now!.
In the 5th Chapter we come to learn about Accumulators. And as an example, these two code snippets are given.
To Find the Length of a List
without accumulators:
len([],0).
len([_|T],N) :- len(T,X), N is X+1.
with accumulators:
accLen([_|T],A,L) :- Anew is A+1, accLen(T,Anew,L).
accLen([],A,A).
I am unable to understand, how the two snippets are conceptually different? What exactly an accumulator is doing different? And what are the benefits?
Accumulators sound like intermediate variables. (Correct me if I am wrong.) And I had already used them in my programs up till now, so is it really that big a concept?

When you give something a name, it suddenly becomes more real than it used to be. Discussing something can now be done by simply using the name of the concept. Without getting any more philosophical, no, there is nothing special about accumulators, but they are useful.
In practice, going through a list without an accumulator:
foo([]).
foo([H|T]) :-
foo(T).
The head of the list is left behind, and cannot be accessed by the recursive call. At each level of recursion you only see what is left of the list.
Using an accumulator:
bar([], _Acc).
bar([H|T], Acc) :-
bar(T, [H|Acc]).
At every recursive step, you have the remaining list and all the elements you have gone through. In your len/3 example, you only keep the count, not the actual elements, as this is all you need.
Some predicates (like len/3) can be made tail-recursive with accumulators: you don't need to wait for the end of your input (exhaust all elements of the list) to do the actual work, instead doing it incrementally as you get the input. Prolog doesn't have to leave values on the stack and can do tail-call optimization for you.
Search algorithms that need to know the "path so far" (or any algorithm that needs to have a state) use a more general form of the same technique, by providing an "intermediate result" to the recursive call. A run-length encoder, for example, could be defined as:
rle([], []).
rle([First|Rest],Encoded):-
rle_1(Rest, First, 1, Encoded).
rle_1([], Last, N, [Last-N]).
rle_1([H|T], Prev, N, Encoded) :-
( dif(H, Prev)
-> Encoded = [Prev-N|Rest],
rle_1(T, H, 1, Rest)
; succ(N, N1),
rle_1(T, H, N1, Encoded)
).
Hope that helps.

TL;DR: yes, they are.
Imagine you are to go from a city A on the left to a city B on the right, and you want to know the distance between the two in advance. How are you to achieve this?
A mathematician in such a position employs magic known as structural recursion. He says to himself, what if I'll send my own copy one step closer towards the city B, and ask it of its distance to the city? I will then add 1 to its result, after receiving it from my copy, since I have sent it one step closer towards the city, and will know my answer without having moved an inch! Of course if I am already at the city gates, I won't send any copies of me anywhere since I'll know that the distance is 0 - without having moved an inch!
And how do I know that my copy-of-me will succeed? Simply because he will follow the same exact rules, while starting from a point closer to our destination. Whatever value my answer will be, his will be one less, and only a finite number of copies of us will be called into action - because the distance between the cities is finite. So the total operation is certain to complete in a finite amount of time and I will get my answer. Because getting your answer after an infinite time has passed, is not getting it at all - ever.
And now, having found out his answer in advance, our cautious magician mathematician is ready to embark on his safe (now!) journey.
But that of course wasn't magic at all - it's all being a dirty trick! He didn't find out the answer in advance out of thin air - he has sent out the whole stack of others to find it for him. The grueling work had to be done after all, he just pretended not to be aware of it. The distance was traveled. Moreover, the distance back had to be traveled too, for each copy to tell their result to their master, the result being actually created on the way back from the destination. All this before our fake magician had ever started walking himself. How's that for a team effort. For him it could seem like a sweet deal. But overall...
So that's how the magician mathematician thinks. But his dual the brave traveler just goes on a journey, and counts his steps along the way, adding 1 to the current steps counter on each step, before the rest of his actual journey. There's no pretense anymore. The journey may be finite, or it may be infinite - he has no way of knowing upfront. But at each point along his route, and hence when ⁄ if he arrives at the city B gates too, he will know his distance traveled so far. And he certainly won't have to go back all the way to the beginning of the road to tell himself the result.
And that's the difference between the structural recursion of the first, and tail recursion with accumulator ⁄ tail recursion modulo cons ⁄ corecursion employed by the second. The knowledge of the first is built on the way back from the goal; of the second - on the way forth from the starting point, towards the goal. The journey is the destination.
see also:
Technical Report TR19: Unwinding Structured Recursions into Iterations. Daniel P. Friedman and David S. Wise (Dec 1974).
What are the practical implications of all this, you ask? Why, imagine our friend the magician mathematician needs to boil some eggs. He has a pot; a faucet; a hot plate; and some eggs. What is he to do?
Well, it's easy - he'll just put eggs into the pot, add some water from the faucet into it and will put it on the hot plate.
And what if he's already given a pot with eggs and water in it? Why, it's even easier to him - he'll just take the eggs out, pour out the water, and will end up with the problem he already knows how to solve! Pure magic, isn't it!
Before we laugh at the poor chap, we mustn't forget the tale of the centipede. Sometimes ignorance is bliss. But when the required knowledge is simple and "one-dimensional" like the distance here, it'd be a crime to pretend to have no memory at all.

accumulators are intermediate variables, and are an important (read basic) topic in Prolog because allow reversing the information flow of some fundamental algorithm, with important consequences for the efficiency of the program.
Take reversing a list as example
nrev([],[]).
nrev([H|T], R) :- nrev(T, S), append(S, [H], R).
rev(L, R) :- rev(L, [], R).
rev([], R, R).
rev([H|T], C, R) :- rev(T, [H|C], R).
nrev/2 (naive reverse) it's O(N^2), where N is list length, while rev/2 it's O(N).

Related

Why after pressing semicolon program is back in deep recursion?

I'm trying to understand the semicolon functionality.
I have this code:
del(X,[X|Rest],Rest).
del(X,[Y|Tail],[Y|Rest]) :-
del(X,Tail,Rest).
permutation([],[]).
permutation(L,[X|P]) :- del(X,L,L1), permutation(L1,P).
It's the simple predicate to show all permutations of given list.
I used the built-in graphical debugger in SWI-Prolog because I wanted to understand how it works and I understand for the first case which returns the list given in argument. Here is the diagram which I made for better understanding.
But I don't get it for the another solution. When I press the semicolon it doesn't start in the place where it ended instead it's starting with some deep recursion where L=[] (like in step 9). I don't get it, didn't the recursion end earlier? It had to go out of the recursions to return the answer and after semicolon it's again deep in recursion.
Could someone clarify that to me? Thanks in advance.
One analogy that I find useful in demystifying Prolog is that Backtracking is like Nested Loops, and when the innermost loop's variables' values are all found, the looping is suspended, the vars' values are reported, and then the looping is resumed.
As an example, let's write down simple generate-and-test program to find all pairs of natural numbers above 0 that sum up to a prime number. Let's assume is_prime/1 is already given to us.
We write this in Prolog as
above(0, N), between(1, N, M), Sum is M+N, is_prime(Sum).
We write this in an imperative pseudocode as
for N from 1 step 1:
for M from 1 step 1 until N:
Sum := M+N
if is_prime(Sum):
report_to_user_and_ask(Sum)
Now when report_to_user_and_ask is called, it prints Sum out and asks the user whether to abort or to continue. The loops are not exited, on the contrary, they are just suspended. Thus all the loop variables values that got us this far -- and there may be more tests up the loops chain that sometimes succeed and sometimes fail -- are preserved, i.e. the computation state is preserved, and the computation is ready to be resumed from that point, if the user presses ;.
I first saw this in Peter Norvig's AI book's implementation of Prolog in Common Lisp. He used mapping (Common Lisp's mapcan which is concatMap in Haskell or flatMap in many other languages) as a looping construct though, and it took me years to see that nested loops is what it is really all about.
Goals conjunction is expressed as the nesting of the loops; goals disjunction is expressed as the alternatives to loop through.
Further twist is that the nested loops' structure isn't fixed from the outset. It is fluid, the nested loops of a given loop can be created depending on the current state of that loop, i.e. depending on the current alternative being explored there; the loops are written as we go. In (most of the) languages where such dynamic creation of nested loops is impossible, it can be encoded with nested recursion / function invocation / inside the loops. (Here's one example, with some pseudocode.)
If we keep all such loops (created for each of the alternatives) in memory even after they are finished with, what we get is the AND-OR tree (mentioned in the other answer) thus being created while the search space is being explored and the solutions are found.
(non-coincidentally this fluidity is also the essence of "monad"; nondeterminism is modeled by the list monad; and the essential operation of the list monad is the flatMap operation which we saw above. With fluid structure of loops it is "Monad"; with fixed structure it is "Applicative Functor"; simple loops with no structure (no nesting at all): simply "Functor" (the concepts used in Haskell and the like). Also helps to demystify those.)
So, the proper slogan could be Backtracking is like Nested Loops, either fixed, known from the outset, or dynamically-created as we go. It's a bit longer though. :)
Here's also a Prolog example, which "as if creates the code to be run first (N nested loops for a given value of N), and then runs it." (There's even a whole dedicated tag for it on SO, too, it turns out, recursive-backtracking.)
And here's one in Scheme ("creates nested loops with the solution being accessible in the innermost loop's body"), and a C++ example ("create n nested loops at run-time, in effect enumerating the binary encoding of 2n, and print the sums out from the innermost loop").
There is a big difference between recursion in functional/imperative programming languages and Prolog (and it really became clear to me only in the last 2 weeks or so):
In functional/imperative programming, you recurse down a call chain, then come back up, unwinding the stack, then output the result. It's over.
In Prolog, you recurse down an AND-OR tree (really, alternating AND and OR nodes), selecting a predicate to call on an OR node (the "choicepoint"), from left to right, and calling every predicate in turn on an AND node, also from left to right. An acceptable tree has exactly one predicate returning TRUE under each OR node, and all predicates returning TRUE under each AND node. Once an acceptable tree has been constructed, by the very search procedure, we are (i.e. the "search cursor" is) on a rightmost bottommost node .
Success in constructing an acceptable tree also means a solution to the query entered at the Prolog Toplevel (the REPL) has been found: The variable values are output, but the tree is kept (unless there are no choicepoints).
And this is also important: all variables are global in the sense that if a variable X as been passed all the way down the call chain from predicate to predicate to the rightmost bottommost node, then constrained at the last possible moment by unifying it with 2 for example, X = 2, then the Prolog Toplevel is aware of that without further ado: nothing needs to be passed up the call chain.
If you now press ;, search doesn't restart at the top of the tree, but at the bottom, i.e. at the current cursor position: the nearest parent OR node is asked for more solutions. This may result in much search until a new acceptable tree has been constructed, we are at a new rightmost bottommost node. The new variable values are output and you may again enter ;.
This process cycles until no acceptable tree can be constructed any longer, upon which false is output.
Note that having this AND-OR as an inspectable and modifiable data structure at runtime allows some magical tricks to be deployed.
There is bound to be a lot of power in debugging tools which record this tree to help the user who gets the dreaded sphynxian false from a Prolog program that is supposed to work. There are now Time Traveling Debuggers for functional and imperative languages, after all...

Could I ask for physical analogies or metaphors for recursion?

I am suddenly in a recursive language class (sml) and recursion is not yet physically sensible for me. I'm thinking about the way a floor of square tiles is sometimes a model or metaphor for integer multiplication, or Cuisenaire Rods are a model or analogue for addition and subtraction. Does anyone have any such models you could share?
Imagine you're a real life magician, and can make a copy of yourself. You create your double a step closer to the goal and give him (or her) the same orders as you were given.
Your double does the same to his copy. He's a magician too, you see.
When the final copy finds itself created at the goal, it has nowhere more to go, so it reports back to its creator. Which does the same.
Eventually, you get your answer back – without having moved an inch – and can now create the final result from it, easily. You get to pretend not knowing about all those doubles doing the actual hard work for you. "Hmm," you're saying to yourself, "what if I were one step closer to the goal and already knew the result? Wouldn't it be easy to find the final answer then ?" (*)
Of course, if you were a double, you'd have to report your findings to your creator.
More here.
(also, I think I saw this "doubles" creation chain event here, though I'm not entirely sure).
(*) and that is the essence of the recursion method of problem solving.
How do I know my procedure is right? If my simple little combination step produces a valid solution, under assumption it produced the correct solution for the smaller case, all I need is to make sure it works for the smallest case – the base case – and then by induction the validity is proven!
Another possibility is divide-and-conquer, where we split our problem in two halves, so will get to the base case much much faster. As long as the combination step is simple (and preserves validity of solution of course), it works. In our magician metaphor, I get to create two copies of myself, and combine their two answers into one when they are finished. Each of them creates two copies of themselves as well, so this creates a branching tree of magicians, instead of a simple line as before.
A good example is the Sierpinski triangle which is a figure that is built from three quarter-sized Sierpinski triangles simply, by stacking them up at their corners.
Each of the three component triangles is built according to the same recipe.
Although it doesn't have the base case, and so the recursion is unbounded (bottomless; infinite), any finite representation of S.T. will presumably draw just a dot in place of the S.T. which is too small (serving as the base case, stopping the recursion).
There's a nice picture of it in the linked Wikipedia article.
Recursively drawing an S.T. without the size limit will never draw anything on screen! For mathematicians recursion may be great, engineers though should be more cautious about it. :)
Switching to corecursion ⁄ iteration (see the linked answer for that), we would first draw the outlines, and the interiors after that; so even without the size limit the picture would appear pretty quickly. The program would then be busy without any noticeable effect, but that's better than the empty screen.
I came across this piece from Edsger W. Dijkstra; he tells how his child grabbed recursions:
A few years later a five-year old son would show me how smoothly the idea of recursion comes to the unspoilt mind. Walking with me in the middle of town he suddenly remarked to me, Daddy, not every boat has a lifeboat, has it? I said How come? Well, the lifeboat could have a smaller lifeboat, but then that would be without one.
I love this question and couldn't resist to add an answer...
Recursion is the russian doll of programming. The first example that come to my mind is closer to an example of mutual recursion :
Mutual recursion everyday example
Mutual recursion is a particular case of recursion (but sometimes it's easier to understand from a particular case than from a generic one) when we have two function A and B defined like A calls B and B calls A. You can experiment this very easily using a webcam (it also works with 2 mirrors):
display the webcam output on your screen with VLC, or any software that can do it.
Point your webcam to the screen.
The screen will progressively display an infinite "vortex" of screen.
What happens ?
The webcam (A) capture the screen (B)
The screen display the image captured by the webcam (the screen itself).
The webcam capture the screen with a screen displayed on it.
The screen display that image (now there are two screens displayed)
And so on.
You finally end up with such an image (yes, my webcam is total crap):
"Simple" recursion is more or less the same except that there is only one actor (function) that calls itself (A calls A)
"Simple" Recursion
That's more or less the same answer as #WillNess but with a little code and some interactivity (using the js snippets of SO)
Let's say you are a very motivated gold-miner looking for gold, with a very tiny mine, so tiny that you can only look for gold vertically. And so you dig, and you check for gold. If you find some, you don't have to dig anymore, just take the gold and go. But if you don't, that means you have to dig deeper. So there are only two things that can stop you:
Finding some gold nugget.
The Earth's boiling kernel of melted iron.
So if you want to write this programmatically -using recursion-, that could be something like this :
// This function only generates a probability of 1/10
function checkForGold() {
let rnd = Math.round(Math.random() * 10);
return rnd === 1;
}
function digUntilYouFind() {
if (checkForGold()) {
return 1; // he found something, no need to dig deeper
}
// gold not found, digging deeper
return digUntilYouFind();
}
let gold = digUntilYouFind();
console.log(`${gold} nugget found`);
Or with a little more interactivity :
// This function only generates a probability of 1/10
function checkForGold() {
console.log("checking...");
let rnd = Math.round(Math.random() * 10);
return rnd === 1;
}
function digUntilYouFind() {
if (checkForGold()) {
console.log("OMG, I found something !")
return 1;
}
try {
console.log("digging...");
return digUntilYouFind();
} finally {
console.log("climbing back...");
}
}
let gold = digUntilYouFind();
console.log(`${gold} nugget found`);
If we don't find some gold, the digUntilYouFind function calls itself. When the miner "climbs back" from his mine it's actually the deepest child call to the function returning the gold nugget through all its parents (the call stack) until the value can be assigned to the gold variable.
Here the probability is high enough to avoid the miner to dig to the earth kernel. The earth kernel is to the miner what the stack size is to a program. When the miner comes to the kernel he dies in terrible pain, when the program exceed the stack size (causes a stack overflow), it crashes.
There are optimization that can be made by the compiler/interpreter to allow infinite level of recursion like tail-call optimization.
Take fractals as being recursive: the same pattern get applied each time, yet each figure differs from another.
As natural phenomena with fractal features, Wikipedia presents:
Moutain ranges
Frost crystals
DNA
and, even, proteins.
This is odd, and not quite a physical example except insofar as dance-movement is physical. It occurred to me the other morning. I call it "Written in Latin, solved in Hebrew." Huh? Surely you are saying "Huh?"
By it I mean that encoding a recursion is usually done left-to-right, in the Latin alphabet style: "Def fac(n) = n*(fac(n-1))." The movement style is "outermost case to base case."
But (please check me on this) at least in this simple case, it seems the easiest way to evaluate it is right-to-left, in the Hebrew alphabet style: Start from the base case and move outward to the outermost case:
(fac(0) = 1)
(fac(1) = 1)*(fac(0) = 1)
(fac(2))*(fac(1) = 1)*(fac(0) = 1)
(fac(n)*(fac(n-1)*...*(fac(2))*(fac(1) = 1)*(fac(0) = 1)
(* Easier order to calculate <<<<<<<<<<< is leftwards,
base outwards to outermost case;
more difficult order to calculate >>>>>> is rightwards,
outermost case to base *)
Then you do not have to suspend items on the left while awaiting the results of calculations further right. "Dance Leftwards" instead of "Dance rightwards"?

Recursive thinking

I would like to ask if it is really necessary to track every recursive call when writing it, because I am having troubles if recursive call is inside a loop or inside multiple for loops. I just get lost when I am trying to understand what is happening.
Do you have some advice how to approach recursive problems and how to imagine it. I have already read a lot about it but I havent found a perfect answer yet. I understand for example how factorial works or fibonacci recursion. I get lost for example when I am trying to print all combinations from 1 to 5 length of 3 or all the solutions for n-queen problem
I had a similar problem, try drawing a tree like structure that keeps track of each recursive call. Where a node is a function and every child node of that node is a recursive call made from that function.
Everyone may have a different mental approach towards towards modeling a recursive problem. If you can solve the n queens problem in a non-recursive way, then you are just fine. It is certainly helpful to grasp the concept of recursion to break down a problem, though. If you are up for the mental exercise, then I suggest a text book on PROLOG. It is fun and very much teaches recursion from the very beginning.
Attempting a bit of a brain dump on n-queens. It goes like "how would I do it manually" by try and error. For n-queens, I propose to in your mind call it 8-queens as a start, just to make it look more familiar and intuitive. "n" is not an iterator here but specifies the problem size.
you reckon that n-queens has a self-similarity which is that you place single queens on a board - that is your candidate recursive routine
for a given board you have a routine to test if the latest queen added is in conflict with the prior placed ones
for a given board you have a routine to find a position for the queen that you have not tested yet if that test is not successful for the current position
you print out all queen positions if the queen you just placed was the nth (last) queen
otherwise (if the current queen was validly placed) you position an additional queen
The above is your program. Your routine will pass a list of positions of earlier queens. The first invocation is with an empty list.

Will Dijkstra or A* work correctly with cost a function of full path?

What I'm considering is this: when a node becomes the current node, compute "on the fly" the cost to each neighbor, where the cost is a function of the complete path to arrive at the current node. I can't think how this would break the assumptions of the algorithm, but I have a feeling it might.
I'm doing the on the fly computation for storage reasons anyway, but the new thing would be having the costs be a function of more than the two nodes involved. Could it work?
As far as I see, it doesn't break the assumptions of the Dijsktra algorithm, i.e. you can still continue to use it. However, when you want to do so, it does require you to completely refigure your graph.
More detailed, you can't simply use Node-Indices {1,...,N} anymore, but then your state rather needs to be something like {(1,all-ways-to-get-there), ... , (N,all-ways-to-get-there)}. This will bring in a bad exponential scaling.
The reason for this is that the Dijkstra algorithm -- like Dynamic programming -- relies on the fact that the problem can be split in parts and solved there, which is not the case here.
Here is an example why it can't be done by "normal" Dijkstra: say your function which assigns a cost to a given way {node_1, node_2, ..., node_N} is called f and is assumed arbitrary. Then it is completely irrelevant what your current best costs or best path {node_1, ..., node_{N-1}} is at the moment, as you can't make any implication on that -- all you can do is to work out each possible path, which grows exponentially and is hopeless for large graphs.
In case your function fulfills some requirements, however, there might be better things to do. For example, in the simplest case when your function is linear, f({path1} + {path2}) = f({path1}) + f({path2})], the "original" Dijkstra algorithm is recovered.
If it's possible to pre-compute the cost of travelling between each pair of nodes, then there is absolutely no reason you can't use Dijkstra or A*, as long as none of your edge weights can be negative.
If it's not possible to pre-compute the cost, then it's likely that you're doing something wrong in your pathfinding, as it likely depends on the state of the search. :)

How to quantitatively measure how simplified a mathematical expression is

I am looking for a simple method to assign a number to a mathematical expression, say between 0 and 1, that conveys how simplified that expression is (being 1 as fully simplified). For example:
eval('x+1') should return 1.
eval('1+x+1+x+x-5') should returns some value less than 1, because it is far from being simple (i.e., it can be further simplified).
The parameter of eval() could be either a string or an abstract syntax tree (AST).
A simple idea that occurred to me was to count the number of operators (?)
EDIT: Let simplified be equivalent to how close a system is to the solution of a problem. E.g., given an algebra problem (i.e. limit, derivative, integral, etc), it should assign a number to tell how close it is to the solution.
The closest metaphor I can come up with it how a maths professor would look at an incomplete problem and mentally assess it in order to tell how close the student is to the solution. Like in a math exam, were the student didn't finished a problem worth 20 points, but the professor assigns 8 out of 20. Why would he come up with 8/20, and can we program such thing?
I'm going to break a stack-overflow rule and post this as an answer instead of a comment, because not only I'm pretty sure the answer is you can't (at least, not the way you imagine), but also because I believe it can be educational up to a certain degree.
Let's assume that a criteria of simplicity can be established (akin to a normal form). It seems to me that you are very close to trying to solve an analogous to entscheidungsproblem or the halting problem. I doubt that in a complex rule system required for typical algebra, you can find a method that gives a correct and definitive answer to the number of steps of a series of term reductions (ipso facto an arbitrary-length computation) without actually performing it. Such answer would imply knowing in advance if such computation could terminate, and so contradict the fact that automatic theorem proving is, for any sufficiently powerful logic capable of representing arithmetic, an undecidable problem.
In the given example, the teacher is actually either performing that computation mentally (going step by step, applying his own sequence of rules), or gives an estimation based on his experience. But, there's no generic algorithm that guarantees his sequence of steps are the simplest possible, nor that his resulting expression is the simplest one (except for trivial expressions), and hence any quantification of "distance" to a solution is meaningless.
Wouldn't all this be true, your problem would be simple: you know the number of steps, you know how many steps you've taken so far, you divide the latter by the former ;-)
Now, returning to the criteria of simplicity, I also advice you to take a look on Hilbert's 24th problem, that specifically looked for a "Criteria of simplicity, or proof of the greatest simplicity of certain proofs.", and the slightly related proof compression. If you are philosophically inclined to further understand these subjects, I would suggest reading the classic Gödel, Escher, Bach.
Further notes: To understand why, consider a well-known mathematical artefact called the Mandelbrot fractal set. Each pixel color is calculated by determining if the solution to the equation z(n+1) = z(n)^2 + c for any specific c is bounded, that is, "a complex number c is part of the Mandelbrot set if, when starting with z(0) = 0 and applying the iteration repeatedly, the absolute value of z(n) remains bounded however large n gets." Despite the equation being extremely simple (you know, square a number and sum a constant), there's absolutely no way to know if it will remain bounded or not without actually performing an infinite number of iterations or until a cycle is found (disregarding complex heuristics). In this sense, every fractal out there is a rough approximation that typically usages an escape time algorithm as an heuristic to provide an educated guess whether the solution will be bounded or not.

Resources