Groovy's trampoline() makes recursive execution much slower - why?

Groovy's trampoline() makes recursive execution much slower - why? - recursion

I'm experimenting with recursion:
def fac
//fac = { int curr, res = 1G -> 1 >= curr ? res : fac( curr - 1, res * curr ) }
fac = { int curr, res = 1G -> 1 >= curr ? res : fac.trampoline( curr - 1, res * curr ) }
fac = fac.trampoline()
def rnd = new Random()
long s = System.currentTimeMillis()
100000.times{ fac rnd.nextInt( 40 ) }
println "done in ${System.currentTimeMillis() - s} ms / ${fac(40)}"
If I use it like this, I'm getting this:
done in 691 ms
If I uncomment line #2 and comment lines #3-4 to remove trampoline() and run it, I'm getting significantly lower numbers:
done in 335 ms
So, with trampoline the recursion works 2 times slower.
What am I missing?
P.S.
If I run the same example in Scala 2.12:
def fac( curr:Int, acc:BigInt = 1 ):BigInt = if( 1 >= curr ) acc else fac( curr - 1, curr * acc )
val s = System.currentTimeMillis
for( ix <- 0 until 100000 ) fac( scala.util.Random.nextInt(40).toInt )
println( s"done in ${System.currentTimeMillis - s} ms" )
it executes a bit faster:
done in 178 ms
UPDATE
Rewriting the closure to a method with the annotation:
#groovy.transform.TailRecursive
def fac( int curr, res = 1G ) { 1 >= curr ? res : fac( curr - 1, res * curr ) }
// the rest
gives
done in 164 ms
and is super-coll. Nevertheless, I still want to know about trampoline() :)

As stated in the documentation, Closure.trampoline() prevents from overflowing the call stack.
Recursive algorithms are often restricted by a physical limit: the maximum stack height. For example, if you call a method that recursively calls itself too deep, you will eventually receive a StackOverflowException.
An approach that helps in those situations is by using Closure and its trampoline capability.
Closures are wrapped in a TrampolineClosure. Upon calling, a trampolined Closure will call the original Closure waiting for its result. If the outcome of the call is another instance of a TrampolineClosure, created perhaps as a result to a call to the trampoline() method, the Closure will again be invoked. This repetitive invocation of returned trampolined Closures instances will continue until a value other than a trampolined Closure is returned. That value will become the final result of the trampoline. That way, calls are made serially, rather than filling the stack.
Source: http://groovy-lang.org/closures.html#_trampoline
However, using trampoline comes with a cost. Let's take a look at the JVisualVM samples.
Non-trampoline use case
Running an example without trampoline() we get a result in ~441 ms
done in 441 ms / 815915283247897734345611269596115894272000000000
This execution allocates ~2,927,550 objects and consumes around 100 MB of memory.
The CPU has a little to do, and except spending time on main() and run() methods, it spends some cycles on coercing arguments.
The trampoline() use case
Introducing the trampoline does change a lot. Firstly, it makes execution time almost two times slower compared to the previous attempt.
done in 856 ms / 815915283247897734345611269596115894272000000000
Secondly, it allocates ~5,931,470 (!!!) objects and consumes ~221 MB of memory. The main difference is that in the previous case a single of $_main_closure1 was used across all executions, and in case of using trampoline - every call to trampoline() method creates:
a new $_main_closure1 object
which gets wrapped with the CurriedClosure<T>
which then gets wrapped with the TrampolineClosure<T>
Only this allocates more than 1,200,000 objects.
If it comes to the CPU, it also has much more things to do. Just look at the numbers:
all calls to TrampolineClosure<T>.<init>() consume 199 ms
using trampoline introduces calls to PojoeMetaMethodSite$PojoCachedMethodSietNoUnwrap.invoke() which in total consume additional 201 ms
all calls to CachedClass$3.initValue() consume in total additional 98.8 ms
all calls to ClosureMetaClass$NormalMethodChooser.chooseMethod() consume in total additional 100 ms
And this is exactly why introducing trampoline in your case makes the code execution much slower.
So why #TailRecursive does much better?
In short - #TailRecursive annotation replaces all closures and recursive calls with good old while-loop. The factorial function with #TailRecursive looks something like this at the bytecode level:
//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by Fernflower decompiler)
//
package factorial;
import groovy.lang.GroovyObject;
import groovy.lang.MetaClass;
import java.math.BigInteger;
import org.codehaus.groovy.runtime.ScriptBytecodeAdapter;
import org.codehaus.groovy.runtime.dgmimpl.NumberNumberMultiply;
import org.codehaus.groovy.transform.tailrec.GotoRecurHereException;
public class Groovy implements GroovyObject {
public Groovy() {
MetaClass var1 = this.$getStaticMetaClass();
this.metaClass = var1;
}
public static BigInteger factorial(int number, BigInteger acc) {
BigInteger _acc_ = acc;
int _number_ = number;
try {
while(true) {
try {
while(_number_ != 1) {
int __number__ = _number_;
int var7 = _number_ - 1;
_number_ = var7;
Number var8 = NumberNumberMultiply.multiply(__number__, _acc_);
_acc_ = (BigInteger)ScriptBytecodeAdapter.castToType(var8, BigInteger.class);
}
BigInteger var4 = _acc_;
return var4;
} catch (GotoRecurHereException var13) {
;
}
}
} finally {
;
}
}
public static BigInteger factorial(int number) {
return factorial(number, (BigInteger)ScriptBytecodeAdapter.castToType(1, BigInteger.class));
}
}
I have documented this use case on my blog some time ago. You can read the blog post if you want to get more information:
https://e.printstacktrace.blog/tail-recursive-methods-in-groovy/

Related

Calculate the time stream takes to complete in akka stream with and without async

I want to calculate the time akka stream takes to complete
object Demo extends App {
implicit val system = ActorSystem("MyDemo")
implicit val materializer = ActorMaterializer()
val startTime = System.currentTimeMillis
System.out.println(elapsedTime)
val flowA = Flow[String].map { element ⇒
println(s"Flow A : $element ${Thread.currentThread().getName()}" )
Thread.sleep(1000)
element
}
val flowB = Flow[String].map { element ⇒
println(s"Flow B : $element ${Thread.currentThread().getName()}" )
Thread.sleep(1000)
element
}
val flowC = Flow[String].map { element ⇒
println(s"Flow C : $element ${Thread.currentThread().getName()}" )
Thread.sleep(1000)
element
}
import system.dispatcher
val completion = Source(List("Java", "Scala", "C++"))
.via(flowA)
.via(flowB)
.via(flowC)
.runWith(Sink.foreach(s ⇒ println("Got output " + s)))
val stopTime = System.currentTimeMillis
val elapsedTime = stopTime - startTime
println(elapsedTime)
completion.onComplete(_ => system.terminate())
Output
0
113
Flow A : Java MyDemo-akka.actor.default-dispatcher-4
Flow B : Java MyDemo-akka.actor.default-dispatcher-4
Flow C : Java MyDemo-akka.actor.default-dispatcher-4
Got output Java
Flow A : Scala MyDemo-akka.actor.default-dispatcher-4
Flow B : Scala MyDemo-akka.actor.default-dispatcher-4
Flow C : Scala MyDemo-akka.actor.default-dispatcher-4
Got output Scala
Flow A : C++ MyDemo-akka.actor.default-dispatcher-4
Flow B : C++ MyDemo-akka.actor.default-dispatcher-4
Flow C : C++ MyDemo-akka.actor.default-dispatcher-4
Got output C++
Queries
the elapsed time 113 gets printed before streams completes, not clear with the reason. I want to print the elapsed time after stream completes the processing
how can we calculate the time taken to complete stream processing as as I want to compare results of time taken using .map versus replacing .map with .async

Running a stream is asynchronous. For
val completion =
// omitted for brevity
.runWith(Sink.foreach(s => println(s"Got output $s")))
completion is a Future[Done] (the materialized value of Sink.foreach) that will be completed with Done (a singleton) when the stream successfully completes (the future will be failed if the stream fails). Effectively that line of code is complete and execution moves on once the stream has been materialized and started.
You can get an upper-bound on the time taken by simply moving the code to calculate the elapsed time into an onComplete callback on completion.
completion.onComplete { _ => // there's only one possible value here, so we don't need it
val stopTime = System.currentTimeMillis()
val elapsedTime = stopTime - startTime
println(elapsedTime)
system.terminate()
}
Note that this callback will execute at some point after the stream completes, but there are no guarantees that it will immediately be executed. That said, as long as the system and JVM you're running this on aren't under a heavy load, it's good enough.
Two other things are worth noting:
currentTimeMillis really shouldn't be used for reliable benchmarking: it's not even guaranteed to be monotonic (it can go backwards). System.nanoTime is generally more reliable for this purpose.
It may be more realistic to take the start time right before val completion = ???, as otherwise you're also measuring time to construct the "blueprint" of the stream, not just the time to materialize and run the stream.

I tried to build a graph to measure the flow time, maybe it can help you.
import akka.NotUsed
import akka.actor.ActorSystem
import akka.stream.FlowShape
import akka.stream.scaladsl.{Flow, GraphDSL, Source, Unzip, Zip}
object TimedFlow {
def apply[In, Out](innerFlow: Flow[In, Out, NotUsed], func: (Long, Long) => Any): Flow[In, Out, NotUsed] = {
val flowWithLong = Flow.fromGraph(GraphDSL.create() {
implicit builder =>
import akka.stream.scaladsl.GraphDSL.Implicits._
val unzip = builder.add(Unzip[In, Long]())
val zip = builder.add(Zip[Out, Long]())
unzip.out0 ~> innerFlow ~> zip.in0
unzip.out1 ~> zip.in1
FlowShape(unzip.in, zip.out)
})
Flow[In]
.map(in => (in, System.currentTimeMillis()))
.via(flowWithLong)
.via(Flow[(Out, Long)].map {
case (out, beginTime) =>
val endTime = System.currentTimeMillis()
func(beginTime, endTime)
out
})
}
def main(args: Array[String]): Unit = {
implicit val system: ActorSystem = ActorSystem("QuickStart")
val source: Source[Int, NotUsed] = Source(1 to 100)
implicit val ec = system.dispatcher
val plusOneFlowWithTimePrint = TimedFlow(plusOneFlow(), (beginTime: Long, endTime: Long) => {
println(s"begin ${beginTime} end ${endTime}")
println(s"end - begin: ${endTime - beginTime}")
})
val done = source.via(plusOneFlowWithTimePrint).runForeach(println)
done.onComplete(_ => system.terminate())
}
def plusOneFlow(): Flow[Int, Int, NotUsed] = {
Flow[Int]
.map {
x =>
Thread.sleep(50)
x + 1
}
}
}

Using caching instead of memoization to speedup a function

While memoization of a function is a good idea, it could cause a program to crash because the program could potentially run out of memory.
Therefore it is NOT A SAFE OPTION to be used in a production program.
Instead I have developed caching with a fixed memory slots below with a soft limit and hard limit. When the cache slots is above the hard limit, it will have the least used slots deleted until the number of slots is reduced to the soft limit.
struct cacheType
softlimit::Int
hardlimit::Int
memory::Dict{Any,Any}
freq::Dict{Any,Int}
cacheType(soft::Int,hard::Int) = new(soft,hard,Dict(),Dict())
end
function tidycache!(c::cacheType)
memory_slots=length(c.memory)
if memory_slots > c.hardlimit
num_to_delete = memory_slots - c.softlimit
# Now sort the freq dictionary into array of key => AccessFrequency
# where the first few items have the lowest AccessFrequency
for item in sort(collect(c.freq),by = x -> x[2])[1:num_to_delete]
delete!(c.freq, item[1])
delete!(c.memory, item[1])
end
end
end
# Fibonacci function
function cachefib!(cache::cacheType,x)
if haskey(cache.memory,x)
# Increment the number of times this key has been accessed
cache.freq[x] += 1
return cache.memory[x]
else
# perform housekeeping and remove cache entries if over the hardlimit
tidycache!(cache)
if x < 3
cache.freq[x] = 1
return cache.memory[x] = 1
else
result = cachefib!(cache,x-2) + cachefib!(cache,x-1)
cache.freq[x] = 1
cache.memory[x] = result
return result
end
end
end
c = cacheType(3,4)
cachefib!(c,3)
cachefib!(c,4)
cachefib!(c,5)
cachefib!(c,6)
cachefib!(c,4)
println("c.memory is ",c.memory)
println("c.freq is ",c.freq)
I think this would be most useful in a production environment than just using memorization with no limits of memory consumption which could result in a program crashing.
In Python language, they have
#functools.lru_cache(maxsize=128, typed=False)
Decorator to wrap a function with a memoizing callable that saves up to the maxsize most recent calls. It can save time when an expensive or I/O bound function is periodically called with the same arguments.
Since a dictionary is used to cache results, the positional and keyword arguments to the function must be hashable.
Is there an equivalent in Julia language?

There is LRUCache.jl, which provides an LRU type which basically acts like a Dict. Unfortunately, this doesn't seem to work with the Memoize.jl package, but you can use my answer to your other question:
using LRUCache
const fibmem = LRU{Int,Int}(3) # store only 3 values
function fib(n)
get!(fibmem, n) do
n < 3 ? 1 : fib(n-1) + fib(n-2)
end
end

tail rec kotlin list

I'm trying to do some operations that would cause a StackOverflow in Kotlin just now.
Knowing that, I remembered that Kotlin has support for tailrec functions, so I tried to do:
private tailrec fun Turn.debuffPhase(): List<Turn> {
val turns = listOf(this)
if (facts.debuff == 0 || knight.damage == 0) {
return turns
}
// Recursively find all possible thresholds of debuffing
return turns + debuff(debuffsForNextThreshold()).debuffPhase()
}
Upon my surprise that IDEA didn't recognize it as a tailrec, I tried to unmake it a extension function and make it a normal function:
private tailrec fun debuffPhase(turn: Turn): List<Turn> {
val turns = listOf(turn)
if (turn.facts.debuff == 0 || turn.knight.damage == 0) {
return turns
}
// Recursively find all possible thresholds of debuffing
val newTurn = turn.debuff(turn.debuffsForNextThreshold())
return turns + debuffPhase(newTurn)
}
Even so it isn't accepted. The important isn't that the last function call is to the same function? I know that the + is a sign to the List plus function, but should it make a difference? All the examples I see on the internet for tail call for another languages allow those kind of actions.
I tried to do that with Int too, that seemed to be something more commonly used than addition to lists, but had the same result:
private tailrec fun discoverBuffsNeeded(dragon: RPGChar): Int {
val buffedDragon = dragon.buff(buff)
if (dragon.turnsToKill(initKnight) < 1 + buffedDragon.turnsToKill(initKnight)) {
return 0
}
return 1 + discoverBuffsNeeded(buffedDragon)
}
Shouldn't all those implementations allow for tail call? I thought of some other ways to solve that(Like passing the list as a MutableList on the parameters too), but when possible I try to avoid sending collections to be changed inside the function and this seems a case that this should be possible.
PS: About the question program, I'm implementing a solution to this problem.

None of your examples are tail-recursive.
A tail call is the last call in a subroutine. A recursive call is a call of a subroutine to itself. A tail-recursive call is a tail call of a subroutine to itself.
In all of your examples, the tail call is to +, not to the subroutine. So, all of those are recursive (because they call themselves), and all of those have tail calls (because every subroutine always has a "last call"), but none of them is tail-recursive (because the recursive call isn't the last call).
Infix notation can sometimes obscure what the tail call is, it is easier to see when you write every operation in prefix form or as a method call:
return plus(turns, debuff(debuffsForNextThreshold()).debuffPhase())
// or
return turns.plus(debuff(debuffsForNextThreshold()).debuffPhase())
Now it becomes much easier to see that the call to debuffPhase is not in tail position, but rather it is the call to plus (i.e. +) which is in tail position. If Kotlin had general tail calls, then that call to plus would indeed be eliminated, but AFAIK, Kotlin only has tail-recursion (like Scala), so it won't.

Without giving away an answer to your puzzle, here's a non-tail-recursive function.
fun fac(n: Int): Int =
if (n <= 1) 1 else n * fac(n - 1)
It is not tail recursive because the recursive call is not in a tail position, as noted by Jörg's answer.
It can be transformed into a tail-recursive function using CPS,
tailrec fun fac2(n: Int, k: Int = 1): Int =
if (n <= 1) k else fac2(n - 1, n * k)
although a better interface would likely hide the continuation in a private helper function.
fun fac3(n: Int): Int {
tailrec fun fac_continue(n: Int, k: Int): Int =
if (n <= 1) k else fac_continue(n - 1, n * k)
return fac_continue(n, 1)
}

Is there a way to write code in D similar to this Python expression?

There are articles and presentations about functional style programming in D (e.g. http://www.drdobbs.com/architecture-and-design/component-programming-in-d/240008321). I never used D before, but I'm interested in trying it. Is there a way to write code in D similar to this Python expression:
max(x*y for x in range(N) for y in range(x, N) if str(x*y) == str(x*y)[::-1])
Are there D constructs for generators or list (array) comprehensions?

Here's one possible solution, not particularly pretty:
iota(1,N)
.map!(x =>
iota(x,N)
.map!(y => tuple(x,y)))
.joiner
.map!(xy => xy[0]*xy[1])
.filter!(xy => equal(to!string(xy), to!string(xy).retro))
.reduce!max;
So what this actually does is create a range from 1 to N, and map each element to a range of tuples with your x,y values. This gives you a nested range ([[(1,1),(1,2)],[(2,2)]] for N = 2).
We then join this range to get a range of tuples ([(1,1),(1,2),(2,2)] for N = 2).
Next we map to x*y (D's map does for some reason not allow for unpacked tuples, so we need to use indexing).
Penultimately we filter out non-palindromes, before finally reducing the range to its largest element.

Simple answer, no, D does not have generators or list comprehensions (AFAIK). However, you can create a generator using an InputRange. For that solution, see this related question: What is a "yield return" equivalent in the D programming language?
However, your code isn't using generators, so your code could be translated as:
import std.algorithm : max, reduce, retro, equal;
import std.conv : to;
immutable N = 13;
void main() {
int[] keep;
foreach(x; 0 .. N) {
foreach(y; x .. N) {
auto val = x*y;
auto s = to!string(val);
if (equal(s, s.retro)) // reverse doesn't work on immutable Ranges
keep ~= val; // don't use ~ if N gets large, use appender instead
}
}
reduce!max(keep); // returns 121 (11*11)
}
For me, this is much more readable than your list comprehension because the list comprehension has gotten quite large.
There may be a better solution out there, but this is how I'd implement it. An added bonus is you get to see std.algorithm in all its glory.
However, for this particular piece of code, I wouldn't use the array to save on memory and instead store only the best value to save on memory. Something like this:
import std.algorithm : retro, equal;
import std.conv : to;
immutable N = 13;
void main() {
int best = 0;
foreach(x; 0 .. N) {
foreach(y; x .. N) {
auto val = x*y;
auto s = to!string(val);
if (equal(s, s.retro))
best = val;
}
}
}

Understanding recursion

I am struggling to understand this recursion used in the dynamic programming example. Can anyone explain the working of this. The objective is to find the least number of coins for a value.
//f(n) = 1 + min f(n-d) for all denomimations d
Pseudocode:
int memo[128]; //initialized to -1
int min_coin(int n)
{
if(n < 0) return INF;
if(n == 0) return 0;
if(memo[n] != -1)
int ans = INF;
for(int i = 0; i < num_denomination; ++i)
{
ans = min(ans, min_coin(n - denominations[i]));
}
return memo[n] = ans+1; //when does this get called?
}

This particular example is explained very well in this article at Topcoder.
Basically this recursion is using the solutions to smaller problems (least number of coins for a smaller n) to find the solution for the overall problem. The dynamic programming aspect of this is the memoization of the solutions to the sub-problems so they don't have to be recalculated every time.
And yes - there are {} missing as ring0 mentioned in his comment - the recursion should only be executed if the sub-problem has not been solved before.

To answer the owner's question when does this get called? : in a solution based on a recursive program, the same function is called by itself... but eventually returns... When does it return? from the time the function ceased to call itself
f(a) {
if (a > 0) f(a-1);
display "x"
}
f(5);
f(5) would call f(4), in turns call f(3) that call f(2) which calls f(1) calling f(0).
f(0) has a being 0, so it does not call f(), and displays "x" then returns. It returns to the previous f(1) that, after calling f(0) - done - displays also "x". f(1) ends, f(2) displays "x", ... , until f(5). You get 6 "x".

In another terms from what ring0 has already mentioned - when the program reaches the base case and starts to unwind by going up the stack (call frames). For similar case using factorial example see this.
#!/usr/bin/env perl
use strict;
use IO::Handle;
use Carp qw(cluck);
STDOUT->autoflush(1);
STDERR->autoflush(1);
sub factorial {
my $v = shift;
dummy_func();
return 1 if $v == 1;
print "Variable v value: $v and it's address:", \$v, "\ncurrent sub factorial addr:", \&factorial, "\n","-"x40;
return $v * factorial($v - 1);
}
sub dummy_func {
cluck;
}
factorial(5);

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Groovy's trampoline() makes recursive execution much slower - why? - recursion

Related

Calculate the time stream takes to complete in akka stream with and without async

Using caching instead of memoization to speedup a function

tail rec kotlin list

Is there a way to write code in D similar to this Python expression?

Understanding recursion

Categories

Resources