I'm trying to solve the problem #14 of Project Euler.
So the main objective is finding length of Collatz sequence.
Firstly I solved problem with regular loop:
compute <- function(n) {
result <- 0
max_chain <- 0
hashmap <- 1
for (i in 1:n) {
chain <- 1
number <- i
while (number > 1) {
if (!is.na(hashmap[number])) {
chain <- chain + hashmap[number]
break
}
if (number %% 2 == 0) {
chain <- chain + 1
number <- number / 2
} else {
chain <- chain + 2
number <- (3 * number + 1) / 2
}
}
hashmap[i] <- chain
if (chain > max_chain) {
max_chain <- chain
result <- i
}
}
return(result)
}
Only 2 seconds for n = 1000000.
I decided to replace while loop to recursion
len_collatz_chain <- function(n, hashmap) {
get_len <- function(n) {
if (is.na(hashmap[n])) {
hashmap[n] <<- ifelse(n %% 2 == 0, 1 + get_len(n / 2), 2 + get_len((3 * n + 1) / 2))
}
return(hashmap[n])
}
get_len(n)
return(hashmap)
}
compute <- function(n) {
result <- 0
max_chain <- 0
hashmap <- 1
for (i in 1:n) {
hashmap <- len_collatz_chain(i, hashmap)
print(length(hashmap))
if (hashmap[i] > max_chain) {
max_chain <- hashmap[i]
result <- i
}
}
return(result)
}
This solution works but works so slow. Almost 1 min for n = 10000.
I suppose that one of the reasons is R creates hashmap object each time when call function len_collatz_chain.
I know about Rcpp packages and yes, the first solution works fine but I can't understand where I'm wrong.
Any tips?
For example, my Python recursive solution works in 1 second with n = 1000000
def len_collatz_chain(n: int, hashmap: dict) -> int:
if n not in hashmap:
hashmap[n] = 1 + len_collatz_chain(n // 2, hashmap) if n % 2 == 0 else 2 + len_collatz_chain((3 * n + 1) // 2, hashmap)
return hashmap[n]
def compute(n: int) -> int:
result, max_chain, hashmap = 0, 0, {1: 1}
for i in range(2, n):
chain = len_collatz_chain(i, hashmap)
if chain > max_chain:
result, max_chain = i, chain
return result
The main difference between your R and Python code is that in R you use a vector for the hashmap, while in Python you use a dictionary and that hashmap is transferred many times as function argument.
In Python, if you have a Dictionary as function argument, only a reference to the actual data is transfered to the called function. This is fast. The called function works on the same data as the caller.
In R, a vector is copied when used as function argument. This is potentially slow, but safer in the sense that the called function cannot alter the data of the caller.
This the main reason that Python is so much faster in your code.
You can however alter the R code slightly, such that the hashmap is not transfered as function argument anymore:
len_collatz_chain <- local({
hashmap <- 1L
get_len <- function(n) {
if (is.na(hashmap[n])) {
hashmap[n] <<- ifelse(n %% 2 == 0, 1 + get_len(n / 2), 2 + get_len((3 * n + 1) / 2))
}
hashmap[n]
}
get_len
})
compute <- function(n) {
result <- rep(NA_integer_, n)
for (i in seq_len(n)) {
result[i] <- len_collatz_chain(i)
}
result
}
compute(n=10000)
This makes the R code much faster. (Python will probably still be faster though).
Note that I have also removed the return statements in the R code, as they are not needed and add one level to the call stack.
Related
I want to create an script that calculates probabilities for a rol game.
I´m new to programming and I´m stuck with the return values and nested functions. What I want is to use the values returned by the first function in the next one.
I have two functions dice(k, n) and fight(a, b). (for the example, the functions are partly written):
dice <- function (k, n) {
if (k > 3 && n > 2){
a <- 3
b <- 2
attack <- sample(1:6, a)
deff <- sample(1:6, b)
}
return(c(attack, deff))
}
So I want to use the vector attack, and deff in the next function:
fight <- function(a, b){
if (a == 3 && b == 2){
if(sort(attack,T)[1] > sort(deff,T)[1]){
n <- n - 1}
if (sort(attack,T)[1] <= sort(deff,T)[1]) {
k <- k - 1}
if (sort(attack,T)[2] > sort(deff,T)[2]) {
n <- n - 1}
if (sort(attack,T)[2]<= sort(deff,T)[2]){
k <- k - 1}
}
return(c(k, n)
}
But this gives me the next error:
Error in sort(attack, T) : object 'attack' not found
Any ideas? Thanks!
I tried to write a function to calculate the nth Fibonnaci number in R. I can do this recursively.
fibonacci = function(n){
if (n == 1) {return(1)}
if (n == 2) {return(2)}
return(fibonacci(n - 1) + fibonacci(n - 2))
}
I couldn't find any examples in R but from guide in other languages I came up with the following. However it doesn't seem to run any faster.
fibonacci = function(n, lookup = NULL){
if (is.null(lookup)) {
lookup = integer(n + 1)
}
if (n == 1) {return(1)}
if (n == 2) {return(2)}
lookup[1] = 1
lookup[2] = 2
if (lookup[n - 1] == 0) {
lookup[n - 1] = fibonacci(n - 1, lookup)
}
if (lookup[n - 2] == 0) {
lookup[n - 2] = fibonacci(n - 2, lookup)
}
return(lookup[n - 1] + lookup[n - 2])
}
The problem with your solution is that your lookup vector is always local to the call frame environment and new solutions are not propagated up to the callers, i.e., changes to the lookup vector are lost when the function returns. In order to make a persistent variable a la static variables in C, you may create an attribute to the function that acts as a memoizer. Here is one solution:
fibonaccid = function(n, init=T){
if (init) {
lookup <- integer(n + 1)
lookup[1] <- 1
lookup[2] <- 2
} else {
lookup <- attr(fibonaccid, ".lookup")
}
# ... calculate lookup as before, recurse with fibonaccid(...,init=F)
attr(fibonaccid, ".lookup") <<- lookup
return(lookup[n - 1] + lookup[n - 2])
}
This indeed runs much faster:
R> system.time(print(fibonacci(35)))
[1] 14930352
user system elapsed
20.923 0.140 21.446
R> system.time(print(fibonaccid(35)))
[1] 14930352
user system elapsed
0.202 0.006 0.209
See this post for more information.
I have a function that I want to write in tail recursive form. The function calculates the number of ways to get the sum of k by rolling an s sided die n times. I have seen the mathematical solution for this function on this answer. It is as follows:
My reference recursive implementation in R is:
sum_ways <- function(n_times, k_sum, s_side) {
if (k_sum < n_times || k_sum > n_times * s_side) {
return(0)
} else if (n_times == 1) {
return(1)
} else {
sigma_values <- sapply(
1:s_side,
function(j) sum_ways(n_times - 1, k_sum - j, s_side)
)
return(sum(sigma_values))
}
}
I have tried to re-write the function in continuation passing style as I have learned from this answer, but I wasn't successful. Is there a way to write this function in tail-recursive form?
EDIT
I know that R doesn't optimise for tail-recursion. My question is not R specific, a solution in any other language is just as welcome. Even if it is a language that does not optimise for tail-recursion.
sapply isn't in continuation-passing style, so you have to replace it.
Here's a translation to continuation-passing style in Python (another language that does not have proper tail calls):
def sum_ways_cps(n_times, k_sum, s_side, ctn):
"""Compute the number of ways to get the sum k by rolling an s-sided die
n times. Then pass the answer to ctn."""
if k_sum < n_times or k_sum > n_times * s_side:
return ctn(0)
elif n_times == 1:
return ctn(1)
else:
f = lambda j, ctn: sum_ways_cps(n_times - 1, k_sum - j, s_side, ctn)
return sum_cps(1, s_side + 1, 0, f, ctn)
def sum_cps(j, j_max, total_so_far, f, ctn):
"""Compute the sum of f(x) for x=j to j_max.
Then pass the answer to ctn."""
if j > j_max:
return ctn(total_so_far)
else:
return f(j, lambda result: sum_cps(j + 1, j_max, total_so_far + result, f, ctn))
sum_ways_cps(2, 7, 6, print) # 6
Try this (with recursion, we need to think of a linear recurrence relation if we want a tail recursive version):
f <- function(n, k) {
if (n == 1) { # base case
return(ifelse(k<=6, 1, 0))
} else if (k > n*6 | k < n) { # some validation
return(0)
}
else {
# recursive calls, f(1,j)=1, 1<=j<=6, otherwise 0
return(sum(sapply(1:min(k-n+1, 6), function(j) f(n-1,k-j))))
}
}
sapply(1:13, function(k) f(2, k))
# [1] 0 1 2 3 4 5 6 5 4 3 2 1 0
I am self studying the book "Introduction to Algorithms" by Cormen et alli. In their book, they use pseudo-code which assumes that arrays are passed by pointer (by reference). This is different from R (where objects are passed by value), so I am having some difficulties trying to translate their pseudo-code as close as possible, especially when recursion is involved. Most of the time, I have to implement things a lot differently.
For example, with the Merge Sort algorithm, they define the Merge Function (which I think I have translated correctly) and the recursive MergeSort function (where direct translation to R does not work).
The merge function in pseudo-code is as follows where: A is an array and p, q, and r are indices into the array such that p < q < r. The procedure assumes that the subarrays A[p:q] and A[q+1:r] are in sorted order. It merges them to form a single sorted subarray that replaces the current subarray A[p:r]
Merge(A, p, q, r)
n1 = q - p + 1
n2 = r - q
let L[1...n1+1] and R[1...n2+1] be new arrays
for i = 1 to n1
L[i] = A[p+i-1]
for j = 1 to n2
R[j] = A[q+j]
L[n1+1] = infinite
R[n2+1] = infinite
i=1
j=1
for k = p to r
if L[i] <= R[j]
A[j] = L[i]
i = i + 1
else
A[k] = R[j]
j = j + 1
Which I've translated to R as:
Merge <- function(a, p, q, r){
n1 <- q - p + 1
n2 <- r - q
L <- numeric(n1+1)
R <- numeric(n2+1)
for(i in 1:n1){
L[i] <- a[p+i-1]
}
for(j in 1:n2){
R[j] <- a[q+j]
}
L[n1+1] <- Inf
R[n2+1] <- Inf
i=1
j=1
for(k in p:r){
if(L[i] <= R[j]){
a[k] <- L[i]
i <- i +1
}else{
a[k] <- R[j]
j <- j+1
}
}
a
}
And it seems to work fine.
Merge(c(1,3,5, 2,4,6), 1, 3, 6)
[1] 1 2 3 4 5 6
Now the MergeSort function is defined in pseudo-code as follows:
MergeSort(A, p, r)
if p < r
q = (p+r)/2
MergeSort(A, p, q)
MergeSort(A, q+1, r)
Merge(A, p, q, r)
This assumes that A is passed by reference and that every change is visible to every recursive call, which is not true in R.
So, given the Merge function defined above, how you would implement the MergeSort function in R to obtain the correct results? (if possible, and preferable, but not necessary, somewhat similar to the pseudo-code)
Trying to do a literal translation of pseudocode that is written for a language that allows for pass-by-reference in a language that does not support it is a terrible idea. R's not meant to work on slices of an array within a function. That's just not an appropriate translation. The pseudocode is supposed to communicate the spirit of the algorithm which you then translate into the appropriate language. Here's one possible translation of the spirit of mergesort into R.
mmerge<-function(a,b) {
r<-numeric(length(a)+length(b))
ai<-1; bi<-1; j<-1;
for(j in 1:length(r)) {
if((ai<=length(a) && a[ai]<b[bi]) || bi>length(b)) {
r[j] <- a[ai]
ai <- ai+1
} else {
r[j] <- b[bi]
bi <- bi+1
}
}
r
}
mmergesort<-function(A) {
if(length(A)>1) {
q <- ceiling(length(A)/2)
a <- mmergesort(A[1:q])
b <- mmergesort(A[(q+1):length(A)])
mmerge(a,b)
} else {
A
}
}
You can run it with
x<-c(18, 16, 8, 7, 6, 3, 11, 9, 15, 1)
mmergesort(x)
In this version thing is replaced via reference: all functions return new values. Additional, rather than passing in slide indexes, we simply subset vectors and pass them whole to the functions.
Of course the performance of this version is likely to suffer because of all the memory reallocations that occur at the intermediate steps. There's not much you can do about that in base R because of how the language was designed. If you like, you can write C/C++ code and call that via the foreign language interfaces.
If you want to leave your Merge as-is (and ignore the R-way to do things), then you could do...
MergeSort<-function(A, p, r) {
if(p < r) {
q <- floor((p+r)/2)
A <- MergeSort(A, p, q)
A <- MergeSort(A, q+1, r)
Merge(A, p, q, r)
} else {
A
}
}
x <- c(18, 16, 8, 7, 6, 3, 11, 9, 15, 1)
MergeSort(x, 1, length(x))
UPDATE:
Including benchmarking harness
m1<-function() {
x<-sample(1000, 250);
mmergesort(x)
}
m2<-function() {
x<-sample(1000, 250);
MergeSort(x, 1, length(x))
}
microbenchmark(m1(), m2())
This solution runs with getting length only once and simpler logic. And merge is implemented inside mergesort:
mergesort = function(x){
l = length(x)
if(l==1)
{
return(x)
}
else
{
a = mergesort(x[1:((l - l %% 2)/2)])
b = mergesort(x[((l + 2 - l %% 2)/2):l])
a = c(a, Inf)
b = c(b, Inf)
for(el in 1:l){
if(a[1]>=b[1]){
x[el] = b[1]
b = b[-1]
}
else{
x[el] = a[1]
a = a[-1]
}
}
return(x)
}
}
I would like to create matrix A[i,j,k] with the following elements:
A[i,j,k] = 0 if k+j-s-i =/= 0
A[i,j,k] = p[s] if k+j-s-i =0 ( p[s] is given vector )
This may be written by characteristic function as p[s]*ð(k+j-s-i) or by Kronecker delta function as p[s]*ð(0,k+j-s-i).
Is there any "build in" function in R which gives that - I mean is there "ð" built in?
Or do I have to wrote it by myself?
I suppose it would be very useful to have built function which returns 1 for f(x)=0 and 0 otherwise, at least for linear f(x)
I'd rewrite this as
A[i,j,k] = p[k+j-i] if that exists, otherwise 0
which could then be implemented as
p <- c(1,2,3,4,5)
pfun <- function(x) {
if (x < 1 | x > length(p)) {
0
} else {
p[x]
}
}
n <- 5
A <- array(0, c(n, n, n))
for (i in 1:n) {
for (j in 1:n) {
for (k in 1:n) {
A[i,j,k] <- pfun(k+j-i)
}
}
}
There may be something more elegant than triply-nested for loops.
As for a the function you ask about, something as simple as
as.numeric(f(x)==0)
would work.