What is wrong with my R for-loop that sums a series? - r

Here is my function that does a loop:
answer = function(a,n) {
for (k in 0:n) {
x =+ (a^k)/factorial(k)
}
return(x)
}
answer(1,2) should return 2.5 as it is the calculated value of
1^0 / 0! + 1^1 / 1! + 1^2 / 2! = 1 + 1 + 0.5 = 2.5
But I get
answer(1,2)
#[1] 0.5
Looks like it fails to accumulate all three terms and just stores the newest value every time. += does not work so I used =+ but it is still not right. Thanks.

answer = function(a,n) {
x <- 0 ## initialize the accumulator
for (k in 0:n) {
x <- x + (a^k)/factorial(k) ## note how to accumulate value in R
}
return(x)
}
answer(1, 2)
#[1] 2.5
There is "vectorized" solution:
answer = function(a,n) {
x <- a ^ (0:n) / factorial(0:n)
return(sum(x))
}
In this case you don't need to initialize anything. R will allocate memory behind that <- and sum.
You are using Taylor expansion to approximate exp(a). See this Q & A on the theme. You may want to pay special attention to the "numerical convergence" issue mentioned in my answer.

Related

Slow recursion even with memoization in R

I'm trying to solve the problem #14 of Project Euler.
So the main objective is finding length of Collatz sequence.
Firstly I solved problem with regular loop:
compute <- function(n) {
result <- 0
max_chain <- 0
hashmap <- 1
for (i in 1:n) {
chain <- 1
number <- i
while (number > 1) {
if (!is.na(hashmap[number])) {
chain <- chain + hashmap[number]
break
}
if (number %% 2 == 0) {
chain <- chain + 1
number <- number / 2
} else {
chain <- chain + 2
number <- (3 * number + 1) / 2
}
}
hashmap[i] <- chain
if (chain > max_chain) {
max_chain <- chain
result <- i
}
}
return(result)
}
Only 2 seconds for n = 1000000.
I decided to replace while loop to recursion
len_collatz_chain <- function(n, hashmap) {
get_len <- function(n) {
if (is.na(hashmap[n])) {
hashmap[n] <<- ifelse(n %% 2 == 0, 1 + get_len(n / 2), 2 + get_len((3 * n + 1) / 2))
}
return(hashmap[n])
}
get_len(n)
return(hashmap)
}
compute <- function(n) {
result <- 0
max_chain <- 0
hashmap <- 1
for (i in 1:n) {
hashmap <- len_collatz_chain(i, hashmap)
print(length(hashmap))
if (hashmap[i] > max_chain) {
max_chain <- hashmap[i]
result <- i
}
}
return(result)
}
This solution works but works so slow. Almost 1 min for n = 10000.
I suppose that one of the reasons is R creates hashmap object each time when call function len_collatz_chain.
I know about Rcpp packages and yes, the first solution works fine but I can't understand where I'm wrong.
Any tips?
For example, my Python recursive solution works in 1 second with n = 1000000
def len_collatz_chain(n: int, hashmap: dict) -> int:
if n not in hashmap:
hashmap[n] = 1 + len_collatz_chain(n // 2, hashmap) if n % 2 == 0 else 2 + len_collatz_chain((3 * n + 1) // 2, hashmap)
return hashmap[n]
def compute(n: int) -> int:
result, max_chain, hashmap = 0, 0, {1: 1}
for i in range(2, n):
chain = len_collatz_chain(i, hashmap)
if chain > max_chain:
result, max_chain = i, chain
return result
The main difference between your R and Python code is that in R you use a vector for the hashmap, while in Python you use a dictionary and that hashmap is transferred many times as function argument.
In Python, if you have a Dictionary as function argument, only a reference to the actual data is transfered to the called function. This is fast. The called function works on the same data as the caller.
In R, a vector is copied when used as function argument. This is potentially slow, but safer in the sense that the called function cannot alter the data of the caller.
This the main reason that Python is so much faster in your code.
You can however alter the R code slightly, such that the hashmap is not transfered as function argument anymore:
len_collatz_chain <- local({
hashmap <- 1L
get_len <- function(n) {
if (is.na(hashmap[n])) {
hashmap[n] <<- ifelse(n %% 2 == 0, 1 + get_len(n / 2), 2 + get_len((3 * n + 1) / 2))
}
hashmap[n]
}
get_len
})
compute <- function(n) {
result <- rep(NA_integer_, n)
for (i in seq_len(n)) {
result[i] <- len_collatz_chain(i)
}
result
}
compute(n=10000)
This makes the R code much faster. (Python will probably still be faster though).
Note that I have also removed the return statements in the R code, as they are not needed and add one level to the call stack.

Writing a square root function in R

I'm trying to write a square root function in R. The function is supposed to behave like sqrt() but not use that function of course. I'm supposed to use Newton's method for computing the square root, which is:
y(a+1) = [y(a) + x / y(a)]/2
Here x is the number I'm trying to calculate the square root of and y(0) would be the initial guess of the square root of x.
The function is supposed to take in four arguments: x (the number I'm trying to compute the square root of), eps (the difference in value between iterations that are considered be equal), iter (the max number of iterations), and verbose (says I want to output intermediate results).
My issue is that I am not very well versed in writing functions in R. I have experience in C++, but they are slightly different in R.
I believe I'm supposed to write something that goes like this.
Asks the user to input a number as a guess for the value we want to calculate the square root of. Make a for loop from 1 to iter with two if statements 1) that stop the function and output the y value if the max number of iterations have been reached 2) stop the function and output the y value if the difference between successive iterations is less than eps.
Here is the code I have so far:
MySqrt <- function (x, eps = 1e-6, iter = 100, verbose = TRUE) {
for (i in 0:itmax) {
y[0] <- readline(prompt="Please enter your initial square root guess: ")
y[i + 1] = (y[i] + x / y[i])/2
if (i == 100) {
stop (return(y[i + 1]))
}
if (abs(y[i + 1] - y[i]) < eps) {
stop (return(y[i + 1]))
}
}
return(y[i + 1])
}
Here is the error I receive after entering the initial square root guess: Error in y[0] <- readline(prompt = "Please enter your initial square root guess: ") :
object 'y' not found
Honestly, I didn't expect the code to work because I'm sure there are more than one errors.
You should use iter instead of itmax.
I initialized y within the function and input of y should be formatted as a number instead of a character. You could also simplify the if statement by using | (or).
I also added "cat" function so you could see what i is before the function prints out the square root value.
MySqrt <- function (x, eps = 1e-6, iter = 100, verbose = TRUE) {
y = 0
y[1] = as.numeric(readline(prompt="Please enter your initial square root guess: "))
for (i in 1:iter) {
y[i+1] = as.numeric((y[i] + (x/y[i]))/2)
if (i == 100 || abs(y[i+1] - y[i]) < eps) {
cat("This is", i,"th try: \n")
return(y[i+1])
}
}
}
Try this simply:
newton.raphson <- function(x, start, epsilon=0.0001, maxiter=100) {
y <- c(start) # initial guess
a <- 1 # number of iterations
while (TRUE) {
y <- c(y, (y[a] + x / y[a])/2)
if (abs(y[a+1] - y[a]) < epsilon | a > maxiter) { # converged or exceeded maxiter
return(y[a+1])
}
a <- a + 1
}
}
newton.raphson(2, 0.5, 0.01)
# [1] 1.414234
newton.raphson(3, 0.5, 0.01)
# [1] 1.732051
since sqrt(n) < n/2 then with precision of 1/10000
sqrnt=function(y){
x=y/2
while (abs(x*x-y) > 1e-10)
{x=(x+y/x)/2 }
x
}
In Newton’s method. If you want to know the square root of a, you can start estimate a number, x (for examples a/2), you can compute a better estimate with the following formula:
y = (x + a / x) / 2
If y != x, you set x = y, and repeat until y == x. Then you get the square root of a. Please see the code below:
square_root <- function(a) {
x <- a/2
while (TRUE) {
y <- (x + a / x) / 2
if (y == x) break
x <- y
}
return(y)
}

Get distance to next largest floating point number in R [duplicate]

Is there any implementation of functionality in R, such that it is possible to get the next representable floating point number from a given floating point number. This would be similar to the nextafter function in the C standard library. Schemes such as number + .Machine$double.eps don't work in general.
No, but there are two ways you can make it:
Using C
If you want the exact functionality of the nextafter() function, you can write a C function that works as an interface to the function such that the following two constraints are met:
The function does not return a value. All work is accomplished as a "side effect" (changing the values of arguments).
All the arguments are pointers. Even scalars are vectors (of length one) in R.
That function should then be compiled as a shared library:
R CMD SHLIB foo.c
for UNIX-like OSs. The shared library can be called using dyn.load("foo.so"). You can then call the function from inside R using the .C() function
.C("foo", ...)
A more in depth treatment of calling C from R is here.
Using R
number + .Machine$double.eps is the way to go but you have to consider edge cases, such as if x - y < .Machine$double.eps or if x == y. I would write the function like this:
nextafter <- function(x, y){
# Appropriate type checking and bounds checking goes here
delta = y - x
if(x > 0){
factor = 2^floor(log2(x)) + ifelse(x >= 4, 1, 0)
} else if (x < 0) {
factor = 65
}
if (delta > .Machine$double.eps){
return(x + factor * .Machine$double.eps)
} else if (delta < .Machine$double.eps){
return(x - factor * .Machine$double.eps)
} else {
return(x)
}
}
Now, unlike C, if you want to check integers, you can do so in the same function but you need to change the increment based on the type.
UPDATE
The previous code did not perform as expected for numbers larger than 2. There is a factor that needs to be multiplied by the .Machine$double.eps to make it large enough to cause the numbers to be different. It is related to the nearest power of 2 plus one. You can get an idea of how this works with the below code:
n <- -100
factor <- vector('numeric', 100)
for(i in 1:n){
j = 0
while(TRUE){
j = j + 1
if(i - j * .Machine$double.eps != i) break()
}
factor[i] = j
}
If you prefer Rcpp:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double nextAfter(double x, double y) {
return nextafter(x, y);
}
Then in R:
sprintf("%.20f", 1)
#[1] "1.00000000000000000000"
sprintf("%.20f", nextAfter(1, 2))
#[1] "1.00000000000000022204"
I'm not sure if Christopher Louden's answer works for all values, but here's a pure R version of the classic approach (increments/decrements the integer bits). R does not make it easy to convert between doubles and integers, nor does it have a 64-bit integer type, so there's quite a lot of code for this.
doubleToRaw <- function(d) writeBin(d, raw());
rawToDouble <- function(r) readBin(r, numeric());
int64inc <- function(lo, hi) {
if (lo == 0xffffffff) { hi <- hi + 1; lo <- 0; } else { lo <- lo + 1; }
return(c(lo, hi));
}
int64dec <- function(lo, hi) {
if (lo == 0) { hi <- hi - 1; lo <- 0xffffffff; } else { lo <- lo - 1; }
return(c(lo, hi));
}
nextafter <- function(x, y) {
if (is.nan(x + y))
return(NaN);
if (x == y)
return(x);
if (x == 0)
return(sign(y) * rawToDouble(as.raw(c(0, 0, 0, 0, 0, 0, 0, 1))));
ints <- packBits(rawToBits(doubleToRaw(x)), "integer")
if ((y > x) == (x > 0))
ints <- int64inc(ints[1], ints[2])
else
ints <- int64dec(ints[1], ints[2]);
return(rawToDouble(packBits(intToBits(ints), "raw")))
}

Newton's Method in R Precision/Output

So, I'm supposed to write the code to execute Newton's Method to calculate the square root of any arbitrary number to a specified precision (tolerance).
Here is my code:
MySqrt <- function(x, eps = 1e-6, itmax = 100, verbose = TRUE) {
GUESS <- 11
myvector <- integer(0)
i <- 1
if (x < 0) {
stop("Square root of negative value")
}
else {
myvector[i] <- GUESS
while (i <= itmax) {
GUESS <- (GUESS + (x/GUESS)) * 0.5
myvector[i+1] <- GUESS
if (abs(GUESS-myvector[i]) < eps) {
break()
}
if (verbose) {
cat("Iteration: ", formatC(i, width = 1), formatC(GUESS, digits = 10, width = 12), "\n")
}
i <- i + 1
}
}
myvector[i]
}
eps is the tolerance. When I use the function to calculate the square root of, say, 21, I got this as an output:
> MySqrt(21, eps = 1e-1, verbose = TRUE)
Iteration: 1 6.454545455
Iteration: 2 4.854033291
Iteration: 3 4.59016621
I'm not sure if the function stops carrying out iterations when it is supposed to, however. Can someone verify if my code is correct? This would be greatly appreciated!
Your code is almost correct. It is iterating the correct number of times. The only bug is that you don't increment i until after the break statement, so you are not returning the most recent approximation. Instead you are returning the previous one.
In order to verify that it is stopping at the right time, you can move the tracing line up above the break. You can also add GUESS-myvector[i] to the trace, so you can watch it halt as soon as the difference gets small enough. If you do this and run the function, the fact that it is stopping at the right time, as well as the fact that it is returning the wrong value, will be obvious:
> MySqrt(21,eps=1e-1)
Iteration: 1 6.454545 -4.545455
Iteration: 2 4.854033 -1.600512
Iteration: 3 4.590166 -0.2638671
Iteration: 4 4.582582 -0.007584239
[1] 4.590166
While your code is (almost) correct, it is not written in very good R style. For example, unless you want to return the entire vector of estimates, there is no reason that you need to keep them all around. Also, rather than using a while loop, here it would make more sense to use a for loop. Here one possible improved version of your function:
MySqrt <- function(x, eps = 1e-6, itmax = 100, verbose = TRUE) {
GUESS <- 11
if (x < 0) {
stop("Square root of negative value")
}
for(i in 1:itmax){
nextGUESS <- (GUESS + (x/GUESS)) * 0.5
if (verbose)
cat("Iteration: ", i, nextGUESS, nextGUESS-GUESS, "\n")
if (abs(GUESS-nextGUESS) < eps)
break
GUESS<- nextGUESS
}
nextGUESS
}

speeding up a loop with loop-carried values in R

I'm trying to speed up code that takes time series data and limits it to a maximum value and then stretches it forward until sum of original data and the "stretched" data are the same.
I have a more complicated version of this that is taking 6 hours to run on 100k rows. I don't think this is vectorizable because it uses values calculated on prior rows - is that correct?
x <- c(0,2101,3389,3200,1640,0,0,0,0,0,0,0)
dat <- data.frame(x=x,y=rep(0,length(x)))
remainder <- 0
upperlimit <- 2000
for(i in 1:length(dat$x)){
if(dat$x[i] >= upperlimit){
dat$y[i] <- upperlimit
} else {
dat$y[i] <- min(remainder,upperlimit)
}
remainder <- remainder + dat$x[i] - dat$y[i]
}
dat
I understand you can use ifelse but I don't think cumsum can be used to carry forward the remainder - apply doesn't help either as far as I know. Do I need to resort to Rcpp? Thank you greatly.
I went ahead and implemented this in Rcpp and made some adjustments to the R function:
require(Rcpp);require(microbenchmark);require(ggplot2);
limitstretchR <- function(upperlimit,original) {
remainder <- 0
out <- vector(length=length(original))
for(i in 1:length(original)){
if(original[i] >= upperlimit){
out[i] <- upperlimit
} else {
out[i] <- min(remainder,upperlimit)
}
remainder <- remainder + original[i] - out[i]
}
out
}
The Rcpp function:
cppFunction('
NumericVector limitstretchC(double upperlimit, NumericVector original) {
int n = original.size();
double remainder = 0.0;
NumericVector out(n);
for(int i = 0; i < n; ++i) {
if (original[i] >= upperlimit) {
out[i] = upperlimit;
} else {
out[i] = std::min<double>(remainder,upperlimit);
}
remainder = remainder + original[i] - out[i];
}
return out;
}
')
Testing them:
x <- c(0,2101,3389,3200,1640,0,0,0,0,0,0,0)
original <- rep(x,20000)
upperlimit <- 2000
system.time(limitstretchR(upperlimit,original))
system.time(limitstretchC(upperlimit,original))
That yielded 80.655 and 0.001 seconds respectively. Native R is quite bad for this. However, I ran a microbenchmark (using a smaller vector) and got some confusing results.
res <- microbenchmark(list=
list(limitstretchR=limitstretchR(upperlimit,rep(x,10000)),
limitstretchC=limitstretchC(upperlimit,rep(x,10000))),
times=110,
control=list(order="random",warmup=10))
print(qplot(y=time, data=res, colour=expr) + scale_y_log10())
boxplot(res)
print(res)
If you were to run that you would see nearly identical results for both functions. This is my first time using microbenchmark, any tips?

Resources