Get distance to next largest floating point number in R [duplicate] - r

Is there any implementation of functionality in R, such that it is possible to get the next representable floating point number from a given floating point number. This would be similar to the nextafter function in the C standard library. Schemes such as number + .Machine$double.eps don't work in general.

No, but there are two ways you can make it:
Using C
If you want the exact functionality of the nextafter() function, you can write a C function that works as an interface to the function such that the following two constraints are met:
The function does not return a value. All work is accomplished as a "side effect" (changing the values of arguments).
All the arguments are pointers. Even scalars are vectors (of length one) in R.
That function should then be compiled as a shared library:
R CMD SHLIB foo.c
for UNIX-like OSs. The shared library can be called using dyn.load("foo.so"). You can then call the function from inside R using the .C() function
.C("foo", ...)
A more in depth treatment of calling C from R is here.
Using R
number + .Machine$double.eps is the way to go but you have to consider edge cases, such as if x - y < .Machine$double.eps or if x == y. I would write the function like this:
nextafter <- function(x, y){
# Appropriate type checking and bounds checking goes here
delta = y - x
if(x > 0){
factor = 2^floor(log2(x)) + ifelse(x >= 4, 1, 0)
} else if (x < 0) {
factor = 65
}
if (delta > .Machine$double.eps){
return(x + factor * .Machine$double.eps)
} else if (delta < .Machine$double.eps){
return(x - factor * .Machine$double.eps)
} else {
return(x)
}
}
Now, unlike C, if you want to check integers, you can do so in the same function but you need to change the increment based on the type.
UPDATE
The previous code did not perform as expected for numbers larger than 2. There is a factor that needs to be multiplied by the .Machine$double.eps to make it large enough to cause the numbers to be different. It is related to the nearest power of 2 plus one. You can get an idea of how this works with the below code:
n <- -100
factor <- vector('numeric', 100)
for(i in 1:n){
j = 0
while(TRUE){
j = j + 1
if(i - j * .Machine$double.eps != i) break()
}
factor[i] = j
}

If you prefer Rcpp:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double nextAfter(double x, double y) {
return nextafter(x, y);
}
Then in R:
sprintf("%.20f", 1)
#[1] "1.00000000000000000000"
sprintf("%.20f", nextAfter(1, 2))
#[1] "1.00000000000000022204"

I'm not sure if Christopher Louden's answer works for all values, but here's a pure R version of the classic approach (increments/decrements the integer bits). R does not make it easy to convert between doubles and integers, nor does it have a 64-bit integer type, so there's quite a lot of code for this.
doubleToRaw <- function(d) writeBin(d, raw());
rawToDouble <- function(r) readBin(r, numeric());
int64inc <- function(lo, hi) {
if (lo == 0xffffffff) { hi <- hi + 1; lo <- 0; } else { lo <- lo + 1; }
return(c(lo, hi));
}
int64dec <- function(lo, hi) {
if (lo == 0) { hi <- hi - 1; lo <- 0xffffffff; } else { lo <- lo - 1; }
return(c(lo, hi));
}
nextafter <- function(x, y) {
if (is.nan(x + y))
return(NaN);
if (x == y)
return(x);
if (x == 0)
return(sign(y) * rawToDouble(as.raw(c(0, 0, 0, 0, 0, 0, 0, 1))));
ints <- packBits(rawToBits(doubleToRaw(x)), "integer")
if ((y > x) == (x > 0))
ints <- int64inc(ints[1], ints[2])
else
ints <- int64dec(ints[1], ints[2]);
return(rawToDouble(packBits(intToBits(ints), "raw")))
}

Related

Slow recursion even with memoization in R

I'm trying to solve the problem #14 of Project Euler.
So the main objective is finding length of Collatz sequence.
Firstly I solved problem with regular loop:
compute <- function(n) {
result <- 0
max_chain <- 0
hashmap <- 1
for (i in 1:n) {
chain <- 1
number <- i
while (number > 1) {
if (!is.na(hashmap[number])) {
chain <- chain + hashmap[number]
break
}
if (number %% 2 == 0) {
chain <- chain + 1
number <- number / 2
} else {
chain <- chain + 2
number <- (3 * number + 1) / 2
}
}
hashmap[i] <- chain
if (chain > max_chain) {
max_chain <- chain
result <- i
}
}
return(result)
}
Only 2 seconds for n = 1000000.
I decided to replace while loop to recursion
len_collatz_chain <- function(n, hashmap) {
get_len <- function(n) {
if (is.na(hashmap[n])) {
hashmap[n] <<- ifelse(n %% 2 == 0, 1 + get_len(n / 2), 2 + get_len((3 * n + 1) / 2))
}
return(hashmap[n])
}
get_len(n)
return(hashmap)
}
compute <- function(n) {
result <- 0
max_chain <- 0
hashmap <- 1
for (i in 1:n) {
hashmap <- len_collatz_chain(i, hashmap)
print(length(hashmap))
if (hashmap[i] > max_chain) {
max_chain <- hashmap[i]
result <- i
}
}
return(result)
}
This solution works but works so slow. Almost 1 min for n = 10000.
I suppose that one of the reasons is R creates hashmap object each time when call function len_collatz_chain.
I know about Rcpp packages and yes, the first solution works fine but I can't understand where I'm wrong.
Any tips?
For example, my Python recursive solution works in 1 second with n = 1000000
def len_collatz_chain(n: int, hashmap: dict) -> int:
if n not in hashmap:
hashmap[n] = 1 + len_collatz_chain(n // 2, hashmap) if n % 2 == 0 else 2 + len_collatz_chain((3 * n + 1) // 2, hashmap)
return hashmap[n]
def compute(n: int) -> int:
result, max_chain, hashmap = 0, 0, {1: 1}
for i in range(2, n):
chain = len_collatz_chain(i, hashmap)
if chain > max_chain:
result, max_chain = i, chain
return result
The main difference between your R and Python code is that in R you use a vector for the hashmap, while in Python you use a dictionary and that hashmap is transferred many times as function argument.
In Python, if you have a Dictionary as function argument, only a reference to the actual data is transfered to the called function. This is fast. The called function works on the same data as the caller.
In R, a vector is copied when used as function argument. This is potentially slow, but safer in the sense that the called function cannot alter the data of the caller.
This the main reason that Python is so much faster in your code.
You can however alter the R code slightly, such that the hashmap is not transfered as function argument anymore:
len_collatz_chain <- local({
hashmap <- 1L
get_len <- function(n) {
if (is.na(hashmap[n])) {
hashmap[n] <<- ifelse(n %% 2 == 0, 1 + get_len(n / 2), 2 + get_len((3 * n + 1) / 2))
}
hashmap[n]
}
get_len
})
compute <- function(n) {
result <- rep(NA_integer_, n)
for (i in seq_len(n)) {
result[i] <- len_collatz_chain(i)
}
result
}
compute(n=10000)
This makes the R code much faster. (Python will probably still be faster though).
Note that I have also removed the return statements in the R code, as they are not needed and add one level to the call stack.

What is wrong with my R for-loop that sums a series?

Here is my function that does a loop:
answer = function(a,n) {
for (k in 0:n) {
x =+ (a^k)/factorial(k)
}
return(x)
}
answer(1,2) should return 2.5 as it is the calculated value of
1^0 / 0! + 1^1 / 1! + 1^2 / 2! = 1 + 1 + 0.5 = 2.5
But I get
answer(1,2)
#[1] 0.5
Looks like it fails to accumulate all three terms and just stores the newest value every time. += does not work so I used =+ but it is still not right. Thanks.
answer = function(a,n) {
x <- 0 ## initialize the accumulator
for (k in 0:n) {
x <- x + (a^k)/factorial(k) ## note how to accumulate value in R
}
return(x)
}
answer(1, 2)
#[1] 2.5
There is "vectorized" solution:
answer = function(a,n) {
x <- a ^ (0:n) / factorial(0:n)
return(sum(x))
}
In this case you don't need to initialize anything. R will allocate memory behind that <- and sum.
You are using Taylor expansion to approximate exp(a). See this Q & A on the theme. You may want to pay special attention to the "numerical convergence" issue mentioned in my answer.

Writing a square root function in R

I'm trying to write a square root function in R. The function is supposed to behave like sqrt() but not use that function of course. I'm supposed to use Newton's method for computing the square root, which is:
y(a+1) = [y(a) + x / y(a)]/2
Here x is the number I'm trying to calculate the square root of and y(0) would be the initial guess of the square root of x.
The function is supposed to take in four arguments: x (the number I'm trying to compute the square root of), eps (the difference in value between iterations that are considered be equal), iter (the max number of iterations), and verbose (says I want to output intermediate results).
My issue is that I am not very well versed in writing functions in R. I have experience in C++, but they are slightly different in R.
I believe I'm supposed to write something that goes like this.
Asks the user to input a number as a guess for the value we want to calculate the square root of. Make a for loop from 1 to iter with two if statements 1) that stop the function and output the y value if the max number of iterations have been reached 2) stop the function and output the y value if the difference between successive iterations is less than eps.
Here is the code I have so far:
MySqrt <- function (x, eps = 1e-6, iter = 100, verbose = TRUE) {
for (i in 0:itmax) {
y[0] <- readline(prompt="Please enter your initial square root guess: ")
y[i + 1] = (y[i] + x / y[i])/2
if (i == 100) {
stop (return(y[i + 1]))
}
if (abs(y[i + 1] - y[i]) < eps) {
stop (return(y[i + 1]))
}
}
return(y[i + 1])
}
Here is the error I receive after entering the initial square root guess: Error in y[0] <- readline(prompt = "Please enter your initial square root guess: ") :
object 'y' not found
Honestly, I didn't expect the code to work because I'm sure there are more than one errors.
You should use iter instead of itmax.
I initialized y within the function and input of y should be formatted as a number instead of a character. You could also simplify the if statement by using | (or).
I also added "cat" function so you could see what i is before the function prints out the square root value.
MySqrt <- function (x, eps = 1e-6, iter = 100, verbose = TRUE) {
y = 0
y[1] = as.numeric(readline(prompt="Please enter your initial square root guess: "))
for (i in 1:iter) {
y[i+1] = as.numeric((y[i] + (x/y[i]))/2)
if (i == 100 || abs(y[i+1] - y[i]) < eps) {
cat("This is", i,"th try: \n")
return(y[i+1])
}
}
}
Try this simply:
newton.raphson <- function(x, start, epsilon=0.0001, maxiter=100) {
y <- c(start) # initial guess
a <- 1 # number of iterations
while (TRUE) {
y <- c(y, (y[a] + x / y[a])/2)
if (abs(y[a+1] - y[a]) < epsilon | a > maxiter) { # converged or exceeded maxiter
return(y[a+1])
}
a <- a + 1
}
}
newton.raphson(2, 0.5, 0.01)
# [1] 1.414234
newton.raphson(3, 0.5, 0.01)
# [1] 1.732051
since sqrt(n) < n/2 then with precision of 1/10000
sqrnt=function(y){
x=y/2
while (abs(x*x-y) > 1e-10)
{x=(x+y/x)/2 }
x
}
In Newton’s method. If you want to know the square root of a, you can start estimate a number, x (for examples a/2), you can compute a better estimate with the following formula:
y = (x + a / x) / 2
If y != x, you set x = y, and repeat until y == x. Then you get the square root of a. Please see the code below:
square_root <- function(a) {
x <- a/2
while (TRUE) {
y <- (x + a / x) / 2
if (y == x) break
x <- y
}
return(y)
}

speeding up a loop with loop-carried values in R

I'm trying to speed up code that takes time series data and limits it to a maximum value and then stretches it forward until sum of original data and the "stretched" data are the same.
I have a more complicated version of this that is taking 6 hours to run on 100k rows. I don't think this is vectorizable because it uses values calculated on prior rows - is that correct?
x <- c(0,2101,3389,3200,1640,0,0,0,0,0,0,0)
dat <- data.frame(x=x,y=rep(0,length(x)))
remainder <- 0
upperlimit <- 2000
for(i in 1:length(dat$x)){
if(dat$x[i] >= upperlimit){
dat$y[i] <- upperlimit
} else {
dat$y[i] <- min(remainder,upperlimit)
}
remainder <- remainder + dat$x[i] - dat$y[i]
}
dat
I understand you can use ifelse but I don't think cumsum can be used to carry forward the remainder - apply doesn't help either as far as I know. Do I need to resort to Rcpp? Thank you greatly.
I went ahead and implemented this in Rcpp and made some adjustments to the R function:
require(Rcpp);require(microbenchmark);require(ggplot2);
limitstretchR <- function(upperlimit,original) {
remainder <- 0
out <- vector(length=length(original))
for(i in 1:length(original)){
if(original[i] >= upperlimit){
out[i] <- upperlimit
} else {
out[i] <- min(remainder,upperlimit)
}
remainder <- remainder + original[i] - out[i]
}
out
}
The Rcpp function:
cppFunction('
NumericVector limitstretchC(double upperlimit, NumericVector original) {
int n = original.size();
double remainder = 0.0;
NumericVector out(n);
for(int i = 0; i < n; ++i) {
if (original[i] >= upperlimit) {
out[i] = upperlimit;
} else {
out[i] = std::min<double>(remainder,upperlimit);
}
remainder = remainder + original[i] - out[i];
}
return out;
}
')
Testing them:
x <- c(0,2101,3389,3200,1640,0,0,0,0,0,0,0)
original <- rep(x,20000)
upperlimit <- 2000
system.time(limitstretchR(upperlimit,original))
system.time(limitstretchC(upperlimit,original))
That yielded 80.655 and 0.001 seconds respectively. Native R is quite bad for this. However, I ran a microbenchmark (using a smaller vector) and got some confusing results.
res <- microbenchmark(list=
list(limitstretchR=limitstretchR(upperlimit,rep(x,10000)),
limitstretchC=limitstretchC(upperlimit,rep(x,10000))),
times=110,
control=list(order="random",warmup=10))
print(qplot(y=time, data=res, colour=expr) + scale_y_log10())
boxplot(res)
print(res)
If you were to run that you would see nearly identical results for both functions. This is my first time using microbenchmark, any tips?

Modulus warning in R- Lehmann Primality Test

I spent a little time hacking an R implementation of the lehmann primality test. The function design I borrowed from http://davidkendal.net/articles/2011/12/lehmann-primality-test
Here is my code:
primeTest <- function(n, iter){
a <- sample(1:(n-1), 1)
lehmannTest <- function(y, tries){
x <- ((y^((n-1)/2)) %% n)
if (tries == 0) {
return(TRUE)
}else{
if ((x == 1) | (x == (-1 %% n))){
lehmannTest(sample(1:(n-1), 1), (tries-1))
}else{
return(FALSE)
}
}
}
lehmannTest(a, iter)
}
primeTest(4, 50) # false
primeTest(3, 50) # true
primeTest(10, 50)# false
primeTest(97, 50) # gives false # SHOULD BE TRUE !!!! WTF
prime_test<-c(2,3,5,7,11,13,17 ,19,23,29,31,37)
for (i in 1:length(prime_test)) {
print(primeTest(prime_test[i], 50))
}
For small primes it works but as soon as i get around ~30, i get a bad looking message and the function stops working correctly:
2: In lehmannTest(a, iter) : probable complete loss of accuracy in modulus
After some investigating i believe it has to do with floating point conversions. Very large numbers are rounded so that the mod function gives a bad response.
Now the questions.
Is this a floating point problem? or in my implementation?
Is there a purely R solution or is R just bad at this?
Thanks
Solution:
After the great feedback and a hour reading about modular exponentiation algorithms i have a solution. first it is to make my own modular exponentiation function. The basic idea is that modular multiplication allows you calculate intermediate results. you can calculate the mod after each iteration, thus never getting a giant nasty number that swamps the 16-bit R int.
modexp<-function(a, b, n){
r = 1
for (i in 1:b){
r = (r*a) %% n
}
return(r)
}
primeTest <- function(n, iter){
a <- sample(1:(n-1), 1)
lehmannTest <- function(y, tries){
x <- modexp(y, (n-1)/2, n)
if (tries == 0) {
return(TRUE)
}else{
if ((x == 1) | (x == (-1 %% n))){
lehmannTest(sample(1:(n-1), 1), (tries-1))
}else{
return(FALSE)
}
}
}
if( n < 2 ){
return(FALSE)
}else if (n ==2) {
return(TRUE)
} else{
lehmannTest(a, iter)
}
}
primeTest(4, 50) # false
primeTest(3, 50) # true
primeTest(10, 50)# false
primeTest(97, 50) # NOW IS TRUE !!!!
prime_test<-c(5,7,11,13,17 ,19,23,29,31,37,1009)
for (i in 1:length(prime_test)) {
print(primeTest(prime_test[i], 50))
}
#ALL TRUE
Of course there is a problem with representing integers. In R integers will be represented correctly up to 2^53 - 1 which is about 9e15. And the term y^((n-1)/2) will exceed that even for small numbers easily. You will have to compute (y^((n-1)/2)) %% n by continually squaring y and taking the modulus. That corresponds to the binary representation of (n-1)/2.
Even the 'real' number theory programs do it like that -- see Wikipedia's entry on "modular exponentiation". That said it should be mentioned that programs like R (or Matlab and other systems for numerical computing) may not be a proper environment for implementing number theory algorithms, probably not even as playing fields with small integers.
Edit: The original package was incorrect
You could utilize the function modpower() in package 'pracma' like this:
primeTest <- function(n, iter){
a <- sample(1:(n-1), 1)
lehmannTest <- function(y, tries){
x <- modpower(y, (n-1)/2, n) # ((y^((n-1)/2)) %% n)
if (tries == 0) {
return(TRUE)
}else{
if ((x == 1) | (x == (-1 %% n))){
lehmannTest(sample(1:(n-1), 1), (tries-1))
}else{
return(FALSE)
}
}
}
lehmannTest(a, iter)
}
The following test is successful as 1009 is the only prime in this set:
prime_test <- seq(1001, 1011, by = 2)
for (i in 1:length(prime_test)) {
print(primeTest(prime_test[i], 50))
}
# FALSE FALSE FALSE FALSE TRUE FALSE
If you are just using base R, I would pick #2b... "R is bad at this". In R integers (which you do not appear to be using) are restricted to 16-bit accuracy. Above that limit you will get rounding errors. You should probably be looking at: package:gmp or package:Brobdingnag. Package:gmp has large-integer and large-rational classes.

Resources