speeding up a loop with loop-carried values in R - r

I'm trying to speed up code that takes time series data and limits it to a maximum value and then stretches it forward until sum of original data and the "stretched" data are the same.
I have a more complicated version of this that is taking 6 hours to run on 100k rows. I don't think this is vectorizable because it uses values calculated on prior rows - is that correct?
x <- c(0,2101,3389,3200,1640,0,0,0,0,0,0,0)
dat <- data.frame(x=x,y=rep(0,length(x)))
remainder <- 0
upperlimit <- 2000
for(i in 1:length(dat$x)){
if(dat$x[i] >= upperlimit){
dat$y[i] <- upperlimit
} else {
dat$y[i] <- min(remainder,upperlimit)
}
remainder <- remainder + dat$x[i] - dat$y[i]
}
dat
I understand you can use ifelse but I don't think cumsum can be used to carry forward the remainder - apply doesn't help either as far as I know. Do I need to resort to Rcpp? Thank you greatly.

I went ahead and implemented this in Rcpp and made some adjustments to the R function:
require(Rcpp);require(microbenchmark);require(ggplot2);
limitstretchR <- function(upperlimit,original) {
remainder <- 0
out <- vector(length=length(original))
for(i in 1:length(original)){
if(original[i] >= upperlimit){
out[i] <- upperlimit
} else {
out[i] <- min(remainder,upperlimit)
}
remainder <- remainder + original[i] - out[i]
}
out
}
The Rcpp function:
cppFunction('
NumericVector limitstretchC(double upperlimit, NumericVector original) {
int n = original.size();
double remainder = 0.0;
NumericVector out(n);
for(int i = 0; i < n; ++i) {
if (original[i] >= upperlimit) {
out[i] = upperlimit;
} else {
out[i] = std::min<double>(remainder,upperlimit);
}
remainder = remainder + original[i] - out[i];
}
return out;
}
')
Testing them:
x <- c(0,2101,3389,3200,1640,0,0,0,0,0,0,0)
original <- rep(x,20000)
upperlimit <- 2000
system.time(limitstretchR(upperlimit,original))
system.time(limitstretchC(upperlimit,original))
That yielded 80.655 and 0.001 seconds respectively. Native R is quite bad for this. However, I ran a microbenchmark (using a smaller vector) and got some confusing results.
res <- microbenchmark(list=
list(limitstretchR=limitstretchR(upperlimit,rep(x,10000)),
limitstretchC=limitstretchC(upperlimit,rep(x,10000))),
times=110,
control=list(order="random",warmup=10))
print(qplot(y=time, data=res, colour=expr) + scale_y_log10())
boxplot(res)
print(res)
If you were to run that you would see nearly identical results for both functions. This is my first time using microbenchmark, any tips?

Related

What is wrong with my R for-loop that sums a series?

Here is my function that does a loop:
answer = function(a,n) {
for (k in 0:n) {
x =+ (a^k)/factorial(k)
}
return(x)
}
answer(1,2) should return 2.5 as it is the calculated value of
1^0 / 0! + 1^1 / 1! + 1^2 / 2! = 1 + 1 + 0.5 = 2.5
But I get
answer(1,2)
#[1] 0.5
Looks like it fails to accumulate all three terms and just stores the newest value every time. += does not work so I used =+ but it is still not right. Thanks.
answer = function(a,n) {
x <- 0 ## initialize the accumulator
for (k in 0:n) {
x <- x + (a^k)/factorial(k) ## note how to accumulate value in R
}
return(x)
}
answer(1, 2)
#[1] 2.5
There is "vectorized" solution:
answer = function(a,n) {
x <- a ^ (0:n) / factorial(0:n)
return(sum(x))
}
In this case you don't need to initialize anything. R will allocate memory behind that <- and sum.
You are using Taylor expansion to approximate exp(a). See this Q & A on the theme. You may want to pay special attention to the "numerical convergence" issue mentioned in my answer.

Get distance to next largest floating point number in R [duplicate]

Is there any implementation of functionality in R, such that it is possible to get the next representable floating point number from a given floating point number. This would be similar to the nextafter function in the C standard library. Schemes such as number + .Machine$double.eps don't work in general.
No, but there are two ways you can make it:
Using C
If you want the exact functionality of the nextafter() function, you can write a C function that works as an interface to the function such that the following two constraints are met:
The function does not return a value. All work is accomplished as a "side effect" (changing the values of arguments).
All the arguments are pointers. Even scalars are vectors (of length one) in R.
That function should then be compiled as a shared library:
R CMD SHLIB foo.c
for UNIX-like OSs. The shared library can be called using dyn.load("foo.so"). You can then call the function from inside R using the .C() function
.C("foo", ...)
A more in depth treatment of calling C from R is here.
Using R
number + .Machine$double.eps is the way to go but you have to consider edge cases, such as if x - y < .Machine$double.eps or if x == y. I would write the function like this:
nextafter <- function(x, y){
# Appropriate type checking and bounds checking goes here
delta = y - x
if(x > 0){
factor = 2^floor(log2(x)) + ifelse(x >= 4, 1, 0)
} else if (x < 0) {
factor = 65
}
if (delta > .Machine$double.eps){
return(x + factor * .Machine$double.eps)
} else if (delta < .Machine$double.eps){
return(x - factor * .Machine$double.eps)
} else {
return(x)
}
}
Now, unlike C, if you want to check integers, you can do so in the same function but you need to change the increment based on the type.
UPDATE
The previous code did not perform as expected for numbers larger than 2. There is a factor that needs to be multiplied by the .Machine$double.eps to make it large enough to cause the numbers to be different. It is related to the nearest power of 2 plus one. You can get an idea of how this works with the below code:
n <- -100
factor <- vector('numeric', 100)
for(i in 1:n){
j = 0
while(TRUE){
j = j + 1
if(i - j * .Machine$double.eps != i) break()
}
factor[i] = j
}
If you prefer Rcpp:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double nextAfter(double x, double y) {
return nextafter(x, y);
}
Then in R:
sprintf("%.20f", 1)
#[1] "1.00000000000000000000"
sprintf("%.20f", nextAfter(1, 2))
#[1] "1.00000000000000022204"
I'm not sure if Christopher Louden's answer works for all values, but here's a pure R version of the classic approach (increments/decrements the integer bits). R does not make it easy to convert between doubles and integers, nor does it have a 64-bit integer type, so there's quite a lot of code for this.
doubleToRaw <- function(d) writeBin(d, raw());
rawToDouble <- function(r) readBin(r, numeric());
int64inc <- function(lo, hi) {
if (lo == 0xffffffff) { hi <- hi + 1; lo <- 0; } else { lo <- lo + 1; }
return(c(lo, hi));
}
int64dec <- function(lo, hi) {
if (lo == 0) { hi <- hi - 1; lo <- 0xffffffff; } else { lo <- lo - 1; }
return(c(lo, hi));
}
nextafter <- function(x, y) {
if (is.nan(x + y))
return(NaN);
if (x == y)
return(x);
if (x == 0)
return(sign(y) * rawToDouble(as.raw(c(0, 0, 0, 0, 0, 0, 0, 1))));
ints <- packBits(rawToBits(doubleToRaw(x)), "integer")
if ((y > x) == (x > 0))
ints <- int64inc(ints[1], ints[2])
else
ints <- int64dec(ints[1], ints[2]);
return(rawToDouble(packBits(intToBits(ints), "raw")))
}

R recursive function or loop in loop

simple problem.
I want to check if the difference of two points (i, j) is greater than a threshold (diff).
If the difference between the points exceeds the threshold the index should be returned and the next distance is measured but from the new datapoint. It is a simple cutofffilter where all datapoints under a predefined threshold are filtered. The only trick is, that the measurement is performed from always the "last" point (that was "far enough away" from the point before).
I first wrote it as two nested loops like:
x <- sample(1:100)
for(i in 1:(length(x)-1)){
for(j in (i+1):length(x)){
if(abs(x[i] - x[j]) >= cutoff) {
print(j)
i <- j # set the index to the current datapoint
break }
}}
This solution is kind of intuitive. But does not work proper. I think the assignment of i and j is not valid. The first loop just ignores to jump and loops through all datapoints.
Well, I did not want to waste time with debugging and just thought I can do the same with a recursive function.
So I wrote it like:
checkCutOff.f <- function(x,cutoff,i = 1) {
options(expressions=500000)
# Loops through the data and comperes the temporally fixed point 'i with the looping points 'j
for(j in (i+1):length(x)){
if( abs(x[i] - x[j]) >= cutoff ){
break
}
}
# Recursive function to update the new 'i - stops at the end of the dataset
if( j<length(x) ) return(c(j,checkCutOff.f(x,cutoff,j)))
else return(j)
}
x<-sample(1:100000)
checkCutOff.f(x,1)
This code works. But I get a stack overflow with big datasets. That's why I ask myself if this code is efficient.
For me is increasing limits etc. always a hint for inefficient code...
So my question is:
What kind of solution is really efficient?
Thanks!
You should avoid growing your return value with c. That's inefficient. Allocate to the maximum size and subset to the needed size in the end.
Note that your function always includes length(x) in your result, which is wrong:
set.seed(42)
x<-sample(1:10)
checkCutOff.f(x, 100)
#[1] 10
Here is an R solution with a loop:
checkCutOff.f1 <- function(x,cutoff) {
i <- 1
j <- 1
k <- 1
result <- integer(length(x))
while(j < length(x)) {
j <- j + 1
if (abs(x[i] - x[j]) >= cutoff) {
result[k] <- j
k <- k + 1
i <- j
}
}
result[seq_len(k - 1)]
}
all.equal(checkCutOff.f(x, 4), checkCutOff.f1(x, 4))
#[1] TRUE
#the correct solution includes length(x) here (by chance)
It's easy to translate to Rcpp:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector checkCutOff_f1cpp(NumericVector x, double cutoff) {
int i = 0;
int j = 1;
int k = 0;
IntegerVector result(x.size());
while(j < x.size()) {
if (std::abs(x[i] - x[j]) >= cutoff) {
result[k] = j + 1;
k++;
i = j;
}
j++;
}
result = result[seq_len(k)-1];
return result;
}
Then in R:
all.equal(checkCutOff.f(x, 4), checkCutOff_f1cpp(x, 4))
#[1] TRUE
Benchmarks:
library(microbenchmark)
y <- sample(1:1000)
microbenchmark(
checkCutOff.f(y, 4),
checkCutOff.f1(y, 4),
checkCutOff_f1cpp(y, 4)
)
#Unit: microseconds
# expr min lq mean median uq max neval cld
# checkCutOff.f(y, 4) 3665.105 4681.6005 7798.41776 5323.068 6635.9205 41028.930 100 c
# checkCutOff.f1(y, 4) 1384.524 1507.2635 1831.43236 1769.031 2070.7225 3012.279 100 b
# checkCutOff_f1cpp(y, 4) 8.765 10.7035 26.40709 14.240 18.0005 587.958 100 a
I'm sure this can be improved further and more testing should be done.

%dopar% or alternative method to speed up sequential stochastic calculation

I have written a stochastic process simulator but I would like to speed it up since it's pretty slow.
The main part of the simulator is made of a for loop which I would like to re-write as a foreach with `%dopar%.
I have tried doing so with a simplified loop but I'm running into some problems. Suppose my for loop looks like this
library(foreach)
r=0
t<-rep(0,500)
for(n in 1:500){
s<-1/2+r
u<-runif(1, min = 0, max = 1)
if(u<s){
t[n]<-u
r<-r+0.001
}else{r<-r-0.001}
}
which means that at each iteration I update the value of r and s and, in one of the two outcomes, populate my vector t. I have tried several different ways of re-writing it as a foreach loop but it seems like with each iteration my values don't get updated and I get some pretty strange results. I have tried using return but it doesn't seem to work!
This is an example of what I have come up with.
rr=0
tt<-foreach(i=1:500, .combine=c) %dopar% {
ss<-1/2+rr
uu<-runif(1, min = 0, max = 1)
if(uu<=ss){
return(uu)
rr<-rr+0.001
}else{
return(0)
rr<-rr-0.001}
}
If it is impossible to use foreach what other way is there for me to re-write the loop so to be able to use all cores and speed up things?
Since your comments, about turning to C, were encouraging and -mostly- to prove that this isn't a hard task (especially for such operations) and it's worth looking into, here is a comparison of two sample functions that accept a number of iterations and perform the steps of your loop:
ffR = function(n)
{
r = 0
t = rep(0, n)
for(i in 1:n) {
s = 1/2 + r
u = runif(1)
if(u < s) {
t[i] = u
r = r + 0.001
} else r = r - 0.001
}
return(t)
}
ffC = inline::cfunction(sig = c(R_n = "integer"), body = '
int n = INTEGER(AS_INTEGER(R_n))[0];
SEXP ans;
PROTECT(ans = allocVector(REALSXP, n));
double r = 0.0, s, u, *pans = REAL(ans);
GetRNGstate();
for(int i = 0; i < n; i++) {
s = 0.5 + r;
u = runif(0.0, 1.0);
if(u < s) {
pans[i] = u;
r += 0.001;
} else {
pans[i] = 0.0;
r -= 0.001;
}
}
PutRNGstate();
UNPROTECT(1);
return(ans);
', includes = "#include <Rmath.h>")
A comparison of results:
set.seed(007); ffR(5)
#[1] 0.00000000 0.39774545 0.11569778 0.06974868 0.24374939
set.seed(007); ffC(5)
#[1] 0.00000000 0.39774545 0.11569778 0.06974868 0.24374939
A comparison of speed:
microbenchmark::microbenchmark(ffR(1e5), ffC(1e5), times = 20)
#Unit: milliseconds
# expr min lq median uq max neval
# ffR(1e+05) 497.524808 519.692781 537.427332 668.875402 692.598785 20
# ffC(1e+05) 2.916289 3.019473 3.133967 3.445257 4.076541 20
And for the sake of completeness:
set.seed(101); ans1 = ffR(1e5)
set.seed(101); ans2 = ffC(1e5)
all.equal(ans1, ans2)
#[1] TRUE
Hope any of this could be helpful in some way.
What you are trying to do, since every iteration is dependent on the previous steps of the loop, doesn't seem to be parallelizable. You are updating the variable r and expecting other branches that are running simultaneously to know about it, and in fact wait for the update to happen, which
1) Doesn't happen. They won't wait, they'll just take r's current value whatever that is at the time they are running
2) If it did it would be same as running it without %dopar%

Speeding up Julia's poorly written R examples

The Julia examples to compare performance against R seem particularly convoluted. https://github.com/JuliaLang/julia/blob/master/test/perf/perf.R
What is the fastest performance you can eke out of the two algorithms below (preferably with an explanation of what you changed to make it more R-like)?
## mandel
mandel = function(z) {
c = z
maxiter = 80
for (n in 1:maxiter) {
if (Mod(z) > 2) return(n-1)
z = z^2+c
}
return(maxiter)
}
mandelperf = function() {
re = seq(-2,0.5,.1)
im = seq(-1,1,.1)
M = matrix(0.0,nrow=length(re),ncol=length(im))
count = 1
for (r in re) {
for (i in im) {
M[count] = mandel(complex(real=r,imag=i))
count = count + 1
}
}
return(M)
}
assert(sum(mandelperf()) == 14791)
## quicksort ##
qsort_kernel = function(a, lo, hi) {
i = lo
j = hi
while (i < hi) {
pivot = a[floor((lo+hi)/2)]
while (i <= j) {
while (a[i] < pivot) i = i + 1
while (a[j] > pivot) j = j - 1
if (i <= j) {
t = a[i]
a[i] = a[j]
a[j] = t
}
i = i + 1;
j = j - 1;
}
if (lo < j) qsort_kernel(a, lo, j)
lo = i
j = hi
}
return(a)
}
qsort = function(a) {
return(qsort_kernel(a, 1, length(a)))
}
sortperf = function(n) {
v = runif(n)
return(qsort(v))
}
sortperf(5000)
The key word in this question is "algorithm":
What is the fastest performance you can eke out of the two algorithms below (preferably with an explanation of what you changed to make it more R-like)?
As in "how fast can you make these algorithms in R?" The algorithms in question here are the standard Mandelbrot complex loop iteration algorithm and the standard recursive quicksort kernel.
There are certainly faster ways to compute the answers to the problems posed in these benchmarks – but not using the same algorithms. You can avoid recursion, avoid iteration, and avoid whatever else R isn't good at. But then you're no longer comparing the same algorithms.
If you really wanted to compute Mandelbrot sets in R or sort numbers, yes, this is not how you would write the code. You would either vectorize it as much as possible – thereby pushing all the work into predefined C kernels – or just write a custom C extension and do the computation there. Either way, the conclusion is that R isn't fast enough to get really good performance on its own – you need have C do most of the work in order to get good performance.
And that's exactly the point of these benchmarks: in Julia you never have to rely on C code to get good performance. You can just write what you want to do in pure Julia and it will have good performance. If an iterative scalar loop algorithm is the most natural way to do what you want to do, then just do that. If recursion is the most natural way to solve the problem, then that's ok too. At no point will you be forced to rely on C for performance – whether via unnatural vectorization or writing custom C extensions. Of course, you can write vectorized code when it's natural, as it often is in linear algebra; and you can call C if you already have some library that does what you want. But you don't have to.
We do want to have the fairest possible comparison of the same algorithms across languages:
If someone does have faster versions in R that use the same algorithm, please submit patches!
I believe that the R benchmarks on the julia site are already byte-compiled, but if I'm doing it wrong and the comparison is unfair to R, please let me know and I will fix it and update the benchmarks.
Hmm, in the Mandelbrot example the matrix M has its dimensions transposed
M = matrix(0.0,nrow=length(im), ncol=length(re))
because it's filled by incrementing count in the inner loop (successive values of im). My implementation creates a vector of complex numbers in mandelperf.1 and operates on all elements, using an index and subsetting to keep track of which elements of the vector have not yet satisfied the condition Mod(z) <= 2
mandel.1 = function(z, maxiter=80L) {
c <- z
result <- integer(length(z))
i <- seq_along(z)
n <- 0L
while (n < maxiter && length(z)) {
j <- Mod(z) <= 2
if (!all(j)) {
result[i[!j]] <- n
i <- i[j]
z <- z[j]
c <- c[j]
}
z <- z^2 + c
n <- n + 1L
}
result[i] <- maxiter
result
}
mandelperf.1 = function() {
re = seq(-2,0.5,.1)
im = seq(-1,1,.1)
mandel.1(complex(real=rep(re, each=length(im)),
imaginary=im))
}
for a 13-fold speed-up (the results are equal but not identical because the original returns numeric rather than integer values).
> library(rbenchmark)
> benchmark(mandelperf(), mandelperf.1(),
+ columns=c("test", "elapsed", "relative"),
+ order="relative")
test elapsed relative
2 mandelperf.1() 0.412 1.00000
1 mandelperf() 5.705 13.84709
> all.equal(sum(mandelperf()), sum(mandelperf.1()))
[1] TRUE
The quicksort example doesn't actually sort
> set.seed(123L); qsort(sample(5))
[1] 2 4 1 3 5
but my main speed-up was to vectorize the partition around the pivot
qsort_kernel.1 = function(a) {
if (length(a) < 2L)
return(a)
pivot <- a[floor(length(a) / 2)]
c(qsort_kernel.1(a[a < pivot]), a[a == pivot], qsort_kernel.1(a[a > pivot]))
}
qsort.1 = function(a) {
qsort_kernel.1(a)
}
sortperf.1 = function(n) {
v = runif(n)
return(qsort.1(v))
}
for a 7-fold speedup (in comparison to the uncorrected original)
> benchmark(sortperf(5000), sortperf.1(5000),
+ columns=c("test", "elapsed", "relative"),
+ order="relative")
test elapsed relative
2 sortperf.1(5000) 6.60 1.000000
1 sortperf(5000) 47.73 7.231818
Since in the original comparison Julia is about 30 times faster than R for mandel, and 500 times faster for quicksort, the implementations above are still not really competitive.

Resources