Are "self-contained" functions more efficient in R? - r

I'm writing a function that needs to call a function g passed as a parameter to each element of a list, iteratively.
I'm wondering how to make this the fastest possible. I can achieve an acceptable speed using Rcpp and specific kind of g (writing everything in Cpp), but I can't figure out if I can reach similar speed passing an R function as argument.
Was doing some tests to figure out why R is slower and found some really unexpected results:
minus <- function(x) -x
minus_vec <- Vectorize(minus, "x")
Testing with some simple functions to invert signs.
f0 <- function(x) {
sapply(x, minus)
}
f1 <- function(x) {
for(i in seq_along(x)){
x[i] <- -x[i]
}
x
}
f2 <- function(x) {
for(i in seq_along(x)){
x[i] <- minus(x[i])
}
x
}
I got the following results:
a <- 1:10^5
library(rbenchmark)
benchmark(f0(a), f1(a), f2(a), minus_vec(a), minus(a))[,c(1,4)]
test relative
1 f0(a) 454.842
2 f1(a) 25.579
3 f2(a) 178.211
4 minus_vec(a) 523.789
5 minus(a) 1.000
I would like some explanation on the following points:
Why don't f1 and f2 have the same speed? Writing the piece of code -x[i] and calling the function minus(x[i]) really should be so different when they do the exact same thing?
Why is f0 slower than f2? I always thought apply functions were more efficient than for loops, but never really understood why and now I even found a counter-example.
Can I make a function as fast as f1 using the function minus ?
Why does vectorizing minus (unnecessary since - is already vectorized, but that might not be the case always) made it so bad?

Not a full answer, but here are a few notes
1 minus(x) vs -x: Doing nothing is better than doing something
Your function minus calls `-`, so the added step adds computation time. I honestly do not know the who's, what's and when's specifically, in other words I wouldn't know how much more computation time ought to be expected.
Here is an example highlighting it: we have four functions, all squaring numbers
fa <- function (n) n^2
fb <- function (n) fa(n)
fc <- function (n) fb(n)
fd <- function (n) fc(n)
Fa <- function (n) {
for (i in seq_along(n)) n[i] <- fa(i)
n
}
Fb <- function (n) {
for (i in seq_along(n)) n[i] <- fb(i)
n
}
Fc <- function (n) {
for (i in seq_along(n)) n[i] <- fc(i)
n
}
Fd <- function (n) {
for (i in seq_along(n)) n[i] <- fd(i)
n
}
And here are the benchmarking results
n <- 1:10^4
b <- benchmark(Fa(n),Fb(n),Fc(n),Fd(n), replications = 1000L)
b
# test replications elapsed relative user.self sys.self user.child sys.child
# 1 Fa(n) 1000 3.93 1.000 3.85 0.00 NA NA
# 2 Fb(n) 1000 7.08 1.802 6.94 0.02 NA NA
# 3 Fc(n) 1000 10.16 2.585 9.94 0.06 NA NA
# 4 Fd(n) 1000 13.68 3.481 13.56 0.00 NA NA
# looks rather even
diff(b$elapsed)
# [1] 3.15 3.08 3.52
Now back to your minusfunction
a <- 1:10^5
b <- benchmark(f0(a), f1(a), f2(a), minus_vec(a), minus(a))
b$elapsed[b$test == 'f2(a)'] - b$elapsed[b$test == 'f1(a)']
# [1] 3.39
2 apply vs for vs Vectorize:
#NavyCheng provided for some good material on the topic. Now my understanding is, the apply family (just like Vectorize) loops in R (whereas if I'm not mistaking the looping for `-` is done in C).
Again, I do not know about the exact details, but if apply/Vectorize use R loops, then, in theory (and often in practice), it is possible to write a proper for loop that will perform as good or better.
3 A Function as fast as f1:
Ad-hoc, the closes I came up was by cheating using the Rcpp package. (cheating since one writes the function in c++ first)
In C++
#include <RcppArmadillo.h>
//[[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector minusCpp(NumericVector x) {
for (int k = 0; k < x.length(); ++k) {
x[k] = -x[k];
}
return x;
}
Now to the bechmarks in R
a <- 1:10^5
b <- benchmark(f0(a), f1(a), f2(a), minus_vec(a), minus(a), minusCpp(a))
b
# test replications elapsed relative user.self sys.self user.child sys.child
# 1 f0(a) 100 9.47 NA 9.22 0.01 NA NA
# 2 f1(a) 100 0.53 NA 0.54 0.00 NA NA
# 3 f2(a) 100 4.23 NA 4.24 0.00 NA NA
# 5 minus(a) 100 0.00 NA 0.00 0.00 NA NA
# 4 minus_vec(a) 100 10.42 NA 10.39 0.02 NA NA
# 6 minusCpp(a) 100 0.05 NA 0.04 0.00 NA NA

Ignore -x[i] and minus(-x[i]), and I summarize the four questions to two:
Why apply family is slower than forloop?
Why Vectorize is slower than apply family?
For the 1st question:
The apply functions are designed to be convenient and clear to read,
not necessarily fast.
and apply family will do more things than forloop,
Also the sapply function first uses as.vector(unlist(...)) to convert anything to a vector, and in the end tries to simplify the answer into a suitable form.
You can't read here and here for more detail.
For for 2rd question, it's because Vectorize is a wrapper of mapply and if you type Vectorize in Rstudio, you'll see the detail code. you can read this for more help.

Related

Multiplication of many matrices in R

I want to multiply several matrices of the same size with an inital vector. In the example below p.state is vector of m elements and tran.mat is list where each member is an m x m matrix.
for (i in 1:length(tran.mat)){
p.state <- p.state %*% tran.mat[[i]]
}
The code above gives the correct answer but can be slow when length(tran.mat) is large. I was wondering if there was a more efficient way of doing this?
Below is an example with a m=3 and length(mat)=10 that can generate this:
p.state <- c(1,0,0)
tran.mat<-lapply(1:10,function(y){apply(matrix(runif(9),3,3),1,function(x){x/sum(x)})})
for (i in 1:length(tran.mat)){
p.state <- p.state %*% tran.mat[[i]]
}
print(p.state)
NB: tran.mat does not have to be a list it is just currently written as one.
Edit after a few comments:
Reduce is useful when m is small. However when m=6 the loop out performed both the above solutions.
library(rbenchmark)
p.state1 <- p.state <- c(1,0,0,0,0,0)
tran.mat<-lapply(1:10000,function(y){t(apply(matrix(runif(36),6,6),1,function(x){x/sum(x)}))})
tst<-do.call(c, list(list(p.state), tran.mat))
benchmark(
'loop' = {
for (i in 1:length(tran.mat)){
p.state <- p.state %*% tran.mat[[i]]
}
},
'reduce' = {
p.state1 %*% Reduce('%*%', tran.mat)
},
'reorder' = {
Reduce(`%*%`,tran.mat,p.state1)
}
)
This results in
test replications elapsed relative user.self sys.self user.child sys.child
1 loop 100 0.87 1.000 0.87 0 NA NA
2 reduce 100 1.41 1.621 1.39 0 NA NA
3 reorder 100 1.00 1.149 1.00 0 NA NA
A faster way is to use Reduce() to do sequential matrix multiplication on the list of matrices.
You can get approximately a 4x speedup that way. Below is an example of your code tested, with 1000 elements in the list instead of 10 to see the performance improvement more easily.
Code
library(rbenchmark)
p.state <- c(1,0,0)
tran.mat<-lapply(1:1000,function(y){apply(matrix(runif(9),3,3),1,function(x){x/sum(x)})})
benchmark(
'loop' = {
for (i in 1:length(tran.mat)){
p.state <- p.state %*% tran.mat[[i]]
}
},
'reduce' = {
p.state %*% Reduce('%*%', tran.mat)
}
)
Output
test replications elapsed relative user.self sys.self user.child sys.child
1 loop 100 0.23 3.833 0.23 0 NA NA
2 reduce 100 0.06 1.000 0.07 0 NA NA
You can see the reduce method is about 3.8 times faster.
I am not sure that this will be any faster but it is shorter:
prod <- Reduce("%*%", L)
all.equal(prod, L[[1]] %*% L[[2]] %*% L[[3]] %*% L[[4]])
## [1] TRUE
Note
We used this test input:
m <- matrix(1:9, 3)
L <- list(m^0, m, m^2, m^3)
I am going to use a function from package Rfast to reduce the execution time of multiplication. Unfortunately, for loop's time can not be reduced.
The function called Rfast::eachcol.apply is a great solution for your purpose. Your multiplication is also the function crossprod but it is slow for our purpose.
Here are some helper functions:
mult.list<-function(x,y){
for (xm in x){
y <- y %*% xm
}
y
}
mult.list2<-function(x,y){
for (xm in x){
y <- Rfast::eachcol.apply(xm,y,oper="*",apply="sum")
}
y
}
Here is an example:
x<-list()
y<-rnomr(1000)
for(i in 1:100){
x[[i]]<-Rfast::matrnorm(1000,1000)
}
microbenchmark::microbenchmark(R=a<-mult.list(x,y),Rfast=b<-mult.list2(x,y),times = 10)
Unit: milliseconds
expr min lq mean median uq max neval
R 410.067525 532.176979 633.3700627 649.155826 699.721086 916.542414 10
Rfast 239.987159 251.266488 352.1951486 276.382339 458.089342 741.340268 10
all.equal(as.numeric(a),as.numeric(b))
[1] TRUE
The argument oper is for the operation on each element and the apply for the operation on each column. In large matrices should be fast. I couldn't test it in my laptop for bigger matrices.

Speed up R loop [duplicate]

This question already has answers here:
Any documentation for optimizing the performance of R? [duplicate]
(4 answers)
Closed 9 years ago.
Speeding up loops in R can easily be done using a function from the apply family. How can I use an apply function in the code below to speed it up? Note that within the loop, at each iteration, one column is permuted and a function is applied to the new data frame (i.e., the initial data frame with one column permuted). I cannot seem to get apply to work because the new data frame has to be built within the loop.
#x <- data.frame(a=1:10,b=11:20,c=21:30) #small example
x <- data.frame(matrix(runif(50*100),nrow=50,ncol=100)) #larger example
y <- rowMeans(x)
start <- Sys.time()
totaldiff <- numeric()
for (i in 1:ncol(x)){
x.after <- x
x.after[,i] <- sample(x[,i])
diff <- abs(y-rowMeans(x.after))
totaldiff[i] <- sum(diff)
}
colnames(x)[which.max(totaldiff)]
Sys.time() - start
After working through this and other replies, the optimization strategies (and approximate speed-up) here seem to be
(30x) Choose an appropriate data representation -- matrix, rather than data.frame
(1.5x) Reduce unnecessary data copies -- difference of columns, rather than of rowMeans
Structure for loops as *apply functions (to emphasize code structure, simplify memory management, and provide type consistency)
(2x) Hoist vector operations outside loops -- abs and sum on columns become abs and colSums on a matrix
for an overall speed-up of about 100x. For this size and complexity of code, the use of the compiler or parallel packages would not be effective.
I put your code into a function
f0 <- function(x) {
y <- rowMeans(x)
totaldiff <- numeric()
for (i in 1:ncol(x)){
x.after <- x
x.after[,i] <- sample(x[,i])
diff <- abs(y-rowMeans(x.after))
totaldiff[i] <- sum(diff)
}
which.max(totaldiff)
}
and here we have
x <- data.frame(matrix(runif(50*100),nrow=50,ncol=100)) #larger example
set.seed(123)
system.time(res0 <- f0(x))
## user system elapsed
## 1.065 0.000 1.066
Your data can be represented as a matrix, and operations on R matrices are faster than on data.frames.
m <- matrix(runif(50*100),nrow=50,ncol=100)
set.seed(123)
system.time(res0.m <- f0(m))
## user system elapsed
## 0.036 0.000 0.037
identical(res0, res0.m)
##[1] TRUE
That's probably the biggest speed-up. But for the specific operation here we don't need to calculate the row means of the updated matrix, just the change in the mean from shuffling one column
f1 <- function(x) {
y <- rowMeans(x)
totaldiff <- numeric()
for (i in 1:ncol(x)){
diff <- abs(sample(x[,i]) - x[,i]) / ncol(x)
totaldiff[i] <- sum(diff)
}
which.max(totaldiff)
}
The for loop doesn't follow the right pattern for filling up the result vector totaldiff (you want to "pre-allocate and fill", so totaldiff <- numeric(ncol(x))) but we can use an sapply and let R worry about that (this memory management is one of the advantages of using the apply family of functions)
f2 <- function(x) {
totaldiff <- sapply(seq_len(ncol(x)), function(i, x) {
sum(abs(sample(x[,i]) - x[,i]) / ncol(x))
}, x)
which.max(totaldiff)
}
set.seed(123); identical(res0, f1(m))
set.seed(123); identical(res0, f2(m))
The timings are
> library(microbenchmark)
> microbenchmark(f0(m), f1(m), f2(m))
Unit: milliseconds
expr min lq median uq max neval
f0(m) 32.45073 33.07804 33.16851 33.26364 33.81924 100
f1(m) 22.20913 23.87784 23.96915 24.06216 24.66042 100
f2(m) 21.02474 22.60745 22.70042 22.80080 23.19030 100
#flodel points out that vapply can be faster (and provides type safety)
f3 <- function(x) {
totaldiff <- vapply(seq_len(ncol(x)), function(i, x) {
sum(abs(sample(x[,i]) - x[,i]) / ncol(x))
}, numeric(1), x)
which.max(totaldiff)
}
and that
f4 <- function(x)
which.max(colSums(abs((apply(x, 2, sample) - x))))
is still faster (ncol(x) is a constant factor, so removed) -- The abs and sum are hoisted outside the sapply, maybe at the expense of additional memory use. The advice in the comments to compile functions is good in general; here are some further timings
> microbenchmark(f0(m), f1(m), f1.c(m), f2(m), f2.c(m), f3(m), f4(m))
Unit: milliseconds
expr min lq median uq max neval
f0(m) 32.35600 32.88326 33.12274 33.25946 34.49003 100
f1(m) 22.21964 23.41500 23.96087 24.06587 24.49663 100
f1.c(m) 20.69856 21.20862 22.20771 22.32653 213.26667 100
f2(m) 20.76128 21.52786 22.66352 22.79101 69.49891 100
f2.c(m) 21.16423 21.57205 22.94157 23.06497 23.35764 100
f3(m) 20.17755 21.41369 21.99292 22.10814 22.36987 100
f4(m) 10.10816 10.47535 10.56790 10.61938 10.83338 100
where the ".c" are compiled versions and
Compilation is particularly helpful in code written with for loops but doesn't do much for vectorized code; this is shown here where's a small but consistent improvement from compiling f1's for loop, but not f2's sapply.
Since you are looking at efficiency/optimization, start by using the rbenchmark package for comparison purposes.
Rewriting your given example as a function (so that it can be replicated and compared)
forFirst <- function(x) {
y <- rowMeans(x)
totaldiff <- numeric()
for (i in 1:ncol(x)){
x.after <- x
x.after[,i] <- sample(x[,i])
diff <- abs(y-rowMeans(x.after))
totaldiff[i] <- sum(diff)
}
colnames(x)[which.max(totaldiff)]
}
Applying some standard optimizations (pre-allocating totaldiff to the right size, eliminating intermediate variables that are only used once) gives
forSecond <- function(x) {
y <- rowMeans(x)
totaldiff <- numeric(ncol(x))
for (i in 1:ncol(x)){
x.after <- x
x.after[,i] <- sample(x[,i])
totaldiff[i] <- sum(abs(y-rowMeans(x.after)))
}
colnames(x)[which.max(totaldiff)]
}
Not much more can be done for this that I can see to improve the algorithm itself in the loop. A better algorithm would be the most help, but since this particular problem is just an example, it is not worth spending that time.
The apply version looks very similar.
applyFirst <- function(x) {
y <- rowMeans(x)
totaldiff <- sapply(seq_len(ncol(x)), function(i) {
x[,i] <- sample(x[,i])
sum(abs(y-rowMeans(x)))
})
colnames(x)[which.max(totaldiff)]
}
Benchmarking them gives:
> library("rbenchmark")
> benchmark(forFirst(x),
+ forSecond(x),
+ applyFirst(x),
+ order = "relative")
test replications elapsed relative user.self sys.self user.child
1 forFirst(x) 100 16.92 1.000 16.88 0.00 NA
2 forSecond(x) 100 17.02 1.006 16.96 0.03 NA
3 applyFirst(x) 100 17.05 1.008 17.02 0.01 NA
sys.child
1 NA
2 NA
3 NA
The differences between these is just noise. In fact, running the benchmark again gives a different ordering:
> benchmark(forFirst(x),
+ forSecond(x),
+ applyFirst(x),
+ order = "relative")
test replications elapsed relative user.self sys.self user.child
3 applyFirst(x) 100 17.05 1.000 17.02 0 NA
2 forSecond(x) 100 17.08 1.002 17.05 0 NA
1 forFirst(x) 100 17.44 1.023 17.41 0 NA
sys.child
3 NA
2 NA
1 NA
So these approaches are the same speed. Any real improvement will come from using a better algorithm than just simple looping and copying to create the intermediate results.
Apply functions do not necessarily speed up loops in R. Sometimes they can even slow them down. There's no reason to believe that turning this into an apply family function will speed it up any appreciable amount.
As an aside, this code seems like a relatively pointless endeavour. It's just going to select a random column. I could get the same result by just doing that in the first place. Perhaps this is nested in a larger loop looking for a distribution?

Benchmark analysis and display results of analysis AND benchmark?

Probably, I just missed a parameter... but, maybe someone can point me to it: How can run analysis in R benchmark it and still store the result back somewhere?. I know R functions can only return one single object, but I could either make use of a list here or paste the benchmark results and store the analysis in the function's return value.
But, is there any way to evaluate benchmark (or system.time) and analysis without running it twice like this?:
require(rbenchmark)
bmark <- function(x){
res <- list()
res[[1]] <- benchmark(x^6)
res[[2]] <- x^6
res
}
EDIT: I am sorry I caused some confusion about what I really want to do. Maybe the use case makes it clearer: I don't have a typical benchmark situation where I want to check whether my custom function is faster than some other function. It's rather that I run the same thing with different data on different machines. I don't need this in a test environment, but in production – I just want to let users of a script know how long it took. If that's an hour or more people can plan their lunch break :) .
Here's an example using two functions. The first one uses plyr and the second uses data.table.
# dummy data
require(plyr)
require(data.table)
set.seed(45)
x1 <- data.frame(x=rnorm(1e6), grp = sample(letters[1:26], 1e6, replace=T))
x1.dt <- data.table(x1, key="grp")
# function that uses plyr
DF.FUN <- function(x) {
ddply(x1, .(grp), summarise, m.x = mean(x))
}
# function that uses data.table
DT.FUN <- function(x) {
x1.dt[, list(m.x=mean(x)),by=grp]
}
require(rbenchmark)
> benchmark( s1 <- DF.FUN(), s2 <- DT.FUN(), order="elapsed", replications=2)
# test replications elapsed relative user.self sys.self user.child sys.child
# 2 s2 <- DT.FUN() 2 0.036 1.000 0.031 0.006 0 0
# 1 s1 <- DF.FUN() 2 0.527 14.639 0.363 0.163 0 0
Now, s1 and s2 contain the results from each function, and the benchmarked results will be displayed on screen.
# > head(s1)
# grp m.x
# 1 a 0.0069312201
# 2 b -0.0002422315
# 3 c -0.0129449586
# 4 d -0.0036275338
# 5 e 0.0013438022
# 6 f -0.0015428427
# > head(s2)
# grp m.x
# 1: a 0.0069312201
# 2: b -0.0002422315
# 3: c -0.0129449586
# 4: d -0.0036275338
# 5: e 0.0013438022
# 6: f -0.0015428427
Is this what you were after?
I read the question a bit differently than Arun. This would be the answer to what I thought was being asked:
> bres <- bmark(2)
> bres
[[1]]
test replications elapsed relative user.self sys.self user.child sys.child
1 x^6 100 0.001 1 0.001 0.001 0 0
[[2]]
[1] 64
The bmark function is returning a result with the default 100 replications. It you wanted to annotate the results you could use paste() and if you wanted to add a parameter for number of reps:
bmark2 <- function(x, reps=100){
res <- list()
res[[1]] <- benchmark(x^6, replications=reps)
res[[2]] <- paste(reps, " replications of ", x, "to the 6th in", res[[1]]$elapsed)
res
}
I am unsure of what StackOverflow thinks about answering old questions, but it seems like nobody actually answered after your edit. So here goes:
To time a process in R you can use two methods.
The first one uses system.time(expression) and gives you how much time it took to evaluate the expression within the brackets.
If this is not practical in your case you can get system time with Sys.time() before the operation and after the operation and subtract the two.
If this finally answers your question please accept the solution :)

Memoize and vectorize a custom function

I want to know how to vectorize and memoize a custom function in R. It seems
my way of thinking is not aligned with R's way of operation. So, I gladly
welcome any links to good reading material. For example, R inferno is a nice
resource, but it didn't help to figure out memoization in R.
More generally, can you provide a relevant usage example for the memoise
or R.cache packages?
I haven't been able to find any other discussions on this subject. Searching
for "memoise" or "memoize" on r-bloggers.com returns zero results. Searching
for those keywords at http://r-project.markmail.org/ does not return helpful
discussions. I emailed the mailing list and did not receive a complete
answer.
I am not solely interested in memoizing the GC function, and I am aware of
Bioconductor and the various packages
available there.
Here's my data:
seqs <- c("","G","C","CCC","T","","TTCCT","","C","CTC")
Some sequences are missing, so they're blank "".
I have a function for calculating GC content:
> GC <- function(s) {
if (!is.character(s)) return(NA)
n <- nchar(s)
if (n == 0) return(NA)
m <- gregexpr('[GCSgcs]', s)[[1]]
if (m[1] < 1) return(0)
return(100.0 * length(m) / n)
}
It works:
> GC('')
[1] NA
> GC('G')
[1] 100
> GC('GAG')
[1] 66.66667
> sapply(seqs, GC)
G C CCC T TTCCT
NA 100.00000 100.00000 100.00000 0.00000 NA 40.00000 NA
C CTC
100.00000 66.66667
I want to memoize it. Then, I want to vectorize it.
Apparently, I must have the wrong mindset for using the memoise or
R.cache R packages:
> system.time(dummy <- sapply(rep(seqs,100), GC))
user system elapsed
0.044 0.000 0.054
>
> library(memoise)
> GCm1 <- memoise(GC)
> system.time(dummy <- sapply(rep(seqs,100), GCm1))
user system elapsed
0.164 0.000 0.173
>
> library(R.cache)
> GCm2 <- addMemoization(GC)
> system.time(dummy <- sapply(rep(seqs,100), GCm2))
user system elapsed
10.601 0.252 10.926
Notice that the memoized functions are several orders of magnitude slower.
I tried the hash package, but things seem to be happening behind the
scenes and I don't understand the output. The sequence C should have a
value of 100, not NULL.
Note that using has.key(s, cache) instead of exists(s, cache) results
in the same output. Also, using cache[s] <<- result instead of
cache[[s]] <<- result results in the same output.
> cache <- hash()
> GCc <- function(s) {
if (!is.character(s) || nchar(s) == 0) {
return(NA)
}
if(exists(s, cache)) {
return(cache[[s]])
}
result <- GC(s)
cache[[s]] <<- result
return(result)
}
> sapply(seqs,GCc)
[[1]]
[1] NA
$G
[1] 100
$C
NULL
$CCC
[1] 100
$T
NULL
[[6]]
[1] NA
$TTCCT
[1] 40
[[8]]
[1] NA
$C
NULL
$CTC
[1] 66.66667
At least I figured out how to vectorize:
> GCv <- Vectorize(GC)
> GCv(seqs)
G C CCC T TTCCT
NA 100.00000 100.00000 100.00000 0.00000 NA 40.00000 NA
C CTC
100.00000 66.66667
Relevant stackoverflow posts:
Options for caching / memoization / hashing in R
While this won't give you memoization across calls, you can use factors to make individual calls a lot faster if there is a fair bit of repetition. Eg using Joshua's GC2 (though I had to remove fixed=T to get it to work):
GC2 <- function(s) {
if(!is.character(s)) stop("'s' must be character")
n <- nchar(s)
m <- gregexpr('[GCSgcs]', s)
len <- sapply(m, length)
neg <- sapply(m, "[[", 1)
len <- len*(neg > 0)
100.0 * len/n
}
One can easily define a wrapper like:
GC3 <- function(s) {
x <- factor(s)
GC2(levels(x))[x]
}
system.time(GC2(rep(seqs, 50000)))
# user system elapsed
# 8.97 0.00 8.99
system.time(GC3(rep(seqs, 50000)))
# user system elapsed
# 0.06 0.00 0.06
This doesn't explicitly answer your question, but this function is ~4 times faster than yours.
GC2 <- function(s) {
if(!is.character(s)) stop("'s' must be character")
n <- nchar(s)
m <- gregexpr('[GCSgcs]', s)
len <- sapply(m, length)
neg <- sapply(m, "[[", 1)
len <- len*(neg > 0)
len/n
}

Vectorize a product calculation which depends on previous elements?

I'm trying to speed up/vectorize some calculations in a time series.
Can I vectorize a calculation in a for loop which can depend on results from an earlier iteration? For example:
z <- c(1,1,0,0,0,0)
zi <- 2:6
for (i in zi) {z[i] <- ifelse (z[i-1]== 1, 1, 0) }
uses the z[i] values updated in earlier steps:
> z
[1] 1 1 1 1 1 1
In my effort at vectorizing this
z <- c(1,1,0,0,0,0)
z[zi] <- ifelse( z[zi-1] == 1, 1, 0)
the element-by-element operations don't use results updated in the operation:
> z
[1] 1 1 1 0 0 0
So this vectorized operation operates in 'parallel' rather than iterative fashion. Is there a way I can write/vectorize this to get the results of the for loop?
ifelse is vectorized and there's a bit of a penalty if you're using it on one element at a time in a for-loop. In your example, you can get a pretty good speedup by using if instead of ifelse.
fun1 <- function(z) {
for(i in 2:NROW(z)) {
z[i] <- ifelse(z[i-1]==1, 1, 0)
}
z
}
fun2 <- function(z) {
for(i in 2:NROW(z)) {
z[i] <- if(z[i-1]==1) 1 else 0
}
z
}
z <- c(1,1,0,0,0,0)
identical(fun1(z),fun2(z))
# [1] TRUE
system.time(replicate(10000, fun1(z)))
# user system elapsed
# 1.13 0.00 1.32
system.time(replicate(10000, fun2(z)))
# user system elapsed
# 0.27 0.00 0.26
You can get some additional speed gains out of fun2 by compiling it.
library(compiler)
cfun2 <- cmpfun(fun2)
system.time(replicate(10000, cfun2(z)))
# user system elapsed
# 0.11 0.00 0.11
So there's a 10x speedup without vectorization. As others have said (and some have illustrated) there are ways to vectorize your example, but that may not translate to your actual problem. Hopefully this is general enough to be applicable.
The filter function may be useful to you as well if you can figure out how to express your problem in terms of a autoregressive or moving average process.
This is a nice and simple example where Rcpp can shine.
So let us first recast functions 1 and 2 and their compiled counterparts:
library(inline)
library(rbenchmark)
library(compiler)
fun1 <- function(z) {
for(i in 2:NROW(z)) {
z[i] <- ifelse(z[i-1]==1, 1, 0)
}
z
}
fun1c <- cmpfun(fun1)
fun2 <- function(z) {
for(i in 2:NROW(z)) {
z[i] <- if(z[i-1]==1) 1 else 0
}
z
}
fun2c <- cmpfun(fun2)
We write a Rcpp variant very easily:
funRcpp <- cxxfunction(signature(zs="numeric"), plugin="Rcpp", body="
Rcpp::NumericVector z = Rcpp::NumericVector(zs);
int n = z.size();
for (int i=1; i<n; i++) {
z[i] = (z[i-1]==1.0 ? 1.0 : 0.0);
}
return(z);
")
This uses the inline package to compile, load and link the five-liner on the fly.
Now we can define our test-date, which we make a little longer than the original (as just running the original too few times result in unmeasurable times):
R> z <- rep(c(1,1,0,0,0,0), 100)
R> identical(fun1(z),fun2(z),fun1c(z),fun2c(z),funRcpp(z))
[1] TRUE
R>
All answers are seen as identical.
Finally, we can benchmark:
R> res <- benchmark(fun1(z), fun2(z),
+ fun1c(z), fun2c(z),
+ funRcpp(z),
+ columns=c("test", "replications", "elapsed",
+ "relative", "user.self", "sys.self"),
+ order="relative",
+ replications=1000)
R> print(res)
test replications elapsed relative user.self sys.self
5 funRcpp(z) 1000 0.005 1.0 0.01 0
4 fun2c(z) 1000 0.466 93.2 0.46 0
2 fun2(z) 1000 1.918 383.6 1.92 0
3 fun1c(z) 1000 10.865 2173.0 10.86 0
1 fun1(z) 1000 12.480 2496.0 12.47 0
The compiled version wins by a factor of almost 400 against the best R version, and almost 100 against its byte-compiled variant. For function 1, the byte compilation matters much less and both variants trail the C++ by a factor of well over two-thousand.
It took about one minute to write the C++ version. The speed gain suggests it was a minute well spent.
For comparison, here is the result for the original short vector called more often:
R> z <- c(1,1,0,0,0,0)
R> res2 <- benchmark(fun1(z), fun2(z),
+ fun1c(z), fun2c(z),
+ funRcpp(z),
+ columns=c("test", "replications",
+ "elapsed", "relative", "user.self", "sys.self"),
+ order="relative",
+ replications=10000)
R> print(res2)
test replications elapsed relative user.self sys.self
5 funRcpp(z) 10000 0.046 1.000000 0.04 0
4 fun2c(z) 10000 0.132 2.869565 0.13 0
2 fun2(z) 10000 0.271 5.891304 0.27 0
3 fun1c(z) 10000 1.045 22.717391 1.05 0
1 fun1(z) 10000 1.202 26.130435 1.20 0
The qualitative ranking is unchanged: the Rcpp version dominates, function2 is second-best. with the byte-compiled version being about twice as fast that the plain R variant, but still almost three times slower than the C++ version. And the relative difference are lower: relatively speaking, the function call overhead matters less and the actual looping matters more: C++ gets a bigger advantage on the actual loop operations in the longer vectors. That it is an important result as it suggests that more real-life sized data, the compiled version may reap a larger benefit.
Edited to correct two small oversights in the code examples. And edited again with thanks to Josh to catch a setup error relative to fun2c.
I think this is cheating and not generalizable, but: according to the rules you have above, any occurrence of 1 in the vector will make all subsequent elements 1 (by recursion: z[i] is 1 set to 1 if z[i-1] equals 1; therefore z[i] will be set to 1 if z[i-2] equals 1; and so forth). Depending on what you really want to do, there may be such a recursive solution available if you think carefully about it ...
z <- c(1,1,0,0,0,0)
first1 <- min(which(z==1))
z[seq_along(z)>first1] <- 1
edit: this is wrong, but I'm leaving it up to admit my mistakes. Based on a little bit of playing (and less thinking), I think the actual solution to this recursion is more symmetric and even simpler:
rep(z[1],length(z))
Test cases:
z <- c(1,1,0,0,0,0)
z <- c(0,1,1,0,0,0)
z <- c(0,0,1,0,0,0)
Check out the rollapply function in zoo.
I'm not super familiar with it, but I think this does what you want:
> c( 1, rollapply(z,2,function(x) x[1]) )
[1] 1 1 1 1 1 1
I'm sort of kludging it by using a window of 2 and then only using the first element of that window.
For more complicated examples you could perform some calculation on x[1] and return that instead.
Sometimes you just need to think about it totally differently. What you're doing is creating a vector where every item is the same as the first if it's a 1 or 0 otherwise.
z <- c(1,1,0,0,0,0)
if (z[1] != 1) z[1] <- 0
z[2:length(z)] <- z[1]
There is a function that does this particular calculation: cumprod (cumulative product)
> cumprod(z[zi])
[1] 1 0 0 0 0
> cumprod(c(1,2,3,4,0,5))
[1] 1 2 6 24 0 0
Otherwise, vectorize with Rccp as other answers have shown.
It's also possible to do this with "apply" using the original vector and a lagged version of the vector as the constituent columns of a data frame.

Resources