Multiplication of many matrices in R - r

I want to multiply several matrices of the same size with an inital vector. In the example below p.state is vector of m elements and tran.mat is list where each member is an m x m matrix.
for (i in 1:length(tran.mat)){
p.state <- p.state %*% tran.mat[[i]]
}
The code above gives the correct answer but can be slow when length(tran.mat) is large. I was wondering if there was a more efficient way of doing this?
Below is an example with a m=3 and length(mat)=10 that can generate this:
p.state <- c(1,0,0)
tran.mat<-lapply(1:10,function(y){apply(matrix(runif(9),3,3),1,function(x){x/sum(x)})})
for (i in 1:length(tran.mat)){
p.state <- p.state %*% tran.mat[[i]]
}
print(p.state)
NB: tran.mat does not have to be a list it is just currently written as one.
Edit after a few comments:
Reduce is useful when m is small. However when m=6 the loop out performed both the above solutions.
library(rbenchmark)
p.state1 <- p.state <- c(1,0,0,0,0,0)
tran.mat<-lapply(1:10000,function(y){t(apply(matrix(runif(36),6,6),1,function(x){x/sum(x)}))})
tst<-do.call(c, list(list(p.state), tran.mat))
benchmark(
'loop' = {
for (i in 1:length(tran.mat)){
p.state <- p.state %*% tran.mat[[i]]
}
},
'reduce' = {
p.state1 %*% Reduce('%*%', tran.mat)
},
'reorder' = {
Reduce(`%*%`,tran.mat,p.state1)
}
)
This results in
test replications elapsed relative user.self sys.self user.child sys.child
1 loop 100 0.87 1.000 0.87 0 NA NA
2 reduce 100 1.41 1.621 1.39 0 NA NA
3 reorder 100 1.00 1.149 1.00 0 NA NA

A faster way is to use Reduce() to do sequential matrix multiplication on the list of matrices.
You can get approximately a 4x speedup that way. Below is an example of your code tested, with 1000 elements in the list instead of 10 to see the performance improvement more easily.
Code
library(rbenchmark)
p.state <- c(1,0,0)
tran.mat<-lapply(1:1000,function(y){apply(matrix(runif(9),3,3),1,function(x){x/sum(x)})})
benchmark(
'loop' = {
for (i in 1:length(tran.mat)){
p.state <- p.state %*% tran.mat[[i]]
}
},
'reduce' = {
p.state %*% Reduce('%*%', tran.mat)
}
)
Output
test replications elapsed relative user.self sys.self user.child sys.child
1 loop 100 0.23 3.833 0.23 0 NA NA
2 reduce 100 0.06 1.000 0.07 0 NA NA
You can see the reduce method is about 3.8 times faster.

I am not sure that this will be any faster but it is shorter:
prod <- Reduce("%*%", L)
all.equal(prod, L[[1]] %*% L[[2]] %*% L[[3]] %*% L[[4]])
## [1] TRUE
Note
We used this test input:
m <- matrix(1:9, 3)
L <- list(m^0, m, m^2, m^3)

I am going to use a function from package Rfast to reduce the execution time of multiplication. Unfortunately, for loop's time can not be reduced.
The function called Rfast::eachcol.apply is a great solution for your purpose. Your multiplication is also the function crossprod but it is slow for our purpose.
Here are some helper functions:
mult.list<-function(x,y){
for (xm in x){
y <- y %*% xm
}
y
}
mult.list2<-function(x,y){
for (xm in x){
y <- Rfast::eachcol.apply(xm,y,oper="*",apply="sum")
}
y
}
Here is an example:
x<-list()
y<-rnomr(1000)
for(i in 1:100){
x[[i]]<-Rfast::matrnorm(1000,1000)
}
microbenchmark::microbenchmark(R=a<-mult.list(x,y),Rfast=b<-mult.list2(x,y),times = 10)
Unit: milliseconds
expr min lq mean median uq max neval
R 410.067525 532.176979 633.3700627 649.155826 699.721086 916.542414 10
Rfast 239.987159 251.266488 352.1951486 276.382339 458.089342 741.340268 10
all.equal(as.numeric(a),as.numeric(b))
[1] TRUE
The argument oper is for the operation on each element and the apply for the operation on each column. In large matrices should be fast. I couldn't test it in my laptop for bigger matrices.

Related

Are "self-contained" functions more efficient in R?

I'm writing a function that needs to call a function g passed as a parameter to each element of a list, iteratively.
I'm wondering how to make this the fastest possible. I can achieve an acceptable speed using Rcpp and specific kind of g (writing everything in Cpp), but I can't figure out if I can reach similar speed passing an R function as argument.
Was doing some tests to figure out why R is slower and found some really unexpected results:
minus <- function(x) -x
minus_vec <- Vectorize(minus, "x")
Testing with some simple functions to invert signs.
f0 <- function(x) {
sapply(x, minus)
}
f1 <- function(x) {
for(i in seq_along(x)){
x[i] <- -x[i]
}
x
}
f2 <- function(x) {
for(i in seq_along(x)){
x[i] <- minus(x[i])
}
x
}
I got the following results:
a <- 1:10^5
library(rbenchmark)
benchmark(f0(a), f1(a), f2(a), minus_vec(a), minus(a))[,c(1,4)]
test relative
1 f0(a) 454.842
2 f1(a) 25.579
3 f2(a) 178.211
4 minus_vec(a) 523.789
5 minus(a) 1.000
I would like some explanation on the following points:
Why don't f1 and f2 have the same speed? Writing the piece of code -x[i] and calling the function minus(x[i]) really should be so different when they do the exact same thing?
Why is f0 slower than f2? I always thought apply functions were more efficient than for loops, but never really understood why and now I even found a counter-example.
Can I make a function as fast as f1 using the function minus ?
Why does vectorizing minus (unnecessary since - is already vectorized, but that might not be the case always) made it so bad?
Not a full answer, but here are a few notes
1 minus(x) vs -x: Doing nothing is better than doing something
Your function minus calls `-`, so the added step adds computation time. I honestly do not know the who's, what's and when's specifically, in other words I wouldn't know how much more computation time ought to be expected.
Here is an example highlighting it: we have four functions, all squaring numbers
fa <- function (n) n^2
fb <- function (n) fa(n)
fc <- function (n) fb(n)
fd <- function (n) fc(n)
Fa <- function (n) {
for (i in seq_along(n)) n[i] <- fa(i)
n
}
Fb <- function (n) {
for (i in seq_along(n)) n[i] <- fb(i)
n
}
Fc <- function (n) {
for (i in seq_along(n)) n[i] <- fc(i)
n
}
Fd <- function (n) {
for (i in seq_along(n)) n[i] <- fd(i)
n
}
And here are the benchmarking results
n <- 1:10^4
b <- benchmark(Fa(n),Fb(n),Fc(n),Fd(n), replications = 1000L)
b
# test replications elapsed relative user.self sys.self user.child sys.child
# 1 Fa(n) 1000 3.93 1.000 3.85 0.00 NA NA
# 2 Fb(n) 1000 7.08 1.802 6.94 0.02 NA NA
# 3 Fc(n) 1000 10.16 2.585 9.94 0.06 NA NA
# 4 Fd(n) 1000 13.68 3.481 13.56 0.00 NA NA
# looks rather even
diff(b$elapsed)
# [1] 3.15 3.08 3.52
Now back to your minusfunction
a <- 1:10^5
b <- benchmark(f0(a), f1(a), f2(a), minus_vec(a), minus(a))
b$elapsed[b$test == 'f2(a)'] - b$elapsed[b$test == 'f1(a)']
# [1] 3.39
2 apply vs for vs Vectorize:
#NavyCheng provided for some good material on the topic. Now my understanding is, the apply family (just like Vectorize) loops in R (whereas if I'm not mistaking the looping for `-` is done in C).
Again, I do not know about the exact details, but if apply/Vectorize use R loops, then, in theory (and often in practice), it is possible to write a proper for loop that will perform as good or better.
3 A Function as fast as f1:
Ad-hoc, the closes I came up was by cheating using the Rcpp package. (cheating since one writes the function in c++ first)
In C++
#include <RcppArmadillo.h>
//[[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector minusCpp(NumericVector x) {
for (int k = 0; k < x.length(); ++k) {
x[k] = -x[k];
}
return x;
}
Now to the bechmarks in R
a <- 1:10^5
b <- benchmark(f0(a), f1(a), f2(a), minus_vec(a), minus(a), minusCpp(a))
b
# test replications elapsed relative user.self sys.self user.child sys.child
# 1 f0(a) 100 9.47 NA 9.22 0.01 NA NA
# 2 f1(a) 100 0.53 NA 0.54 0.00 NA NA
# 3 f2(a) 100 4.23 NA 4.24 0.00 NA NA
# 5 minus(a) 100 0.00 NA 0.00 0.00 NA NA
# 4 minus_vec(a) 100 10.42 NA 10.39 0.02 NA NA
# 6 minusCpp(a) 100 0.05 NA 0.04 0.00 NA NA
Ignore -x[i] and minus(-x[i]), and I summarize the four questions to two:
Why apply family is slower than forloop?
Why Vectorize is slower than apply family?
For the 1st question:
The apply functions are designed to be convenient and clear to read,
not necessarily fast.
and apply family will do more things than forloop,
Also the sapply function first uses as.vector(unlist(...)) to convert anything to a vector, and in the end tries to simplify the answer into a suitable form.
You can't read here and here for more detail.
For for 2rd question, it's because Vectorize is a wrapper of mapply and if you type Vectorize in Rstudio, you'll see the detail code. you can read this for more help.

Is it possible to speed up my function for creating a correlation matrix?

I have written the following function to estimate the pairwise correlations of multinomial variables using so-called Cramér's V. I use the vcd package for this purpose, but to my knowledge there is no existing function that would create a symmetrical correlation matrix of V from a matrix or data.frame similar to cor.
The function is:
require(vcd)
get.V<-function(y){
col.y<-ncol(y)
V<-matrix(ncol=col.y,nrow=col.y)
for(i in 1:col.y){
for(j in 1:col.y){
V[i,j]<-assocstats(table(y[,i],y[,j]))$cramer
}
}
return(V)
}
However, for large numbers of variables it gets relatively slow.
no.var<-5
y<-matrix(ncol=no.var,sample(1:5,100*no.var,TRUE))
get.V(y)
As you increase no.var computing time may explode. Since I need to apply this to a data.frame of lengths 100 and higher, my question is, whether it is possible to 'speed up' my function by more elegant programming, maybe. Thank you.
As well as the reducing the number of tests performed, or otherwise
optimising the running of the whole function, we might be able to make
assocstats faster. We'll start by establishing a test case to make
sure we don't accidentally make a faster function that's incorrect.
x <- vcd::Arthritis$Improved
y <- vcd::Arthritis$Treatment
correct <- vcd::assocstats(table(x, y))$cramer
correct
## [1] 0.3942
is_ok <- function(x) stopifnot(all.equal(x, correct))
We'll start by making a version of assocstats that's very close to the
original.
cramer1 <- function (x, y) {
mat <- table(x, y)
tab <- summary(MASS::loglm(~1 + 2, mat))$tests
phi <- sqrt(tab[2, 1] / sum(mat))
cont <- sqrt(phi ^ 2 / (1 + phi ^ 2))
sqrt(phi ^ 2 / min(dim(mat) - 1))
}
is_ok(cramer1(x, y))
The slowest operation here is going to be loglm, so before we try
making that faster, it's worth looking for an alternative approach. A
little googling finds a useful blog
post.
Let's also try that:
cramer2 <- function(x, y) {
chi <- chisq.test(x, y, correct=FALSE)$statistic[[1]]
ulength_x <- length(unique(x))
ulength_y <- length(unique(y))
sqrt(chi / (length(x) * (min(ulength_x, ulength_y) - 1)))
}
is_ok(cramer2(x, y))
How does the performance stack up:
library(microbenchmark)
microbenchmark(
cramer1(x, y),
cramer2(x, y)
)
## Unit: microseconds
## expr min lq median uq max neval
## cramer1(x, y) 1080.0 1149.3 1182.0 1222.1 2598 100
## cramer2(x, y) 800.7 850.6 881.9 934.6 1866 100
cramer2() is faster. chisq.test() is likely to be the bottleneck, so
lets see if we can make that function faster by doing less:
chisq.test() does a lot more than compute the test-statistic, so it's
likely that we can make it faster. A few minutes careful work reduces
the function to:
chisq_test <- function (x, y) {
O <- table(x, y)
n <- sum(O)
E <- outer(rowSums(O), colSums(O), "*")/n
sum((abs(O - E))^2 / E)
}
We can then create a new cramer3() that uses chisq.test().
cramer3 <- function(x, y) {
chi <- chisq_test(x, y)
ulength_x <- length(unique(x))
ulength_y <- length(unique(y))
sqrt(chi / (length(x) * (min(ulength_x, ulength_y) - 1)))
}
is_ok(cramer3(x, y))
microbenchmark(
cramer1(x, y),
cramer2(x, y),
cramer3(x, y)
)
## Unit: microseconds
## expr min lq median uq max neval
## cramer1(x, y) 1088.6 1138.9 1169.6 1221.5 2534 100
## cramer2(x, y) 796.1 840.6 865.0 906.6 1893 100
## cramer3(x, y) 334.6 358.7 373.5 390.4 1409 100
And now that we have our own simple version of chisq.test() we could
eek out a little more speed by using the results of table() to figure
out the number of unique elements in x and y:
cramer4 <- function(x, y) {
O <- table(x, y)
n <- length(x)
E <- outer(rowSums(O), colSums(O), "*")/n
chi <- sum((abs(O - E))^2 / E)
sqrt(chi / (length(x) * (min(dim(O)) - 1)))
}
is_ok(cramer4(x, y))
microbenchmark(
cramer1(x, y),
cramer2(x, y),
cramer3(x, y),
cramer4(x, y)
)
## Unit: microseconds
## expr min lq median uq max neval
## cramer1(x, y) 1097.6 1145.8 1183.3 1233.3 2318 100
## cramer2(x, y) 800.7 840.5 860.7 895.5 2079 100
## cramer3(x, y) 334.4 353.1 365.7 384.1 1654 100
## cramer4(x, y) 248.0 263.3 273.2 283.5 1342 100
Not bad - we've made it 4 times faster just using R code. From here, you
could try to get even more speed by:
Using tcrossprod() instead of outer()
Making a faster version of table() for this special (2d) case
Using Rcpp to compute the test-statistic from the tabular data
You are best off using the vectorized version of outer like Tyler suggested. You can still get a performance boost by writing a function to calculate just the Cramer's V. The assocstats function uses summary on the table and that calculates a lot of statistics you don't want. If you reply the call to assocstats to a a user defined function along the lines of
cv <- function(x, y) {
t <- table(x, y)
chi <- chisq.test(t)$statistic
cramer <- sqrt(chi / (NROW(x) * (min(dim(t)) - 1)))
cramer
}
This new function, by calculating only Cramer's V, runs in about 40% of the time required for assocstats. You could potentially speed it up again my reducing the chisq.test to something that only calculates the chi square test statistic.
Even if you just adjust your loop index values to realize you have a symmetric matrix with 1 on the diagonals and use this cv function instead of assocstats you are looking at easily a 5 fold increase in performance.
Edit: As requested, the full code I've been using to get a 4x speed up is
cv <- function(x, y) {
t <- table(x, y)
chi <- suppressWarnings(chisq.test(t))$statistic
cramer <- sqrt(chi / (NROW(x) * (min(dim(t)) - 1)))
cramer
}
get.V3<-function(y, fill = TRUE){
col.y<-ncol(y)
V<-matrix(ncol=col.y,nrow=col.y)
for(i in 1:(col.y - 1)){
for(j in (i + 1):col.y){
V[i,j]<-cv(y[,i],y[,j])
}
}
diag(V) <- 1
if (fill) {
for (i in 1:ncol(V)) {
V[, i] <- V[i, ]
}
}
V
}
It looks to be very similar to what Hadley suggests below, although his version of the function to get Cramer's V uses correct = FALSE in chisq.test. If all the tables are larger than 2x2, the setting on correct doesn't matter. For 2x2 tables, the results will vary depending on the argument. It is probably best to follow his example and set it to correct = FALSE so that everything is calculated the same regardless of the table size.
You could reduce the calculation time by calculate only one half of your matrix:
get.V2 <-function(y){
cb <- combn(1:ncol(y), 2, function(i)assocstats(table(y[, i[1]], y[, i[2]]))$cramer)
m <- matrix(0, ncol(y), ncol(y))
m[lower.tri(m)] <- cb
diag(m) <- 1
## copy the lower.tri to upper.tri, suggested by #iacobus
for (i in 1:nrow(m)) {
m[i, ] <- m[, i]
}
return(m)
}
EDIT: added #iacobus suggestion to populate the upper.tri of the matrix and added a little benchmark:
library("vcd")
library("qdapTools")
library("rbenchmark")
## suggested by #TylerRinker
get.V3 <- function(y)v_outer(y, function(i, j)assocstats(table(i, j))$cramer)
set.seed(1)
no.var<-10
y<-matrix(ncol=no.var,sample(1:5,100*no.var,TRUE))
benchmark(get.V(y), get.V2(y), get.V3(y), replications=10, order="relative")
# test replications elapsed relative user.self sys.self user.child sys.child
#2 get.V2(y) 10 0.992 1.000 0.988 0.000 0 0
#1 get.V(y) 10 2.239 2.257 2.232 0.004 0 0
#3 get.V3(y) 10 2.495 2.515 2.484 0.004 0 0
This uses a vectorized version of outer:
library(qdapTools)
y <- matrix(ncol=no.var,sample(1:5,100*no.var,TRUE))
get.V2<-function(x, y){
assocstats(table(x, y))$cramer
}
v_outer(y, get.V2)
## > v_outer(y, get.V2)
## V1 V2 V3 V4 V5
## V1 1.000 0.224 0.158 0.195 0.217
## V2 0.224 1.000 0.175 0.163 0.240
## V3 0.158 0.175 1.000 0.208 0.145
## V4 0.195 0.163 0.208 1.000 0.189
## V5 0.217 0.240 0.145 0.189 1.000
Edit
On 1000 variables these are the system times:
Tyler: Time difference of 38.79437 mins
sgibb: Time difference of 19.54342 mins
Clearly sgibb's approach is superior.

Speed up R loop [duplicate]

This question already has answers here:
Any documentation for optimizing the performance of R? [duplicate]
(4 answers)
Closed 9 years ago.
Speeding up loops in R can easily be done using a function from the apply family. How can I use an apply function in the code below to speed it up? Note that within the loop, at each iteration, one column is permuted and a function is applied to the new data frame (i.e., the initial data frame with one column permuted). I cannot seem to get apply to work because the new data frame has to be built within the loop.
#x <- data.frame(a=1:10,b=11:20,c=21:30) #small example
x <- data.frame(matrix(runif(50*100),nrow=50,ncol=100)) #larger example
y <- rowMeans(x)
start <- Sys.time()
totaldiff <- numeric()
for (i in 1:ncol(x)){
x.after <- x
x.after[,i] <- sample(x[,i])
diff <- abs(y-rowMeans(x.after))
totaldiff[i] <- sum(diff)
}
colnames(x)[which.max(totaldiff)]
Sys.time() - start
After working through this and other replies, the optimization strategies (and approximate speed-up) here seem to be
(30x) Choose an appropriate data representation -- matrix, rather than data.frame
(1.5x) Reduce unnecessary data copies -- difference of columns, rather than of rowMeans
Structure for loops as *apply functions (to emphasize code structure, simplify memory management, and provide type consistency)
(2x) Hoist vector operations outside loops -- abs and sum on columns become abs and colSums on a matrix
for an overall speed-up of about 100x. For this size and complexity of code, the use of the compiler or parallel packages would not be effective.
I put your code into a function
f0 <- function(x) {
y <- rowMeans(x)
totaldiff <- numeric()
for (i in 1:ncol(x)){
x.after <- x
x.after[,i] <- sample(x[,i])
diff <- abs(y-rowMeans(x.after))
totaldiff[i] <- sum(diff)
}
which.max(totaldiff)
}
and here we have
x <- data.frame(matrix(runif(50*100),nrow=50,ncol=100)) #larger example
set.seed(123)
system.time(res0 <- f0(x))
## user system elapsed
## 1.065 0.000 1.066
Your data can be represented as a matrix, and operations on R matrices are faster than on data.frames.
m <- matrix(runif(50*100),nrow=50,ncol=100)
set.seed(123)
system.time(res0.m <- f0(m))
## user system elapsed
## 0.036 0.000 0.037
identical(res0, res0.m)
##[1] TRUE
That's probably the biggest speed-up. But for the specific operation here we don't need to calculate the row means of the updated matrix, just the change in the mean from shuffling one column
f1 <- function(x) {
y <- rowMeans(x)
totaldiff <- numeric()
for (i in 1:ncol(x)){
diff <- abs(sample(x[,i]) - x[,i]) / ncol(x)
totaldiff[i] <- sum(diff)
}
which.max(totaldiff)
}
The for loop doesn't follow the right pattern for filling up the result vector totaldiff (you want to "pre-allocate and fill", so totaldiff <- numeric(ncol(x))) but we can use an sapply and let R worry about that (this memory management is one of the advantages of using the apply family of functions)
f2 <- function(x) {
totaldiff <- sapply(seq_len(ncol(x)), function(i, x) {
sum(abs(sample(x[,i]) - x[,i]) / ncol(x))
}, x)
which.max(totaldiff)
}
set.seed(123); identical(res0, f1(m))
set.seed(123); identical(res0, f2(m))
The timings are
> library(microbenchmark)
> microbenchmark(f0(m), f1(m), f2(m))
Unit: milliseconds
expr min lq median uq max neval
f0(m) 32.45073 33.07804 33.16851 33.26364 33.81924 100
f1(m) 22.20913 23.87784 23.96915 24.06216 24.66042 100
f2(m) 21.02474 22.60745 22.70042 22.80080 23.19030 100
#flodel points out that vapply can be faster (and provides type safety)
f3 <- function(x) {
totaldiff <- vapply(seq_len(ncol(x)), function(i, x) {
sum(abs(sample(x[,i]) - x[,i]) / ncol(x))
}, numeric(1), x)
which.max(totaldiff)
}
and that
f4 <- function(x)
which.max(colSums(abs((apply(x, 2, sample) - x))))
is still faster (ncol(x) is a constant factor, so removed) -- The abs and sum are hoisted outside the sapply, maybe at the expense of additional memory use. The advice in the comments to compile functions is good in general; here are some further timings
> microbenchmark(f0(m), f1(m), f1.c(m), f2(m), f2.c(m), f3(m), f4(m))
Unit: milliseconds
expr min lq median uq max neval
f0(m) 32.35600 32.88326 33.12274 33.25946 34.49003 100
f1(m) 22.21964 23.41500 23.96087 24.06587 24.49663 100
f1.c(m) 20.69856 21.20862 22.20771 22.32653 213.26667 100
f2(m) 20.76128 21.52786 22.66352 22.79101 69.49891 100
f2.c(m) 21.16423 21.57205 22.94157 23.06497 23.35764 100
f3(m) 20.17755 21.41369 21.99292 22.10814 22.36987 100
f4(m) 10.10816 10.47535 10.56790 10.61938 10.83338 100
where the ".c" are compiled versions and
Compilation is particularly helpful in code written with for loops but doesn't do much for vectorized code; this is shown here where's a small but consistent improvement from compiling f1's for loop, but not f2's sapply.
Since you are looking at efficiency/optimization, start by using the rbenchmark package for comparison purposes.
Rewriting your given example as a function (so that it can be replicated and compared)
forFirst <- function(x) {
y <- rowMeans(x)
totaldiff <- numeric()
for (i in 1:ncol(x)){
x.after <- x
x.after[,i] <- sample(x[,i])
diff <- abs(y-rowMeans(x.after))
totaldiff[i] <- sum(diff)
}
colnames(x)[which.max(totaldiff)]
}
Applying some standard optimizations (pre-allocating totaldiff to the right size, eliminating intermediate variables that are only used once) gives
forSecond <- function(x) {
y <- rowMeans(x)
totaldiff <- numeric(ncol(x))
for (i in 1:ncol(x)){
x.after <- x
x.after[,i] <- sample(x[,i])
totaldiff[i] <- sum(abs(y-rowMeans(x.after)))
}
colnames(x)[which.max(totaldiff)]
}
Not much more can be done for this that I can see to improve the algorithm itself in the loop. A better algorithm would be the most help, but since this particular problem is just an example, it is not worth spending that time.
The apply version looks very similar.
applyFirst <- function(x) {
y <- rowMeans(x)
totaldiff <- sapply(seq_len(ncol(x)), function(i) {
x[,i] <- sample(x[,i])
sum(abs(y-rowMeans(x)))
})
colnames(x)[which.max(totaldiff)]
}
Benchmarking them gives:
> library("rbenchmark")
> benchmark(forFirst(x),
+ forSecond(x),
+ applyFirst(x),
+ order = "relative")
test replications elapsed relative user.self sys.self user.child
1 forFirst(x) 100 16.92 1.000 16.88 0.00 NA
2 forSecond(x) 100 17.02 1.006 16.96 0.03 NA
3 applyFirst(x) 100 17.05 1.008 17.02 0.01 NA
sys.child
1 NA
2 NA
3 NA
The differences between these is just noise. In fact, running the benchmark again gives a different ordering:
> benchmark(forFirst(x),
+ forSecond(x),
+ applyFirst(x),
+ order = "relative")
test replications elapsed relative user.self sys.self user.child
3 applyFirst(x) 100 17.05 1.000 17.02 0 NA
2 forSecond(x) 100 17.08 1.002 17.05 0 NA
1 forFirst(x) 100 17.44 1.023 17.41 0 NA
sys.child
3 NA
2 NA
1 NA
So these approaches are the same speed. Any real improvement will come from using a better algorithm than just simple looping and copying to create the intermediate results.
Apply functions do not necessarily speed up loops in R. Sometimes they can even slow them down. There's no reason to believe that turning this into an apply family function will speed it up any appreciable amount.
As an aside, this code seems like a relatively pointless endeavour. It's just going to select a random column. I could get the same result by just doing that in the first place. Perhaps this is nested in a larger loop looking for a distribution?

Speeding up a for loop containing a sum in R

I'm wondering if it's possible to modify this loop to run faster. When I run it with n=2000000 it takes about 25 seconds. Any tricks available?
for(i in 1:n)
{
x[i] <- sum(runif(20))
}
system.time(x <- rowSums(matrix(runif(2e6),ncol=20)))
# user system elapsed
# 0.108 0.620 0.748
Using apply can get you some speed increases.
# How many rows?
n <- 1000
# How many samples from runif?
k <- 20
# Preallocate x
x <- double(n)
## Your loop
for(i in 1:n){
x[i] <- sum(runif(k))
}
## Using apply
## First create a matrix that has n rows and k columns
## then find the sum of the row.
x <- apply(matrix(runif(n*k), nrow=n), 1, sum)
Now test the speed:
benchmark(
loop = expression(
for(i in 1:n){
x[i] <- sum(runif(k))
}
),
apply = expression(
x <- apply(matrix(runif(n*k), nrow=n), 1, sum)
)
)
# Result of benchmark
#
# test replications elapsed relative user.self sys.self user.child sys.child
#2 apply 100 1.08 1.000000 1.06 0.00 NA NA
#1 loop 100 1.69 1.564815 1.63 0.02 NA NA
The loop takes longer than apply.
I would prefer the following solution:
x <- rep(sum(runif(20)), 2e6)
EDIT: Sorry, I recognize that you will get the same number 2e6 times.

Vectorize a product calculation which depends on previous elements?

I'm trying to speed up/vectorize some calculations in a time series.
Can I vectorize a calculation in a for loop which can depend on results from an earlier iteration? For example:
z <- c(1,1,0,0,0,0)
zi <- 2:6
for (i in zi) {z[i] <- ifelse (z[i-1]== 1, 1, 0) }
uses the z[i] values updated in earlier steps:
> z
[1] 1 1 1 1 1 1
In my effort at vectorizing this
z <- c(1,1,0,0,0,0)
z[zi] <- ifelse( z[zi-1] == 1, 1, 0)
the element-by-element operations don't use results updated in the operation:
> z
[1] 1 1 1 0 0 0
So this vectorized operation operates in 'parallel' rather than iterative fashion. Is there a way I can write/vectorize this to get the results of the for loop?
ifelse is vectorized and there's a bit of a penalty if you're using it on one element at a time in a for-loop. In your example, you can get a pretty good speedup by using if instead of ifelse.
fun1 <- function(z) {
for(i in 2:NROW(z)) {
z[i] <- ifelse(z[i-1]==1, 1, 0)
}
z
}
fun2 <- function(z) {
for(i in 2:NROW(z)) {
z[i] <- if(z[i-1]==1) 1 else 0
}
z
}
z <- c(1,1,0,0,0,0)
identical(fun1(z),fun2(z))
# [1] TRUE
system.time(replicate(10000, fun1(z)))
# user system elapsed
# 1.13 0.00 1.32
system.time(replicate(10000, fun2(z)))
# user system elapsed
# 0.27 0.00 0.26
You can get some additional speed gains out of fun2 by compiling it.
library(compiler)
cfun2 <- cmpfun(fun2)
system.time(replicate(10000, cfun2(z)))
# user system elapsed
# 0.11 0.00 0.11
So there's a 10x speedup without vectorization. As others have said (and some have illustrated) there are ways to vectorize your example, but that may not translate to your actual problem. Hopefully this is general enough to be applicable.
The filter function may be useful to you as well if you can figure out how to express your problem in terms of a autoregressive or moving average process.
This is a nice and simple example where Rcpp can shine.
So let us first recast functions 1 and 2 and their compiled counterparts:
library(inline)
library(rbenchmark)
library(compiler)
fun1 <- function(z) {
for(i in 2:NROW(z)) {
z[i] <- ifelse(z[i-1]==1, 1, 0)
}
z
}
fun1c <- cmpfun(fun1)
fun2 <- function(z) {
for(i in 2:NROW(z)) {
z[i] <- if(z[i-1]==1) 1 else 0
}
z
}
fun2c <- cmpfun(fun2)
We write a Rcpp variant very easily:
funRcpp <- cxxfunction(signature(zs="numeric"), plugin="Rcpp", body="
Rcpp::NumericVector z = Rcpp::NumericVector(zs);
int n = z.size();
for (int i=1; i<n; i++) {
z[i] = (z[i-1]==1.0 ? 1.0 : 0.0);
}
return(z);
")
This uses the inline package to compile, load and link the five-liner on the fly.
Now we can define our test-date, which we make a little longer than the original (as just running the original too few times result in unmeasurable times):
R> z <- rep(c(1,1,0,0,0,0), 100)
R> identical(fun1(z),fun2(z),fun1c(z),fun2c(z),funRcpp(z))
[1] TRUE
R>
All answers are seen as identical.
Finally, we can benchmark:
R> res <- benchmark(fun1(z), fun2(z),
+ fun1c(z), fun2c(z),
+ funRcpp(z),
+ columns=c("test", "replications", "elapsed",
+ "relative", "user.self", "sys.self"),
+ order="relative",
+ replications=1000)
R> print(res)
test replications elapsed relative user.self sys.self
5 funRcpp(z) 1000 0.005 1.0 0.01 0
4 fun2c(z) 1000 0.466 93.2 0.46 0
2 fun2(z) 1000 1.918 383.6 1.92 0
3 fun1c(z) 1000 10.865 2173.0 10.86 0
1 fun1(z) 1000 12.480 2496.0 12.47 0
The compiled version wins by a factor of almost 400 against the best R version, and almost 100 against its byte-compiled variant. For function 1, the byte compilation matters much less and both variants trail the C++ by a factor of well over two-thousand.
It took about one minute to write the C++ version. The speed gain suggests it was a minute well spent.
For comparison, here is the result for the original short vector called more often:
R> z <- c(1,1,0,0,0,0)
R> res2 <- benchmark(fun1(z), fun2(z),
+ fun1c(z), fun2c(z),
+ funRcpp(z),
+ columns=c("test", "replications",
+ "elapsed", "relative", "user.self", "sys.self"),
+ order="relative",
+ replications=10000)
R> print(res2)
test replications elapsed relative user.self sys.self
5 funRcpp(z) 10000 0.046 1.000000 0.04 0
4 fun2c(z) 10000 0.132 2.869565 0.13 0
2 fun2(z) 10000 0.271 5.891304 0.27 0
3 fun1c(z) 10000 1.045 22.717391 1.05 0
1 fun1(z) 10000 1.202 26.130435 1.20 0
The qualitative ranking is unchanged: the Rcpp version dominates, function2 is second-best. with the byte-compiled version being about twice as fast that the plain R variant, but still almost three times slower than the C++ version. And the relative difference are lower: relatively speaking, the function call overhead matters less and the actual looping matters more: C++ gets a bigger advantage on the actual loop operations in the longer vectors. That it is an important result as it suggests that more real-life sized data, the compiled version may reap a larger benefit.
Edited to correct two small oversights in the code examples. And edited again with thanks to Josh to catch a setup error relative to fun2c.
I think this is cheating and not generalizable, but: according to the rules you have above, any occurrence of 1 in the vector will make all subsequent elements 1 (by recursion: z[i] is 1 set to 1 if z[i-1] equals 1; therefore z[i] will be set to 1 if z[i-2] equals 1; and so forth). Depending on what you really want to do, there may be such a recursive solution available if you think carefully about it ...
z <- c(1,1,0,0,0,0)
first1 <- min(which(z==1))
z[seq_along(z)>first1] <- 1
edit: this is wrong, but I'm leaving it up to admit my mistakes. Based on a little bit of playing (and less thinking), I think the actual solution to this recursion is more symmetric and even simpler:
rep(z[1],length(z))
Test cases:
z <- c(1,1,0,0,0,0)
z <- c(0,1,1,0,0,0)
z <- c(0,0,1,0,0,0)
Check out the rollapply function in zoo.
I'm not super familiar with it, but I think this does what you want:
> c( 1, rollapply(z,2,function(x) x[1]) )
[1] 1 1 1 1 1 1
I'm sort of kludging it by using a window of 2 and then only using the first element of that window.
For more complicated examples you could perform some calculation on x[1] and return that instead.
Sometimes you just need to think about it totally differently. What you're doing is creating a vector where every item is the same as the first if it's a 1 or 0 otherwise.
z <- c(1,1,0,0,0,0)
if (z[1] != 1) z[1] <- 0
z[2:length(z)] <- z[1]
There is a function that does this particular calculation: cumprod (cumulative product)
> cumprod(z[zi])
[1] 1 0 0 0 0
> cumprod(c(1,2,3,4,0,5))
[1] 1 2 6 24 0 0
Otherwise, vectorize with Rccp as other answers have shown.
It's also possible to do this with "apply" using the original vector and a lagged version of the vector as the constituent columns of a data frame.

Resources