I wondered if there exists a simpler (and possibly more efficient) way to write the following loop
l = leg(u[1],k)
for (i in 2:length(u)){l=rbind(l,leg(u[i],k))}
where leg is a user-defined function that returns a vector of size k. Basically, I want to return a length(u) x k matrix.
My issue is that the computation is very slow when length(u) is big, so maybe using some smarter functions (like apply or its variants) would make it faster.
Contrary to popular belief, sapply is not necessarily faster than a for loop
Let's test a few different methods with microbenchmark
library(microbenchmark)
u<-1:10
k<-3
I invent here a function for the sake of testing:
leg<-function(u,k){u**(1:k)}
Let's test different methods:
method1<-function(u,k){
l = leg(u[1],k)
for (i in 2:length(u)){l=rbind(l,leg(u[i],k))}
}
method2<-function(u,k){
l<-matrix(nrow = length(u),ncol = k)
for (i in 1:length(u)){l[i,]<-leg(u[i],k)}
}
method3<-function(u,k){
l <- do.call(rbind,lapply(1:length(u),function(i)leg(u[i],k)))
}
Now :
microbenchmark(times = 100, method1(u,k),method2(u,k),method3(u,k))
expr min lq mean median uq max neval
method1(u, k) 30.031 32.6920 36.88837 34.2125 41.4350 53.219 100
method2(u, k) 21.668 25.8490 29.60131 27.1800 32.6915 70.705 100
method3(u, k) 21.667 26.2295 29.42637 27.3700 33.0715 51.699 100
If we vectorize our function:
leg2<-function(u,k){
result<-matrix(nrow = length(u),ncol = k);
for(i in 1:k){result[,i]<-u**i}
}
microbenchmark(times = 100, method1(u,k),method2(u,k),method3(u,k),leg2(u,k))
Unit: microseconds
expr min lq mean median uq max neval
method1(u, k) 28.891 31.1710 34.79391 32.692 37.8235 64.243 100
method2(u, k) 20.527 24.7085 29.20205 26.229 31.3610 79.068 100
method3(u, k) 22.428 24.7090 28.49507 26.610 31.1710 71.465 100
leg2(u, k) 6.462 7.6030 9.03213 8.363 8.7430 19.768 100
So about 3 times faster!
Related
Any generic way of doing the following R code faster? For example in python dict comprehension (see equivalent below) would be a nice faster alternative.
R:
l1 <- 1:3
l2 <- c("a", "b", "c")
foo <- function(x) {return(5*x)}
bar <- list()
for (i in 1:length(l1)) bar[l2[i]] <- foo(l1[i])
Python
l1 = range(1, 4)
l2 = ["a", "b", "c"]
def foo(x):
return 5*x
{b: foo(a) for a, b in zip(l1, l2)}
We're talking about speed, so let's do some benchmarking:
library(microbenchmark)
microbenchmark(op = {for (i in 1:length(l1)) bar[l2[i]] <- foo(l1[i])},
lapply = setNames(lapply(l1,foo),l2),
vectorised = setNames(as.list(foo(l1)), l2))
Unit: microseconds
expr min lq mean median uq max neval
op 7.982 9.122 10.81052 9.693 10.548 36.206 100
lapply 5.987 6.557 7.73159 6.842 7.270 55.877 100
vectorised 4.561 5.132 6.72526 5.417 5.987 80.964 100
But these small values don't mean much, so I pumped up the vector length to 10,000 where you'll really see a difference:
l <- 10000
l1 <- seq_len(l)
l2 <- sample(letters, l, replace = TRUE)
microbenchmark(op = {bar <- list(); for (i in 1:length(l1)) bar[l2[i]] <- foo(l1[i])},
lapply = setNames(lapply(l1,foo),l2),
vectorised = setNames(as.list(foo(l1)), l2),
times = 100)
Unit: microseconds
expr min lq mean median uq max neval
op 30122.865 33325.788 34914.8339 34769.8825 36721.428 41515.405 100
lapply 13526.397 14446.078 15217.5309 14829.2320 15351.933 19241.767 100
vectorised 199.559 259.997 349.0544 296.9155 368.614 3189.523 100
But tacking onto what everyone else said, it doesn't have to be a list. If you remove the list requirement:
microbenchmark(setNames(foo(l1), l2))
Unit: microseconds
expr min lq mean median uq max neval
setNames(foo(l1), l2) 22.522 23.8045 58.06888 25.0875 48.322 1427.417 100
Lets say I have vector x that:
is very large ( > 200 000 )
is integer
is sorted
all of it's values are unique
I would like to check if an integer value y is in this vector, and if it is, I would like to get the index of it. I would like to take advantage of the fact, that vector is sorted, so it can be done fast.
How would I accomplish such thing?
Here's some data
set.seed(123)
x = sort(unique(floor(runif(1e6, 1, 1e7))))
y = sample(1e7, 10000)
And a couple of approaches
f0 = function(y, vec) y %in% vec
f1 = function(y, vec) vec[findInterval(y, vec)] == y
The %in% does a full scan; findInterval() does a binary search (I think). They generate the same result
> identical(f0(y, x), f1(y, x))
[1] TRUE
And have approximately similar amortized performance
> library(microbenchmark)
> microbenchmark(f0(y, x), f1(y, x), times=10)
Unit: milliseconds
expr min lq mean median uq max neval
f0(y, x) 99.35425 100.87319 102.32160 102.20107 103.67718 105.70854 10
f1(y, x) 94.83219 95.05068 95.93625 95.77822 96.72601 97.50961 10
But findInterval() is I think faster for small queries
> microbenchmark(f0(y[1:10], x), f1(y[1:10], x), times=10)
Unit: milliseconds
expr min lq mean median uq max neval
f0(y[1:10], x) 83.441578 85.116818 86.264751 86.07515 87.13516 89.430801 10
f1(y[1:10], x) 7.731606 7.734207 7.757201 7.75199 7.77210 7.810957 10
David suggests (I think)
f2 = function(x, vec) vec[which.max(x == vec)] == x
which.max() is only good for scalar y, which is seldom (saying this for the benefit of OP) a good use of R. It appears less performant than findInterval()
> microbenchmark(f1(x[1000], x), f2(x[1000], x), times=10)
Unit: milliseconds
expr min lq mean median uq max neval
f1(x[1000], x) 7.707420 7.709047 7.714576 7.711979 7.718953 7.729688 10
f2(x[1000], x) 9.353225 9.358874 9.381781 9.378680 9.400808 9.426102 10
Contrary to #Laterow I don't see any particular performance difference between which() and which.max() (in current R-devel or R-3-2-branch; also, the results aren't the same, so it's an apples-to-oranges comparison). I have a vague recollection of an R-devel conversation about this in the last 6 months...
> set.seed(123) ; x <- sample(2e5, replace = TRUE)
> microbenchmark(which.max(x == 1e7), which(x == 1e7)[1])
Unit: milliseconds
expr min lq mean median uq max
which.max(x == 1e+07) 4.240606 4.266470 5.975966 5.015947 5.217903 43.78467
which(x == 1e+07)[1] 4.060040 4.132667 5.550078 4.986287 5.059128 43.88074
neval
100
100
Performance of which versus which.max might have changed with this commit, where previously which.max() would coerce logical to numeric vectors before the scan, triggering a copy.
I was looking at the benchmarks in this answer, and wanted to compare them with diag (used in a different answer). Unfortunately, it seems that diag takes ages:
nc <- 1e4
set.seed(1)
m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc)
microbenchmark(
diag = diag(m),
cond = m[row(m)==col(m)],
vec = m[(1:nc-1L)*nc+1:nc],
mat = m[cbind(1:nc,1:nc)],
times=10)
Comments: I tested these with identical. I took "cond" from one of the answers to this homework question. Results are similar with a matrix of integers, 1:26 instead of letters.
Results:
Unit: microseconds
expr min lq mean median uq max neval
diag 604343.469 629819.260 710371.3320 706842.3890 793144.019 837115.504 10
cond 3862039.512 3985784.025 4175724.0390 4186317.5260 4312493.742 4617117.706 10
vec 317.088 329.017 432.9099 350.1005 629.460 651.376 10
mat 272.147 292.953 441.7045 345.9400 637.506 706.860 10
It is just a matrix-subsetting operation, so I don't know why there's so much overhead. Looking inside the function, I see a few checks and then c(m)[v], where v is the same vector used in the "vec" benchmark. Timing these two...
v <- (1:nc-1L)*nc+1:nc
microbenchmark(diaglike=c(m)[v],vec=m[v])
# Unit: microseconds
# expr min lq mean median uq max neval
# diaglike 579224.436 664853.7450 720372.8105 712649.706 767281.5070 931976.707 100
# vec 334.843 339.8365 568.7808 646.799 663.5825 1445.067 100
...it seems I have found my culprit. So, the new variation on my question is: Why is there a seemingly unnecessary and very time-consuming c in diag?
Summary
As of R version 3.2.1 (World-Famous Astronaut) diag() has received an update. The discussion moved to r-devel where it was noted that c() strips non-name attributes and may have been why it was placed there. While some people worried that removing c() would cause unknown issues on matrix-like objects, Peter Dalgaard found that, "The only case where the c() inside diag() has an effect is where M[i,j] != M[(i-1)*m+j] AND c(M) will stringize M in column-major order, so that M[i,j] == c(M)[(i-1)*m+j]."
Luke Tierney tested #Frank 's removal of c(), finding it did not effect anything on CRAN or BIOC and so was implemented to replace c(x)[...] with x[...] on line 27. This leads to relatively large speedups in diag(). Below is a speed test showing the improvement with R 3.2.1's version of diag().
library(microbenchmark)
nc <- 1e4
set.seed(1)
m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc)
microbenchmark(diagOld(m),diag(m))
Unit: microseconds
expr min lq mean median uq max neval
diagOld(m) 451189.242 526622.2775 545116.5668 531905.5635 540008.704 682223.733 100
diag(m) 222.563 646.8675 644.7444 714.4575 740.701 1015.459 100
Is there an implementation of colMeans in R that includes an alpha trimmed mean parameter?
If not, how could I make one?
The original question has been answered in the comments below.
apply(x, 2, mean, trim=.05) is not as optimized as colMeans;
what is an implementation of equal efficiency?
Here are some examples of different ways to calculate trimmed colMeans, with a comparison of their performance.
m <- matrix(runif(1000000), nc=1000)
trim <- 0.1
Using apply:
out1 <- apply(m, 2, mean, trim=trim)
Using sapply:
out2 <- sapply(seq_len(ncol(m)), function(i) mean(m[, i], trim=trim))
Using Rcpp:
library(inline)
library(RcppArmadillo)
f <- 'using namespace arma;
mat x = sort(as<mat>(x_));
double trim = as<double>(trim_);
int low;
if(x.n_rows % 2 == 1) {
low = ceil(trim * x.n_rows) - 1;
} else {
low = ceil(trim * x.n_rows);
}
int high = ceil((1 - trim) * x.n_rows) - 1;
return(wrap(mean(x.rows(low, high))));'
trim.colMeans <- cxxfunction(signature(x_='matrix', trim_='numeric'),
f, plugin="RcppArmadillo")
out3 <- trim.colMeans(m, trim)
Comparison
identical(out1, out2)
[1] TRUE
identical(out1, c(out3))
[1] TRUE
library(microbenchmark)
microbenchmark(apply=apply(m, 2, mean, trim=trim),
sapply=sapply(seq_len(ncol(m)), function(i) mean(m[, i], trim=trim)),
Rcpp=trim.colMeans(m, trim),
colMeans=colMeans(m))
Unit: microseconds
expr min lq median uq max neval
apply 68907.162 100439.4775 102555.396 109044.4025 136034.067 100
sapply 64675.928 66383.6010 66937.615 68152.1115 98680.906 100
Rcpp 43614.629 44297.6980 44761.360 45164.4850 46883.602 100
colMeans 782.458 805.7995 828.538 988.4625 1452.877 100
I'm sure my Rcpp implementation is sub-optimal, so feel free to chime in with improvements. As you can see, none of these methods is as efficient as an untrimmed calculation of colMeans, yet I suspect equivalent efficiency is impossible, since additional calculations must be made, including sorting and subsetting of the matrix. This penalty for trimming data is evident when benchmarking the mean of a vector vs. the trimmed counterpart:
v <- runif(1000)
microbenchmark(mean(v), mean(v, trim=0.1))
Unit: microseconds
expr min lq median uq max neval
mean(v) 5.722 6.325 6.927 7.229 124.989 100
mean(v, trim = 0.1) 42.165 43.671 44.574 44.876 84.630 100
I am trying to do Conway–Maxwell-Poisson regression using COMPoissonReg in R
However, it is extremely slow for large dataset. Thus I tried to profile and check the source code.
The majority time (>95%) is spent on a function COMPoissonReg:::computez, which is equivalent to:
test <- function (lambda, nu, max=100)
{
forans <- matrix(0, ncol = max + 1, nrow = length(lambda))
for (j in 1:max) {
temp <- matrix(0, ncol = j, nrow = length(lambda))
for (i in 1:j) {
temp[, i] <- lambda/(i^nu)
}
for (k in 1:length(lambda)) {
forans[k, j + 1] <- prod(temp[k, ])
}
}
forans[, 1] <- rep(1, length(lambda))
ans <- rowSums(forans)
return(ans)
}
v is nu here, and lambda is a vector, max is the upper-limit of s (here it is set to 100 as an approximate to infinity).
The question doesn't really need special background stats knowledge, but the link or link2 is here just in case.
A simple script to test performance, this takes 8 secs, and if I lazily cmpfun compile it, it takes 4 secs. I believe it has the potential to be further improved. (without rewriting in C, and I am aiming for around ~ 0.05 sec so that I don't have to refactor the code in package which iteratively calls this function.)
lambda <- rnorm(10000, 1.5, 0.3)
Rprof(tmp <- tempfile())
sum(log(test(lambda, 1.2)))
Rprof()
summaryRprof(tmp)
Update
I realized another issue: floating point arithmetic limitation. Doing power series is dangerous, it can overflow very soon, especially if we have to vectorize. E.g. lambda ^ 100 is certainly NAN if lambda > 10000. Maybe I will use reduce if I program in other languages, but I fear in R reduce is slow.
You can make it much faster than the function you are using by avoiding loops. For example:
test2<-function(lambda,nu,max=100){
len<-length(lambda)
mm<-matrix(rep(lambda,each=max+1),max+1,len)
mm<-mm^(0:max)
mm<-mm/factorial(0:max)^nu
colSums(mm)
}
This runs about 50 times faster with lambda of length 100:
> require(microbenchmark)
> lam<-rnorm(100)
> max(abs(test(lam,1.2)-test2(lam,1.2)))
[1] 4.510281e-16
> microbenchmark(test(lam,1.2),test2(lam,1.2),times=10)
Unit: milliseconds
expr min lq median uq max neval
test(lam, 1.2) 77.124705 77.422619 78.241945 79.635746 81.260280 10
test2(lam, 1.2) 1.335716 1.373116 1.401411 1.507765 1.562447 10
You can probably optimize it a little more, but this should get most of the gains, unless there is some kind of builtin function you can exploit rather than doing the sum explicitly.
On input of length 10000, it takes 0.148 seconds on my machine, versus 6.850 seconds for test:
> lam<-rnorm(10000)
> max(abs(test(lam,1.2)-test2(lam,1.2)))
[1] 3.552714e-15
> system.time(test2(lam,1.2))
user system elapsed
0.132 0.016 0.148
> system.time(test(lam,1.2))
user system elapsed
6.780 0.056 6.850
OK, here's an Rcpp answer. As expected, it's a lot faster than either of the others.
require(Rcpp)
rcppfun<-"
Rcpp::NumericVector myfun(Rcpp::NumericVector lambda,
Rcpp::NumericVector weights)
{
int num = lambda.size();
int max = weights.size();
std::vector<double> r(num);
for(int i=0; i<num; i++){
double total = 0;
double prod = 1;
for(int j=0; j<max; j++){
total += prod/weights[j];
prod *= lambda[i];
}
r[i]=total;
}
return Rcpp::wrap(r);
}
"
testRcpp<-cppFunction(rcppfun)
test5<-function(lambda,nu,max=100){
wts<-factorial(0:max)^nu
testRcpp(lambda,wts)
}
This is about 40x faster than my original test2 and about 12x faster than than #NealFultz's improved logarithmic implementation.
> lam<-abs(rnorm(10000))
> max(abs(test5(lam,1.2)-test2(lam,1.2)))
[1] 7.105427e-15
> microbenchmark(test2(lam,1.2),test3(lam,1.2),test4(lam,1.2),test5(lam,1.2))
Unit: milliseconds
expr min lq median uq max neval
test2(lam, 1.2) 125.601616 126.790516 127.700099 135.182263 222.340179 100
test3(lam, 1.2) 125.523424 126.666410 126.921035 131.316254 178.633839 100
test4(lam, 1.2) 41.734015 42.640340 43.190553 50.932952 97.765219 100
test5(lam, 1.2) 3.432029 3.501046 3.519007 3.532603 3.754232 100
On edit, here's one more Rcpp version that should at least partially address the overflow issue, by computing each term incrementally, rather than the numerator and denominator separately.
rcppfun2<-"
Rcpp::NumericVector myfun2(Rcpp::NumericVector lambda, Rcpp::NumericVector nu){
int num = lambda.size();
int max = nu.size();
std::vector<double> r(num);
for(int i=0; i<num; i++){
double term = 1;
double total = 1;
for(int j=0; j< max; j++){
term *= (lambda[i]/nu[j]);
total += term;
}
r[i]=total;
}
return Rcpp::wrap(r);
}
"
testRcpp2<-cppFunction(rcppfun2)
test6<-function(lambda,nu,max=100){
testRcpp2(lambda,(1:max)^nu)
}
> lam<-abs(rnorm(10000))
> max(abs(test2(lam,1.2)-test6(lam,1.2)))
[1] 1.065814e-14
> microbenchmark(test5(lam,1.2),test6(lam,1.2))
Unit: milliseconds
expr min lq median uq max neval
test5(lam, 1.2) 3.416786 3.426013 3.435492 3.444196 3.604486 100
test6(lam, 1.2) 3.554147 3.572285 3.580865 3.588030 3.840713 100
I slept on it, came up with another big improvement if you can use the gsl package. All you're doing is evaluating a polynomial:
require(gsl)
test5 <- function(lambda, nu, max=100){
gsl_poly(factorial(0:max)^-nu, lambda)
}
R>microbenchmark(test2(1:50,5.1), test4(1:50,5.1), test5(1:50,5.1))
Unit: microseconds
expr min lq median uq max neval
test2(1:50, 5.1) 4518.957 4838.5185 5318.5040 5617.6330 19978.039 100
test4(1:50, 5.1) 2043.422 2268.3490 2472.0430 2727.1045 10328.376 100
test5(1:50, 5.1) 311.144 407.2465 476.0755 540.6095 1138.766 100
Following up #mrips, sometimes working on the log can be a bit faster because you can multiply instead of exponentiate:
test4 <- function(lambda,nu,max=100){
mm<-matrix(log(lambda),max,length(lambda), byrow=TRUE)
mm<-mm * 1:max - nu*lfactorial(1:max)
1 + colSums(exp(mm))
}
I've also factored out the special case where s = 0. Here's my timings:
R>microbenchmark(test2(1:50,5), test4(1:50,5))
Unit: microseconds
expr min lq median uq max neval
test2(1:50, 5) 952.360 1432.6600 1436.4525 1440.1860 3467.981 100
test4(1:50, 5) 695.189 1041.4785 1042.8315 1045.6525 2970.441 100