I'm wondering if it's possible to modify this loop to run faster. When I run it with n=2000000 it takes about 25 seconds. Any tricks available?
for(i in 1:n)
{
x[i] <- sum(runif(20))
}
system.time(x <- rowSums(matrix(runif(2e6),ncol=20)))
# user system elapsed
# 0.108 0.620 0.748
Using apply can get you some speed increases.
# How many rows?
n <- 1000
# How many samples from runif?
k <- 20
# Preallocate x
x <- double(n)
## Your loop
for(i in 1:n){
x[i] <- sum(runif(k))
}
## Using apply
## First create a matrix that has n rows and k columns
## then find the sum of the row.
x <- apply(matrix(runif(n*k), nrow=n), 1, sum)
Now test the speed:
benchmark(
loop = expression(
for(i in 1:n){
x[i] <- sum(runif(k))
}
),
apply = expression(
x <- apply(matrix(runif(n*k), nrow=n), 1, sum)
)
)
# Result of benchmark
#
# test replications elapsed relative user.self sys.self user.child sys.child
#2 apply 100 1.08 1.000000 1.06 0.00 NA NA
#1 loop 100 1.69 1.564815 1.63 0.02 NA NA
The loop takes longer than apply.
I would prefer the following solution:
x <- rep(sum(runif(20)), 2e6)
EDIT: Sorry, I recognize that you will get the same number 2e6 times.
Related
I'm writing a function that needs to call a function g passed as a parameter to each element of a list, iteratively.
I'm wondering how to make this the fastest possible. I can achieve an acceptable speed using Rcpp and specific kind of g (writing everything in Cpp), but I can't figure out if I can reach similar speed passing an R function as argument.
Was doing some tests to figure out why R is slower and found some really unexpected results:
minus <- function(x) -x
minus_vec <- Vectorize(minus, "x")
Testing with some simple functions to invert signs.
f0 <- function(x) {
sapply(x, minus)
}
f1 <- function(x) {
for(i in seq_along(x)){
x[i] <- -x[i]
}
x
}
f2 <- function(x) {
for(i in seq_along(x)){
x[i] <- minus(x[i])
}
x
}
I got the following results:
a <- 1:10^5
library(rbenchmark)
benchmark(f0(a), f1(a), f2(a), minus_vec(a), minus(a))[,c(1,4)]
test relative
1 f0(a) 454.842
2 f1(a) 25.579
3 f2(a) 178.211
4 minus_vec(a) 523.789
5 minus(a) 1.000
I would like some explanation on the following points:
Why don't f1 and f2 have the same speed? Writing the piece of code -x[i] and calling the function minus(x[i]) really should be so different when they do the exact same thing?
Why is f0 slower than f2? I always thought apply functions were more efficient than for loops, but never really understood why and now I even found a counter-example.
Can I make a function as fast as f1 using the function minus ?
Why does vectorizing minus (unnecessary since - is already vectorized, but that might not be the case always) made it so bad?
Not a full answer, but here are a few notes
1 minus(x) vs -x: Doing nothing is better than doing something
Your function minus calls `-`, so the added step adds computation time. I honestly do not know the who's, what's and when's specifically, in other words I wouldn't know how much more computation time ought to be expected.
Here is an example highlighting it: we have four functions, all squaring numbers
fa <- function (n) n^2
fb <- function (n) fa(n)
fc <- function (n) fb(n)
fd <- function (n) fc(n)
Fa <- function (n) {
for (i in seq_along(n)) n[i] <- fa(i)
n
}
Fb <- function (n) {
for (i in seq_along(n)) n[i] <- fb(i)
n
}
Fc <- function (n) {
for (i in seq_along(n)) n[i] <- fc(i)
n
}
Fd <- function (n) {
for (i in seq_along(n)) n[i] <- fd(i)
n
}
And here are the benchmarking results
n <- 1:10^4
b <- benchmark(Fa(n),Fb(n),Fc(n),Fd(n), replications = 1000L)
b
# test replications elapsed relative user.self sys.self user.child sys.child
# 1 Fa(n) 1000 3.93 1.000 3.85 0.00 NA NA
# 2 Fb(n) 1000 7.08 1.802 6.94 0.02 NA NA
# 3 Fc(n) 1000 10.16 2.585 9.94 0.06 NA NA
# 4 Fd(n) 1000 13.68 3.481 13.56 0.00 NA NA
# looks rather even
diff(b$elapsed)
# [1] 3.15 3.08 3.52
Now back to your minusfunction
a <- 1:10^5
b <- benchmark(f0(a), f1(a), f2(a), minus_vec(a), minus(a))
b$elapsed[b$test == 'f2(a)'] - b$elapsed[b$test == 'f1(a)']
# [1] 3.39
2 apply vs for vs Vectorize:
#NavyCheng provided for some good material on the topic. Now my understanding is, the apply family (just like Vectorize) loops in R (whereas if I'm not mistaking the looping for `-` is done in C).
Again, I do not know about the exact details, but if apply/Vectorize use R loops, then, in theory (and often in practice), it is possible to write a proper for loop that will perform as good or better.
3 A Function as fast as f1:
Ad-hoc, the closes I came up was by cheating using the Rcpp package. (cheating since one writes the function in c++ first)
In C++
#include <RcppArmadillo.h>
//[[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector minusCpp(NumericVector x) {
for (int k = 0; k < x.length(); ++k) {
x[k] = -x[k];
}
return x;
}
Now to the bechmarks in R
a <- 1:10^5
b <- benchmark(f0(a), f1(a), f2(a), minus_vec(a), minus(a), minusCpp(a))
b
# test replications elapsed relative user.self sys.self user.child sys.child
# 1 f0(a) 100 9.47 NA 9.22 0.01 NA NA
# 2 f1(a) 100 0.53 NA 0.54 0.00 NA NA
# 3 f2(a) 100 4.23 NA 4.24 0.00 NA NA
# 5 minus(a) 100 0.00 NA 0.00 0.00 NA NA
# 4 minus_vec(a) 100 10.42 NA 10.39 0.02 NA NA
# 6 minusCpp(a) 100 0.05 NA 0.04 0.00 NA NA
Ignore -x[i] and minus(-x[i]), and I summarize the four questions to two:
Why apply family is slower than forloop?
Why Vectorize is slower than apply family?
For the 1st question:
The apply functions are designed to be convenient and clear to read,
not necessarily fast.
and apply family will do more things than forloop,
Also the sapply function first uses as.vector(unlist(...)) to convert anything to a vector, and in the end tries to simplify the answer into a suitable form.
You can't read here and here for more detail.
For for 2rd question, it's because Vectorize is a wrapper of mapply and if you type Vectorize in Rstudio, you'll see the detail code. you can read this for more help.
I want to multiply several matrices of the same size with an inital vector. In the example below p.state is vector of m elements and tran.mat is list where each member is an m x m matrix.
for (i in 1:length(tran.mat)){
p.state <- p.state %*% tran.mat[[i]]
}
The code above gives the correct answer but can be slow when length(tran.mat) is large. I was wondering if there was a more efficient way of doing this?
Below is an example with a m=3 and length(mat)=10 that can generate this:
p.state <- c(1,0,0)
tran.mat<-lapply(1:10,function(y){apply(matrix(runif(9),3,3),1,function(x){x/sum(x)})})
for (i in 1:length(tran.mat)){
p.state <- p.state %*% tran.mat[[i]]
}
print(p.state)
NB: tran.mat does not have to be a list it is just currently written as one.
Edit after a few comments:
Reduce is useful when m is small. However when m=6 the loop out performed both the above solutions.
library(rbenchmark)
p.state1 <- p.state <- c(1,0,0,0,0,0)
tran.mat<-lapply(1:10000,function(y){t(apply(matrix(runif(36),6,6),1,function(x){x/sum(x)}))})
tst<-do.call(c, list(list(p.state), tran.mat))
benchmark(
'loop' = {
for (i in 1:length(tran.mat)){
p.state <- p.state %*% tran.mat[[i]]
}
},
'reduce' = {
p.state1 %*% Reduce('%*%', tran.mat)
},
'reorder' = {
Reduce(`%*%`,tran.mat,p.state1)
}
)
This results in
test replications elapsed relative user.self sys.self user.child sys.child
1 loop 100 0.87 1.000 0.87 0 NA NA
2 reduce 100 1.41 1.621 1.39 0 NA NA
3 reorder 100 1.00 1.149 1.00 0 NA NA
A faster way is to use Reduce() to do sequential matrix multiplication on the list of matrices.
You can get approximately a 4x speedup that way. Below is an example of your code tested, with 1000 elements in the list instead of 10 to see the performance improvement more easily.
Code
library(rbenchmark)
p.state <- c(1,0,0)
tran.mat<-lapply(1:1000,function(y){apply(matrix(runif(9),3,3),1,function(x){x/sum(x)})})
benchmark(
'loop' = {
for (i in 1:length(tran.mat)){
p.state <- p.state %*% tran.mat[[i]]
}
},
'reduce' = {
p.state %*% Reduce('%*%', tran.mat)
}
)
Output
test replications elapsed relative user.self sys.self user.child sys.child
1 loop 100 0.23 3.833 0.23 0 NA NA
2 reduce 100 0.06 1.000 0.07 0 NA NA
You can see the reduce method is about 3.8 times faster.
I am not sure that this will be any faster but it is shorter:
prod <- Reduce("%*%", L)
all.equal(prod, L[[1]] %*% L[[2]] %*% L[[3]] %*% L[[4]])
## [1] TRUE
Note
We used this test input:
m <- matrix(1:9, 3)
L <- list(m^0, m, m^2, m^3)
I am going to use a function from package Rfast to reduce the execution time of multiplication. Unfortunately, for loop's time can not be reduced.
The function called Rfast::eachcol.apply is a great solution for your purpose. Your multiplication is also the function crossprod but it is slow for our purpose.
Here are some helper functions:
mult.list<-function(x,y){
for (xm in x){
y <- y %*% xm
}
y
}
mult.list2<-function(x,y){
for (xm in x){
y <- Rfast::eachcol.apply(xm,y,oper="*",apply="sum")
}
y
}
Here is an example:
x<-list()
y<-rnomr(1000)
for(i in 1:100){
x[[i]]<-Rfast::matrnorm(1000,1000)
}
microbenchmark::microbenchmark(R=a<-mult.list(x,y),Rfast=b<-mult.list2(x,y),times = 10)
Unit: milliseconds
expr min lq mean median uq max neval
R 410.067525 532.176979 633.3700627 649.155826 699.721086 916.542414 10
Rfast 239.987159 251.266488 352.1951486 276.382339 458.089342 741.340268 10
all.equal(as.numeric(a),as.numeric(b))
[1] TRUE
The argument oper is for the operation on each element and the apply for the operation on each column. In large matrices should be fast. I couldn't test it in my laptop for bigger matrices.
I want to extract elements from a list based on indices stored in a separate vector.
This is my attempt at it:
list_positions<-c(2,3,4)
my_list<-list(c(1,3,4),c(2,3,4,5,6),c(1,2,3,4,6))
my_fun<-function(x,y){
x[y]
}
mapply(my_fun,x=my_list,y=list_positions)
Maybe somebody can suggest a faster solution. My list is has around 14 million elements. I tried parallel solutions, where instead of mapply I used clusterMap but still I would like to have a better performance.
We may unlist the list, create index based on lengths of 'my_list' and extract the vector
v1 <- unlist(my_list)
p1 <- list_positions
v1[cumsum(lengths(my_list))- (lengths(my_list)-p1)]
#[1] 3 4 4
Benchmarks
set.seed(24)
lst <- lapply(1:1e6, function(i) sample(1:10, sample(2:5), replace=FALSE))
p2 <- sapply(lst, function(x) sample(length(x), 1))
system.time({
r1 <- mapply(`[`, lst, p2)
})
#user system elapsed
# 1.84 0.02 1.86
system.time( r4 <- mapply(my_fun, lst, p2) )
# user system elapsed
# 1.88 0.01 1.89
system.time({ r4 <- mapply(my_fun, lst, p2) }) #placing inside the {}
# user system elapsed
# 2.31 0.00 2.31
system.time({ ##cccmir's function
r3 <- mapply(my_func1, lst, p2)
})
# user system elapsed
# 12.10 0.03 12.13
system.time({
v2 <- unlist(lst)
r2 <- v2[cumsum(lengths(lst))- (lengths(lst)-p2)]
})
# user system elapsed
# 0.14 0.00 0.14
identical(r1, r2)
#[1] TRUE
you should use a for loop in this case, for example:
library(microbenchmark)
list_positions<-c(2,3,4)
my_list<-list(c(1,3,4),c(2,3,4,5,6),c(1,2,3,4,6))
my_fun<-function(x,y){
x[y]
}
mapply(my_fun,x=my_list,y=list_positions)
my_func1 <- function(aList, positions){
res <- numeric(length(aList))
for(i in seq_along(aList)) {
res[i] <- aList[[i]][positions[i]]
}
return(res)
}
my_func2 <- function(aList, positions) {
v1 <- unlist(aList)
p1 <- positions
v1[cumsum(lengths(my_list))- (lengths(my_list)-p1)]
}
microbenchmark(mapply(my_fun,x=my_list,y=list_positions), my_func1(my_list, list_positions), my_func2(my_list, list_positions), times = 1000)
#Unit: microseconds
# expr min lq mean median uq max neval
#mapply(my_fun, x = my_list, y = list_positions) 12.764 13.858 17.453172 14.588 16.775 119.613 1000
# my_func1(my_list, list_positions) 5.106 5.835 7.328412 6.200 6.929 38.292 1000
# my_func2(my_list, list_positions) 2.553 3.282 4.337367 3.283 3.648 52.514 1000
#akrun solution is the fastest
Suppose I write the following R code:
first.value <- sample(100, 100, replace=TRUE)
second.value <- sample(10, 100, replace=TRUE)
X <- data.frame(first.value, second.value)
split.X <- split(X, second.value)
This code creates a data frame with two fields, and splits into bins according to the second. Now suppose I wanted to normalize each bin; i.e., subtract the mean and divide by the standard deviation. I could accomplish this by
normalized.first.value <- sapply(split.X, function(X) {(X$first.value - mean(X$first.value)) / sd(X$first.value)})
But this creates a new list with the normalized versions of each bin. What I really want to do is replace the copy of the data in split.X with its normalized version.
To illustrate, here's some sample output:
> first.value <- sample(100, 100, replace=TRUE)
> second.value <- sample(10, 100, replace=TRUE)
> X <- data.frame(first.value, second.value)
> split.X <- split(X, second.value)
> normalized.first.value <- sapply(split.X, function(X) {(X$first.value - mean(X$first.value)) / sd(X$first.value)})
> split.X[[1]]
first.value second.value
4 34 1
8 40 1
24 21 1
31 34 1
37 23 1
40 22 1
> normalized.first.value[[1]]
[1] 0.625 1.375 -1.000 0.625 -0.750 -0.875
What I really want to do is to put the values of normalized.first.value[[1]] into split.X[[1]]$first.value, and the same for the other indices.
This could be achieved with a for loop as follows:
for (i in 1:length(split.X)) {
split.X[[i]]$first.value <- (split.X[[i]]$first.value - mean(split.X[[i]]$first.value) / sd(split.X[[i]]$first.value);
}
But for loops are BAD in R, and I'd like to use sapply,lapply, etc. if I can. Unfortunately, when dealing with a list of dataframes, sapply and lapply don't seem to iterate in the way I want.
Here's a more arcane way (though I still reckon the for loop is fine in this case)
new.split.X <- mapply(`[<-`, split.X, T, 'first.value', normalized.first.value,
SIMPLIFY=F)
How it works: applies [<- on each split.X[[i]]. The T is the i index to replace (i.e. all of them), 'first.value' is the j index to replace (that column), normalized.first.value contains the replacements.
The loop may be easier to read in the end though, and probably not slower than tricksy *apply solutions.
library(rbenchmark)
benchmark(loop={
for (i in 1:length(split.X))
split.X[[i]]$first.value <- normalized.first.value[[i]]
},
mapply={
mapply(`[<-`, split.X, T, 'first.value', normalized.first.value,
SIMPLIFY=F)
},
Map={
Map(function(x,y) {x[['first.value']] <- y;x} ,split.X, normalized.first.value)
},
lapply={
lapply(seq_along(split.X), function(i) {
x1 <- split.X[[i]]
x1[,'first.value'] <- normalized.first.value[[i]]
x1})
})
test replications elapsed relative user.self sys.self user.child sys.child
4 lapply 100 0.034 4.857 0.035 0 0 0
1 loop 100 0.007 1.000 0.007 0 0 0
3 Map 100 0.012 1.714 0.013 0 0 0
2 mapply 100 0.030 4.286 0.032 0 0 0
So the explicit loop is the fastest, and easieset to read anyway.
You can use Map as both the lists have the same length. It works by replacing the first column in 'split.X' by the corresponding the list element in 'normalized.first.value'
Map(function(x,y) {x[['first.value']] <- y;x} ,split.X, normalized.first.value)
Or we can loop through the length of 'split.X', get the list elements of the 'split.X' and 'normalized.first.value' based on the index and then replace.
lapply(seq_along(split.X), function(i) {
x1 <- split.X[[i]]
x1[,'first.value'] <- normalized.first.value[[i]]
x1})
I've been running through Project Euler trying to write programs that are computationally efficient. Consider problem 1: http://projecteuler.net/problem=1. I've upped the range from 1000 to 10,000,000 to highlight inefficiencies.
This is my solution:
system.time({
x <- 1:1E7
a <- sum(as.numeric(x[x%%3 ==0 | x%%5==0]))
})
user system elapsed
0.980 0.041 1.011
Here is some C++ code written by a friend to do the same thing.
#include <iostream>
using namespace std;
int main(int argc, char** argv)
{
long x = 0;
for (int i = 1; i < 10000000; i++)
{
if (i % 3 == 0)
x += i;
else if (i % 5 == 0)
x += i;
}
cout << x;
return 0;
}
cbaden$ time ./a.out
23333331666668
real 0m0.044s
user 0m0.042s
sys 0m0.001s
I know C++ should be faster than R, but this much faster? Rprof indicate that I'm spending almost 60% of my time with the modulo operator and 13% of the time with the "==" operation. Are there any vectorized ways of doing this faster?
A secondary concern would be that I'm going to run out of memory--this approach is not very scalable as the range gets larger. Is there a good way to do this that preserves the vectorizability, yet doesn't try to keep the subset in memory?
Modulo is faster when it operates on integers and not numerics:
f1 <- function() {
x <- 1:1E7
a <- sum(as.numeric(x[x%%3 ==0 | x%%5==0]))
}
f2 <- function() {
x <- 1:1E7
a <- sum(as.numeric(x[x %% 3L == 0L | x %% 5L == 0L]))
}
library(rbenchmark)
benchmark(f1(), f2(), replications = 5)
# test replications elapsed relative user.self sys.self user.child sys.child
# 1 f1() 5 14.78 4.976431 13.95 0.67 NA NA
# 2 f2() 5 2.97 1.000000 2.37 0.50 NA NA
That's still far from C++ performance, but it's a step in the right direction.
A faster solution
x <-1E7
a<-x%/%3
b<-x%/%5
c<-x%/%15
ans<-3*a*(a+1)/2+5*b*(b+1)/2-15*c*(c+1)/2
doesnt really help with regards to the modulo
A slight improvement [on the OP]
system.time({
x_3 <- seq(3, 1E7, by = 3)
x_5 <- seq(5, 1E7, by = 5)
x_3_5 <- unique(c(x_3, x_5))
a <- sum(as.numeric(x_3_5))}
)
## user system elapsed
## 1.53 0.13 1.66
EDIT Having used profr to profile the code and replaced seq and unique with the internal generics / default methods.
new2 <- function(){
x_3 <- seq.int(3, 1E7, by = 3)
x_5 <- seq.int(5, 1E7, by = 5)
x_3_5 <- unique.default(c(x_3, x_5))
a <- sum(as.numeric(x_3_5))
}
system.time(new2())
## user system elapsed
## 1.11 0.04 1.16
For comparison (my slow machine):
system.time({
x <- 1:1E7
a <- sum(as.numeric(x[x %% 3 == 0 | x %% 5 == 0]))
})
## user system elapsed
## 4.47 0.18 4.64
Benchmarking
orig <- function(){
x <- 1:1E7
a <- sum(as.numeric(x[x %% 3 == 0 | x %% 5 == 0]))
}
new <- function(){
x_3 <- seq(3, 1E7, by = 3)
x_5 <- seq(5,1 E7, by = 5)
x_3_5 <- unique(c(x_3, x_5))
a <- sum(as.numeric(x_3_5))
}
benchmark(orig(), new(), new2(), replications = 5)
## test replications elapsed relative
## 2 new() 5 7.67 1.198438
## 3 new2() 5 6.40 1.000000
## 1 orig() 5 22.01 3.439063