plyr::aaply is very slow compared to nested loops

plyr::aaply is very slow compared to nested loops - r

I have a simple task to do. I have a 3D array (10,1350,1280) and I want to calculate the min over the first dimensions. I can do it using aaply like the following
minObs <- plyr::aaply(obs, c(2,3), min) # min of observation
But it is extremely slow compared to when I just write a nested loop.
minObs<-matrix(nrow=dim(obs)[2],ncol=dim(obs)[3])
for (i in 1:dim(obs)[2]){
for (j in 1:dim(obs)[3]){
minObs[i,j]<-min(obs[,i,j],na.rm = TRUE)
}
}
I am new to R , but I am guessing that I am doing something wrong with aaply function. And hint would be very much appreciated. How can I speed up using aaply?

Why not just use the base apply function?
apply(obs, c(2,3), min)
It's fast, doesn't require loading an additional package and gives the same result, as per:
all.equal(
apply(obs, 2:3, min),
aaply(obs, 2:3, min), check.attributes=FALSE)
#[1] TRUE
Timings using system.time() using a 10 x 1350 x 1280 array:
Loop
# user system elapsed
# 3.79 0.00 3.79
Base apply()
# user system elapsed
# 2.87 0.02 2.89
plyr::aaply()
#Timing stopped at: 122.1 0.04 122.24

Related

times of function apply()? apply() vs for loop? [duplicate]

It is often said that one should prefer lapply over for loops.
There are some exception as for example Hadley Wickham points out in his Advance R book.
(http://adv-r.had.co.nz/Functionals.html) (Modifying in place, Recursion etc).
The following is one of this case.
Just for sake of learning, I tried to rewrite a perceptron algorithm in a functional form in order to benchmark
relative performance.
source (https://rpubs.com/FaiHas/197581).
Here is the code.
# prepare input
data(iris)
irissubdf <- iris[1:100, c(1, 3, 5)]
names(irissubdf) <- c("sepal", "petal", "species")
head(irissubdf)
irissubdf$y <- 1
irissubdf[irissubdf[, 3] == "setosa", 4] <- -1
x <- irissubdf[, c(1, 2)]
y <- irissubdf[, 4]
# perceptron function with for
perceptron <- function(x, y, eta, niter) {
# initialize weight vector
weight <- rep(0, dim(x)[2] + 1)
errors <- rep(0, niter)
# loop over number of epochs niter
for (jj in 1:niter) {
# loop through training data set
for (ii in 1:length(y)) {
# Predict binary label using Heaviside activation
# function
z <- sum(weight[2:length(weight)] * as.numeric(x[ii,
])) + weight[1]
if (z < 0) {
ypred <- -1
} else {
ypred <- 1
}
# Change weight - the formula doesn't do anything
# if the predicted value is correct
weightdiff <- eta * (y[ii] - ypred) * c(1,
as.numeric(x[ii, ]))
weight <- weight + weightdiff
# Update error function
if ((y[ii] - ypred) != 0) {
errors[jj] <- errors[jj] + 1
}
}
}
# weight to decide between the two species
return(errors)
}
err <- perceptron(x, y, 1, 10)
### my rewriting in functional form auxiliary
### function
faux <- function(x, weight, y, eta) {
err <- 0
z <- sum(weight[2:length(weight)] * as.numeric(x)) +
weight[1]
if (z < 0) {
ypred <- -1
} else {
ypred <- 1
}
# Change weight - the formula doesn't do anything
# if the predicted value is correct
weightdiff <- eta * (y - ypred) * c(1, as.numeric(x))
weight <<- weight + weightdiff
# Update error function
if ((y - ypred) != 0) {
err <- 1
}
err
}
weight <- rep(0, 3)
weightdiff <- rep(0, 3)
f <- function() {
t <- replicate(10, sum(unlist(lapply(seq_along(irissubdf$y),
function(i) {
faux(irissubdf[i, 1:2], weight, irissubdf$y[i],
1)
}))))
weight <<- rep(0, 3)
t
}
I did not expected any consistent improvement due to the aforementioned
issues. But nevertheless I was really surprised when I saw the sharp worsening
using lapply and replicate.
I obtained this results using microbenchmark function from microbenchmark library
What could possibly be the reasons?
Could it be some memory leak?
expr min lq mean median uq
f() 48670.878 50600.7200 52767.6871 51746.2530 53541.2440
perceptron(as.matrix(irissubdf[1:2]), irissubdf$y, 1, 10) 4184.131 4437.2990 4686.7506 4532.6655 4751.4795
perceptronC(as.matrix(irissubdf[1:2]), irissubdf$y, 1, 10) 95.793 104.2045 123.7735 116.6065 140.5545
max neval
109715.673 100
6513.684 100
264.858 100
The first function is the lapply/replicate function
The second is the function with for loops
The third is the same function in C++ using Rcpp
Here According to Roland the profiling of the function.
I am not sure I can interpret it in the right way.
It looks like to me most of the time is spent in subsetting
Function profiling

First of all, it is an already long debunked myth that for loops are any slower than lapply. The for loops in R have been made a lot more performant and are currently at least as fast as lapply.
That said, you have to rethink your use of lapply here. Your implementation demands assigning to the global environment, because your code requires you to update the weight during the loop. And that is a valid reason to not consider lapply.
lapply is a function you should use for its side effects (or lack of side effects). The function lapply combines the results in a list automatically and doesn't mess with the environment you work in, contrary to a for loop. The same goes for replicate. See also this question:
Is R's apply family more than syntactic sugar?
The reason your lapply solution is far slower, is because your way of using it creates a lot more overhead.
replicate is nothing else but sapply internally, so you actually combine sapply and lapply to implement your double loop. sapply creates extra overhead because it has to test whether or not the result can be simplified. So a for loop will be actually faster than using replicate.
inside your lapply anonymous function, you have to access the dataframe for both x and y for every observation. This means that -contrary to in your for-loop- eg the function $ has to be called every time.
Because you use these high-end functions, your 'lapply' solution calls 49 functions, compared to your for solution that only calls 26. These extra functions for the lapply solution include calls to functions like match, structure, [[, names, %in%, sys.call, duplicated, ...
All functions not needed by your for loop as that one doesn't do any of these checks.
If you want to see where this extra overhead comes from, look at the internal code of replicate, unlist, sapply and simplify2array.
You can use the following code to get a better idea of where you lose your performance with the lapply. Run this line by line!
Rprof(interval = 0.0001)
f()
Rprof(NULL)
fprof <- summaryRprof()$by.self
Rprof(interval = 0.0001)
perceptron(as.matrix(irissubdf[1:2]), irissubdf$y, 1, 10)
Rprof(NULL)
perprof <- summaryRprof()$by.self
fprof$Fun <- rownames(fprof)
perprof$Fun <- rownames(perprof)
Selftime <- merge(fprof, perprof,
all = TRUE,
by = 'Fun',
suffixes = c(".lapply",".for"))
sum(!is.na(Selftime$self.time.lapply))
sum(!is.na(Selftime$self.time.for))
Selftime[order(Selftime$self.time.lapply, decreasing = TRUE),
c("Fun","self.time.lapply","self.time.for")]
Selftime[is.na(Selftime$self.time.for),]

There is more to the question of when to use for or lapply and which "performs" better. Sometimes speed is important, other times memory is important. To further complicate things, the time complexity may not be what you expect - that is, different behavior can be observed at different scopes, invalidating any blanket statement such as "faster than" or "at least as fast as". Finally, one performance metric often overlooked is thought-to-code, pre-mature optimization yada yada.
That said, in the Introduction to R the authors hint at some performance concerns:
Warning: for() loops are used in R code much less often than in compiled languages. Code that takes a ‘whole object’ view is likely to be both clearer and faster in R.
Given a similar use case, input and output, disregarding user preferences, is one clearly better than the other?
Benchmark - Fibonacci sequence
I compare approaches to compute 1 to N Fibonacci numbers (inspired by the benchmarkme package), shunning the 2nd Circle and ensuring that inputs and outputs for each approach are the same. Four additional approaches are included to throw some oil on the fire - a vectorized approach and purrr::map, and *apply variants vapply and sapply.
fib <- function(x, ...){
x <- 1:x ; phi = 1.6180339887498949 ; v = \() vector("integer", length(x))
bench::mark(
vector = {
y=v(); y = ((rep(phi, length(x))^x) - ((-rep(phi, length(x)))^-x)) / sqrt(5); y},
lapply = {
y=v(); y = unlist(lapply(x, \(.) (phi^. - (-phi)^(-.)) / sqrt(5)), use.names = F); y},
loop = {
y=v(); `for`(i, x, {y[i] = (phi^i - (-phi)^(-i)) / sqrt(5)}); y},
sapply = {
y=v(); y = sapply(x, \(.) (phi^. - (-phi)^(-.)) / sqrt(5)); y},
vapply = {
y=v(); y = vapply(x, \(.) (phi^. - (-phi)^(-.)) / sqrt(5), 1); y},
map = {
y=v(); y <- purrr::map_dbl(x, ~ (phi^. - (-phi)^(-.))/sqrt(5)); y
}, ..., check = T
)[c(1:9)]
}
Here is a comparison of the performance, ranked by median time.
lapply(list(3e2, 3e3, 3e4, 3e5, 3e6, 3e7), fib) # n iterations specified separately
N = 300
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
1 vector 38.8us 40.9us 21812. 8.44KB 0 1000 0 45.8ms
2 vapply 500us 545us 1653. 3.61KB 1.65 999 1 604ms
3 sapply 518us 556us 1725. 12.48KB 0 1000 0 580ms
4 lapply 513.4us 612.8us 1620. 6KB 8.14 995 5 614.2ms
5 loop 549.9us 633.6us 1455. 3.61KB 8.78 994 6 683.3ms
6 map 649.6us 754.6us 1312. 3.61KB 9.25 993 7 756.9ms
N = 3000
1 vector 769.7us 781.5us 1257. 82.3KB 1.26 999 1 794.83ms
2 vapply 5.38ms 5.58ms 173. 35.2KB 0.697 996 4 5.74s
3 sapply 5.59ms 5.83ms 166. 114.3KB 0.666 996 4 6.01s
4 loop 5.38ms 5.91ms 167. 35.2KB 8.78 950 50 5.69s
5 lapply 5.24ms 6.49ms 156. 58.7KB 8.73 947 53 6.07s
6 map 6.11ms 6.63ms 148. 35.2KB 9.13 942 58 6.35s
N = 30 000
1 vector 10.7ms 10.9ms 90.9 821KB 0.918 297 3 3.27s
2 vapply 57.3ms 60.1ms 16.4 351.66KB 0.741 287 13 17.5s
3 loop 59.2ms 60.7ms 15.9 352KB 16.7 146 154 9.21s
4 sapply 59.6ms 62.1ms 15.7 1.05MB 0.713 287 13 18.2s
5 lapply 57.3ms 67.6ms 15.1 586KB 20.5 127 173 8.43s
6 map 66.7ms 69.1ms 14.4 352KB 21.6 120 180 8.35s
N = 300 000
1 vector 190ms 193ms 5.14 8.01MB 0.206 100 4 19.45s
2 loop 693ms 713ms 1.40 3.43MB 7.43 100 532 1.19m
3 map 766ms 790ms 1.26 3.43MB 7.53 100 598 1.32m
4 vapply 633ms 814ms 1.33 3.43MB 0.851 100 39 45.8s
5 lapply 685ms 966ms 1.06 5.72MB 9.13 100 864 1.58m
6 sapply 694ms 813ms 1.27 12.01MB 0.810 100 39 48.1s
N = 3 000 000
1 vector 3.17s 3.21s 0.312 80.1MB 0.249 20 16 1.07m
2 vapply 8.22s 8.37s 0.118 34.3MB 4.97 20 845 2.83m
3 loop 8.3s 8.42s 0.119 34.3MB 4.35 20 733 2.81m
4 map 9.09s 9.17s 0.109 34.3MB 4.91 20 903 3.07m
5 lapply 10.42s 11.09s 0.0901 57.2MB 4.10 20 909 3.7m
6 sapply 10.43s 11.28s 0.0862 112.1MB 3.58 20 830 3.87m
N = 30 000 000
1 vector 44.8s 45.94s 0.0214 801MB 0.00854 10 4 7.8m
2 vapply 1.56m 1.6m 0.0104 343MB 0.883 10 850 16m
3 loop 1.56m 1.62m 0.00977 343MB 0.366 10 374 17.1m
4 map 1.72m 1.74m 0.00959 343MB 1.23 10 1279 17.4m
5 lapply 2.15m 2.22m 0.00748 572MB 0.422 10 565 22.3m
6 sapply 2.05m 2.25m 0.00747 1.03GB 0.405 10 542 22.3m
# Intel i5-8300H CPU # 2.30GHz / R version 4.1.1 / purrr 0.3.4
for and lapply approaches perform similarly, but lapply is greedier when it comes to memory, and a bit slower when the size of input increases (for this task). Note that purrr::map memory usage is equivalent to the for-loop, superior to that of lapply, in itself a debated topic. However, when the appropriate *apply* is used, here vapply, the performance is similar. But the choice could have a large impact on memory use, sapply being noticeably less memory efficient than vapply.
A peek under the hood reveals the reason of different performance for the approaches. The for-loop performs many type checks, resulting in some overhead. lapply on the other hand, suffers from a flawed language design where lazy evaluation, or use of promises, comes at a cost, the source code confirming that the X and FUN arguments to .Internal(lapply) are promises.
Vectorized approaches are fast, and probably desirable over a for or lapply approach. Notice how the vectorized approach grows irregularly compared to the other approaches. However, aesthetics of vectorized code may be a concern: which approach would you prefer to debug?
Overall, I'd say a choice between lapply or for is not something the average R user should ponder over. Stick to what is easiest to write, think of, and debug or that is less (silent?) error prone. What is lost in performance will likely be canceled out by time saved writing. For performance critical applications, make sure to run some tests with different input sizes and to properly chunk code.

Actually,
I did test the difference with a a problem that a solve recently.
Just try yourself.
In my conclusion, have no difference but for loop to my case were insignificantly more faster than lapply.
Ps: I try mostly keep the same logic in use.
ds <- data.frame(matrix(rnorm(1000000), ncol = 8))
n <- c('a','b','c','d','e','f','g','h')
func <- function(ds, target_col, query_col, value){
return (unique(as.vector(ds[ds[query_col] == value, target_col])))
}
f1 <- function(x, y){
named_list <- list()
for (i in y){
named_list[[i]] <- func(x, 'a', 'b', i)
}
return (named_list)
}
f2 <- function(x, y){
list2 <- lapply(setNames(nm = y), func, ds = x, target_col = "a", query_col = "b")
return(list2)
}
benchmark(f1(ds2, n ))
benchmark(f2(ds2, n ))
As you could see, I did a simple routine to build a named_list based in a dataframe, the func function does the column values extracted, the f1 uses a for loop to iterate through the dataframe and the f2 uses a lapply function.
In my computer I get this results:
test replications elapsed relative user.self sys.self user.child
1 f1(ds2, n) 100 110.24 1 110.112 0 0
sys.child
1 0
&&
test replications elapsed relative user.self sys.self user.child
1 f1(ds2, n) 100 110.24 1 110.112 0 0
sys.child
1 0

tapply() and acast() are super slow and memory consuming when grouping by two variables

I just noticed that tapply() and reshape2::acast() are super slow and memory consuming in scenario when grouping by two variables!
See this example:
#download data and functions for monitoring time & memory
download.file("http://artax.karlin.mff.cuni.cz/~ttel5535/pub/so/tapply,reshape2_slow/tapply,reshape_slow.Rdata", "tapply,reshape_slow.Rdata", mode="wb")
load(file = "tapply,reshape_slow.Rdata")
require(reshape2)
mstart()
xx <- acast(bb, fi ~ gi, sum, value.var = "hour")
mstop()
# user system elapsed
# 6.58 0.79 7.90
#max memory used: 911.2Mb.
Surprisingly very slow and memory greedy! Just to show properties of the data:
nrow(bb)
#[1] 9467
dim(xx)
#[1] 4850 1492
print(object.size(xx), units = "Mb")
#28 Mb
Now tapply():
mstart()
xx2 <- tapply(bb$hour, list(bb$fi, bb$gi), sum, default = 0)
mstop()
# user system elapsed
# 6.45 2.36 9.44
#max memory used: 1135.9Mb.
Even slower and more memory greedy!
Now, to compare, solution when grouping is done by SQLite and acast() is used only for the reshaping:
require(sqldf)
mstart()
xx3_0 <- sqldf("select fi, gi, sum(hour) as sum from bb group by fi, gi")
xx3 <- acast(xx3_0, fi ~ gi, fill = 0, value.var = "sum")
mstop()
# user system elapsed
# 0.22 0.05 0.28
#max memory used: 174.1Mb.
Normally, I'm using sqldf for almost every data operations, but now I wanted to make it "easier" by using basic functions :-) But now I am really surprised that these functions perform really bad! Didn't anyone notice yet? Or am I using them wrong?

Memory efficient scale() function

I am trying to scale a large matrix (the matrix I'm actually working with is much larger):
x = matrix(rnorm(1e8), nrow=1e4)
x = scale(x)
This matrix uses ~800 MB of memory. However, with lineprof, I see that the scale function allocates 9.5 GB of memory and releases 8.75 GB after it has finished running. Because this function is so memory inefficient, it will sometimes crash my session when I run it.
I am trying to find a memory-efficient way to run this function. If I code it myself, it only allocates ~6.8 GB, but this still seems like a lot:
x = matrix(rnorm(1e8), nrow=1e4)
u = apply(x, 2, mean)
s = apply(x, 2, sd)
x = t((t(x) - u)/s)
I thought I could do even better by splitting the columns of x into groups, then scaling each column group separately:
x = matrix(rnorm(1e8), nrow=1e4)
g = split(1:ncol(x), ceiling(1:ncol(x)/100))
for(j in g){
x[,j] = scale(x[,j])
}
With profvis, I see that overall this function is LESS efficient. It allocates 10.8 GB of memory and releases 10.5 GB. However, I think R can probably do garbage collection within the for loop, but it is not doing so because it doesn't need to. Is this correct? If so, then this might be the best option?
Questions:
• What is the best way to code functions like these to avoid memory crashes? (If a package is available, even better)
• How do I account for garbage collection while profiling code? My understanding is that GC isn't always run unless it is needed.
Update: In terms of runtime, splitting the columns into 10 groups is not much slower than using the scale(x) function. Running both functions on a [1000 x 1000] matrix, the mean runtimes assessed with microbenchmark are:
• scale(x) = 154 ms
• splitting into 10 column groups = 167 ms
• splitting into 1000 column groups (i.e. scaling each column separately) = 373 ms

Given that you do not need to keep the original matrix, you can save some memory by directly modifying it (instead of making copies of it). Also, you can bypass base::scale with a simple for loop. For instance:
library(profvis) # to profile RAM usage and time
library(matrixStats) # CPU/RAM efficient versions of rowMeans
profvis({
set.seed(1)
x = matrix(rnorm(1e7), nrow=1e3) # Note that I reduced nrow and ncol (I do not have enough RAM to test at your desired matrix dimension)
x = scale(x)
})
profvis({
set.seed(1)
x = matrix(rnorm(1e7), nrow=1e3) # Note that I reduced nrow and ncol (I do not have enough RAM to test at your desired matrix dimension)
mu = matrixStats::colMeans2(x)
sigma = matrixStats::colSds(x)
for(i in 1:ncol(x))
{
x[,i] = (x[,i]-mu[i])/sigma[i]
}
})
In my machine, peak memory is substantially reduced only with these minor changes (please test at your desired matrix dimension).

Modify my answer thanks to the comment of adn bps, regarding the use of memory. First I use the gc{base} Garbage Collection function, to release some memory.
gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 684317 36.6 1168576 62.5 940480 50.3
Vcells 1053307 8.1 2060183 15.8 1359327 10.4
gc(reset = TRUE)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 684296 36.6 1168576 62.5 684296 36.6
Vcells 1053271 8.1 2060183 15.8 1053271 8.1
I found a form that I think can help you, first I generate the matrix of rnorm with the Rcpp package, using c ++ code, it speeds up the process a bit
library(Rcpp)
cppFunction('NumericVector ranM(int n, int m) {
NumericVector v = rnorm(n * m);
v.attr("dim") = Dimension(n, m);
return v;
}')
system.time(x <- ranM(1e4,1e4))
user system elapsed
7.19 0.09 7.30
system.time(y<- matrix(rnorm(1e8), nrow=1e4))
user system elapsed
10.67 0.42 11.09
The size of matrix x and y is the same
print(object.size(x), units = "auto")
762.9 Mb
print(object.size(y), units = "auto")
762.9 Mb
#system.time(w <- scale(x))
# user system elapsed
# 11.86 5.79 221.54 without using gc(TRUE)
system.time(w <- scale(x))
user system elapsed
9.52 5.39 47.33 using gc(TRUE)
remove(w,y)
I load the library data.table, , and I convert the matrix x to the data.table class, to use the scale function
library(data.table)
system.time(z <- data.table(x))
system.time(z <- data.table(x))
user system elapsed
1.18 0.33 1.55
system.time(z<-z[, lapply(.SD, scale)])
user system elapsed
8.34 0.21 8.58
print(object.size(z), units = "auto")
763.5 Mb
Now I use the bigmemory library for efficient use of memory and I remove the original matrix x if I need it, so as not to accumulate heavy objects in the environment
library(bigmemory)
system.time(z <- as.big.matrix(z))
user system elapsed
15.90 6.64 23.34
print(object.size(x), units = "b")/print(object.size(z), units = "auto")
800000200 bytes
664 bytes
1204819.6 bytes
remove(x)
gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 783279 41.9 1442291 77.1 1442291 77.1
Vcells 1180947 9.1 461812104 3523.4 601095521 4586.0
elapsed time shows a considerable time improvement about 5 times faster.
Note that the bigmatrix object is 1 million times smaller. A short example that you can reproduce to see that the results in scale in matrix and bib.matrix are equal
set.seed(1)
m1 <- matrix(rnorm(5*5), nrow = 5)
m2 <- as.big.matrix(m1)
class(m2)
[1] "big.matrix"
attr(,"package")
[1] "bigmemory"
scale(m1) == scale(m2[,])
[,1] [,2] [,3] [,4] [,5]
[1,] TRUE TRUE TRUE TRUE TRUE
[2,] TRUE TRUE TRUE TRUE TRUE
[3,] TRUE TRUE TRUE TRUE TRUE
[4,] TRUE TRUE TRUE TRUE TRUE
[5,] TRUE TRUE TRUE TRUE TRUE

What are the performance differences between for-loops and the apply family of functions?

It is often said that one should prefer lapply over for loops.
There are some exception as for example Hadley Wickham points out in his Advance R book.
(http://adv-r.had.co.nz/Functionals.html) (Modifying in place, Recursion etc).
The following is one of this case.
Just for sake of learning, I tried to rewrite a perceptron algorithm in a functional form in order to benchmark
relative performance.
source (https://rpubs.com/FaiHas/197581).
Here is the code.
# prepare input
data(iris)
irissubdf <- iris[1:100, c(1, 3, 5)]
names(irissubdf) <- c("sepal", "petal", "species")
head(irissubdf)
irissubdf$y <- 1
irissubdf[irissubdf[, 3] == "setosa", 4] <- -1
x <- irissubdf[, c(1, 2)]
y <- irissubdf[, 4]
# perceptron function with for
perceptron <- function(x, y, eta, niter) {
# initialize weight vector
weight <- rep(0, dim(x)[2] + 1)
errors <- rep(0, niter)
# loop over number of epochs niter
for (jj in 1:niter) {
# loop through training data set
for (ii in 1:length(y)) {
# Predict binary label using Heaviside activation
# function
z <- sum(weight[2:length(weight)] * as.numeric(x[ii,
])) + weight[1]
if (z < 0) {
ypred <- -1
} else {
ypred <- 1
}
# Change weight - the formula doesn't do anything
# if the predicted value is correct
weightdiff <- eta * (y[ii] - ypred) * c(1,
as.numeric(x[ii, ]))
weight <- weight + weightdiff
# Update error function
if ((y[ii] - ypred) != 0) {
errors[jj] <- errors[jj] + 1
}
}
}
# weight to decide between the two species
return(errors)
}
err <- perceptron(x, y, 1, 10)
### my rewriting in functional form auxiliary
### function
faux <- function(x, weight, y, eta) {
err <- 0
z <- sum(weight[2:length(weight)] * as.numeric(x)) +
weight[1]
if (z < 0) {
ypred <- -1
} else {
ypred <- 1
}
# Change weight - the formula doesn't do anything
# if the predicted value is correct
weightdiff <- eta * (y - ypred) * c(1, as.numeric(x))
weight <<- weight + weightdiff
# Update error function
if ((y - ypred) != 0) {
err <- 1
}
err
}
weight <- rep(0, 3)
weightdiff <- rep(0, 3)
f <- function() {
t <- replicate(10, sum(unlist(lapply(seq_along(irissubdf$y),
function(i) {
faux(irissubdf[i, 1:2], weight, irissubdf$y[i],
1)
}))))
weight <<- rep(0, 3)
t
}
I did not expected any consistent improvement due to the aforementioned
issues. But nevertheless I was really surprised when I saw the sharp worsening
using lapply and replicate.
I obtained this results using microbenchmark function from microbenchmark library
What could possibly be the reasons?
Could it be some memory leak?
expr min lq mean median uq
f() 48670.878 50600.7200 52767.6871 51746.2530 53541.2440
perceptron(as.matrix(irissubdf[1:2]), irissubdf$y, 1, 10) 4184.131 4437.2990 4686.7506 4532.6655 4751.4795
perceptronC(as.matrix(irissubdf[1:2]), irissubdf$y, 1, 10) 95.793 104.2045 123.7735 116.6065 140.5545
max neval
109715.673 100
6513.684 100
264.858 100
The first function is the lapply/replicate function
The second is the function with for loops
The third is the same function in C++ using Rcpp
Here According to Roland the profiling of the function.
I am not sure I can interpret it in the right way.
It looks like to me most of the time is spent in subsetting
Function profiling

First of all, it is an already long debunked myth that for loops are any slower than lapply. The for loops in R have been made a lot more performant and are currently at least as fast as lapply.
That said, you have to rethink your use of lapply here. Your implementation demands assigning to the global environment, because your code requires you to update the weight during the loop. And that is a valid reason to not consider lapply.
lapply is a function you should use for its side effects (or lack of side effects). The function lapply combines the results in a list automatically and doesn't mess with the environment you work in, contrary to a for loop. The same goes for replicate. See also this question:
Is R's apply family more than syntactic sugar?
The reason your lapply solution is far slower, is because your way of using it creates a lot more overhead.
replicate is nothing else but sapply internally, so you actually combine sapply and lapply to implement your double loop. sapply creates extra overhead because it has to test whether or not the result can be simplified. So a for loop will be actually faster than using replicate.
inside your lapply anonymous function, you have to access the dataframe for both x and y for every observation. This means that -contrary to in your for-loop- eg the function $ has to be called every time.
Because you use these high-end functions, your 'lapply' solution calls 49 functions, compared to your for solution that only calls 26. These extra functions for the lapply solution include calls to functions like match, structure, [[, names, %in%, sys.call, duplicated, ...
All functions not needed by your for loop as that one doesn't do any of these checks.
If you want to see where this extra overhead comes from, look at the internal code of replicate, unlist, sapply and simplify2array.
You can use the following code to get a better idea of where you lose your performance with the lapply. Run this line by line!
Rprof(interval = 0.0001)
f()
Rprof(NULL)
fprof <- summaryRprof()$by.self
Rprof(interval = 0.0001)
perceptron(as.matrix(irissubdf[1:2]), irissubdf$y, 1, 10)
Rprof(NULL)
perprof <- summaryRprof()$by.self
fprof$Fun <- rownames(fprof)
perprof$Fun <- rownames(perprof)
Selftime <- merge(fprof, perprof,
all = TRUE,
by = 'Fun',
suffixes = c(".lapply",".for"))
sum(!is.na(Selftime$self.time.lapply))
sum(!is.na(Selftime$self.time.for))
Selftime[order(Selftime$self.time.lapply, decreasing = TRUE),
c("Fun","self.time.lapply","self.time.for")]
Selftime[is.na(Selftime$self.time.for),]

There is more to the question of when to use for or lapply and which "performs" better. Sometimes speed is important, other times memory is important. To further complicate things, the time complexity may not be what you expect - that is, different behavior can be observed at different scopes, invalidating any blanket statement such as "faster than" or "at least as fast as". Finally, one performance metric often overlooked is thought-to-code, pre-mature optimization yada yada.
That said, in the Introduction to R the authors hint at some performance concerns:
Warning: for() loops are used in R code much less often than in compiled languages. Code that takes a ‘whole object’ view is likely to be both clearer and faster in R.
Given a similar use case, input and output, disregarding user preferences, is one clearly better than the other?
Benchmark - Fibonacci sequence
I compare approaches to compute 1 to N Fibonacci numbers (inspired by the benchmarkme package), shunning the 2nd Circle and ensuring that inputs and outputs for each approach are the same. Four additional approaches are included to throw some oil on the fire - a vectorized approach and purrr::map, and *apply variants vapply and sapply.
fib <- function(x, ...){
x <- 1:x ; phi = 1.6180339887498949 ; v = \() vector("integer", length(x))
bench::mark(
vector = {
y=v(); y = ((rep(phi, length(x))^x) - ((-rep(phi, length(x)))^-x)) / sqrt(5); y},
lapply = {
y=v(); y = unlist(lapply(x, \(.) (phi^. - (-phi)^(-.)) / sqrt(5)), use.names = F); y},
loop = {
y=v(); `for`(i, x, {y[i] = (phi^i - (-phi)^(-i)) / sqrt(5)}); y},
sapply = {
y=v(); y = sapply(x, \(.) (phi^. - (-phi)^(-.)) / sqrt(5)); y},
vapply = {
y=v(); y = vapply(x, \(.) (phi^. - (-phi)^(-.)) / sqrt(5), 1); y},
map = {
y=v(); y <- purrr::map_dbl(x, ~ (phi^. - (-phi)^(-.))/sqrt(5)); y
}, ..., check = T
)[c(1:9)]
}
Here is a comparison of the performance, ranked by median time.
lapply(list(3e2, 3e3, 3e4, 3e5, 3e6, 3e7), fib) # n iterations specified separately
N = 300
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
1 vector 38.8us 40.9us 21812. 8.44KB 0 1000 0 45.8ms
2 vapply 500us 545us 1653. 3.61KB 1.65 999 1 604ms
3 sapply 518us 556us 1725. 12.48KB 0 1000 0 580ms
4 lapply 513.4us 612.8us 1620. 6KB 8.14 995 5 614.2ms
5 loop 549.9us 633.6us 1455. 3.61KB 8.78 994 6 683.3ms
6 map 649.6us 754.6us 1312. 3.61KB 9.25 993 7 756.9ms
N = 3000
1 vector 769.7us 781.5us 1257. 82.3KB 1.26 999 1 794.83ms
2 vapply 5.38ms 5.58ms 173. 35.2KB 0.697 996 4 5.74s
3 sapply 5.59ms 5.83ms 166. 114.3KB 0.666 996 4 6.01s
4 loop 5.38ms 5.91ms 167. 35.2KB 8.78 950 50 5.69s
5 lapply 5.24ms 6.49ms 156. 58.7KB 8.73 947 53 6.07s
6 map 6.11ms 6.63ms 148. 35.2KB 9.13 942 58 6.35s
N = 30 000
1 vector 10.7ms 10.9ms 90.9 821KB 0.918 297 3 3.27s
2 vapply 57.3ms 60.1ms 16.4 351.66KB 0.741 287 13 17.5s
3 loop 59.2ms 60.7ms 15.9 352KB 16.7 146 154 9.21s
4 sapply 59.6ms 62.1ms 15.7 1.05MB 0.713 287 13 18.2s
5 lapply 57.3ms 67.6ms 15.1 586KB 20.5 127 173 8.43s
6 map 66.7ms 69.1ms 14.4 352KB 21.6 120 180 8.35s
N = 300 000
1 vector 190ms 193ms 5.14 8.01MB 0.206 100 4 19.45s
2 loop 693ms 713ms 1.40 3.43MB 7.43 100 532 1.19m
3 map 766ms 790ms 1.26 3.43MB 7.53 100 598 1.32m
4 vapply 633ms 814ms 1.33 3.43MB 0.851 100 39 45.8s
5 lapply 685ms 966ms 1.06 5.72MB 9.13 100 864 1.58m
6 sapply 694ms 813ms 1.27 12.01MB 0.810 100 39 48.1s
N = 3 000 000
1 vector 3.17s 3.21s 0.312 80.1MB 0.249 20 16 1.07m
2 vapply 8.22s 8.37s 0.118 34.3MB 4.97 20 845 2.83m
3 loop 8.3s 8.42s 0.119 34.3MB 4.35 20 733 2.81m
4 map 9.09s 9.17s 0.109 34.3MB 4.91 20 903 3.07m
5 lapply 10.42s 11.09s 0.0901 57.2MB 4.10 20 909 3.7m
6 sapply 10.43s 11.28s 0.0862 112.1MB 3.58 20 830 3.87m
N = 30 000 000
1 vector 44.8s 45.94s 0.0214 801MB 0.00854 10 4 7.8m
2 vapply 1.56m 1.6m 0.0104 343MB 0.883 10 850 16m
3 loop 1.56m 1.62m 0.00977 343MB 0.366 10 374 17.1m
4 map 1.72m 1.74m 0.00959 343MB 1.23 10 1279 17.4m
5 lapply 2.15m 2.22m 0.00748 572MB 0.422 10 565 22.3m
6 sapply 2.05m 2.25m 0.00747 1.03GB 0.405 10 542 22.3m
# Intel i5-8300H CPU # 2.30GHz / R version 4.1.1 / purrr 0.3.4
for and lapply approaches perform similarly, but lapply is greedier when it comes to memory, and a bit slower when the size of input increases (for this task). Note that purrr::map memory usage is equivalent to the for-loop, superior to that of lapply, in itself a debated topic. However, when the appropriate *apply* is used, here vapply, the performance is similar. But the choice could have a large impact on memory use, sapply being noticeably less memory efficient than vapply.
A peek under the hood reveals the reason of different performance for the approaches. The for-loop performs many type checks, resulting in some overhead. lapply on the other hand, suffers from a flawed language design where lazy evaluation, or use of promises, comes at a cost, the source code confirming that the X and FUN arguments to .Internal(lapply) are promises.
Vectorized approaches are fast, and probably desirable over a for or lapply approach. Notice how the vectorized approach grows irregularly compared to the other approaches. However, aesthetics of vectorized code may be a concern: which approach would you prefer to debug?
Overall, I'd say a choice between lapply or for is not something the average R user should ponder over. Stick to what is easiest to write, think of, and debug or that is less (silent?) error prone. What is lost in performance will likely be canceled out by time saved writing. For performance critical applications, make sure to run some tests with different input sizes and to properly chunk code.

Actually,
I did test the difference with a a problem that a solve recently.
Just try yourself.
In my conclusion, have no difference but for loop to my case were insignificantly more faster than lapply.
Ps: I try mostly keep the same logic in use.
ds <- data.frame(matrix(rnorm(1000000), ncol = 8))
n <- c('a','b','c','d','e','f','g','h')
func <- function(ds, target_col, query_col, value){
return (unique(as.vector(ds[ds[query_col] == value, target_col])))
}
f1 <- function(x, y){
named_list <- list()
for (i in y){
named_list[[i]] <- func(x, 'a', 'b', i)
}
return (named_list)
}
f2 <- function(x, y){
list2 <- lapply(setNames(nm = y), func, ds = x, target_col = "a", query_col = "b")
return(list2)
}
benchmark(f1(ds2, n ))
benchmark(f2(ds2, n ))
As you could see, I did a simple routine to build a named_list based in a dataframe, the func function does the column values extracted, the f1 uses a for loop to iterate through the dataframe and the f2 uses a lapply function.
In my computer I get this results:
test replications elapsed relative user.self sys.self user.child
1 f1(ds2, n) 100 110.24 1 110.112 0 0
sys.child
1 0
&&
test replications elapsed relative user.self sys.self user.child
1 f1(ds2, n) 100 110.24 1 110.112 0 0
sys.child
1 0

Equivalent to rowMeans() for min()

I have seen this question being asked multiple times on the R mailing list, but still could not find a satisfactory answer.
Suppose I a matrix m
m <- matrix(rnorm(10000000), ncol=10)
I can get the mean of each row by:
system.time(rowMeans(m))
user system elapsed
0.100 0.000 0.097
But obtaining the minimum value of each row by
system.time(apply(m,1,min))
user system elapsed
16.157 0.400 17.029
takes more than 100 times as long, is there a way to speed this up?

You could use pmin, but you would have to get each column of your matrix into a separate vector. One way to do that is to convert it to a data.frame then call pmin via do.call (since data.frames are lists).
system.time(do.call(pmin, as.data.frame(m)))
# user system elapsed
# 0.940 0.000 0.949
system.time(apply(m,1,min))
# user system elapsed
# 16.84 0.00 16.95

Quite late to the party, but as the author of matrixStats and in case someone spots this, please note that matrixStats::rowMins() is very fast these days, e.g.
library(microbenchmark)
library(Biobase) # rowMin()
library(matrixStats) # rowMins()
options(digits=3)
m <- matrix(rnorm(10000000), ncol=10)
stats <- microbenchmark(
rowMeans(m), ## A benchmark by OP
rowMins(m),
rowMin(m),
do.call(pmin, as.data.frame(m)),
apply(m, MARGIN=1L, FUN=min),
times=10
)
> stats
Unit: milliseconds
expr min lq mean median uq max
rowMeans(m) 77.7 82.7 85.7 84.4 90.3 98.2
rowMins(m) 72.9 74.1 88.0 79.0 90.2 147.4
rowMin(m) 341.1 347.1 395.9 383.4 395.1 607.7
do.call(pmin, as.data.frame(m)) 326.4 357.0 435.4 401.0 437.6 657.9
apply(m, MARGIN = 1L, FUN = min) 3761.9 3963.8 4120.6 4109.8 4198.7 4567.4

If you want to stick to CRAN packages, then both the matrixStats and the fBasics packages have the function rowMins [note the s which is not in the Biobase function] and a variety of other row and column statistics.

library("sos")
findFn("rowMin")
gets a hit in the Biobase package, from Bioconductor ...
source("http://bioconductor.org/biocLite.R")
biocLite("Biobase")
m <- matrix(rnorm(10000000), ncol=10)
system.time(rowMeans(m))
## user system elapsed
## 0.132 0.148 0.279
system.time(apply(m,1,min))
## user system elapsed
## 11.825 1.688 13.603
library(Biobase)
system.time(rowMin(m))
## user system elapsed
## 0.688 0.172 0.864
Not as fast as rowMeans, but a lot faster than apply(...,1,min)

I've been meaning to try out the new compiler package in R 2.13.0. This essentially follows the post outlined by Dirk here.
library(compiler)
library(rbenchmark)
rowMin <- function(x, ind) apply(x, ind, min)
crowMin <- cmpfun(rowMin)
benchmark(
rowMin(m,1)
, crowMin(m,1)
, columns=c("test", "replications","elapsed","relative")
, order="relative"
, replications=10)
)
And the results:
test replications elapsed relative
2 crowMin(m, 1) 10 120.091 1.0000
1 rowMin(m, 1) 10 122.745 1.0221
Anticlimatic to say the least, though looks like you've gotten some other good options.

Not particularly R-idiosyncratic, but surely the fastest method is just to use pmin and loop over columns:
x <- m[,1]
for (i in 2:ncol(m)) x <- pmin(x, m[,i])
On my machine that takes just 3 times longer than rowMeans for the 1e+07x10 matrix, and is slightly faster than the do.call method via data.frame.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

plyr::aaply is very slow compared to nested loops - r

Related

times of function apply()? apply() vs for loop? [duplicate]

tapply() and acast() are super slow and memory consuming when grouping by two variables

Memory efficient scale() function

What are the performance differences between for-loops and the apply family of functions?

Equivalent to rowMeans() for min()

Categories

Resources