Fast Simplex in R - r

I am not sure whether this is the right place to ask, but I need to efficiently implement a very large linear programming problem in R (and there is no way around R here). I have tried out some packages like lpSolve but the results seem unsatisfactory. I would be glad for any advice on packages, or alternatively a better place to ask this question.
Here is the problem:
N <- 10^4 # number of products
K <- 10^4 # number of scenarios
### Get expectation and covariance matrix
mu <- rep(100,N)
A <- matrix(rnorm(N^2,0,1), nrow=N, ncol=N)
Sigma <- t(A) %*% A
R <- mvrnorm(K, mu, Sigma) # create scenarios
means <- apply(R, 2, mean) # computes mean for each product
### The LP
# There are some additional constraints to pure expectation maximization
# This leads to additional variables
c <- c(-means,0,rep(0,K))
A1 <- c(rep(1,N),0,rep(0,K))
A2 <- c(rep(0,N),1,-rep((0.05*K)^(-1),K))
A3 <- cbind(R,rep(-1,K),diag(1,K))
A <- rbind(A1,A2,A3)
b <- c(1,98,rep(0,K))
system.time(lp <- lp(direction = "min", objective.in = c, const.mat = A,
const.dir = c("=", ">=", rep(">=",K)), const.rhs = b))

You can try the Rglpk package. In the kantorovich package, there are two functions which solve the same linear programming problem: one using Rglpk, the other one using lpSolve. Benchmarks show that the first one is faster when the data is large.
library(kantorovich)
library(microbenchmark)
mu <- rpois(50, 20)
mu <- mu/sum(mu)
nu <- rpois(50, 20)
nu <- nu/sum(nu)
microbenchmark(
Rglpk = kantorovich_glpk(mu, nu),
lpSolve = kantorovich_lp(mu, nu),
times = 3
)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# Rglpk 402.0955 552.3754 605.159 702.6553 706.6908 710.7264 3 a
# lpSolve 1092.7131 1184.7517 1327.208 1276.7904 1444.4552 1612.1200 3 b
Edit: better with 'CVXR'
I tried and CVXR with the ECOS solver is faster:
Unit: milliseconds
expr min lq mean median uq max neval cld
Rglpk 364.2730 378.3198 383.0741 392.3666 392.4746 392.5827 3 b
lpSolve 1012.4465 1087.0728 1128.5777 1161.6990 1186.6433 1211.5875 3 c
CVXR 370.5944 386.2229 392.8022 401.8515 403.9061 405.9607 3 b
CVXR_GLPK 483.2246 488.4495 508.6683 493.6744 521.3902 549.1060 3 b
CVXR_ECOS 219.0252 222.9561 224.3361 226.8871 226.9915 227.0960 3 a

Related

How can I vectorize a for loop used for permutation?

I am using R for analysis and would like to perform a permutation test. For this I am using a for loop that is quite slow and I would like to make the code as fast as possible. I think that vectorization is key for this. However, after several days of trying I still haven't found a suitable solution how to re-code this. I would deeply appreciate your help!
I have a symmetrical matrix with pairwise ecological distances between populations ("dist.mat"). I want to randomly shuffle the rows and columns of this distance matrix to generate a permuted distance matrix ("dist.mat.mix"). Then, I would like to save the upper triangular values in this permuted distance matrix (of the size of "nr.pairs"). This process should be repeated several times ("nr.runs"). The result should be a matrix ("result") containing the permuted upper triangular values of the several runs, with the dimensions of nrow=nr.runs and ncol=nr.pairs. Below an example R code that is doing what I want using a for loop:
# example number of populations
nr.pops <- 20
# example distance matrix
dist.mat <- as.matrix(dist(matrix(rnorm(20), nr.pops, 5)))
# example number of runs
nr.runs <- 1000
# find number of unique pairwise distances in distance matrix
nr.pairs <- nr.pops*(nr.pops-1) / 2
# start loop
result <- matrix(NA, nr.runs, nr.pairs)
for (i in 1:nr.runs) {
mix <- sample(nr.pops, replace=FALSE)
dist.mat.mix <- dist.mat[mix, mix]
result[i, ] <- dist.mat.mix[upper.tri(dist.mat.mix, diag=FALSE)]
}
# inspect result
result
I already made some clumsy vectorization attempts with the base::replicate function, but this doesn't speed things up. Actually it's a bit slower:
# my for loop approach
my.for.loop <- function() {
result <- matrix(NA, nr.runs, nr.pairs)
for (i in 1:nr.runs){
mix <- sample(nr.pops, replace=FALSE)
dist.mat.mix <- dist.mat[mix ,mix]
result[i, ] <- dist.mat.mix[upper.tri(dist.mat.mix, diag=FALSE)]
}
}
# my replicate approach
my.replicate <- function() {
results <- t(replicate(nr.runs, {
mix <- sample(nr.pops, replace=FALSE)
dist.mat.mix <- dist.mat[mix, mix]
dist.mat.mix[upper.tri(dist.mat.mix, diag=FALSE)]
}))
}
# compare speed
require(microbenchmark)
microbenchmark(my.for.loop(), my.replicate(), times=100L)
# Unit: milliseconds
# expr min lq mean median uq max neval
# my.for.loop() 23.1792 24.4759 27.1274 25.5134 29.0666 61.5616 100
# my.replicate() 25.5293 27.4649 30.3495 30.2533 31.4267 68.6930 100
I would deeply appreciate your support in case you know how to speed up my for loop using a neat vectorized solution. Is this even possible?
Slightly faster:
minem <- function() {
result <- matrix(NA, nr.runs, nr.pairs)
ut <- upper.tri(matrix(NA, 4, 4)) # create upper triangular index matrix outside loop
for (i in 1:nr.runs) {
mix <- sample.int(nr.pops) # slightly faster sampling function
result[i, ] <- dist.mat[mix, mix][ut]
}
result
}
microbenchmark(my.for.loop(), my.replicate(), minem(), times = 100L)
# Unit: microseconds
# expr min lq mean median uq max neval cld
# my.for.loop() 75.062 78.222 96.25288 80.1975 104.6915 249.284 100 a
# my.replicate() 118.519 122.667 152.25681 126.0250 165.1355 495.407 100 a
# minem() 45.432 48.000 104.23702 49.5800 52.9380 4848.986 100 a
Update:
We can get the necessary matrix indexes a little bit differently, so we can subset the elements at once:
minem4 <- function() {
n <- dim(dist.mat)[1]
ut <- upper.tri(matrix(NA, n, n))
im <- matrix(1:n, n, n)
p1 <- im[ut]
p2 <- t(im)[ut]
dm <- unlist(dist.mat)
si <- replicate(nr.runs, sample.int(nr.pops))
p <- (si[p1, ] - 1L) * n + si[p2, ]
result2 <- matrix(dm[p], nr.runs, nr.pairs, byrow = T)
result2
}
microbenchmark(my.for.loop(), minem(), minem4(), times = 100L)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# my.for.loop() 13.797526 14.977970 19.14794 17.071401 23.161867 29.98952 100 b
# minem() 8.366614 9.080490 11.82558 9.701725 15.748537 24.44325 100 a
# minem4() 7.716343 8.169477 11.91422 8.723947 9.997626 208.90895 100 a
Update2:
Some additional speedup we can get using dqrng sample function:
minem5 <- function() {
n <- dim(dist.mat)[1]
ut <- upper.tri(matrix(NA, n, n))
im <- matrix(1:n, n, n)
p1 <- im[ut]
p2 <- t(im)[ut]
dm <- unlist(dist.mat)
require(dqrng)
si <- replicate(nr.runs, dqsample.int(nr.pops))
p <- (si[p1, ] - 1L) * n + si[p2, ]
result2 <- matrix(dm[p], nr.runs, nr.pairs, byrow = T)
result2
}
microbenchmark(my.for.loop(), minem(), minem4(), minem5(), times = 100L)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# my.for.loop() 13.648983 14.672587 17.713467 15.265771 16.967894 36.18290 100 d
# minem() 8.282466 8.773725 10.679960 9.279602 10.335206 27.03683 100 c
# minem4() 7.719503 8.208984 9.039870 8.493231 9.097873 25.32463 100 b
# minem5() 6.134911 6.379850 7.226348 6.733035 7.195849 19.02458 100 a

Efficiently find set differences and generate random sample

I have a very large data set with categorical labels a and a vector b that contains all possible labels in the data set:
a <- c(1,1,3,2) # artificial data
b <- c(1,2,3,4) # fixed categories
Now I want to find for each observation in a the set of all remaining categories (that is, the elements of b excluding the given observation in a). From these remaining categories, I want to sample one at random.
My approach using a loop is
goal <- numeric() # container for results
for(i in 1:4){
d <- setdiff(b, a[i]) # find the categories except the one observed in the data
goal[i] <- sample(d,1) # sample one of the remaining categories randomly
}
goal
[1] 4 4 1 1
However, this has to be done a large number of times and applied to very large data sets. Does anyone have a more efficient version that leads to the desired result?
EDIT:
The function by akrun is unfortunately slower than the original loop. If anyone has a creative idea with a competitive result, I'm happy to hear it!
We can use vapply
vapply(a, function(x) sample(setdiff(b, x), 1), numeric(1))
set.seed(24)
a <- sample(c(1:4), 10000, replace=TRUE)
b <- 1:4
system.time(vapply(a, function(x) sample(setdiff(b, x), 1), numeric(1)))
# user system elapsed
# 0.208 0.007 0.215
It turns out that resampling the labels that are equal to the labels in the data is an even faster approach, using
test = sample(b, length(a), replace=T)
resample = (a == test)
while(sum(resample>0)){
test[resample] = sample(b, sum(resample), replace=T)
resample = (a == test)
}
Updated Benchmarks for N=10,000:
Unit: microseconds
expr min lq mean median uq max neval
loop 14337.492 14954.595 16172.2165 15227.010 15585.5960 24071.727 100
akrun 14899.000 15507.978 16271.2095 15736.985 16050.6690 24085.839 100
resample 87.242 102.423 113.4057 112.473 122.0955 174.056 100
shree(data = a, labels = b) 5195.128 5369.610 5472.4480 5454.499 5574.0285 5796.836 100
shree_mapply(data = a, labels = b) 1500.207 1622.516 1913.1614 1682.814 1754.0190 10449.271 100
Update: Here's a fast version with mapply. This method avoids calling sample() for every iteration so is a bit faster. -
mapply(function(x, y) b[!b == x][y], a, sample(length(b) - 1, length(a), replace = T))
Here's a version without setdiff (setdiff can be a bit slow) although I think even more optimization is possible. -
vapply(a, function(x) sample(b[!b == x], 1), numeric(1))
Benchmarks -
set.seed(24)
a <- sample(c(1:4), 1000, replace=TRUE)
b <- 1:4
microbenchmark::microbenchmark(
akrun = vapply(a, function(x) sample(setdiff(b, x), 1), numeric(1)),
shree = vapply(a, function(x) sample(b[!b == x], 1), numeric(1)),
shree_mapply = mapply(function(x, y) b[!b == x][y], a, sample(length(b) - 1, length(a), replace = T))
)
Unit: milliseconds
expr min lq mean median uq max neval
akrun 28.7347 30.66955 38.319655 32.57875 37.45455 237.1690 100
shree 5.6271 6.05740 7.531964 6.47270 6.87375 45.9081 100
shree_mapply 1.8286 2.01215 2.628989 2.14900 2.54525 7.7700 100

r - Parallel allocation of a matrix [duplicate]

I want to do a simple column (Nx1) times row (1xM) multiplication, resulting in (NxM) matrix.
Code where I create a row by sequence, and column by transposing a similar sequence
row1 <- seq(1:6)
col1 <- t(seq(1:6))
col1 * row1
Output which indicates that R thinks matrices more like columns
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 4 9 16 25 36
Expected output: NxM matrix.
OS: Debian 8.5
Linux kernel: 4.6 backports
Hardware: Asus Zenbook UX303UA
In this case using outer would be a more natural choice
outer(1:6, 1:6)
In general for two numerical vectors x and y, the matrix rank-1 operation can be computed as
outer(x, y)
If you want to resort to real matrix multiplication routines, use tcrossprod:
tcrossprod(x, y)
If either of your x and y is a matrix with dimension, use as.numeric to cast it as a vector first.
It is not recommended to use general matrix multiplication operation "%*%" for this. But if you want, make sure you get comformable dimension: x is a one-column matrix and y is a one-row matrix, so x %*% y.
Can you say anything about efficiency?
Matrix rank-1 operation is known to be memory-bound. So make sure we use gc() for garbage collection to tell R to release memory from heap after every replicate (otherwise your system will stall):
x <- runif(500)
y <- runif(500)
xx <- matrix(x, ncol = 1)
yy <- matrix(y, nrow = 1)
system.time(replicate(200, {outer(x,y); gc();}))
# user system elapsed
# 4.484 0.324 4.837
system.time(replicate(200, {tcrossprod(x,y); gc();}))
# user system elapsed
# 4.320 0.324 4.653
system.time(replicate(200, {xx %*% yy; gc();}))
# user system elapsed
# 4.372 0.324 4.708
In terms of performance, they are all very alike.
Follow-up
When I came back I saw another answer with a different benchmark. Well, the thing is, it depends on the problem size. If you just try a small example you can not eliminate function interpretation / calling overhead for all three functions. If you do
x <- y <- runif(500)
microbenchmark(tcrossprod(x,y), x %*% t(y), outer(x,y), times = 200)
you will see roughly identical performance again.
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# tcrossprod(x, y) 2.09644 2.42466 3.402483 2.60424 3.94238 35.52176 200 a
# x %*% t(y) 2.22520 2.55678 3.707261 2.66722 4.05046 37.11660 200 a
# outer(x, y) 2.08496 2.55424 3.695660 2.69512 4.08938 35.41044 200 a
Here's a comparison of the execution speed for the three methods when the vectors being used are of length 100. The fastest is tcrossprod, with x%*%t(y) taking 17% longer and outer(x,y) taking 45% longer (in median time).
In the table, neval is the number of times the function was evaluated to get the benchmark scores.
> x <- runif(100,0,100)
> y <- runif(100,0,100)
> microbenchmark(tcrossprod(x,y), x%*%t(y), outer(x,y), times=5000)
Unit: microseconds
expr min lq mean median uq max neval
tcrossprod(x, y) 11.404 16.6140 50.42392 17.7300 18.7555 5590.103 5000
x %*% t(y) 13.878 19.4315 48.80170 20.5405 21.7310 4459.517 5000
outer(x, y) 19.238 24.0810 72.05250 25.3595 26.8920 89861.855 5000
To get the the following graph, have
library("ggplot2")
bench <- microbenchmark(tcrossprod(x,y), x%*%t(y), outer(x,y), times=5000)
autplot(bench)
Edit: The performance depends on the size of x and y, and of course the machine running the code. I originally did the benchmark with vectors of length 100 because that's what Masi asked about. However, it appears the three methods have very similar performance for larger vectors. For vectors of length 1000, the median times of the three methods are within 5% of each other on my machine.
> x <- runif(1000)
> y <- runif(1000)
> microbenchmark(tcrossprod(x,y),x%*%t(y),outer(x,y),times=2000)
Unit: milliseconds
expr min lq mean median uq max neval
tcrossprod(x, y) 1.870282 2.030541 4.721175 2.916133 4.482346 75.77459 2000
x %*% t(y) 1.861947 2.067908 4.921061 3.067670 4.527197 105.60500 2000
outer(x, y) 1.886348 2.078958 5.114886 3.033927 4.556067 93.93450 2000
An easy way to look at this is to transform your vectors to a matrix
row1.mat = matrix(row1)
col1.mat = matrix(col1)
and then use dim to see the dimension of the matrices:
dim(row1.mat)
dim(col1.mat)
If you want the product to work for this you need a 6*1 matrix, multiplied by a 1*6 matrix. so you need to transpose the col1.mat using t(col1.mat).
And as you might know the matrix product is %*%
row1.mat %*% t(col1.mat)
Comparison of this method to others
library("microbenchmark")
x <- runif(1000)
y <- runif(1000)
xx = matrix(x)
yy = matrix(y)
microbenchmark(tcrossprod(x,y),x%*%t(y),outer(x,y), xx %*% t(yy), times=2000)
Unit: milliseconds
expr min lq mean median uq max neval
tcrossprod(x, y) 2.829099 3.243785 6.015880 4.801640 5.040636 77.87932 2000
x %*% t(y) 2.847175 3.251414 5.942841 4.810261 5.049474 86.53374 2000
outer(x, y) 2.886059 3.277811 5.983455 4.788054 5.074997 96.12442 2000
xx %*% t(yy) 2.868185 3.255833 6.126183 4.699884 5.056234 87.80024 2000

How to do R multiplication with Nx1 1xM for Matrix NxM?

I want to do a simple column (Nx1) times row (1xM) multiplication, resulting in (NxM) matrix.
Code where I create a row by sequence, and column by transposing a similar sequence
row1 <- seq(1:6)
col1 <- t(seq(1:6))
col1 * row1
Output which indicates that R thinks matrices more like columns
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 4 9 16 25 36
Expected output: NxM matrix.
OS: Debian 8.5
Linux kernel: 4.6 backports
Hardware: Asus Zenbook UX303UA
In this case using outer would be a more natural choice
outer(1:6, 1:6)
In general for two numerical vectors x and y, the matrix rank-1 operation can be computed as
outer(x, y)
If you want to resort to real matrix multiplication routines, use tcrossprod:
tcrossprod(x, y)
If either of your x and y is a matrix with dimension, use as.numeric to cast it as a vector first.
It is not recommended to use general matrix multiplication operation "%*%" for this. But if you want, make sure you get comformable dimension: x is a one-column matrix and y is a one-row matrix, so x %*% y.
Can you say anything about efficiency?
Matrix rank-1 operation is known to be memory-bound. So make sure we use gc() for garbage collection to tell R to release memory from heap after every replicate (otherwise your system will stall):
x <- runif(500)
y <- runif(500)
xx <- matrix(x, ncol = 1)
yy <- matrix(y, nrow = 1)
system.time(replicate(200, {outer(x,y); gc();}))
# user system elapsed
# 4.484 0.324 4.837
system.time(replicate(200, {tcrossprod(x,y); gc();}))
# user system elapsed
# 4.320 0.324 4.653
system.time(replicate(200, {xx %*% yy; gc();}))
# user system elapsed
# 4.372 0.324 4.708
In terms of performance, they are all very alike.
Follow-up
When I came back I saw another answer with a different benchmark. Well, the thing is, it depends on the problem size. If you just try a small example you can not eliminate function interpretation / calling overhead for all three functions. If you do
x <- y <- runif(500)
microbenchmark(tcrossprod(x,y), x %*% t(y), outer(x,y), times = 200)
you will see roughly identical performance again.
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# tcrossprod(x, y) 2.09644 2.42466 3.402483 2.60424 3.94238 35.52176 200 a
# x %*% t(y) 2.22520 2.55678 3.707261 2.66722 4.05046 37.11660 200 a
# outer(x, y) 2.08496 2.55424 3.695660 2.69512 4.08938 35.41044 200 a
Here's a comparison of the execution speed for the three methods when the vectors being used are of length 100. The fastest is tcrossprod, with x%*%t(y) taking 17% longer and outer(x,y) taking 45% longer (in median time).
In the table, neval is the number of times the function was evaluated to get the benchmark scores.
> x <- runif(100,0,100)
> y <- runif(100,0,100)
> microbenchmark(tcrossprod(x,y), x%*%t(y), outer(x,y), times=5000)
Unit: microseconds
expr min lq mean median uq max neval
tcrossprod(x, y) 11.404 16.6140 50.42392 17.7300 18.7555 5590.103 5000
x %*% t(y) 13.878 19.4315 48.80170 20.5405 21.7310 4459.517 5000
outer(x, y) 19.238 24.0810 72.05250 25.3595 26.8920 89861.855 5000
To get the the following graph, have
library("ggplot2")
bench <- microbenchmark(tcrossprod(x,y), x%*%t(y), outer(x,y), times=5000)
autplot(bench)
Edit: The performance depends on the size of x and y, and of course the machine running the code. I originally did the benchmark with vectors of length 100 because that's what Masi asked about. However, it appears the three methods have very similar performance for larger vectors. For vectors of length 1000, the median times of the three methods are within 5% of each other on my machine.
> x <- runif(1000)
> y <- runif(1000)
> microbenchmark(tcrossprod(x,y),x%*%t(y),outer(x,y),times=2000)
Unit: milliseconds
expr min lq mean median uq max neval
tcrossprod(x, y) 1.870282 2.030541 4.721175 2.916133 4.482346 75.77459 2000
x %*% t(y) 1.861947 2.067908 4.921061 3.067670 4.527197 105.60500 2000
outer(x, y) 1.886348 2.078958 5.114886 3.033927 4.556067 93.93450 2000
An easy way to look at this is to transform your vectors to a matrix
row1.mat = matrix(row1)
col1.mat = matrix(col1)
and then use dim to see the dimension of the matrices:
dim(row1.mat)
dim(col1.mat)
If you want the product to work for this you need a 6*1 matrix, multiplied by a 1*6 matrix. so you need to transpose the col1.mat using t(col1.mat).
And as you might know the matrix product is %*%
row1.mat %*% t(col1.mat)
Comparison of this method to others
library("microbenchmark")
x <- runif(1000)
y <- runif(1000)
xx = matrix(x)
yy = matrix(y)
microbenchmark(tcrossprod(x,y),x%*%t(y),outer(x,y), xx %*% t(yy), times=2000)
Unit: milliseconds
expr min lq mean median uq max neval
tcrossprod(x, y) 2.829099 3.243785 6.015880 4.801640 5.040636 77.87932 2000
x %*% t(y) 2.847175 3.251414 5.942841 4.810261 5.049474 86.53374 2000
outer(x, y) 2.886059 3.277811 5.983455 4.788054 5.074997 96.12442 2000
xx %*% t(yy) 2.868185 3.255833 6.126183 4.699884 5.056234 87.80024 2000

Why is the diag function so slow? [in R 3.2.0 or earlier]

I was looking at the benchmarks in this answer, and wanted to compare them with diag (used in a different answer). Unfortunately, it seems that diag takes ages:
nc <- 1e4
set.seed(1)
m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc)
microbenchmark(
diag = diag(m),
cond = m[row(m)==col(m)],
vec = m[(1:nc-1L)*nc+1:nc],
mat = m[cbind(1:nc,1:nc)],
times=10)
Comments: I tested these with identical. I took "cond" from one of the answers to this homework question. Results are similar with a matrix of integers, 1:26 instead of letters.
Results:
Unit: microseconds
expr min lq mean median uq max neval
diag 604343.469 629819.260 710371.3320 706842.3890 793144.019 837115.504 10
cond 3862039.512 3985784.025 4175724.0390 4186317.5260 4312493.742 4617117.706 10
vec 317.088 329.017 432.9099 350.1005 629.460 651.376 10
mat 272.147 292.953 441.7045 345.9400 637.506 706.860 10
It is just a matrix-subsetting operation, so I don't know why there's so much overhead. Looking inside the function, I see a few checks and then c(m)[v], where v is the same vector used in the "vec" benchmark. Timing these two...
v <- (1:nc-1L)*nc+1:nc
microbenchmark(diaglike=c(m)[v],vec=m[v])
# Unit: microseconds
# expr min lq mean median uq max neval
# diaglike 579224.436 664853.7450 720372.8105 712649.706 767281.5070 931976.707 100
# vec 334.843 339.8365 568.7808 646.799 663.5825 1445.067 100
...it seems I have found my culprit. So, the new variation on my question is: Why is there a seemingly unnecessary and very time-consuming c in diag?
Summary
As of R version 3.2.1 (World-Famous Astronaut) diag() has received an update. The discussion moved to r-devel where it was noted that c() strips non-name attributes and may have been why it was placed there. While some people worried that removing c() would cause unknown issues on matrix-like objects, Peter Dalgaard found that, "The only case where the c() inside diag() has an effect is where M[i,j] != M[(i-1)*m+j] AND c(M) will stringize M in column-major order, so that M[i,j] == c(M)[(i-1)*m+j]."
Luke Tierney tested #Frank 's removal of c(), finding it did not effect anything on CRAN or BIOC and so was implemented to replace c(x)[...] with x[...] on line 27. This leads to relatively large speedups in diag(). Below is a speed test showing the improvement with R 3.2.1's version of diag().
library(microbenchmark)
nc <- 1e4
set.seed(1)
m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc)
microbenchmark(diagOld(m),diag(m))
Unit: microseconds
expr min lq mean median uq max neval
diagOld(m) 451189.242 526622.2775 545116.5668 531905.5635 540008.704 682223.733 100
diag(m) 222.563 646.8675 644.7444 714.4575 740.701 1015.459 100

Resources