R: Changing diagonal of sparse matrix is very slow - r

I hava a sparse matrix with zeros on the main diagonal that I want to change to ones, but compared to a QR-decomposition the operation is very very slow:
mat <- matrix(c(0,1,1,1,0,1,1,1,0),ncol=3)
mat1 <- Matrix::bdiag(mat,mat,mat)
mat2 <- Matrix::bdiag(mat,mat,mat)
identity_mat <- Matrix::Diagonal(9)
microbenchmark::microbenchmark(
qr(mat1),
Matrix::diag(mat2) <- 1,
mat1 + identity_mat
)
results in
Unit: microseconds
expr min lq mean median uq max neval
qr(mat1) 55.825 69.0080 79.16561 72.9365 85.6095 149.676 100
Matrix::diag(mat2) <- 1 302.172 326.2365 379.60509 364.1985 401.8005 756.477 100
mat1 + identity_mat 1714.578 1762.8665 2006.50270 1974.4125 2073.1795 6671.644 100
How can I set the diagonal to ones faster?

This is slightly faster (it uses the triplet sparse matrix instead of the compressed one).
N <- 3
mat <- matrix(c(0,1,1,1,0,1,1,1,0),ncol=3)
mat1 <- do.call(Matrix::bdiag, replicate(N, mat, simplify = FALSE))
mat2 <- do.call(Matrix::bdiag, replicate(N, mat, simplify = FALSE))
mat3 <- Matrix::.bdiag(replicate(N, mat, simplify = FALSE))
identity_mat <- Matrix::Diagonal(3*N)
microbenchmark::microbenchmark(
qr(mat1),
Matrix::diag(mat2) <- 1,
Matrix::diag(mat3) <- 1,
mat1 + identity_mat
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> qr(mat1) 50.519 65.8000 83.40258 74.1075 84.9095 451.866 100
#> Matrix::diag(mat2) <- 1 266.200 318.6375 452.58706 338.8715 405.3270 5460.654 100
#> Matrix::diag(mat3) <- 1 164.340 181.7700 246.14324 204.1055 235.4700 3083.771 100
#> mat1 + identity_mat 1519.636 1739.8940 2297.10306 1863.0430 2251.7720 18617.782 100
For much larger matrices these barely take any time longer (below is for for N = 300) which makes me wonder if it's just making the S4 objects that is slow (There's probably lots of validation going on in the background).
N <- 300
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> qr(mat1) 239799.888 251484.867 260169.1626 257957.9940 265350.8880 321234.482 100
#> Matrix::diag(mat2) <- 1 396.399 415.131 529.8535 495.5805 575.4920 2367.596 100
#> Matrix::diag(mat3) <- 1 257.128 276.636 361.8436 322.2445 380.6375 2210.064 100
#> mat1 + identity_mat 1605.454 1692.756 2176.5367 1833.2210 2000.9815 16803.231 100
If you can make assumptions about your matrices you may be able to hack it to work faster. In particular if the matrix you are writing the diagonal to has no entries on the diagonal beforehand (as in your example) you could do this:
N <- 3
mat4 <- Matrix::.bdiag(replicate(N, mat, simplify = FALSE))
insert_diagonal <- function(m, d) {
m#i <- c(m#i, 0:(d-1))
m#j <- c(m#j, 0:(d-1))
m#x <- c(m#x, rep(1, d))
m
}
microbenchmark::microbenchmark(
qr(mat1),
Matrix::diag(mat2) <- 1,
Matrix::diag(mat3) <- 1,
insert_diagonal(mat4, 3*N)
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> qr(mat1) 63.885 81.0315 97.14267 90.4660 99.4635 413.534 100
#> Matrix::diag(mat2) <- 1 325.229 368.2320 417.94677 408.6095 425.9595 755.734 100
#> Matrix::diag(mat3) <- 1 195.907 212.6790 266.83832 249.9585 266.0280 796.030 100
#> insert_diagonal(mat4, 3 * N) 23.676 30.2365 35.59022 35.5075 39.2745 62.028 100

You can try
`diag<-`(as.matrix(mat1), 1)
and benchmark
microbenchmark::microbenchmark(
qr(mat1),
Matrix::diag(mat2) <- 1,
`diag<-`(as.matrix(mat1), 1)
)
gives
Unit: microseconds
expr min lq mean median uq max neval
qr(mat1) 52.4 56.55 61.457 60.85 64.05 103.5 100
Matrix::diag(mat2) <- 1 269.2 275.35 290.202 282.75 297.65 443.4 100
`diag<-`(as.matrix(mat1), 1) 38.6 42.40 47.721 46.10 48.90 147.4 100
Update
microbenchmark::microbenchmark(
qr(mat1),
Matrix::diag(mat2) <- 1,
as(`diag<-`(as.matrix(mat1), 1), "sparseMatrix")
)
shows
Unit: microseconds
expr min lq mean median
qr(mat1) 50.6 56.55 61.592 60.2
Matrix::diag(mat2) <- 1 270.6 277.20 290.993 280.3
as(`diag<-`(as.matrix(mat1), 1), "sparseMatrix") 96.6 105.45 111.285 112.1
uq max neval
64.00 179.5 100
294.35 492.3 100
115.25 136.2 100

Here's a solution using RcppArmadillo:
Rcpp::cppFunction(code="
arma::sp_mat set_unit_diagonal(arma::sp_mat& A) {
A.diag().ones();
return A;
}", depends="RcppArmadillo")
mat <- matrix(c(0,1,1,1,0,1,1,1,0),ncol=3)
mat1 <- Matrix::bdiag(mat,mat,mat)
mat2 <- Matrix::bdiag(mat,mat,mat)
mat3 <- Matrix::bdiag(mat,mat,mat)
identity_mat <- Matrix::Diagonal(9)
The solution is very quick:
Unit: microseconds
expr min lq mean median uq max neval
qr(mat1) 52.374 62.4805 69.12798 65.747 75.4455 110.630 100
Matrix::diag(mat2) <- 1 272.026 289.9080 323.98992 300.557 358.5355 419.486 100
mat1 + identity_mat 1543.191 1620.2835 1913.68513 1637.990 1970.8930 13572.716 100
set_unit_diagonal(mat3) 7.426 11.9100 14.00686 13.970 15.5955 26.484 100

Related

How to make this function code run quick in R - should be around 1 second and its around 19 now

#BEGIN CODE
my.kernel <- function(Yt){
for (i in 1:length(Yt)) {
Yt[i] <- ifelse(abs(Yt[i]) <= 1, (35/32)*(1 - Yt[i]^2)^3, 0)}
Yt}
# Print results
my.kernel.density.estimator <- function(y,Yt,h){
result <- 0
for(i in 1:length(Yt)){
result <- result + (1/(length(Yt)*h))*my.kernel((Yt[i]-y)/h)}
result}
# Print results
my.loglik.cv <- function(Yt,h){
result <- 0
for(i in 1:length(Yt)){
result <- result + log(my.kernel.density.estimator(Yt[i],Yt[-i],h))}
result}
# Print the results
# END CODE
Yt, h and y can be any vector/number. Here is one example.
Yt<- seq(0, 10, 0.01)
h <- 1
y<- 1
The main point is to understand how to make it run faster.
In R mathematical operations are vectorized. In other words you do not need to apply the same mathematical operation on each vector element separately, you can perform it on all vector elements simultaneously.
The function
my.kernel <- function(Yt){
for (i in 1:length(Yt)) {
Yt[i] <- ifelse(abs(Yt[i]) <= 1, (35/32)*(1 - Yt[i]^2)^3, 0)}
Yt}
can be rewritten as
my.kernel.vec <- function(x) ifelse(abs(x) <= 1, (35/32)*(1 - x^2)^3, 0)
Yt <- seq(0, 10, 0.01)
h <- 1
y <- 1
all.equal(my.kernel(Yt),
my.kernel.vec(Yt))
#output
TRUE
the difference in speed is not minor:
library(microbenchmark)
microbenchmark(my.kernel(Yt),
my.kernel.vec(Yt))
Unit: microseconds
expr min lq mean median uq max neval cld
my.kernel(Yt) 1110.8 1179.2 1438.136 1311.35 1708.9 6756.4 100 b
my.kernel.vec(Yt) 54.3 66.3 104.204 70.20 74.3 3495.4 100 a
That is quite of a speed up.
Similarly
my.kernel.density.estimator <- function(y,Yt,h){
result <- 0
for(i in 1:length(Yt)){
result <- result + (1/(length(Yt)*h))*my.kernel((Yt[i]-y)/h)}
result}
can be changed to utilize R vectorized operations
my.kernel.density.estimator.vec <- function(y,Yt,h) sum((1/(length(Yt)*h))*my.kernel.vec((Yt-y)/h))
all.equal(my.kernel.density.estimator.vec(1, Yt, 1),
my.kernel.density.estimator(1, Yt, 1))
#output
TRUE
microbenchmark(my.kernel.density.estimator.vec(1, Yt, 1),
my.kernel.density.estimator(1, Yt, 1))
Unit: microseconds
expr min lq mean median uq max neval cld
my.kernel.density.estimator.vec(1, Yt, 1) 57.8 59.6 101.918 63.10 70.25 3716.4 100 a
my.kernel.density.estimator(1, Yt, 1) 2110.8 2163.6 2285.316 2231.35 2283.20 7826.7 100 b
Finally in
my.loglik.cv <- function(Yt,h){
result <- 0
for(i in 1:length(Yt)){
result <- result + log(my.kernel.density.estimator(Yt[i],Yt[-i],h))}
result}
you need to loop in order to create vectors Yt[i] and Yt[-i] so I left it as is.
microbenchmark(my.loglik.cv.vec(Yt, 1),
my.loglik.cv(Yt, 1), times = 10)
Unit: milliseconds
expr min lq mean median uq max neval cld
my.loglik.cv.vec(Yt, 1) 59.1957 59.6794 79.13856 90.46365 92.7877 93.4487 10 a
my.loglik.cv(Yt, 1) 2240.7176 2280.7737 2309.83982 2299.39885 2343.6714 2412.8111 10 b
Not to mention the speedup on larger vectors:
Yt <- seq(0, 10, 0.001)
microbenchmark(my.loglik.cv.vec(Yt, 1),
my.loglik.cv(Yt, 1), times = 1)
Unit: seconds
expr min lq mean median uq max neval
my.loglik.cv.vec(Yt, 1) 5.460431 5.460431 5.460431 5.460431 5.460431 5.460431 1
my.loglik.cv(Yt, 1) 230.221194 230.221194 230.221194 230.221194 230.221194 230.221194 1

Is there a way to find the index of the final last element in a vector that has an actual value (ie. Not N/A) in R

This is my vector
x <- c("1", "1", "PNP004", "10", "10", NA, NA)
I need to find a way to return the index/value of the final element that is not NA.
Does anyone know a good way of doing this?
Any help is appreciated!
Ind <- max(which(!is.na(yourvec)))
yourvec[Ind]
Using dplyr:
dplyr::last(which(!is.na(yourvec)))
You can use tail from the result of which looking for !is.na of your vector.
tail(which(!is.na(x)), 1)
#[1] 5
Or a for loop.
idxLNNA <- function(x) {
if(length(x) > 0) {
for(i in length(x):1) if(!is.na(x[i])) break
if(i == 1 & is.na(x[i])) {0} else {i}
} else {0}
}
idxLNNA(x)
#[1] 5
Or using cumsum with which.max, what will return 1 in case there are only NA's.
which.max(cumsum(!is.na(x)))
#[1] 5
Or subtracting the rev hit from the length of x.
length(x) - which.min(rev(is.na(x))) + 1 #Will fail in case on only NA
#length(x) - match(FALSE, is.na(rev(x))) + 1 #Alternative
#[1] 5
Benchmark:
library(microbenchmark)
fun <- alist(Dason = max(which(!is.na(x)))
, juljo = dplyr::last(which(!is.na(x)))
, GKiTail = tail(which(!is.na(x)), 1)
, GKiCum = which.max(cumsum(!is.na(x)))
, GKiRev = length(x) - which.min(rev(is.na(x))) + 1
, GKiMatch = length(x) - match(FALSE, is.na(rev(x))) + 1
, GKiFor = idxLNNA(x)
)
x <- numeric(1e6)
microbenchmark(list = fun, control=list(order="block"))
#Unit: microseconds
# expr min lq mean median uq max neval cld
# Dason 4855.744 5740.1355 7941.21809 7082.5535 7671.371 107165.201 100 bc
# juljo 4145.322 4616.7815 5756.38147 6134.9200 6625.008 7378.724 100 b
# GKiTail 4082.716 4434.3880 5576.70509 6051.1465 6489.966 7433.579 100 b
# GKiCum 6552.213 7445.1525 8632.12253 8988.8700 9420.481 16791.845 100 c
# GKiRev 4005.929 4138.4735 5845.70457 4212.7470 5851.034 101665.685 100 b
# GKiMatch 5180.600 5483.8545 7507.82723 5998.2760 7373.458 108327.967 100 bc
# GKiFor 1.541 1.5775 2.16462 1.6145 1.724 20.436 100 a
x <- rep(NA, 1e6) #Dason, GKiCum and GKiRev Fail
microbenchmark(list = fun[-c(1,4,5)], control=list(order="block"))
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# juljo 3.011272 3.076340 3.525396 3.111676 3.494768 6.367839 100 a
# GKiTail 2.942336 3.014327 3.529691 3.063891 3.809653 6.136984 100 a
# GKiMatch 4.928626 4.975369 7.490588 5.039941 6.823780 98.194653 100 b
# GKiFor 155.078444 159.314918 163.706542 160.168266 163.464146 258.136977 100 c
x <- numeric(0) #Dason Fails
microbenchmark(list = fun[-1], control=list(order="block"))
Unit: nanoseconds
# expr min lq mean median uq max neval cld
# juljo 26794 27324.0 28694.75 27640.0 27933.0 120143 100 d
# GKiTail 6746 7027.5 7396.45 7206.5 7432.5 21898 100 c
# GKiCum 869 880.0 947.72 890.0 948.0 3403 100 a
# GKiRev 2466 2527.0 2657.99 2565.5 2652.0 8071 100 b
# GKiMatch 2739 2807.5 2919.78 2862.5 2935.5 5651 100 b
# GKiFor 492 512.5 671.74 537.5 604.5 9088 100 a

better way to create a function that returns a gamma distribution?

So I have created a function that generates a data frame for gamma distribution.
Currently, I have.
sample_gamma <- function(alpha,beta,n,iter) {
gamma.df <- as.data.frame(matrix(nrow = iter, ncol = 3))
colnames(gamma.df) <- c("iteration","mean","standard dev")
gamma.df$iteration <- c(1:iter)
for (i in 1:iter) {
gamma.dist <- rgamma(n,shape = alpha, rate = beta, scale = 1/beta)
gamma.df[i,2] <- mean(gamma.dist)
gamma.df[i,3] <- sd(gamma.dist)
}
print(gamma.df)
}
The function does everything I need it to do but I was wondering if there were any alternate or cleaner ways to do it
I would create a function which returns mean and sd for one iteration.
sample_gamma <- function(alpha,beta,n) {
dist <- rgamma(n,shape = alpha, rate = beta)
c(mean = mean(dist), sd = sd(dist))
}
and then repeat it using replicate
t(replicate(5, sample_gamma(2, 3, 4)))
# mean sd
#[1,] 0.5990206 0.2404226
#[2,] 0.6108976 0.3083426
#[3,] 1.0616542 0.4602403
#[4,] 0.3415355 0.1543885
#[5,] 1.0558066 0.9659599
While I think Ronak Shah's answer is simple and relatively idiomatic (R-wise), here's one that is a little more efficient when scaled to high iter counts (since it only does a single random-pull):
sample_gamma <- function(alpha, beta, n, iter) {
mtx <- matrix(rgamma(n*iter, shape=alpha, rate=beta), nrow=n, ncol=iter)
t(apply(mtx, 2, function(a) c(mean=mean(a), sd=sd(a))))
}
sample_gamma(2, 3, 4, 5)
# mean sd
# [1,] 0.6486220 0.22900833
# [2,] 0.8551055 0.07874287
# [3,] 0.7854750 0.72694260
# [4,] 0.7045878 0.24834502
# [5,] 1.1783301 0.25210538
Benchmarking:
microbenchmark::microbenchmark(
RS=t(replicate(5, sample_gamma_RS(2,3,4))),
r2=sample_gamma_r2(2,3,4,5)
)
# Unit: microseconds
# expr min lq mean median uq max neval
# RS 413.7 493.70 757.884 743.80 946.1 1611.6 100
# r2 405.2 461.15 681.630 706.35 898.6 1348.2 100
microbenchmark::microbenchmark(
RS=t(replicate(500, sample_gamma_RS(2,3,4))),
r2=sample_gamma_r2(2,3,4,500)
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# RS 31.271 40.58735 56.44298 57.85735 65.08605 95.1866 100
# r2 29.110 38.81230 53.99426 57.45820 61.35720 100.5820 100
microbenchmark::microbenchmark(
RS=t(replicate(500, sample_gamma_RS(2,3,400))),
r2=sample_gamma_r2(2,3,400,500)
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# RS 60.6782 101.3112 121.3533 116.7464 140.8845 227.1904 100
# r2 66.3892 81.0329 106.9920 98.7170 126.7742 198.3947 100
I confess I thought it would be a more dramatic difference in performance.

Fastest way of norming vectors in a matrix

I'd like to sort out what it the fastest way to get the norm of a set of vectors contained in a matrix. I was using apply (this is an example, my matrices are much bigger):
a = matrix(1:9, 3,3)
norm_a = apply(a, 1, function(x) sqrt(sum(x^2)))
but then I wanted to speed up my code and moved to:
norm_a = sqrt(a^2%*%rep(1,dim(a)[2]))
which is actually much faster (seen with system.time, I'm not an expert in benchmarking). But I haven't found any final answer to this question so far. Does anyone have an insight about this ?
thanks
This depends on the size of your matrix:
library(microbenchmark)
microbenchmark(f1 = apply(a, 1, function(x) sqrt(sum(x^2))),
f2 = sqrt(a^2%*%rep(1,dim(a)[2])),
f3 = sqrt(rowSums(a^2)))
#Unit: microseconds
# expr min lq mean median uq max neval cld
# f1 44.656 46.812 52.21050 47.5815 49.4295 191.248 100 c
# f2 1.849 2.772 4.07532 4.3120 4.6210 16.323 100 a
# f3 6.160 7.392 9.25537 9.5480 10.1630 20.943 100 b
set.seed(42)
b <- matrix(rnorm(1e6), 1000)
microbenchmark(f1 = apply(b, 1, function(x) sqrt(sum(x^2))),
f2 = sqrt(b^2%*%rep(1,dim(b)[2])),
f3 = sqrt(rowSums(b^2)))
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# f1 30.851752 55.513228 86.84168 109.439043 112.54796 152.27730 100 b
# f2 5.503050 7.434152 14.36080 8.861268 10.42327 66.41539 100 a
# f3 4.430403 5.895553 12.92235 7.359163 8.62321 74.65256 100 a

check whether all elements of a list are in equal in R

I have a list of several vectors. I would like to check whether all vectors in the list are equal. There's identical which only works for pairwise comparison. So I wrote the following function which looks ugly to me. Still I did not find a better solution. Here's my RE:
test_true <- list(a=c(1,2,3),b=c(1,2,3),d=c(1,2,3))
test_false <- list(a=c(1,2,3),b=c(1,2,3),d=c(1,32,13))
compareList <- function(li){
stopifnot(length(li) > 1)
l <- length(li)
res <- lapply(li[-1],function(X,x) identical(X,x),x=li[[1]])
res <- all(unlist(res))
res
}
compareList(test_true)
compareList(test_false)
Any suggestions? Are there any native checks for identical for more than just pairwise comparison?
How about
allSame <- function(x) length(unique(x)) == 1
allSame(test_true)
# [1] TRUE
allSame(test_false)
# [1] FALSE
As #JoshuaUlrich pointed out below, unique may be slow on lists. Also, identical and unique may use different criteria. Reduce is a function I recently learned about for extending pairwise operations:
identicalValue <- function(x,y) if (identical(x,y)) x else FALSE
Reduce(identicalValue,test_true)
# [1] 1 2 3
Reduce(identicalValue,test_false)
# [1] FALSE
This inefficiently continues making comparisons after finding one non-match. My crude solution to that would be to write else break instead of else FALSE, throwing an error.
I woud do:
all.identical <- function(l) all(mapply(identical, head(l, 1), tail(l, -1)))
all.identical(test_true)
# [1] TRUE
all.identical(test_false)
# [1] FALSE
To summarize the solutions. Data for the tests:
x1 <- as.list(as.data.frame(replicate(1000, 1:100)))
x2 <- as.list(as.data.frame(replicate(1000, sample(1:100, 100))))
Solutions:
comp_list1 <- function(x) length(unique.default(x)) == 1L
comp_list2 <- function(x) all(vapply(x[-1], identical, logical(1L), x = x[[1]]))
comp_list3 <- function(x) all(vapply(x[-1], function(x2) all(x[[1]] == x2), logical(1L)))
comp_list4 <- function(x) sum(duplicated.default(x)) == length(x) - 1L
Test on the data:
for (i in 1:4) cat(match.fun(paste0("comp_list", i))(x1), " ")
#> TRUE TRUE TRUE TRUE
for (i in 1:4) cat(match.fun(paste0("comp_list", i))(x2), " ")
#> FALSE FALSE FALSE FALSE
Benchmarks:
library(microbenchmark)
microbenchmark(comp_list1(x1), comp_list2(x1), comp_list3(x1), comp_list4(x1))
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> comp_list1(x1) 138.327 148.5955 171.9481 162.013 188.9315 269.342 100 a
#> comp_list2(x1) 1023.932 1125.2210 1387.6268 1255.985 1403.1885 3458.597 100 b
#> comp_list3(x1) 1130.275 1275.9940 1511.7916 1378.789 1550.8240 3254.292 100 c
#> comp_list4(x1) 138.075 144.8635 169.7833 159.954 185.1515 298.282 100 a
microbenchmark(comp_list1(x2), comp_list2(x2), comp_list3(x2), comp_list4(x2))
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> comp_list1(x2) 139.492 140.3540 147.7695 145.380 149.6495 218.800 100 a
#> comp_list2(x2) 995.373 1030.4325 1179.2274 1054.711 1136.5050 3763.506 100 b
#> comp_list3(x2) 977.805 1029.7310 1134.3650 1049.684 1086.0730 2846.592 100 b
#> comp_list4(x2) 135.516 136.4685 150.7185 139.030 146.7170 345.985 100 a
As we see the most efficient solutions based on the duplicated and unique functions.
PUtting in my self-promoting suggestion for cgwtools::approxeq which essentially does what all.equal does but returns a vector of logical values indicating equality or not.
So: depends whether you want exact equality or floating-point-representational equality.
UPDATE
The overall best solution:
all.identical.list <- function(l) identical(unname(l[-length(l)]), unname(l[-1]))
Implementing Frank's solution with a break:
all.identical <- function(l) class(try(Reduce(function(x, y) if(identical(x, y)) x else break, l), silent = TRUE)) != "try-error"
Continuing with Artem's benchmarking and adding the solution from Jake's comment, speeds are pretty dependent on the objects being compared, but all.identical.list is consistently the fastest (or very close to fastest):
library(microbenchmark)
all.identical.list <- function(l) identical(unname(l[-length(l)]), unname(l[-1]))
all.identical <- function(l) !is.null(Reduce(function(x, y) if(identical(x, y)) x else NULL, l))
all.identical.break <- function(l) class(try(Reduce(function(x, y) if(identical(x, y)) x else break, l), silent = TRUE)) != "try-error"
comp_list4 <- function(l) sum(duplicated.default(l)) == length(l) - 1L
comp_list5 <- function(l) all(duplicated.default(l)[-1])
x1 <- as.list(as.data.frame(replicate(1000, 1:100)))
x2 <- as.list(as.data.frame(replicate(1000, sample(100))))
microbenchmark(all.identical.list(x1), all.identical(x1), all.identical.break(x1), comp_list4(x1), comp_list5(x1), check = "equal")
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> all.identical.list(x1) 60.3 66.65 125.803 72.90 94.30 3271.5 100
#> all.identical(x1) 1134.0 1209.45 1484.864 1265.85 1655.95 5085.3 100
#> all.identical.break(x1) 1156.6 1226.75 1602.869 1337.25 1698.05 5030.4 100
#> comp_list4(x1) 170.5 179.35 234.169 184.75 200.40 2164.1 100
#> comp_list5(x1) 173.3 182.35 213.542 187.55 194.50 1704.0 100
microbenchmark(all.identical.list(x2), all.identical(x2), all.identical.break(x2), comp_list4(x2), comp_list5(x2), check = "equal")
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> all.identical.list(x2) 31.0 34.30 47.182 37.65 46.90 180.8 100
#> all.identical(x2) 1002.8 1059.85 1237.426 1106.65 1278.35 3404.4 100
#> all.identical.break(x2) 119.4 137.15 156.748 147.60 164.00 340.8 100
#> comp_list4(x2) 165.0 172.35 189.869 181.20 192.25 334.6 100
#> comp_list5(x2) 166.6 171.10 188.782 179.25 190.55 394.9 100
x1 <- as.list(as.data.frame(replicate(10, 1:1e5)))
x2 <- as.list(as.data.frame(replicate(10, sample(1e5))))
microbenchmark(all.identical.list(x1), all.identical(x1), all.identical.break(x1), comp_list4(x1), comp_list5(x1), check = "equal")
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> all.identical.list(x1) 211.4 217.25 264.978 229.5 258.00 711.4 100
#> all.identical(x1) 182.2 187.50 218.062 195.3 217.05 499.4 100
#> all.identical.break(x1) 194.8 207.25 258.043 222.7 266.70 1013.4 100
#> comp_list4(x1) 1457.3 1495.30 1659.118 1543.0 1806.75 2689.0 100
#> comp_list5(x1) 1457.7 1502.45 1685.194 1553.5 1769.10 3021.2 100
microbenchmark(all.identical.list(x2), all.identical(x2), all.identical.break(x2), comp_list4(x2), comp_list5(x2), check = "equal")
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> all.identical.list(x2) 3.1 4.45 7.894 6.35 9.85 48.5 100
#> all.identical(x2) 12.0 15.25 19.404 17.05 22.05 56.1 100
#> all.identical.break(x2) 114.3 128.80 172.876 144.90 190.45 511.5 100
#> comp_list4(x2) 1292.2 1342.35 1443.261 1397.00 1472.25 1908.5 100
#> comp_list5(x2) 1292.4 1364.90 1478.291 1409.50 1484.80 2467.2 100
this also works
m <- combn(length(test_true),2)
for(i in 1:ncol(m)){
print(all(test_true[[m[,i][1]]] == test_true[[m[,i][2]]]))
}

Resources