Efficiently finding minimum cells values from a set of matrices in R - r

I have a list of matrices (size n*n), and I need to create a new matrix giving the minimum value observed for each cell, based on my list.
For instance, with the following matrices list:
> a = list(matrix(rexp(9), 3), matrix(rexp(9), 3), matrix(rexp(9), 3))
> a
[[1]]
[,1] [,2] [,3]
[1,] 0.5220069 0.39643016 0.04255687
[2,] 0.4464044 0.66029350 0.34116609
[3,] 2.2495949 0.01705576 0.08861866
[[2]]
[,1] [,2] [,3]
[1,] 0.3823704 0.271399 0.7388449
[2,] 0.1227819 1.160775 1.2131681
[3,] 0.1914548 1.004209 0.7628437
[[3]]
[,1] [,2] [,3]
[1,] 0.2125612 0.45379057 1.5987420
[2,] 0.3242311 0.02736743 0.4372894
[3,] 0.6634098 1.15401347 0.9008529
The output should be:
[,1] [,2] [,3]
[1,] 0.2125612 0.271399 0.04255687
[2,] 0.1227819 0.02736743 0.34116609
[3,] 0.1914548 0.01705576 0.08861866
I tried using apply loop with the following code (using melt and dcast from reshape2 library):
library(reshape2)
all = melt(a)
allComps = unique(all[,c(1:2)])
allComps$min=apply(allComps, 1, function(x){
g1 = x[1]
g2 = x[2]
b = unlist(lapply(a, function(y){
return(y[g1,g2])
}))
return(b[which(b==min(b))])
})
dcast(allComps, Var1~Var2)
It works but it is taking a very long time to run when applied on large matrices (6000*6000). I am looking for a faster way to do this.

Use Reduce with pmin :
Reduce(pmin, a)
# [,1] [,2] [,3]
#[1,] 0.02915345 0.03157736 0.3142273
#[2,] 0.57661027 0.05621098 0.1452668
#[3,] 0.48021473 0.18828404 0.4787604
data
set.seed(123)
a = list(matrix(rexp(9), 3), matrix(rexp(9), 3), matrix(rexp(9), 3))

Maybe it should be considered to store the matrices in an array instead of a list. This can be done with simplify2array. In an array the minimum over specific dimensions can be found using min in apply.
A <- simplify2array(a)
apply(A, 1:2, min)

We can use
apply(array(unlist(a), c(3, 3, 3)), 1:2, min)

Related

Conditional function in R which returns a matrix

I am sorry in advance if that's a silly question, but I am a bit new to it.
I would like to write a for loop where the input is a time sequence. Based on the time conditions I would like to select either mat1, mat2, or mat3 to substitute the "mat" parameter and multiply it by 2.
output <- mat*2 #general function
For each time point, I need to have an output.
time=seq(0,10, by=1)
mat1 <- matrix(data = rexp(9, rate = 10), nrow = 3, ncol = 3)
mat2 <- matrix(data = rexp(9, rate = 10), nrow = 3, ncol = 3)
mat3 <- matrix(data = rexp(9, rate = 10), nrow = 3, ncol = 3)
I would like when the time <= 3 the "mat1" to be selected
when the time>3 & time<=6 the "mat2" to be selected
and when the time >6 the "mat3" to be selected and then multiplied by 2.
I know that all this is a bit sketchy but any help would be highly appreciated.
By the way, if you want a list of consecutive integers you can simply use time <- 0:10
Here is one method
lapply(as.character(cut(time,c(-1,3.1,6.1,10),labels=c('mat1','mat2','mat3'))), function(x) get(x)*2)
[[1]]
[,1] [,2] [,3]
[1,] 0.4013379 1.2690301 0.142831401
[2,] 0.1536697 0.1132762 0.040964909
[3,] 0.1412248 0.2209273 0.007446217
[[2]]
[,1] [,2] [,3]
[1,] 0.4013379 1.2690301 0.142831401
[2,] 0.1536697 0.1132762 0.040964909
[3,] 0.1412248 0.2209273 0.007446217
...
[[10]]
[,1] [,2] [,3]
[1,] 0.16712782 0.06451693 0.06554605
[2,] 0.03614116 0.18526124 0.46443236
[3,] 0.53055007 0.01203971 0.16585931
[[11]]
[,1] [,2] [,3]
[1,] 0.16712782 0.06451693 0.06554605
[2,] 0.03614116 0.18526124 0.46443236
[3,] 0.53055007 0.01203971 0.16585931

Performing element-wise standard deviation in R with two matrices

As the title suggests, I am looking for a way to get the standard deviation per element from two separate matrices. However, I am quite the beginner at R and I can't seem to figue out how to do this. Below is an example of what I am trying to accomplish with a small sample of my data (first three rows)
I have two matrices with coordinates (df143 and df143_2, or matrices A and B as you will)
A:
[1,] 21.729504 -55.66055 -37.26477
[2,] 39.445610 -67.67449 -32.19464
[3,] 57.604027 -54.16734 -28.48679
B:
[1,] 21.706865 -55.50722 -37.57840
[2,] 39.553314 -67.68414 -31.95995
[3,] 57.286247 -54.13008 -28.44446
I am looking for an matrix output that shows the standard deviation per element of the two combined matrices.
Or you can do base R:
matrix(mapply(function(x,y) sd(c(x,y)),A, B), ncol=ncol(A))
# [,1] [,2] [,3]
#[1,] 0.01600819 0.10842068 0.22176990
#[2,] 0.07615823 0.00682358 0.16595089
#[3,] 0.22470439 0.02634680 0.02993183
I believe this is what you're looking to do:
library(abind)
a <- c(21.729504, -55.66055, -37.26477, 39.445610, -67.67449, -32.19464, 57.604027, -54.16734, -28.48679)
a <- matrix(a, ncol=3, byrow=TRUE)
b <- c(21.706865, -55.50722, -37.57840, 39.553314, -67.68414, -31.95995, 57.286247, -54.13008, -28.44446)
b <- matrix(b, ncol=3, byrow=TRUE)
m <- abind(a, b, along=3)
apply(m, 1:2, sd)
## [,1] [,2] [,3]
## [1,] 0.01600819 0.10842068 0.22176990
## [2,] 0.07615823 0.00682358 0.16595089
## [3,] 0.22470439 0.02634680 0.02993183

Fastest R equivalent to MATLAB's reshape() method?

I am converting a MATLAB script into R and regretting it so far, as it is slower at the moment. I'm trying to use "vectorized functions" as much as possible, but I'm relatively new to R and do not know what is meant by this. From my research for loops are only slower than the apply() method in R if you use loads of operators (including the parenthesis). Otherwise, I don't see what R could have done to slow down it further. Here is code that works that I want to speed up.
somPEs <- 9;
inputPEs <- 6;
initial_w <- matrix(1, nrow=somPEs, ncol=inputPEs)
w <- apply(initial_w, 1, function(i) runif(i));
# Reshape w to a 3D matrix of dimension: c(sqrt(somPEs), sqrt(somPEs), inputPEs)
nw <- array(0, dim=c(sqrt(somPEs), sqrt(somPEs), inputPEs))
for (i in 1:inputPEs) {
nw[,,i] <- matrix(w[i,], nrow=sqrt(somPEs), ncol=sqrt(somPEs), byrow=TRUE)
}
w <- nw;
In MATLAB, this code is executed by a built-in function called "reshape", as is done as below:
w = reshape(w,[sqrt(somPEs) sqrt(somPEs) inputPEs]);
I timed my current R code and it's actually super fast, but I'd still like to learn about vectorization and how to convert my code to apply() for readability's sake.
user system elapsed
0.003 0.000 0.002
The first step is to convert your array w from 6x9 to 3x3x6 size, which in your case can be done by transposing and then changing the dimension:
neww <- t(w)
dim(neww) <- c(sqrt(somPEs), sqrt(somPEs), inputPEs)
This is almost what we want, except that the first two dimensions are flipped. You can use the aperm function to transpose them:
neww <- aperm(neww, c(2, 1, 3))
This should be a good deal quicker than looping through the matrix and individually copying over data by row. To see this, let's look at a larger example with 10,000 rows and 100 columns (which will be mapped to a 10x10x10k matrix):
josilber <- function(w) {
neww <- t(w)
dim(neww) <- c(sqrt(dim(w)[2]), sqrt(dim(w)[2]), dim(w)[1])
aperm(neww, c(2, 1, 3))
}
OP <- function(w) {
nw <- array(0, dim=c(sqrt(dim(w)[2]), sqrt(dim(w)[2]), dim(w)[1]))
for (i in 1:(dim(w)[1])) {
nw[,,i] <- matrix(w[i,], nrow=sqrt(dim(w)[2]), ncol=sqrt(dim(w)[2]), byrow=TRUE)
}
nw
}
bigw <- matrix(runif(1000000), nrow=10000, ncol=100)
all.equal(josilber(bigw), OP(bigw))
# [1] TRUE
microbenchmark(josilber(bigw), OP(bigw))
# Unit: milliseconds
# expr min lq mean median uq max neval
# josilber(bigw) 8.483245 9.08430 14.46876 9.431534 11.76744 135.7204 100
# OP(bigw) 83.379053 97.07395 133.86606 117.223236 129.28317 1553.4381 100
The approach using t, dim, and aperm is more than 10x faster in median runtime than the looping approach.
I did not test the speed, but you could try
nw1 <- aperm(`dim<-`(t(w), list(3, 3, 6)), c(2, 1, 3))
> nw1
, , 1
[,1] [,2] [,3]
[1,] 0.8257185 0.5475478 0.4157915
[2,] 0.8436991 0.3310513 0.1546463
[3,] 0.1794918 0.1836032 0.2675192
, , 2
[,1] [,2] [,3]
[1,] 0.6914582 0.1674163 0.2921129
[2,] 0.2558240 0.4269716 0.7335542
[3,] 0.6416367 0.8771934 0.6553210
, , 3
[,1] [,2] [,3]
[1,] 0.9761232 0.05223183 0.6651574
[2,] 0.5740032 0.80621864 0.2295017
[3,] 0.1138926 0.76009870 0.6932736
, , 4
[,1] [,2] [,3]
[1,] 0.437871558 0.5172516 0.1145181
[2,] 0.006923583 0.3235762 0.3751655
[3,] 0.823235642 0.4586850 0.6013853
, , 5
[,1] [,2] [,3]
[1,] 0.7425735 0.1665975 0.8659373
[2,] 0.1418979 0.1878132 0.2357267
[3,] 0.6963537 0.5391961 0.1112467
, , 6
[,1] [,2] [,3]
[1,] 0.7246276 0.02896792 0.04692648
[2,] 0.7563403 0.22027518 0.41138672
[3,] 0.8303413 0.31908307 0.25180560

How to generate symmetric random matrix?

I want to generate a random matrix which should be symmetric.
I have tried this:
matrix(sample(0:1, 25, TRUE), 5, 5)
but it is not necessarily symmetric.
How can I do that?
Another quite interesting opportunity is based on the following mathematical fact: if A is some matrix, then A multiplied by its transpose is always symmetric.
> A <- matrix(runif(25), 5, 5)
> A %*% t(A)
[,1] [,2] [,3] [,4] [,5]
[1,] 1.727769 1.0337816 1.2195505 1.4661507 1.1041355
[2,] 1.033782 1.0037048 0.7368944 0.9073632 0.7643080
[3,] 1.219551 0.7368944 1.8383986 1.3309980 0.9867812
[4,] 1.466151 0.9073632 1.3309980 1.3845322 1.0034140
[5,] 1.104135 0.7643080 0.9867812 1.0034140 0.9376534
Try this from the Matrix package
library(Matrix)
x<-Matrix(rnorm(9),3)
x
3 x 3 Matrix of class "dgeMatrix"
[,1] [,2] [,3]
[1,] -0.9873338 0.8965887 -0.6041742
[2,] -0.3729662 -0.5882091 -0.2383262
[3,] 2.1263985 -0.3550972 0.1067264
X<-forceSymmetric(x)
X
3 x 3 Matrix of class "dsyMatrix"
[,1] [,2] [,3]
[1,] -0.9873338 0.8965887 -0.6041742
[2,] 0.8965887 -0.5882091 -0.2383262
[3,] -0.6041742 -0.2383262 0.1067264
If you don't want to use a package:
n=3
x <- matrix(rnorm(n*n), n)
ind <- lower.tri(x)
x[ind] <- t(x)[ind]
x
I like this one:
n <- 3
aux <- matrix(NA, nrow = n, ncol = n)
for(i in c(1:n)){
for(j in c(i:n)){
aux[i,j] <- sample(c(1:n), 1)
aux[j,i] <- aux[i,j]
}
}

How to use some apply function to solve what requires two for-loops in R

I have a matrix, named "mat", and a smaller matrix, named "center".
temp = c(1.8421,5.6586,6.3526,2.904,3.232,4.6076,4.8,3.2909,4.6122,4.9399)
mat = matrix(temp, ncol=2)
[,1] [,2]
[1,] 1.8421 4.6076
[2,] 5.6586 4.8000
[3,] 6.3526 3.2909
[4,] 2.9040 4.6122
[5,] 3.2320 4.9399
center = matrix(c(3, 6, 3, 2), ncol=2)
[,1] [,2]
[1,] 3 3
[2,] 6 2
I need to compute the distance between each row of mat with every row of center. For example, the distance of mat[1,] and center[1,] can be computed as
diff = mat[1,]-center[1,]
t(diff)%*%diff
[,1]
[1,] 3.92511
Similarly, I can find the distance of mat[1,] and center[2,]
diff = mat[1,]-center[2,]
t(diff)%*%diff
[,1]
[1,] 24.08771
Repeat this process for each row of mat, I will end up with
[,1] [,2]
[1,] 3.925110 24.087710
[2,] 10.308154 7.956554
[3,] 11.324550 1.790750
[4,] 2.608405 16.408805
[5,] 3.817036 16.304836
I know how to implement it with for-loops. I was really hoping someone could tell me how to do it with some kind of an apply() function, maybe mapply() I guess.
Thanks
apply(center, 1, function(x) colSums((x - t(mat)) ^ 2))
# [,1] [,2]
# [1,] 3.925110 24.087710
# [2,] 10.308154 7.956554
# [3,] 11.324550 1.790750
# [4,] 2.608405 16.408805
# [5,] 3.817036 16.304836
If you want the apply for expressiveness of code that's one thing but it's still looping, just different syntax. This can be done without any loops, or with a very small one across center instead of mat. I'd just transpose first because it's wise to get into the habit of getting as much as possible out of the apply statement. (The BrodieG answer is pretty much identical in function.) These are working because R will automatically recycle the smaller vector along the matrix and do it much faster than apply or for.
tm <- t(mat)
apply(center, 1, function(m){
colSums((tm - m)^2) })
Use dist and then extract the relevant submatrix:
ix <- 1:nrow(mat)
as.matrix( dist( rbind(mat, center) )^2 )[ix, -ix]
6 7
# 1 3.925110 24.087710
# 2 10.308154 7.956554
# 3 11.324550 1.790750
# 4 2.608405 16.408805
# 5 3.817036 16.304836
REVISION: simplified slightly.
You could use outer as well
d <- function(i, j) sum((mat[i, ] - center[j, ])^2)
outer(1:nrow(mat), 1:nrow(center), Vectorize(d))
This will solve it
t(apply(mat,1,function(row){
d1<-sum((row-center[1,])^2)
d2<-sum((row-center[2,])^2)
return(c(d1,d2))
}))
Result:
[,1] [,2]
[1,] 3.925110 24.087710
[2,] 10.308154 7.956554
[3,] 11.324550 1.790750
[4,] 2.608405 16.408805
[5,] 3.817036 16.304836

Resources