How can we construct a block-diagonal matrix from a three-dimensional array in R? There are several possibilities when starting from a list of matrices (e.g., Reduce(magic::adiag, list_of_matrices)) or individual matrices (e.g., magic::adiag(matrix1, matrix2)). However, I could not find anything when we start with an array:
matrices <- array(NA, c(3,3,2))
matrices[,,1] <- diag(1,3)
matrices[,,2] <- matrix(rnorm(9), 3, 3)
Are there any efficient solutions for constructing the corresponding 9x9 block matrix or is it a better idea to just convert to a list and use magic::adiag? The latter seems relatively inefficient, especially when the number of matrices is large.
I guess converting to a list and using magic::adiag is the fastest way. Try the following lines of code which is rather short and I use frequently:
library(magic)
arr <- array(1:8, c(2,2,3))
do.call("adiag", lapply(seq(dim(arr)[3]), function(x) arr[ , , x]))
This essentially reduces to a one-liner but uses lists.
Related
I'm working with a vector (~14000x1) of various values that I would like to put on the diagonal of a sparse matrix where I'm using the library Matrix. I want to do this while avoiding the need of creating a full matrix and then converting back to a sparse matrix after.
So far I can do this with a for loop but it takes a long time. Can you think of a more efficient and least memory-intense way of doing it?
Here's a simple reproducible example:
library(Matrix)
x = Matrix(matrix(1,14000,1),sparse=TRUE)
X = Diagonal(14000)
for(i in 1:13383){
X[i,i]=aa[i]
print(i)
}
I have 100,000 sparse matrices("dgCMatrix") store in a list object. The row number of every matrix is the same(8,000,000) and the size of the list is approximately 25 Gb. Now when I do:
do.call(cbind, theListofMatrices)
to combine all matrices into one big sparse matrix, I got "node stack overflow". Actually, I can't even do this with only 500 elements out of that list, which should output a sparse matrix with a size of only 100 Mb.
My speculation for this is that the cbind() function transformed the sparse matrix to a normal dense matrix and thus cause the stack overflow?
Actually, I have tried something like this:
tmp = do.call(cbind, theListofMatrices[1:400])
this works fine, and tmp is still a sparse matrix with a size of 95 Mb, and then I tried:
> tmp = do.call(cbind, theListofMatrices[1:410])
Error in stopifnot(0 <= deparse.level, deparse.level <= 2) :
node stack overflow
and then the error occurred. However, I am having no trouble doing something like:
cbind(tmp, tmp, tmp, tmp)
thus, I believe it has something to do with do.call()
Reduce() seems to solve my problem, though I still don't know the reason why do.call() crushes.
The problem is not in do.call() but due to the way cbind from the Matrix package is implemented. It uses recursion to bind the individual arguments together. For instance, Matrix::cbind(mat1, mat2, mat3) is translated to something along the lines of Matrix::cbind(mat1, Matrix::cbind(mat2, mat3)).
Since do.call(cbind, theListofMatrices) is basically cbind(theListofMatrices[[1]], theListofMatrices[[2]], ...) you have too many arguments to the cbind function and you will end up with a recursion that's nested too deeply and it will fail.
Thus, Ben's comment to use Reduce() is a good way to work around that issue since it avoids the recursion and replaces it with an iteration:
tmp <- Reduce(cbind, theListofMatrices[-1], theListofMatrices[[1]])
In R: a 2-column matrix can have up to 2^30-1 rows = 1073,741,823 rows. So, I would check the row number and check the RAM size to make sure it can accommodate the big matrix size.
For example:
Let M be some matrix mXn matrix where n is large enough to make manual entry impossible.
tmp_list[1] <- M[,1:10]
tmp_list[2] <- M[,11:20]
.
.
.
tmp_list[last] <- M[end - 9,end]
The problem I'm working on is sort of monte carlo, repeating an experiment involving a random mXn matrix 100K times. I'm still pretty new to R, I've done it using a for loop, but it obviously took a very long time. So I'm hoping to assign each "experiment" to an element of a list and use lapply.
let's take the easy case, and you can expand it from there
say n=100, develop your start indeces
n<-100
byParam<-10
starts<-seq(1, n-(byParam-1), by=byParam)
then lapply
tmp_list<-lapply(starts, function(startIndex) M[, startIndex:(startIndex+(byParam-1)])
just one way to do it, becomes a bit more complicated if n is not a nice multiple of 10 (or whatever you set the "byParam" equal to). If that is the case then you can develop your start and end indeces, and then use mapply instead
#given start and end indeces
tmp_list<-mapply(function(startInd, endInd){
M[, startInd:endInd},
startInd=starts, endInd=ends)
Now lapply and mapply are still iterative, so I wouldn't expect massive improvement on time efficiency
EDIT
After discussion in the comments, here is a solution for the entire set up, not just the above question
tmp_list<-lapply(1:1000, function(i){
vect<-sample(c(0,1), 10*1000, replace=TRUE)
dim(vect)<-c(10, 1000)
vect
})
Let's break this down, it makes everything very simple.
We first create a random sample of 1's and 0's, of the length 10*1000 (the number of elements in each sub-matrix). We can then neatly convert that vector to a matrix by assigning it's dim attribute to be c(10, 1000), which changes its form to have 10 rows and 1000 columns. Then we return that into a list at the index i. We lapply over 1:1000, or iterate 1000 times.
I'm new to R programming and I know I could write a loop to do this, but everything I read says that for simplicity its best to avoid loops and use apply instead.
I have a matrix and i would like to run this function on each element in the matrix.
cellresidue <- function(i,j){
result <- (cluster[i,j] - cluster.I[i,] - cluster.J[j,] - cluster.IJ)/(cluster.N*cluster.M)
return (result)
}
i= element row
j= element column
cluster.J is a matrix of column means
cluster.I is a matrix of row means
cluster.IJ is the mean of the entire matrix named cluster
What I can't figure out is how do I get the row and column of the element (I think should use row() and column col() functions) that mapply is working with and how do pass those arguments to mapply or apply?
There is no need for loops or *apply functions. You can just use plain matrix operations:
nI <- nrows(cluster)
nJ <- ncols(cluster)
cluster.I <- matrix(rowMeans(cluster), nI, nJ, byrow = FALSE)
cluster.J <- matrix(rowMeans(cluster), nI, nJ, byrow = TRUE)
cluster.IJ <- matrix( mean(cluster), nI, nJ)
residue.mat <- (cluster - cluster.I - cluster.J - cluster.IJ) /
(cluster.N * cluster.M)
(You did not explain what cluster.N and cluster.M are but I assume they are scalars)
It is not clear from your question what you are trying to do. It is best on this site to provide some mock data (preferably generated by the code, not pasted), and then show what form the end result should look like. It seems that the apply family is not what you seek.
Quick disambiguation between apply, sapply and mapply:
#providing data for examples
X=matrix(rnorm(9),3,3)
apply: apply a function to either columns (2) or rows (1) of a matrix or array
#here, sum by columns, same as colSums(X)
apply(X, 2, sum)
sapply: apply a function against (usually) a list of objects
#create a list with three vectors
mylist=list(1:4, 5:10, c(1,1,1))
#get the mean of each vector
sapply(mylist, mean)
#remove 2 to each element of X, same as c(X-2)
sapply(X, FUN=function(x) x-2)
mapply: a multivariate version of sapply, taking an arbitrary number of arguments. Never had much use of it… Some rock-bottom examples:
#same as c(1,2,3,4) + c(15,16,17,18)
mapply(sum, 1:4, 15:18)
#same as c(X+X), the vectorized matrix sum
mapply(sum, X, X)
Side note: It's perfectly ok to use loops in R; use whichever suits the best your thoughts. The issue is that if you have a "really big" number of iterations, this is where you could meet bottlenecks, depending on your patience. There are two solutions to this: rewrite your function in C/FORTRAN (and boost speed), or use built-in functions if applicable (which are, by the way, often writen in C or FORTRAN).
This is my problem:
There is a predefined list named gamma with three entries: gamma$'2' is 2x2 matrix gamma$'3' a 3x3 matrix and gamma$'4' a 4x4 matrix. I would like to have function that returns the matrix I need:
GiveMatrix <- function(n) {
gamma.list <- #init the list of matrices
gamma.list$n # return the list entry named n
Since n is not a character, the last line does not work. I tried gamma.list$paste(n)and gamma.list$as.character(n)but both did not work. Is there a function that converts nto the right format? Or is there maybe a much better way? I know, I am not really good in R.
You need to use:
gamma.list[[as.character(n)]]
In your example, R is looking for a entry in the list called n. When using [[, the contents of n is used, which is what you need.
I've found it!
gamma.list[as.character(n)] is the solution I needed.