I have three large matrices: I, G, and G^2. These are 4Million x 4Million matrices and they are sparse. I would like to check if they are linearly independent and I would like to do this in R.
For small matrices, a way to this is to vectorize each matrix: stack columns on top of each other and test if the matrix formed by the three stacked vectors has rank three.
However, due to the size of my problem I am not sure how to proceed.
(1) Is there a way to vectorize a Large Sparse Matrix into a Very Large Sparse Vector in R?
(2) Is there any other solution to the problem that could make this test efficient ?
Thanks in advance
When converting your matrices to vectors, you can keep only the non-zero elements.
# Sample data
n <- 4e6
k <- n
library(Matrix)
I <- spMatrix(n, n, 1:n, 1:n, rep(1,n))
G <- spMatrix(n, n,
sample(1:n, k, replace=TRUE),
sample(1:n, k, replace=TRUE),
sample(0:9, k, replace=TRUE)
)
G2 <- G %*% G
G2 <- as(G2, "dgTMatrix") # For the j slot
# Only keep elements that are non-zero in one of the 3 matrices
i <- as.integer( c(G#i, G2#i, I#i) + 1 )
j <- as.integer( c(G#j, G2#j, I#j) + 1 )
ij <- cbind(i,j)
rankMatrix( cbind( G2[ij], G[ij], I[ij] ) ) # 3
# Another example
m <- ceiling(n/2)-1
G <- spMatrix(n, n,
c(1:n, 2*(1:m)),
c(1:n, 2*(1:m)+1),
rep(1, n+m)
)
G2 <- as(G %*% G, "dgTMatrix")
i <- c(G#i, G2#i, I#i) + 1
j <- c(G#j, G2#j, I#j) + 1
ij <- cbind(i,j)
rankMatrix( cbind( G2[ij], G[ij], I[ij] ) ) # 2
(To speed things up, you could take only a small part of those vectors:
if the rank is already 3, you know that they are independent,
if it is 2, you can check if the linear dependence relation also holds for the large vectors.)
Related
I have an n x p matrix and would like to compute the n x n matrix B defined as
B[i, j] = f(A[i,], A[j,])
where f is a function that accepts arguments of the appropriate dimensionality. Is there a neat trick to compute this in R? f is symmetric and positive-definite (if this can help in the computation).
EDIT: Praneet asked to specify f. That is a good point. Although I think it would be interesting to have an efficient solution for any function, I would get a lot of mileage from efficient computation in the important case where f(x, y) is base::norm(x-y, type='F').
You can use outer with the matrix dimensions.
n <- 10
p <- 5
A <- matrix( rnorm(n*p), n, p )
f <- function(x,y) sqrt(sum((x-y)^2))
B <- outer(
1:n, 1:n,
Vectorize( function(i,j) f(A[i,], A[j,]) )
)
Given a matrix mat (of size N by M) and a power, p (e.g., 4), produce p matrices, where each p-th matrix contains all possible combinations of the columns in mat at that degree.
In my current approach, I generate the p-th matrix and then use it in the next call to produce the p+1th matrix. Can this be 'automated' for a given power p, rather than done manually?
I am a novice when it comes to R and understand that there is likely a more efficient and elegant way to achieve this solution than the following attempt...
N = 5
M = 3
p = 4
mat = matrix(1:(N*M),N,M)
mat_1 = mat
mat_2 = t(sapply(1:N, function(i) tcrossprod(mat_1[i, ], mat[i, ])))
mat_3 = t(sapply(1:N, function(i) tcrossprod(mat_2[i, ], mat[i, ])))
mat_4 = t(sapply(1:N, function(i) tcrossprod(mat_3[i, ], mat[i, ])))
Can anyone provide some suggestions? My goal is to create a function for a given matrix mat and power p that outputs the p different matrices in a more 'automated' fashion.
Related question that got me started: How to multiply columns of two matrix with all combinations
This solves your problem.
N = 5
M = 3
p = 4
mat = matrix(1:(N*M),N,M)
f=function(x) matrix(apply(x,2,"*",mat),nrow(x))
rev(Reduce(function(f,x)f(x), rep(c(f), p-1), mat, T,T))
You can do something like this
N = 5
M = 3
p = 4
mat = matrix(1:(N*M),N,M)
res_mat <- list()
res_mat[[1]] <- mat
for(i in 2:p) {
res_mat[[i]] <- t(sapply(1:N, function(j) tcrossprod(res_mat[[i-1]][j, ], res_mat[[1]][j, ])))
}
Suppose we have dataset G2:
data(iris)
G2 <- iris[1:5, -5]
We need to calculate Euclidean distance between x (row in G2) and G2 (excluding x) for all x's in G2, formally
I wonder what is the best way to to this. Here is my initial attempt:
D <- dist(G2)
m1 <- as.matrix(D)
(1 / (5 - 1)) * colSums(m1)
Your notation is a bit confusing because you use D differently in the code and formula. How about
m <- as.matrix(dist(G2, upper=T))
D <- apply(m, 2, mean)
n <- length(D)
D <- n/(n-1)*D
As of now I am computing some features from a large matrix and doing it all in a for-loop. As expected it's very slow. I have been able to vectorize part of the code, but I'm stuck on one part.
I would greatly appreciate some advice/help!
s1 <- MyMatrix #dim = c(5167,256)
fr <- MyVector #vector of length 256
tw <- 5
fw <- 6
# For each point S(t,f) we need the sub-matrix of points S_hat(i,j),
# i in [t - tw, t + tw], j in [f - fw, f + fw] for the feature vector.
# To avoid edge effects, I pad the original matrix with zeros,
# resulting in a matrix of size nobs+2*tw x nfreqs+2*fw
nobs <- dim(s1)[1] #note: this is 5167
nf <- dim(s1)[2] #note: this is 256
sp <- matrix(0, nobs+2*tw, nf+2*fw)
t1 <- tw+1; tn <- nobs+tw
f1 <- fw+1; fn <- nf+fw
sp[t1:tn, f1:fn] <- s1 # embed the actual matrix into the padding
nfeatures <- 1 + (2*tw+1)*(2*fw+1) + 1
fsp <- array(NaN, c(dim(sp),nfeatures))
for (t in t1:tn){
for (f in f1:fn){
fsp[t,f,1] <- fr[(f - f1 + 1)] #this part I can vectorize
fsp[t,f,2:(nfeatures-1)] <- as.vector(sp[(t-tw):(t+tw),(f-fw):(f+fw)]) #this line is the problem
fsp[t,f,nfeatures] <- var(fsp[t,f,2:(nfeatures-1)])
}
}
fspec[t1:tn, f1:fn, 1] <- t(matrix(rep(fr,(tn-t1+1)),ncol=(tn-t1+1)))
#vectorized version of the first feature ^
return(fsp[t1:tn, f1:fn, ]) #this is the returned matrix
I assume that the var feature will be easy to vectorize after the 2nd feature is vectorized
I'm having trouble efficiently loading data into a sparse matrix format in R.
Here is an (incomplete) example of my current strategy:
library(Matrix)
a1=Matrix(0,5000,100000,sparse=T)
for(i in 1:5000)
a1[i,idxOfCols]=x
Where x is usually around length 20. This is not efficient and eventually slows to a crawl. I know there is a better way but wasn't sure how. Suggestions?
You can populate the matrix all at once:
library(Matrix)
n <- 5000
m <- 1e5
k <- 20
idxOfCols <- sample(1:m, k)
x <- rnorm(k)
a2 <- sparseMatrix(
i=rep(1:n, each=k),
j=rep(idxOfCols, n),
x=rep(x, k),
dims=c(n,m)
)
# Compare
a1 <- Matrix(0,5000,100000,sparse=T)
for(i in 1:n) {
a1[i,idxOfCols] <- x
}
sum(a1 - a2) # 0
You don't need to use a for-loop. Yu can just use standard matrix indexing with a two column matrix:
a1[ cbind(i,idxOfCols) ] <- x