Linear Independence of Large Sparse Matrices in R - r

I have three large matrices: I, G, and G^2. These are 4Million x 4Million matrices and they are sparse. I would like to check if they are linearly independent and I would like to do this in R.
For small matrices, a way to this is to vectorize each matrix: stack columns on top of each other and test if the matrix formed by the three stacked vectors has rank three.
However, due to the size of my problem I am not sure how to proceed.
(1) Is there a way to vectorize a Large Sparse Matrix into a Very Large Sparse Vector in R?
(2) Is there any other solution to the problem that could make this test efficient ?
Thanks in advance

When converting your matrices to vectors, you can keep only the non-zero elements.
# Sample data
n <- 4e6
k <- n
library(Matrix)
I <- spMatrix(n, n, 1:n, 1:n, rep(1,n))
G <- spMatrix(n, n,
sample(1:n, k, replace=TRUE),
sample(1:n, k, replace=TRUE),
sample(0:9, k, replace=TRUE)
)
G2 <- G %*% G
G2 <- as(G2, "dgTMatrix") # For the j slot
# Only keep elements that are non-zero in one of the 3 matrices
i <- as.integer( c(G#i, G2#i, I#i) + 1 )
j <- as.integer( c(G#j, G2#j, I#j) + 1 )
ij <- cbind(i,j)
rankMatrix( cbind( G2[ij], G[ij], I[ij] ) ) # 3
# Another example
m <- ceiling(n/2)-1
G <- spMatrix(n, n,
c(1:n, 2*(1:m)),
c(1:n, 2*(1:m)+1),
rep(1, n+m)
)
G2 <- as(G %*% G, "dgTMatrix")
i <- c(G#i, G2#i, I#i) + 1
j <- c(G#j, G2#j, I#j) + 1
ij <- cbind(i,j)
rankMatrix( cbind( G2[ij], G[ij], I[ij] ) ) # 2
(To speed things up, you could take only a small part of those vectors:
if the rank is already 3, you know that they are independent,
if it is 2, you can check if the linear dependence relation also holds for the large vectors.)

Related

computing kernel matrix without loops in R [duplicate]

I have an n x p matrix and would like to compute the n x n matrix B defined as
B[i, j] = f(A[i,], A[j,])
where f is a function that accepts arguments of the appropriate dimensionality. Is there a neat trick to compute this in R? f is symmetric and positive-definite (if this can help in the computation).
EDIT: Praneet asked to specify f. That is a good point. Although I think it would be interesting to have an efficient solution for any function, I would get a lot of mileage from efficient computation in the important case where f(x, y) is base::norm(x-y, type='F').
You can use outer with the matrix dimensions.
n <- 10
p <- 5
A <- matrix( rnorm(n*p), n, p )
f <- function(x,y) sqrt(sum((x-y)^2))
B <- outer(
1:n, 1:n,
Vectorize( function(i,j) f(A[i,], A[j,]) )
)

R - Given a matrix and a power, produce multiple matrices containing all combinations of matrix columns

Given a matrix mat (of size N by M) and a power, p (e.g., 4), produce p matrices, where each p-th matrix contains all possible combinations of the columns in mat at that degree.
In my current approach, I generate the p-th matrix and then use it in the next call to produce the p+1th matrix. Can this be 'automated' for a given power p, rather than done manually?
I am a novice when it comes to R and understand that there is likely a more efficient and elegant way to achieve this solution than the following attempt...
N = 5
M = 3
p = 4
mat = matrix(1:(N*M),N,M)
mat_1 = mat
mat_2 = t(sapply(1:N, function(i) tcrossprod(mat_1[i, ], mat[i, ])))
mat_3 = t(sapply(1:N, function(i) tcrossprod(mat_2[i, ], mat[i, ])))
mat_4 = t(sapply(1:N, function(i) tcrossprod(mat_3[i, ], mat[i, ])))
Can anyone provide some suggestions? My goal is to create a function for a given matrix mat and power p that outputs the p different matrices in a more 'automated' fashion.
Related question that got me started: How to multiply columns of two matrix with all combinations
This solves your problem.
N = 5
M = 3
p = 4
mat = matrix(1:(N*M),N,M)
f=function(x) matrix(apply(x,2,"*",mat),nrow(x))
rev(Reduce(function(f,x)f(x), rep(c(f), p-1), mat, T,T))
You can do something like this
N = 5
M = 3
p = 4
mat = matrix(1:(N*M),N,M)
res_mat <- list()
res_mat[[1]] <- mat
for(i in 2:p) {
res_mat[[i]] <- t(sapply(1:N, function(j) tcrossprod(res_mat[[i-1]][j, ], res_mat[[1]][j, ])))
}

Euclidean distance for each row in dataset

Suppose we have dataset G2:
data(iris)
G2 <- iris[1:5, -5]
We need to calculate Euclidean distance between x (row in G2) and G2 (excluding x) for all x's in G2, formally
I wonder what is the best way to to this. Here is my initial attempt:
D <- dist(G2)
m1 <- as.matrix(D)
(1 / (5 - 1)) * colSums(m1)
Your notation is a bit confusing because you use D differently in the code and formula. How about
m <- as.matrix(dist(G2, upper=T))
D <- apply(m, 2, mean)
n <- length(D)
D <- n/(n-1)*D

Speeding up this tricky matrix calculation

As of now I am computing some features from a large matrix and doing it all in a for-loop. As expected it's very slow. I have been able to vectorize part of the code, but I'm stuck on one part.
I would greatly appreciate some advice/help!
s1 <- MyMatrix #dim = c(5167,256)
fr <- MyVector #vector of length 256
tw <- 5
fw <- 6
# For each point S(t,f) we need the sub-matrix of points S_hat(i,j),
# i in [t - tw, t + tw], j in [f - fw, f + fw] for the feature vector.
# To avoid edge effects, I pad the original matrix with zeros,
# resulting in a matrix of size nobs+2*tw x nfreqs+2*fw
nobs <- dim(s1)[1] #note: this is 5167
nf <- dim(s1)[2] #note: this is 256
sp <- matrix(0, nobs+2*tw, nf+2*fw)
t1 <- tw+1; tn <- nobs+tw
f1 <- fw+1; fn <- nf+fw
sp[t1:tn, f1:fn] <- s1 # embed the actual matrix into the padding
nfeatures <- 1 + (2*tw+1)*(2*fw+1) + 1
fsp <- array(NaN, c(dim(sp),nfeatures))
for (t in t1:tn){
for (f in f1:fn){
fsp[t,f,1] <- fr[(f - f1 + 1)] #this part I can vectorize
fsp[t,f,2:(nfeatures-1)] <- as.vector(sp[(t-tw):(t+tw),(f-fw):(f+fw)]) #this line is the problem
fsp[t,f,nfeatures] <- var(fsp[t,f,2:(nfeatures-1)])
}
}
fspec[t1:tn, f1:fn, 1] <- t(matrix(rep(fr,(tn-t1+1)),ncol=(tn-t1+1)))
#vectorized version of the first feature ^
return(fsp[t1:tn, f1:fn, ]) #this is the returned matrix
I assume that the var feature will be easy to vectorize after the 2nd feature is vectorized

Efficiently Load A Sparse Matrix in R

I'm having trouble efficiently loading data into a sparse matrix format in R.
Here is an (incomplete) example of my current strategy:
library(Matrix)
a1=Matrix(0,5000,100000,sparse=T)
for(i in 1:5000)
a1[i,idxOfCols]=x
Where x is usually around length 20. This is not efficient and eventually slows to a crawl. I know there is a better way but wasn't sure how. Suggestions?
You can populate the matrix all at once:
library(Matrix)
n <- 5000
m <- 1e5
k <- 20
idxOfCols <- sample(1:m, k)
x <- rnorm(k)
a2 <- sparseMatrix(
i=rep(1:n, each=k),
j=rep(idxOfCols, n),
x=rep(x, k),
dims=c(n,m)
)
# Compare
a1 <- Matrix(0,5000,100000,sparse=T)
for(i in 1:n) {
a1[i,idxOfCols] <- x
}
sum(a1 - a2) # 0
You don't need to use a for-loop. Yu can just use standard matrix indexing with a two column matrix:
a1[ cbind(i,idxOfCols) ] <- x

Resources