computing kernel matrix without loops in R [duplicate] - r

I have an n x p matrix and would like to compute the n x n matrix B defined as
B[i, j] = f(A[i,], A[j,])
where f is a function that accepts arguments of the appropriate dimensionality. Is there a neat trick to compute this in R? f is symmetric and positive-definite (if this can help in the computation).
EDIT: Praneet asked to specify f. That is a good point. Although I think it would be interesting to have an efficient solution for any function, I would get a lot of mileage from efficient computation in the important case where f(x, y) is base::norm(x-y, type='F').

You can use outer with the matrix dimensions.
n <- 10
p <- 5
A <- matrix( rnorm(n*p), n, p )
f <- function(x,y) sqrt(sum((x-y)^2))
B <- outer(
1:n, 1:n,
Vectorize( function(i,j) f(A[i,], A[j,]) )
)

Related

Generating a function via for-loop for the Jacobian matrix in R

Suppose I have the following two variables y and z and the variable x
y = 1:10
z = 1:10
Now I would like to create a jacobian of the following function
f <- function(x) c(y[1]*x[1]+z[1]*x[2],
y[2]*x[1]+z[2]*x[2],
: : : :
y[10]*x[1]+z[10]*x[2])
Then obtaining the Jacobian can be easily obtained by
jacobian(f, c(1,1))
Now suppose
y= 1:i.
When i becomes large, computing the function manually becomes a time-consuming task.
Is there a way to construct the same function for i?
I tried the following:
for (i in 1:10) {
f[i] <- function(x) c(y[i]*x[1]+z[i]*x[2])
}
jacobian(f, c(1,1))
ThomasIsCoding suggests:
f <- function(x) tcrossprod(cbind(y, z), t(x))
Which works perfectly for this case.
Now suppose that the function is more complex
y[1]*x[1]^2+z[1]/x[2]
The t(x) suggested does no longer work. How do I now write a vector for x?
You can try the following way for function f
f <- function(x) tcrossprod(cbind(y, z), t(x))

Fast computation of kernel matrix in R

I have an n x p matrix and would like to compute the n x n matrix B defined as
B[i, j] = f(A[i,], A[j,])
where f is a function that accepts arguments of the appropriate dimensionality. Is there a neat trick to compute this in R? f is symmetric and positive-definite (if this can help in the computation).
EDIT: Praneet asked to specify f. That is a good point. Although I think it would be interesting to have an efficient solution for any function, I would get a lot of mileage from efficient computation in the important case where f(x, y) is base::norm(x-y, type='F').
You can use outer with the matrix dimensions.
n <- 10
p <- 5
A <- matrix( rnorm(n*p), n, p )
f <- function(x,y) sqrt(sum((x-y)^2))
B <- outer(
1:n, 1:n,
Vectorize( function(i,j) f(A[i,], A[j,]) )
)

Linear Independence of Large Sparse Matrices in R

I have three large matrices: I, G, and G^2. These are 4Million x 4Million matrices and they are sparse. I would like to check if they are linearly independent and I would like to do this in R.
For small matrices, a way to this is to vectorize each matrix: stack columns on top of each other and test if the matrix formed by the three stacked vectors has rank three.
However, due to the size of my problem I am not sure how to proceed.
(1) Is there a way to vectorize a Large Sparse Matrix into a Very Large Sparse Vector in R?
(2) Is there any other solution to the problem that could make this test efficient ?
Thanks in advance
When converting your matrices to vectors, you can keep only the non-zero elements.
# Sample data
n <- 4e6
k <- n
library(Matrix)
I <- spMatrix(n, n, 1:n, 1:n, rep(1,n))
G <- spMatrix(n, n,
sample(1:n, k, replace=TRUE),
sample(1:n, k, replace=TRUE),
sample(0:9, k, replace=TRUE)
)
G2 <- G %*% G
G2 <- as(G2, "dgTMatrix") # For the j slot
# Only keep elements that are non-zero in one of the 3 matrices
i <- as.integer( c(G#i, G2#i, I#i) + 1 )
j <- as.integer( c(G#j, G2#j, I#j) + 1 )
ij <- cbind(i,j)
rankMatrix( cbind( G2[ij], G[ij], I[ij] ) ) # 3
# Another example
m <- ceiling(n/2)-1
G <- spMatrix(n, n,
c(1:n, 2*(1:m)),
c(1:n, 2*(1:m)+1),
rep(1, n+m)
)
G2 <- as(G %*% G, "dgTMatrix")
i <- c(G#i, G2#i, I#i) + 1
j <- c(G#j, G2#j, I#j) + 1
ij <- cbind(i,j)
rankMatrix( cbind( G2[ij], G[ij], I[ij] ) ) # 2
(To speed things up, you could take only a small part of those vectors:
if the rank is already 3, you know that they are independent,
if it is 2, you can check if the linear dependence relation also holds for the large vectors.)

efficient way of calculating lots of matrices

I'm trying to write a program that does the following:
Given two intervals A and B, for every (a,b) with a in A and b in B
create a variance matrix ymat, depending on (a,b)
calculate the (multivariate normal) density of some vector y
with mean 0 and variance matrix ymat
I learned that using loops is bad in R, so I wanted to use outer(). Here are my two functions:
y_mat <- function(n,lambda,theta,sigma) {
L <- diag(n);
L[row(L) == col(L) + 1] <- -1;
K <- t(1/n * L - theta*diag(n))%*%(1/n * L - theta*diag(n));
return(sigma^2*diag(n) + 1/lambda*K);
}
make_plot <- function(y,sigma,theta,lambda) {
n <- length(y)
sig_intv <- seq(.1,2*sigma,.01);
th_intv <- seq(-abs(2*theta),abs(2*theta),.01);
z <- outer(sig_intv,th_intv,function(s,t){dmvnorm(y,rep(0,n),y_mat(n,lambda,theta=t,sigma=s))})
contour(sig_intv,th_intv,z);
}
The shape of the variance matrix isn't relevant for this question. n and lambda are just two scalars, as are sigma and theta.
When I try
make_plot(y,.5,-3,10)
I get the following error message:
Error in t(1/n * L - theta * diag(n)) :
dims [product 25] do not match the length of object [109291]
In addition: Warning message:
In theta * diag(n) :
longer object length is not a multiple of shorter object length
Could someone enlighten me as to what's going wrong? Am I maybe going about this the wrong way?
The third argument of outer should be a vectorized function. Wrapping it with Vectorize should suffice:
make_plot <- function(y, sigma, theta, lambda) {
n <- length(y)
sig_intv <- seq(.1,2*sigma,.01);
th_intv <- seq(-abs(2*theta),abs(2*theta),.01);
z <- outer(
sig_intv, th_intv,
Vectorize(function(s,t){dmvnorm(y,rep(0,n),y_mat(n,lambda,theta=t,sigma=s))})
)
contour(sig_intv,th_intv,z);
}

I have defined a vector according to a certain rule. How do I define a function that outputs the vector?

the rule is: for(i in 1:10){v[i]=f(q,m)}. f(q,m) is a function that generates random outputs in an interval according to the inputs q, m. 'v' is the vector.
After specifying the components of v that way, I can type v, and return the vector. What I would like to be able to do is define a function that takes the inputs q,m and returns the vector, v.
The reason is eventually I want to be able to graph the mean of v, ranging over the variable q. but i need a function that returns v first, i think. So any advice on how to define such a function would be greatly appreciated.
Thanks.
Generating values is elegantly done using the apply family of functions. vapply is lesser known, but more efficient than sapply, so I promote it here. The numeric(1) specifies what the result of f is expected to be:
# Emulating your function f
f <- function(q, m) runif(1, q, m)
# Generator function
g <- function(n=10, q, m) vapply(seq_len(n), function(i) f(q, m), numeric(1))
# Try it out
q <- 3
m <- 5
v <- g(10, q, m)
# or, if f is defined as above, simplify to:
v <- runif(10, q, m)
Exactly following your code:
makeVector <- function(q, m) {
v <- c()
for (i in 1:10) {
v[i] <- f(q, m)
}
v
}
Or, more elegant:
makeVector <- function(q, m) sapply(1:10, function(q, m) f(q, m))
It's probably equivalent to the solutions offered (and doesn't fully address the missing details) but have you tried?
Vf <- Vectorize(f)

Resources