Fast computation of kernel matrix in R - r

I have an n x p matrix and would like to compute the n x n matrix B defined as
B[i, j] = f(A[i,], A[j,])
where f is a function that accepts arguments of the appropriate dimensionality. Is there a neat trick to compute this in R? f is symmetric and positive-definite (if this can help in the computation).
EDIT: Praneet asked to specify f. That is a good point. Although I think it would be interesting to have an efficient solution for any function, I would get a lot of mileage from efficient computation in the important case where f(x, y) is base::norm(x-y, type='F').

You can use outer with the matrix dimensions.
n <- 10
p <- 5
A <- matrix( rnorm(n*p), n, p )
f <- function(x,y) sqrt(sum((x-y)^2))
B <- outer(
1:n, 1:n,
Vectorize( function(i,j) f(A[i,], A[j,]) )
)

Related

Generating a function via for-loop for the Jacobian matrix in R

Suppose I have the following two variables y and z and the variable x
y = 1:10
z = 1:10
Now I would like to create a jacobian of the following function
f <- function(x) c(y[1]*x[1]+z[1]*x[2],
y[2]*x[1]+z[2]*x[2],
: : : :
y[10]*x[1]+z[10]*x[2])
Then obtaining the Jacobian can be easily obtained by
jacobian(f, c(1,1))
Now suppose
y= 1:i.
When i becomes large, computing the function manually becomes a time-consuming task.
Is there a way to construct the same function for i?
I tried the following:
for (i in 1:10) {
f[i] <- function(x) c(y[i]*x[1]+z[i]*x[2])
}
jacobian(f, c(1,1))
ThomasIsCoding suggests:
f <- function(x) tcrossprod(cbind(y, z), t(x))
Which works perfectly for this case.
Now suppose that the function is more complex
y[1]*x[1]^2+z[1]/x[2]
The t(x) suggested does no longer work. How do I now write a vector for x?
You can try the following way for function f
f <- function(x) tcrossprod(cbind(y, z), t(x))

R: fastest way to set up matrix of integrals?

I have a tree-parameter function f(x, y, z), and two limits L, U.
Given a vector v, I want to set up a matrix with element M[i, j] = INTEGRAL( f(x, v[i], v[j]) ), where the integrals limits go from x = L to x = U.
So the problem has two elements:
We need to be able to calculate the integrals. I don't care how this is done, as long as its FAST and reasonably accurate. Fast, fast, fast!! What's the fastest way?
We need to set up the matrix M[i, j]. What's the fastest way?
Please don't make this an issue of "dO yOu WaNt GauSsIan QuaDraTure oR SimPsoNs ruLe?". I don't care. Speed is the only thing relevant here. Whatevers faster, I'll take it, as long as the integrals are at least accurate up to 1-2 digits or something.
A potentially fastest solution is given as below
library(pracma)
M <- matrix(0,nrow = length(v),ncol = length(v))
p <- sapply(seq(length(v)-1), function(k) integral(f,v[k],v[k+1]))
u <- unlist(sapply(rev(seq_along(p)), function(k) cumsum(tail(p,k))))
M[lower.tri(M)] <- u
M <- t(M-t(M))
Regarding the two elements requested by OP
I guess integral from package pracma is fast enough
To build the matrix M, I did not used nested for loop. The idea is explained at the bottom lines, which I believe speeds up the computation remarkably
Benchmark
I wrote down some of the possible solutions and you can compare their performance (my "fastest" solution is in method1()).
set.seed(1)
library(pracma)
# dummy data: function f and vector v
f <- function(x) x**3 + cos(x**2)
v <- rnorm(500)
# my "fastest" solution
method1 <- function() {
m1 <- matrix(0,nrow = length(v),ncol = length(v))
p <- sapply(seq(length(v)-1), function(k) integral(f,v[k],v[k+1]))
u <- unlist(sapply(rev(seq_along(p)), function(k) cumsum(tail(p,k))))
m1[lower.tri(m1)] <- u
t(m1-t(m1))
}
# faster than brute-force solution
method2 <- function() {
m2 <- matrix(0,nrow = length(v),ncol = length(v))
for (i in 1:(length(v)-1)) {
for (j in i:length(v)) {
m2[i,j] <- integral(f,v[i],v[j])
}
}
m2 + t(m2)
}
# slowest, brute-force solution
method3 <- function() {
m3 <- matrix(0,nrow = length(v),ncol = length(v))
for (i in 1:length(v)) {
for (j in 1:length(v)) {
m3[i,j] <- integral(f,v[i],v[j])
}
}
m3
}
# timing for compare
system.time(method1())
system.time(method2())
system.time(method3())
such that
> system.time(method1())
user system elapsed
0.17 0.01 0.19
> system.time(method2())
user system elapsed
25.72 0.07 25.81
> system.time(method3())
user system elapsed
41.84 0.03 41.89
Principle
The idea in method1() is that, you only need to calculate the integrals over intervals consisting of adjacent points in v. Note that the integral properties:
integral(f,v[i],v[j]) is equal to sum(integral(f,v[i],v[i+1]) + integral(f,v[i+1],v[i+1]) + ... + integral(f,v[j-1],v[j]))
integral(f,v[j],v[i]) is equal to -integral(f,v[i],v[j])
In this sense, given n <- length(v), you only need to run integral operations (which is rather computational expensive compared to matrix transpose or vector cumulative summation) n-1 times (far less than choose(n,2) times in method2() or n**2 times in method3(), particularly when n is large).

computing kernel matrix without loops in R [duplicate]

I have an n x p matrix and would like to compute the n x n matrix B defined as
B[i, j] = f(A[i,], A[j,])
where f is a function that accepts arguments of the appropriate dimensionality. Is there a neat trick to compute this in R? f is symmetric and positive-definite (if this can help in the computation).
EDIT: Praneet asked to specify f. That is a good point. Although I think it would be interesting to have an efficient solution for any function, I would get a lot of mileage from efficient computation in the important case where f(x, y) is base::norm(x-y, type='F').
You can use outer with the matrix dimensions.
n <- 10
p <- 5
A <- matrix( rnorm(n*p), n, p )
f <- function(x,y) sqrt(sum((x-y)^2))
B <- outer(
1:n, 1:n,
Vectorize( function(i,j) f(A[i,], A[j,]) )
)

Linear Independence of Large Sparse Matrices in R

I have three large matrices: I, G, and G^2. These are 4Million x 4Million matrices and they are sparse. I would like to check if they are linearly independent and I would like to do this in R.
For small matrices, a way to this is to vectorize each matrix: stack columns on top of each other and test if the matrix formed by the three stacked vectors has rank three.
However, due to the size of my problem I am not sure how to proceed.
(1) Is there a way to vectorize a Large Sparse Matrix into a Very Large Sparse Vector in R?
(2) Is there any other solution to the problem that could make this test efficient ?
Thanks in advance
When converting your matrices to vectors, you can keep only the non-zero elements.
# Sample data
n <- 4e6
k <- n
library(Matrix)
I <- spMatrix(n, n, 1:n, 1:n, rep(1,n))
G <- spMatrix(n, n,
sample(1:n, k, replace=TRUE),
sample(1:n, k, replace=TRUE),
sample(0:9, k, replace=TRUE)
)
G2 <- G %*% G
G2 <- as(G2, "dgTMatrix") # For the j slot
# Only keep elements that are non-zero in one of the 3 matrices
i <- as.integer( c(G#i, G2#i, I#i) + 1 )
j <- as.integer( c(G#j, G2#j, I#j) + 1 )
ij <- cbind(i,j)
rankMatrix( cbind( G2[ij], G[ij], I[ij] ) ) # 3
# Another example
m <- ceiling(n/2)-1
G <- spMatrix(n, n,
c(1:n, 2*(1:m)),
c(1:n, 2*(1:m)+1),
rep(1, n+m)
)
G2 <- as(G %*% G, "dgTMatrix")
i <- c(G#i, G2#i, I#i) + 1
j <- c(G#j, G2#j, I#j) + 1
ij <- cbind(i,j)
rankMatrix( cbind( G2[ij], G[ij], I[ij] ) ) # 2
(To speed things up, you could take only a small part of those vectors:
if the rank is already 3, you know that they are independent,
if it is 2, you can check if the linear dependence relation also holds for the large vectors.)

I have defined a vector according to a certain rule. How do I define a function that outputs the vector?

the rule is: for(i in 1:10){v[i]=f(q,m)}. f(q,m) is a function that generates random outputs in an interval according to the inputs q, m. 'v' is the vector.
After specifying the components of v that way, I can type v, and return the vector. What I would like to be able to do is define a function that takes the inputs q,m and returns the vector, v.
The reason is eventually I want to be able to graph the mean of v, ranging over the variable q. but i need a function that returns v first, i think. So any advice on how to define such a function would be greatly appreciated.
Thanks.
Generating values is elegantly done using the apply family of functions. vapply is lesser known, but more efficient than sapply, so I promote it here. The numeric(1) specifies what the result of f is expected to be:
# Emulating your function f
f <- function(q, m) runif(1, q, m)
# Generator function
g <- function(n=10, q, m) vapply(seq_len(n), function(i) f(q, m), numeric(1))
# Try it out
q <- 3
m <- 5
v <- g(10, q, m)
# or, if f is defined as above, simplify to:
v <- runif(10, q, m)
Exactly following your code:
makeVector <- function(q, m) {
v <- c()
for (i in 1:10) {
v[i] <- f(q, m)
}
v
}
Or, more elegant:
makeVector <- function(q, m) sapply(1:10, function(q, m) f(q, m))
It's probably equivalent to the solutions offered (and doesn't fully address the missing details) but have you tried?
Vf <- Vectorize(f)

Resources