I am looking for a fast computation in R of the trace (trace(A)) of a matrix A = B' C. The fastest way I can think of is the following:
set.seed(123)
n <- 10^6
B <- matrix(rnorm(n), ncol=sqrt(n))
C <- matrix(rnorm(n), ncol=sqrt(n))
ptm <- proc.time()
A <- tcrossprod(B,C)
traceA <- sum(diag(A))
proc.time() - ptm
I am asking myself if there is a faster way (especially if matrix B and matrix C are symmetric or even idempotent). I mean with the line A <- tcrossprod(B,C) I am computing the whole matrix A, although I just need the sum of the diagonal elements of the matrix (trace(A)).
To speed this up, I thought of a parallel computation for tcrossprod, but I havn't found an implementation for this (additionally I don't know if this would be a good idea). Does someone have an idea?
Tr(B'C) is just the inner product of the matrices B and C viewed as vectors. Thus
sum(B*C)
does the trick and is several orders of magnitude faster in this example.
Related
When making predictions for a linear statistical model we usually have a model matrix X of predictors corresponding to the points at which we want to make predictions; a vector of coefficients beta; and a variance-covariance matrix V. Computing the predictions is just X %*% beta. The most straightforward way to compute the variances of the predictions is
diag(X %*% V %*% t(X))
or slightly more efficiently
diag(X %*% tcrossprod(V,X))
However, this is very inefficient, because it constructs an n*n matrix when all we really want is the diagonal. I know I could write some Rcpp-loopy thing that would compute just the diagonal terms, but I'm wondering if there is an existing linear algebra trick in R that will nicely do what I want ... (if someone wants to write the Rcpp-loopy thing for me as an answer I wouldn't object, but I'd prefer a pure-R solution)
FWIW predict.lm seems to do something clever by multiplying X by the inverse of the R component of the QR-decomposition of the lm; I'm not sure that's always going to be available, but it might be a good starting point (see here)
Along the lines of this Octave/Matlab question, for two matrices A and B, we can use the use the fact that the nth diagonal entry of AB will be the product of the nth row of A with the nth column of B. We can naively extend that to the case of three matrices, ABC. I have not considered how to optimize in the case where C=A^T, but aside from that, this code looks like promising speedup:
start_time <- Sys.time()
A=matrix(1:1000000, nrow = 1000, ncol = 1000)
B=matrix(1000000:1, nrow = 1000, ncol = 1000)
# Try one of these two
res=diag(A %*% B %*% t(A)) # ~0.47s
res=rowSums(A * t(B %*% t(A))) # ~0.27s
end_time <- Sys.time()
print(end_time - start_time)
Using tcrossprod did not appear to accelerate the results when I ran this code. However, just using the row-sum-dot-product approach appears to be a lot more efficient already, at least on this silly example, which suggests (though I'm not sure) that rowSums is not computing the full intermediate matrices before returning the diagonal entries, as I'd expect happens with diag.
I am not quite sure how efficient this is,
Find U such that V = U %*% t(U); this is possible since V is cov matrix.
XU = X %*% U
result = apply(XU, 1, function(x) sum(x^2))
Demo
V <- cov(iris[, -5])
X <- as.matrix(iris[1:5, -5])
Using SVD
svd_v <- svd(V)
U <- svd_v$u %*% diag(sqrt(svd_v$d))
XU = X %*% U
apply(XU, 1, function(x) sum(x^2))
# 1 2 3 4 5
#41.35342 39.36286 35.42369 38.25584 40.30839
Another approach - this isn't also going to be faster than #davewy's
U <- chol(V)
XU = (X %*% U)^2
rowSums(XU)
I recently found emulator::quad.diag(), which is just
colSums(crossprod(M, Conj(x)) * x)
This is slightly better than #davewy's solution (although the overall differences are less than I thought they would be anyway).
library(microbenchmark)
microbenchmark(full=diag(A %*% B %*% t(A)),
davewy=rowSums(A * t(B %*% t(A))),
emu = quad.diag(A,B))
Unit: milliseconds
expr min lq mean median uq max neval cld
full 32.76241 35.49665 39.51683 37.63958 41.46561 57.41370 100 c
davewy 22.74787 25.06874 28.42179 26.97330 29.68895 45.38188 100 b
emu 17.68390 20.21322 23.59981 22.09324 24.80734 43.60953 100 a
I seem to have a misunderstanding about memory usage when using a subset of a matrix in R. I came across when I tried to program a cross validation function, but I think the problem is more general. I have cooked up a small example below.
# parameters
n <- 1e6 # the real data are much bigger, but this will do
m <- 50
nfolds <- 10
X <- matrix(rnorm(n*m,0,1),nrow=n,ncol=m)
y <- rnorm(n,0,1)
mse <- rep(0,nfolds)
foldid <- sample(rep(seq(nfolds), length = n))
# produces big spikes in memory
for (i in (1:nfolds)) {
which <- foldid == i
xpx <- crossprod(X[!which,])
xpy <- crossprod(X[!which,],y[!which])
b <- solve(xpx,xpy)
mse[i] <- mean((y[which] - X[which,] %*% b)**2)
}
# does not produce spikes in memory usage
for (i in (1:nfolds)) {
xpx <- crossprod(X)
xpy <- crossprod(X,y)
b <- solve(xpx,xpy)
mse[i] <- mean((y - X %*% b)**2)
}
I don't understand why the first loop produces big upward spikes in memory usage, whereas the second loop doesn't although a strictly larger matrix is multiplied.
Let's compare the first lines withing the loops.
First, the simple crossprod:
xpx <- crossprod(X)
Without subsetting, you work with matrices X (already existing 400 MB) and xpx (small).
Second, with subsetting:
xpx <- crossprod(X[!which,])
Here you work with X, temporary matrix X[!which,], and xpx. The additional matrix X[!which,] requires additional 360 MB of memory.
object.size(X[!which,])
# 360000200 bytes
R has relatively poor memory managment, so the temporary matrix may not be discarded for some time.
Without using any control flow
statements, i.e., if, while, for write an R function that lists all such triplets {a, b, c} less than 1000 and a < b < c.I have no idea how to go about this problem other than knowing the which function will help. Im guessing its some sort of recursion.
# Just because I think this is pretty
xx <- (1:1000) ^2
xy <- combn(xx,2)
xz <- rbind(xy, colSums(xy))
xp <- xz[, xz[3,] %in% xx]
sqrt(xp)
I want to perform a Matrix factorization with alternating least squares (ALS) in R. While the code is working fine for small matrices, it is incredible slow for larger matrices. I would appreciate any help in speeding up the process. I am using RRopen 8.01, therefore it is already running on multiple cores using MKL.
I am utilizing a binary matrix as implicit feedback matrix. Furthermore I implemented a weighting matrix.
## Matrix Factorization with Alternating Least Squares
## R is u * v binary matrix,
## W is u * v weighting matrix
## U is u * k user feature matrix,
## V is v * k item feature matrix
## u is the number of users,
## v is the number of items,
## k is the number of features
## iter is the number of iterations
Here is what I did:
# implicit feedback data matrix.
R <- matrix(nr=2, nc=5, data=rbinom(2*5,1, prob=.2))
W <- matrix(nr=2, nc=5, data=rbinom(2*5,7, prob=.2))
I set the following parameter:
k <- 20
its <- 10
Create the initial matrices for users and items
# initial users matrix.
U <- matrix(nr= nrow(R), nc=k, data=5 *rnorm(nrow(R)*k))
# initial items matrix.
V <- matrix(nr=k, nc=ncol(R), data=5* rnorm(ncol(R)*k))
And now I perform the Matrix Factorization with ALS
w.err <- NULL
for(iter in 1:its) {
# update users
for(i in 1:nrow(R)) {
U[i,] <- t(solve(V %*% (diag(R[i,])%*% t(V)) + 0.1 * diag(k),
as.vector(V %*% as.vector(t(W[i,])%*% diag(R[i,])))))
}
# update items
for(j in 1:ncol(R)){
V[,j] <- solve(t(U) %*% (diag(R[,j]) %*% U) + 0.1 * diag(k),
t(U) %*% (diag(R[,j]) %*% W[, j]))
}
R.hat <- U %*% V
w.err[iter] <- sum((R* (W-U%*%V))^2)
}
R.hat is the desired end matrix.
w.err is just a control for the errors over the iterations. Nice for plotting :)
The code as it is works fine. Just when I increase the number of rows and columns in R (and W), the performance decrease significantly. While it is fine for let's say nr=200, nr=500, it is already running for two hours for nr=2000, nr=5000(and not finished yet) on an 8 core 2.67 Ghz machine.
I didn't use the NMF or the NMFN package since negative values are possible, accordingly it is not an non-negative MF.
Does anyone has an idea how to increase performance? Maybe I am just stupid an my code is nonsense, i would be happy if you could point out improvements.
I looked for similar questions but couldn't find one. Maybe I just overlooked it.
I have a long vector x, and another v, which contains lengths. I would like to sum x so that the answer y is a vector of length length(v), and y[1] is sum(x[1:v[i]]), y[2] is sum(x[(1+v[1]):(v[1]+v[2])]), and so on. Essentially this is performing sparse matrix multiplication from a space of dimension length(x) to one of dimension length(v). However, I would prefer not to bring in "advanced machinery", although I might have to. It does need to be very, very fast. Can anyone think of anything simpler than using a sparse matrix package?
Example -
x <- c(1,1,3,4,5)
v <- c(2,3)
y <- myFunc(x,v)
y should be c(2,12)
I am open to any pre-processing - e.g, storing in v the starting indexes of each stretch.
y <- cumsum(x)[cumsum(v)]
y <- c(y[1], diff(y))
This looks like it's doing extra work because it's computing the cumsum for the whole vector, but it's actually faster than the other solutions so far, for both small and large numbers of groups.
Here's how I simulated the data
set.seed(5)
N <- 1e6
n <- 10
x <- round(runif(N,0,100),1)
v <- as.vector(table(sample(n, N, replace=TRUE)))
On my machine the timings with n <- 10 are:
Brandon Bertelsen (for loop): 0.017
Ramnath (rowsum): 0.057
John (split/apply): 0.280
Aaron (cumsum): 0.008
changing to n <- 1e5 the timings are:
Brandon Bertelsen (for loop): 2.181
Ramnath (rowsum): 0.226
John (split/apply): 0.852
Aaron (cumsum): 0.015
I suspect this is faster than doing matrix multiplication, even with a sparse matrix package, because one doesn't have to form the matrix or do any multiplication. If more speed is needed, I suspect it could be sped up by writing it in C; not hard to do with the inline and rcpp packages, but I'll leave that to you.
You can do this using rowsum. It should be reasonably fast as it uses C code in the background.
y <- rowsum(x, rep(1:length(v), v))
Here's a slightly different tack.
s <- rep(1:length(v), v)
l <- split(x, s)
y <- sapply(l, sum)
Try something like:
for (i in 1:length(v)) {
y[i] <- ifelse(i > 1,sum(x[v[i-1]:v[i]]), sum(x[1:v[i]]))
}