I tried using the eigen solver of the Eigen library in R to improve performance:
// [[Rcpp::export]]
MatrixXd Eigen4(const Map<MatrixXd> bM) {
SelfAdjointEigenSolver<MatrixXd> es(bM);
return(es.eigenvectors());
}
Yet, when comparing on a 2000x2000 matrix:
n <- 5e3
m <- 2e3
b <- crossprod(matrix(rnorm(n*m), n))
print(system.time(test <- Eigen4(b))) # 18 sec
print(system.time(test2 <- eigen(b, symmetric = TRUE))) # 8.5 sec
For the result of microbenchmark:
Unit: seconds
expr min lq mean median uq max neval
Eigen4(b) 18.614694 18.687407 19.136380 18.952063 19.292021 20.812116 10
eigen(b, symmetric = TRUE) 8.652628 8.663302 8.696543 8.676914 8.718517 8.831664 10
R is twice as fast as Eigen ?
I'm using latest versions of R and RcppEigen.
Am I doing something wrong ?
R's eigen is an interface to Fortran functions from LAPACK. Eigen uses its generic C++ code by default, although it can be configured to use external BLAS/LAPACK backends for certain dense matrix operations, including eigendecomposition. Depending on your architecture and compilers, R's default LAPACK may well be faster. If you configure both R and Eigen to use the same highly optimized platform-specific BLAS/LAPACK (e.g. MKL on Intel) you should get virtually identical (and better) results.
Related
I wonder if iterative solvers are a faster way to solve linear systems (non-sparse, symmetric, positive definite).
I tried conjugate gradient methods from the R packages Rlinsolve and cPCG, but both seem to be not very accurate and slower compared to the non-iterative base::solve().
library(Rlinsolve)
library(cPCG)
library(microbenchmark)
n <- 2000
A <- tcrossprod(matrix(rnorm(n^2),nrow=n) + diag(rep(10,n)))
x <- rnorm(n)
b <- A%*%x
mean(abs(x - solve(A,b)))
## [1] 1.158205e-08
mean(abs(x - lsolve.cg(A,b)$x))
## [1] 0.03836865
mean(abs(x - cgsolve(A,b)))
## [1] 0.02642611
mean(abs(x - pcgsolve(A,b)))
## [1] 0.02638013
microbenchmark(solve(A, b), lsolve.cg(A, b),
cgsolve(A, b), pcgsolve(A, b), times=5)
## Unit: milliseconds
## expr min lq mean median uq max neval cld
## solve(A, b) 183.3039 188.6678 189.7362 188.8665 189.8514 197.9914 5 a
## lsolve.cg(A, b) 7178.7477 7784.7646 7934.8406 8114.5838 8218.7356 8377.3714 5 d
## cgsolve(A, b) 1907.0940 2020.8368 2226.0513 2121.2917 2483.1947 2597.8393 5 b
## pcgsolve(A, b) 4059.5856 4109.0319 4203.4093 4242.7750 4275.9537 4329.7005 5 c
(R version 3.6.1 with OpenBLAS and 4 cores.)
Am I missing something? What is a typical use-case for such iterative methods?
What is a good R example for non-sparse linear systems showing the advantages of iterative solvers?
As a creator of Rlinsolve package, I kind of disagree with what this question implicitly argues. If you have dense matrix A, all the benefits of storing sparse matrix with tailored computation disappear at once. I've seen some decent usages of sparse solvers when covariance structure under Gaussian model is banded but such literature is extremely thin.
Please keep in mind that no single tool is designed to solve every problem. If you have symmetric, positive-definite matrix, cholesky or EVD-based solution is a great tool.
FYI, I've seen Rlinsolve to be used in a statistical computing paper that compares the performance of EM-based iterative solver, which is their creation, against methods delivered from my package. I believe that serves a good role somehow.
To accelerate my package, which include plenty of matrix calculation, i use Rcpp to
rewrite all the code. However, some functions are even slower than before. I use microbenchmark to analyze, and find the the matrix multiplication in Rcpp is slower.
Why this will happen?
And how to accelerate my package? Thanks a lot.
The Rcpp code is as follows:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix mmult(const NumericMatrix& a, const NumericMatrix& b){
if (a.ncol() != b.nrow()) stop ("Incompatible matrix dimensions");
NumericMatrix out(a.nrow(),b.ncol());
NumericVector rm1, cm2;
for (int i = 0; i < a.nrow(); ++i) {
rm1 = a(i,_);
for (int j = 0; j < b.ncol(); ++j) {
cm2 = b(_,j);
out(i,j) = std::inner_product(rm1.begin(), rm1.end(), cm2.begin(), 0.);
}
}
return out;}
The R code is as follows:
X = matrix(rnorm(10*10,1),10,10)
Y = matrix(rnorm(10*10,1),10,10)
microbenchmark(
mmult(X,Y),
X%*%Y)
The result is:
Unit: microseconds
expr min lq mean median uq max neval
mmult(X, Y) 45.720 48.9860 126.79228 50.385 51.785 6368.512 100
X %*% Y 5.599 8.8645 12.85787 9.798 10.730 153.486 100
This is the opposite but expected result from what was seen for matrix-vector multiplication. Here R is using BLAS to do all the heavy work, which might even work in parallel. You are throwing away all the optimized memory management done in the BLAS library by using your naive matrix multiplication.
Instead of trying to reinvent the low-level stuff like matrix multiplication, you could try to implement larger parts of your code using something like RcppArmadillo, which uses the same BLAS library as R but also (not only!) offers a convenient syntax on top of that.
I am trying to calculate local efficiency of a graph using shortest.paths of igraph package.
The local efficiency of a vertice v, by definition, is the "global efficiency" computed among all direct neighbors of v (Latora & Machiori, 2001).
I came up with the code below for global and local efficiency. However, the latter is including the target vertex in its calculation. And in the paper above they say the target vertex has to be taken out.
#Global Efficiency (average inverse shortest paths between all u--v vertices)
eff<-1/(shortest.paths(my.graph))
eff[!is.finite(eff)]<-0
gl.eff<-mean(eff,na.rm=TRUE)
#Mean local efficiency (global efficiency for each node)
gn<-graph.neighborhood(my.graph,1) #list with subgraphs of directly connected graphs
names(gn)<-colnames(my.corr.matrix)
local.eff<-numeric(length(gn))
for (i in 1:length(gn)){
gn[[i]]<-gn[[i]] - vertex(V(gn[[i]])[grep(names(gn[i]),V(gn[[i]]))]) #doesn't match
eff.gn<-1/(shortest.paths(gn[[i]]))
eff.gn[!is.finite(gleff.gn)]<-0
eff.gn<-mean(eff.gn,na.rm=TRUE)
local.eff[i]<-gleff.gn
mean.local.eff<-mean(local.eff, na.rm=TRUE)
}
I am trying to match the list name (each element of the list is a subgraph) with the name of the vertex inside that subgraph. I am trying to use 'grep()', but haven't been able to get it right. Could someone give me a hand on that?
Thanks in advance,
I have already written a function to do this that is many times faster than what you've written. See if the following will suit your needs. For smaller graphs (or if you are using Windows), you will possibly want to replace simplify2array(mclapply(nodes, with sapply(nodes,, and then of course remove the argument mc.cores=detectCores(). However this really helps performance on large graphs.
You can see the code at the following link:
Local efficiency code
EDIT: Included some benchmark info (where the function f is yours, and g is what I pasted above). This was done on a laptop with 4 cores #2.10 GHz (Intel i3-2310m).
g.rand <- sample_gnp(100, .1)
V(g.rand)$degree <- degree(g.rand)
compare <- microbenchmark(f(g.rand), g(g.rand), times=1e2)
compare
Unit: milliseconds
expr min lq mean median uq max neval cld
f(g.rand) 476.9853 4097.2202 4544.720 4539.911 4895.020 9346.873 100 b
g(g.rand) 299.2696 329.6629 1319.377 1114.054 2314.304 3003.966 100 a
In case that someone needs the local efficiency in python, here is my code for that:
Python version
import numpy as np
from igraph import *
np.seterr(divide='ignore')
def nodal_eff(g):
"""
This function calculates the nodal efficiency of a weighted graph object.
Created by: Loukas Serafeim (seralouk), Nov 2017
Args:
g: A igraph Graph() object.
Returns:
The nodal efficiency of each node of the graph
"""
sp = g.shortest_paths_dijkstra(weights = g.es["weight"])
sp = np.asarray(sp)
temp = 1.0 / sp
np.fill_diagonal(temp, 0)
N = temp.shape[0]
ne = ( 1.0 / (N - 1)) * np.apply_along_axis(sum, 0, temp)
return ne
I know that my title doesn't fully account for what my question really was. So I'm answering it myself since I just got it to work.
#Mean local efficiency (global efficiency for each node)
gn<-graph.neighborhood(my.graph,1) #list with subgraphs of directly connected graphs
names(gn)<-V(my.graph)$name
local.eff<-numeric(length(gn))
for (i in 1:length(gn)){
gn[[i]]<-gn[[i]] - vertex(V(gn[[i]])[match(names(gn[i]),V(gn[[i]])$name)]) #MATCHES
eff.gn<-1/(shortest.paths(gn[[i]]))
eff.gn[!is.finite(eff.gn)]<-0
eff.gn<-mean(eff.gn,na.rm=TRUE)
local.eff[i]<-eff.gn
}
local.eff[local.eff %in% NaN]<-NA
mean.local.eff<-mean(local.eff, na.rm=TRUE)
I have written what I believe to be a semi-quick ols-regression function
ols32 <- function (y, x,Ridge=1.1) {
xrd<-crossprod(x)
xry<-crossprod(x, y)
diag(xrd)<-Ridge*diag(xrd)
solve(xrd,xry)
}
Now I want to apply this to the following
(vapply(1:la, function(J)
ME %*% ols32((nza[,J]),(cbind(nzaf1[,J],nzaf2[,J],nza[,-J],MOMF)))
[(la+2):(2*la+1)],FUN.VALUE=0))
Where nza,nzaf1,nzaf2 and MOMF are 500x50 matrixes and la=50 and ME is a vector of length 50.
So what I actually do is I do a regression but only use the beta-coefficients from MOMF which I multiply by ME.
nza.mat<-matrix(rnorm(500*200),ncol=200)
nza<-nza.mat[,1:50]
nzaf2<-nza.mat[,101:150]
MOMF<-nza.mat[,151:200]
nzaf1<-nza.mat[,51:100]
ME<-nza.mat[500,151:200]
Is there an imediate way of making things faster or do I need to use someting like RcppEigen?
Tks P
So I came up with a slightly faster way of solving this by rewriting my ols-function so that it calculates the two crossproducts only once for a whole matrix. The new function looks like this:
ols.quick <- function (y, x, ME) {
la<-dim(y)[2]
XX.cross<-crossprod(x)
XY.cross<-crossprod(x, y)
diag(XX.cross)<-Ridge*diag(XX.cross)
betas<-sapply(1:la, function(J){
idx<-c((1:la)[-J],la+J,2*la+J,(3*la+1):(4*la));
solve(XX.cross[idx,idx],XY.cross[idx,J])},simplify=T)
ME%*%betas[(la+2):(2*la+1),]
}
where
y=nza (500x50) and x=cbind(nza,nzaf1,nzaf2,MOMF) (500x200)
This solves the problem about 3.5 times faster.
microbenchmark(ols.quick(nza,nza.mat,ME),
vapply(1:la, function(J) ME%*%ols32(nza[,J],(cbind(nzaf1[,J],nzaf2[,J],nza[,-J],MOMF)))
[(la+2): (lb+2)],FUN.VALUE=0))
min lq median uq max neval
66.30495 67.71903 68.57001 70.17742 77.80069 100
251.59395 255.43306 258.35041 262.85742 296.51313 100
I suppose I could gain some speed with parLapply from the parallel package but I havet looked into that yet.
I have the following code in R
library(mvtnorm)
m = matrix(rnorm(2000000),nrow=200)
A = matrix(rnorm(40000),ncol=200)
A = A%*%t(A)
C = array(A,c(200,200,10000))
B = 10000
S = 100
postpred = array(NA,c(200,S,B))
for(i in 1:B){
postpred[,,i] = t(rmvnorm(S,m[,i],C[,,i],method="svd"))
}
but this code is extremely slow because I have to loop 10,000 times while also simulating from the multivariate normal 100 times and m and C can be very large as well. So what I would like to do is be able to calculate postpred outside of a loop. I have tried using the apply function but to no avail. Any help or suggestions greatly appreciated.
Others have pointed out that apply (and similar functions) won't help you much in your case, and they are right.
For what it is worth, I checked whether your would have a gain of performance by compiling your code. Here is a little benchmark that I made with your problem (I reduced the size of the matrices, because otherwise I cannot run them):
library(mvtnorm)
func = function()
{
m = matrix(rnorm(200000),nrow=100)
A = matrix(rnorm(10000),ncol=100)
A = A%*%t(A)
C = array(A,c(100,100,1000))
B = 1000
S = 10
postpred = array(NA,c(1000,S,B))
for(i in 1:B){
postpred[,,i] = t(rmvnorm(S,m[,i],C[,,i],method="svd"))
}
}
require(compiler)
func_compiled <- cmpfun(func)
require(microbenchmark)
microbenchmark(func_compiled(), func(), times=10) # grab a coffee, this takes some time
The results show that compiling won't give you any advantage:
Unit: seconds
expr min lq median uq max neval
slow_func_compiled() 9.938632 10.12269 10.18237 10.48215 15.43299 10
slow_func() 9.969320 10.07676 10.21916 15.44664 15.66109 10
(this could have been expected, as the library mvtnorm should be already compiled)
Overall, you have only two ways left to optimize your code in R:
use smaller numbers (if acceptable)
parallelize your code
As Josillber says, vectorisation (apply family of functions) ain't going to do much for you, it really is a bit of an R myth that it gives significant speed improvements.
Suggest you look at parallel options, there are parallel mcapply and snow packages. Read more here http://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf