Fastest way in R to compute the inverse for large matrices - r

I need to compute a hat matrix (as from linear regression). Standard R code would be:
H <- tcrossprod(tcrossprod(X, solve(crossprod(X))), X)
with X being a relatively large matrix (i.e 1e5*100), and this line has to run thousands of times. I understand the most limiting part is the inverse computation, but the crossproducts may be time-consuming too. Is there any faster alternative to perform these matrix operations? I tried Rcpp and reviewed several posts but any alternative I tested was slower. Maybe I did not code properly my C++ code, as I am not an advanced C++ programmer.
Thanks!

Chasing the code for this line by line is a little difficult because the setup of R code is a little on the complicated side. But read on, pointers below.
The important part is that the topic has been discussed many times: what happens is that R dispatches this to the BLAS (Basic Linear Algebra Subprogram) and LAPACK (Linear Algebra PACKage) libraries. Which contain the most efficient code known to man for this. In general, you cannot gain on it by rewriting.
One can gain performance differences by switching one BLAS/LAPACK implementation for another---there are many, many posts on this online too. R itself comes with the so-called 'reference BLAS' known to be correct, but slowest. You can switch to Atlas, OpenBLAS, MKL, ... depending on your operating system; instructions on how to do so are in some of the R manuals that come with your installation.
For completeness, per file src/main/names.c the commands %*%, crossprod and tcrossprod all refer to do_matprod. This is in file src/main/array.c and does much argument checking and arranging and branching on types of arguments but e.g. one path then calls
F77_CALL(dsyrk)(uplo, trans, &nc, &nr, &one, x, &nr, &zero, z, &nc
FCONE FCONE);
which is this LAPACK function. It is essentially the same for all others making this an unlikely venue for your optimisation.

Related

Converting cosine distance function in R to Rcpp

I've been developing an R package for single cell RNA-seq analysis, and one of the functions I used repeatedly calculates the cosine dissimilarity matrix for a given matrix of m cells by n genes. The function I wrote is as follows:
CosineDist <- function(input = NULL) {
if (is.null(input)) { stop("You forgot to provide an input matrix") }
dist_mat <- as.dist(1 - input %*% t(input) / (sqrt(rowSums(input^2) %*% t(rowSums(input^2)))))
return(dist_mat)
}
This code works fine for smaller datasets, but when I run it on anything over 20,000 rows it takes forever and then crashes my R session due to memory issues. I believe that porting this to Rcpp would make it both faster and more memory efficient (I know this is a bit of a naive belief, but my knowledge of C++ in general is limited). Finally, the output of the function, though it does not have to be a distance matrix object when returned, does need to be able to be converted to that format after its generation.
How should I got about converting this function to Rcpp and then calling it as I would any of the other functions in my package? Alternatively, is this the best way to go about solving the speed / memory problem?
Hard to help you, since as the comments pointed out you are basically searching for an Rcpp intro.
I'll try to give you some hints, which I already mentioned partly in the comments.
In general using C/C++ can provide a great speedup (dependent on the task of course). But I've reached for (loop intensive, not optimized code) 100x+ speedups.
Since adding C++ can be complicated and sometimes cause problems, before you go this way check the following:
1. Is your R code optimized?
You can make lot of bad choices here (e.g. loops are slow in R). Just by optimizing your R code speedups of 10x or much more can often be easily reached.
2. Are there better implementations in other packages?
Especially if it is helper functions or common functionalities, often other packages have these already implemented. Benchmark different existing solutions with the 'microbenchmark' package. It is easier to just use an optimized function from another R package then doing everything on your own. (maybe the other package implementations are already in C++). I mostly try to look for mainstream and popular packages (since these are better tested and they are unlikely to suddenly drop from CRAN).
3. Profile your code
Take a look what parts exactly cause the speed / memory problems. Might be that you can keep parts in R and only create a function for the critical parts in C++. Or you find another package that has a R function that is implemented in C for exactly this critical part.
In the end I'd say, I prefer using Rcpp/C++ over C code. Think this is the easier way to go. For the Rcpp learning part you have to go with a dedicated tutorial (and not a SO question).

Lapack Orthonormalization Function for Rectangular Matrix

I was wondering if there was a function in Lapack for orthonormalizing the columns of a very tall and skinny matrix. A similar previous question asked this question, presumably in the context of a square matrix. My setting is as follows: I have an M by N matrix A that I am trying to orthonormalize the columns of.
So, my first thought was to do a qr decomposition. The functions for doing a qr decomposition in Lapack seem to be dgeqrf and dormqr. Great. However, my problem is as follows: my matrix A is so tall, that I don't want to actually compute all of Q, because it is M by M. In fact, I can't afford to instantiate an M by M matrix at all during any of my computation (it would not fit in memory). I would rather compute just the matrix that wikipedia calls Q1. However, I can't seem to find a way to make this work.
The weird thing is, that I think it is possible. Numpy, in particular, has a function numpy.linalg.qr that appears to do just this. However, even after reading their source code, I can't figure out how they are using lapack calls to get this to work.
Do folks have ideas? I would strongly prefer this to only use lapack functions because I am hoping to port this code to CuSOLVE, which has implemented several lapack functions (including dgeqrf and dormqr) for the GPU.
You want the "thin" or "economy size" version of QR. In matlab, you can do this with:
[Q,R] = qr(A,0);
I haven't used Lapack directly, but I would imagine there's a corresponding call there. It appears that you can do this in python with:
numpy.linalg.qr(a, mode='reduced')

Fast, portable, c++ math library for matrix and vector manipulations

I have my own game engine which is written in opengl and c++. I also have my own math library for matrix and vectors manipulations. I always had doubts about the performance of my math library, so I recently decided to search for some popular math library which is used by many game / graphics developers. I was surprised that I couldn't find anything.
People on stackoverflow suggested GLM and Eigen libraries in similar posts, so I made some performance tests. I multiplied 1000000 times two matrices 4x4, and here are results:
GLM: 4.23 seconds
Eigen: 12.57 seconds
My library: 0.25 seconds
I was surprised by these results, because my implementation of multiplying matrices is taken from wikipedia. I checked the code from glm and eigen and I found, that there is a lot of typedefs, assertions and other type checking, unnecessary code, which decrease performance a lot.
So, my question is:
Do you know any FAST math library with nice API for gamedev / graphics purpose? I need functionality like: creating translation, rotation, projection, matrix * matrix, inverse, look at, matrix * vector, quaternions, etc...
I checked the code from glm and eigen and I found, that there is a lot of typedefs, assertions and other type checking, unnecessary code, which decrease performance a lot.
Are you sure that you have done all this benchmarks using higher compiler optimization ON ?
And not for example using Debug settings ?
Another alternative would be also MathFu from Google.
http://google.github.io/mathfu/

LIM Package in R: Faster read of input

For solving linear inverse models in R there's an excellent package called LIM (http://cran.r-project.org/web/packages/LIM/).
The model problem is formulated in text files in a way that is natural and comprehensible. Functions in LIM then converts this input into the required linear equality and inequality conditions, which can be solved either by least squares or by linear programming techniques.
I have a text File with approx. 6000 lines (simple list of equalities, inequalities, components, parameters), which describes the linear inverse model.
I make it available to R for processing by following 2 lines
liminput <- Read(File)
lim <- Setup(liminput)
Problem:
The 2 lines need around 5 minutes to run.
The first line Read command accounts for almost 100% of these 5 minutes.
Question:
Is there a way to make it faster?
I don't think there's going to be a very easy answer to this; you will probably need to find some way to re-write the Read() function for better speed (but see one possibility below). Looking at the Read() function in detail (in case you didn't know, you can print the source code by typing Read), it is essentially reading in lines and parsing them in R code. Most of these operations will probably be hard to vectorize, and moderately difficult to re-write in Rcpp/C++ ...
Noam Ross has written a very accessible guide to speeding up R code (one of the first recommendations is "get a better computer"). There is really only one "low-hanging fruit" suggestion that might work without digging into the code yourself, which is to use R's byte compiler:
library(compiler)
Read.comp <- cmpfun(Read)
Read.comp(File) ## **maybe** faster than Read(File) ...

CVX-esque convex optimization in R?

I need to solve (many times, for lots of data, alongside a bunch of other things) what I think boils down to a second order cone program. It can be succinctly expressed in CVX something like this:
cvx_begin
variable X(2000);
expression MX(2000);
MX = M * X;
minimize( norm(A * X - b) + gamma * norm(MX, 1) )
subject to
X >= 0
MX((1:500) * 4 - 3) == MX((1:500) * 4 - 2)
MX((1:500) * 4 - 1) == MX((1:500) * 4)
cvx_end
The data lengths and equality constraint patterns shown are just arbitrary values from some test data, but the general form will be much the same, with two objective terms -- one minimizing error, the other encouraging sparsity -- and a large number of equality constraints on the elements of a transformed version of the optimization variable (itself constrained to be non-negative).
This seems to work pretty nicely, much better than my previous approach, which fudges the constraints something rotten. The trouble is that everything else around this is happening in R, and it would be quite a nuisance to have to port it over to Matlab. So is doing this in R viable, and if so how?
This really boils down to two separate questions:
1) Are there any good R resources for this? As far as I can tell from the CRAN task page, the SOCP package options are CLSCOP and DWD, which includes an SOCP solver as an adjunct to its classifier. Both have similar but fairly opaque interfaces and are a bit thin on documentation and examples, which brings us to:
2) What's the best way of representing the above problem in the constraint block format used by these packages? The CVX syntax above hides a lot of tedious mucking about with extra variables and such, and I can just see myself spending weeks trying to get this right, so any tips or pointers to nudge me in the right direction would be very welcome...
You might find the R package CVXfromR useful. This lets you pass an optimization problem to CVX from R and returns the solution to R.
OK, so the short answer to this question is: there's really no very satisfactory way to handle this in R. I have ended up doing the relevant parts in Matlab with some awkward fudging between the two systems, and will probably migrate everything to Matlab eventually. (My current approach predates the answer posted by user2439686. In practice my problem would be equally awkward using CVXfromR, but it does look like a useful package in general, so I'm going to accept that answer.)
R resources for this are pretty thin on the ground, but the blog post by Vincent Zoonekynd that he mentioned in the comments is definitely worth reading.
The SOCP solver contained within the R package DWD is ported from the Matlab solver SDPT3 (minus the SDP parts), so the programmatic interface is basically the same. However, at least in my tests, it runs a lot slower and pretty much falls over on problems with a few thousand vars+constraints, whereas SDPT3 solves them in a few seconds. (I haven't done a completely fair comparison on this, because CVX does some nifty transformations on the problem to make it more efficient, while in R I'm using a pretty naive definition, but still.)
Another possible alternative, especially if you're eligible for an academic license, is to use the commercial Mosek solver, which has an R interface package Rmosek. I have yet to try this, but may give it a go at some point.
(As an aside, the other solver bundled with CVX, SeDuMi, fails completely on the same problem; the CVX authors aren't kidding when they suggest trying multiple solvers. Also, in a significant subset of cases, SDTP3 has to switch from Cholesky to LU decomposition, which makes the processing orders of magnitude slower, with only very marginal improvement in the objective compared to the pre-LU steps. I've found it worth reducing the requested precision to avoid this, but YMMV.)
There is a new alternative: CVXR, which comes from the same people.
There is a website, a paper and a github project.
Disciplined Convex Programming seems to be growing in popularity observing cvxpy (Python) and Convex.jl (Julia), again, backed by the same people.

Resources