R bindings for SLEPC / PETSC? - r

Anyone have much luck within bindings for R and SLEPC? Looking for a faster SVD and eigenvalue algorithm in R.
update: I'm generally interested in both scenarios(speed vs scale)

I don't follow:
SLEPC is for scalable ie very large problems
You ask for faster SVDs
If you are after speed, I'd consider
fast implementations, either something like RcppEigen or,
accelerated BLAS such as OpenBLAS which can be a drop-in for R, if R was built the right way.
But if you are really after large problems, maybe the new pbd-r.org project which wraps ScaLAPACK can be of help.
So please let us know if you're after speed or scale.

Related

When to use eigen and when to use Blas

I did some basic reading on Eigen and Blas. Both library have support for matrix matrix, matrix vector multiplication. I don't understand which one should I use in which case? To me it seems, both have almost same performance. It would be nice if someone could give me some resource or just tell me what are the advantages one library have over another? Or how does these two differer in case of matrix and vector manipulation? Thanks in advance.
Use Eigen, it's more complete and much easier to use. Then, if you wonder if another fully optimized BLAS implementation could give you higher performance, then just recompile your code with -DEIGEN_USE_BLAS and link to your favorite blas and see by yourself.
Also, when using Eigen, don't forget to enable compiler optimizations, e.g. -O3 and the instruction-sets your hardware supports, e.g., -mavx -mfma when using latest Eigen.
So the answer to this question is here.
http://eigen.tuxfamily.org/index.php?title=FAQ#How_does_Eigen_compare_to_BLAS.2FLAPACK.3F
More or less, I use Eigen mostly, because it has a comforable interface. If you need speed and multicore parallelism or have only little but time-consuming linear algebra stuff in your code, go for GotoBlas2. Usually it is fastest on Intel machines.

Efficient and fast way of solving linear programming problems in R

I am working with a very large dataset, typically dealing with a few millions of combinations.
I want to solve the assignment problem.(maximise the sum)
I had tried solving it on a small test set using adagio::assignment, clue::solve_LSAP
I wasnt able to successfully install the "lpSolve" package on my system, threw some segmentation fault
Wanted to know which of these is faster or any other method which does it faster.
Thanks....
An LP formulation is not a good way to solve the assignment problem, whichever library you use. You have to use the Hungarian algorithm, and it looks like solve_LSAP does exactly that.
No need to try anything else IMHO.
EDIT: An efficient implementation of the Hungarian method should be O(n^3), which is extremely fast for any optimization algorithm. If solve_LSAP is not fast enough for your problem (assumed it is implemented correctly), it is very unlikely that any exact method will work.
You will have to use some sort of heuristic to approximate the solution.

Advice about inversion of large sparse matrices

Just got a Windows box set up with two 64 bit Intel Xeon X5680 3.33 GHz processors (6 cores each) and 12 GB of RAM. I've been using SAS on some large data sets, but it's just too slow, so I want to set up R to do parallel processing. I want to be able to carry out matrix operations, e.g., multiplication and inversion. Most of my data are not huge, 3-4 GB range, but one file is around 50 GB. It's been a while since I used R, so I looked around on the web, including the CRAN HPC, to see what was available. I think a foreach loop and the bigmemory package will be applicable. I came across this post: Is there a package for parallel matrix inversion in R that had some interesting suggestions. I was wondering if anyone has experience with the HIPLAR packages. Looks like hiparlm adds functionality to the matrix package and hiplarb add new functions altogether. Which of these would be recommended for my application? Furthermore, there is a reference to the PLASMA library. Is this of any help? My matrices have a lot of zeros, so I think they could be considered sparse. I didn't see any examples of how to pass data fro R to PLASMA, and looking at the PLASMA docs, it says it does not support sparse matrices, so I'm thinking that I don't need this library. Am I on the right track here? Any suggestions on other approaches?
EDIT: It looks like HIPLAR and package pbdr will not be helpful. I'm leaning more toward bigmemory, although it looks like I/O may be a problem: http://files.meetup.com/1781511/bigmemoryRandLinearAlgebra_BryanLewis.pdf. This article talks about a package vam for virtual associative matrices, but it must be proprietary. Would package ff be of any help here? My R skills are just not current enough to know what direction to pursue. Pretty sure I can read this using bigmemory, but not sure the processing will be very fast.
If you want to use HiPLAR (MAGMA and PLASMA libraries in R), it is only available for Linux at the moment. For this and many other things, I suggest switching your OS to the penguin.
That being said, Intel MKL optimization can do wonders for these sort of operations. For most practical uses, it is the way to go. Python built with MKL optimization for example can process large matrices about 20x faster than IDL, which was designed specifically for image processing. R has similarly shown vast improvements when built with MKL optimization. You can also install R Open from Revolution Analytics, which includes MKL optimization, but I am not sure that it has quite the same effect as building it yourself using Intel tools: https://software.intel.com/en-us/articles/build-r-301-with-intel-c-compiler-and-intel-mkl-on-linux
I would definitely consider the type of operations one is looking to perform. GPU processes are those that lend well to high parallelism (many of the same little computations running at once, as with matrix algebra), but they are limited by bus speeds. Intel MKL optimization is similar in that it can help use all of your CPU cores, but it is really optimized to Intel CPU architecture. Hence, it should provide basic memory optimization too. I think that is the simplest route. HiPLAR is certainly the future, as it is CPU-GPU by design, especially with highly parallel heterogeneous architectures making their way into consumer systems. Most consumer systems today cannot fully utilize this though I think.
Cheers,
Adam

using GPU in optimization in R (OpenCL)?

based on an idea from another thread, I was hoping you could help me out with this idea / push me in the right direction.
I have seen an example of OpenCL, which didn't look too complicated for basic calculations, so I hope to just rewrite the function for numerical gradient the optimization routine uses in the OpenCL language, and squeeze it in the optimizer function, so everytime I would optimize some function, it would do the independent calculations in the GPU.
Idea: Use gpu for calculation of functionals and gradients during the optimizations (e.g. nlminb()
Problems:
1, How to tap the optimization routine? (I can't seem to locate the C file of which does the optimisation)
2,Can I just replace the gradient calculation with what I prepare for GPU?
3,Does anyone got something similar to work? Any ideas, notes?
Thank you and have a nice day!
PS: If you think it wouldn't speed up the optim, it's hard to code / hard to do, etc. please let me know! I'm very inexperienced and lousy "programmer".
You could compile R linked to one of the OpenCL optimized BLAS libraries. But based on the attempts to speed up R with other BLAS libraries the results might be limited to special cases. Yours may be one of them though.

General sparse iterative solver libraries

What are some of the better libraries for large sparse iterative (conjugate gradient, MINRES, GMRES, etc.) linear algebra system solving? I've often coded my own routines, but I'm interested to know which "off-the-shelf" packages people prefer. I've heard of PETSc, TAUCS, IML++, and a few others. I'm wondering how these stack up, and what else is out there. My preference is for ease of use, and freely available software.
Victor Eijkhout's Overview of Iterative Linear System Solver Packages would probably be a good place to start.
You may also wish to look at Trilinos
http://trilinos.sandia.gov/
It is designed by some great software craftsman, using modern
design techniques.
Moreover, from within Trilinos, you can call PetsC if you desire.
NIST has some sparse Linear Algebra software you can download
here: http://math.nist.gov/sparselib++/ and here: http://math.nist.gov/spblas/
I haven't used those packages myself, but I've heard good things about them.
http://www.cise.ufl.edu/research/sparse/umfpack/
UMFPACK is a set of routines for
solving unsymmetric sparse linear
systems, Ax=b, using the Unsymmetric
MultiFrontal method. Written in
ANSI/ISO C, with a MATLAB (Version 6.0
and later) interface. Appears as a
built-in routine (for lu, backslash,
and forward slash) in MATLAB. Includes
a MATLAB interface, a C-callable
interface, and a Fortran-callable
interface. Note that "UMFPACK" is
pronounced in two syllables, "Umph
Pack". It is not "You Em Ef Pack".
I'm using it for FEM code.
I would check out Microsoft's Solver Foundation. It's free to cheap for even pretty big problems. The unlimited version is industrial strength and is based on Gurobi and of course isn't cheap.
http://code.msdn.microsoft.com/solverfoundation

Resources