BBOptim equivalent or alternative in Scala - r

I'm planning on converting some R code to Scala and came to a function called BBOptim. It seems to be a wrapper over SPG but having little knowledge about mathematics, I don't know what the equivalent code would be in Scala.
For example, is it possible to convert the code below to Scala? Or can there be an alternative for this? I'm suspecting the SpectralProjectedGradient or NonLinearMinimizer in the Breeze library could be used.
# Use a preset seed so test values are reproducable.
require("setRNG")
old.seed <- setRNG(list(kind="Mersenne-Twister", normal.kind="Inversion",
seed=1234))
rosbkext <- function(x){
# Extended Rosenbrock function
n <- length(x)
j <- 2 * (1:(n/2))
jm1 <- j - 1
sum(100 * (x[j] - x[jm1]^2)^2 + (1 - x[jm1])^2)
}
p0 <- rnorm(500)
BBoptim(par=p0, fn=rosbkext)
Thanks in advance.
Edited
I'm restricted on only using the JVM so calling R from Scala is no possible.

As you mentioned, this R library is a wrapper for spg, which is the Spectral Projected Gradient algorithm.
The SPG implementation by the TANGO project for nonlinear gradient optimization is written in Fortran 77 with interfaces to a number of languages, including Java. All Java libraries work with Scala and thus this should be a suitable solution. This implementation is the focus of several academic articles on Spectral Projected Gradients.
You might also check out the Java API for the Open Optimization Library.

Your safest bet if you're limited to the JVM, though not the most idiomatic Scala, is to pick a suitable optimization routine from Apache Commons Math. The exact one you're using isn't there, but the general solvers work pretty well on most classes of problems I've thrown at them. You might have to try several different classes of solution to get a decent box contstraint--make a sharp penalty near where your walls are, then restrict the search space to those surfaces if you're near a wall.
Otherwise, it's fairly easy to call R from Scala using rScala. That way you don't have to change the algorithm at all. (But you do have to have R installed.)

Related

Converting cosine distance function in R to Rcpp

I've been developing an R package for single cell RNA-seq analysis, and one of the functions I used repeatedly calculates the cosine dissimilarity matrix for a given matrix of m cells by n genes. The function I wrote is as follows:
CosineDist <- function(input = NULL) {
if (is.null(input)) { stop("You forgot to provide an input matrix") }
dist_mat <- as.dist(1 - input %*% t(input) / (sqrt(rowSums(input^2) %*% t(rowSums(input^2)))))
return(dist_mat)
}
This code works fine for smaller datasets, but when I run it on anything over 20,000 rows it takes forever and then crashes my R session due to memory issues. I believe that porting this to Rcpp would make it both faster and more memory efficient (I know this is a bit of a naive belief, but my knowledge of C++ in general is limited). Finally, the output of the function, though it does not have to be a distance matrix object when returned, does need to be able to be converted to that format after its generation.
How should I got about converting this function to Rcpp and then calling it as I would any of the other functions in my package? Alternatively, is this the best way to go about solving the speed / memory problem?
Hard to help you, since as the comments pointed out you are basically searching for an Rcpp intro.
I'll try to give you some hints, which I already mentioned partly in the comments.
In general using C/C++ can provide a great speedup (dependent on the task of course). But I've reached for (loop intensive, not optimized code) 100x+ speedups.
Since adding C++ can be complicated and sometimes cause problems, before you go this way check the following:
1. Is your R code optimized?
You can make lot of bad choices here (e.g. loops are slow in R). Just by optimizing your R code speedups of 10x or much more can often be easily reached.
2. Are there better implementations in other packages?
Especially if it is helper functions or common functionalities, often other packages have these already implemented. Benchmark different existing solutions with the 'microbenchmark' package. It is easier to just use an optimized function from another R package then doing everything on your own. (maybe the other package implementations are already in C++). I mostly try to look for mainstream and popular packages (since these are better tested and they are unlikely to suddenly drop from CRAN).
3. Profile your code
Take a look what parts exactly cause the speed / memory problems. Might be that you can keep parts in R and only create a function for the critical parts in C++. Or you find another package that has a R function that is implemented in C for exactly this critical part.
In the end I'd say, I prefer using Rcpp/C++ over C code. Think this is the easier way to go. For the Rcpp learning part you have to go with a dedicated tutorial (and not a SO question).

Fast, portable, c++ math library for matrix and vector manipulations

I have my own game engine which is written in opengl and c++. I also have my own math library for matrix and vectors manipulations. I always had doubts about the performance of my math library, so I recently decided to search for some popular math library which is used by many game / graphics developers. I was surprised that I couldn't find anything.
People on stackoverflow suggested GLM and Eigen libraries in similar posts, so I made some performance tests. I multiplied 1000000 times two matrices 4x4, and here are results:
GLM: 4.23 seconds
Eigen: 12.57 seconds
My library: 0.25 seconds
I was surprised by these results, because my implementation of multiplying matrices is taken from wikipedia. I checked the code from glm and eigen and I found, that there is a lot of typedefs, assertions and other type checking, unnecessary code, which decrease performance a lot.
So, my question is:
Do you know any FAST math library with nice API for gamedev / graphics purpose? I need functionality like: creating translation, rotation, projection, matrix * matrix, inverse, look at, matrix * vector, quaternions, etc...
I checked the code from glm and eigen and I found, that there is a lot of typedefs, assertions and other type checking, unnecessary code, which decrease performance a lot.
Are you sure that you have done all this benchmarks using higher compiler optimization ON ?
And not for example using Debug settings ?
Another alternative would be also MathFu from Google.
http://google.github.io/mathfu/

CVX-esque convex optimization in R?

I need to solve (many times, for lots of data, alongside a bunch of other things) what I think boils down to a second order cone program. It can be succinctly expressed in CVX something like this:
cvx_begin
variable X(2000);
expression MX(2000);
MX = M * X;
minimize( norm(A * X - b) + gamma * norm(MX, 1) )
subject to
X >= 0
MX((1:500) * 4 - 3) == MX((1:500) * 4 - 2)
MX((1:500) * 4 - 1) == MX((1:500) * 4)
cvx_end
The data lengths and equality constraint patterns shown are just arbitrary values from some test data, but the general form will be much the same, with two objective terms -- one minimizing error, the other encouraging sparsity -- and a large number of equality constraints on the elements of a transformed version of the optimization variable (itself constrained to be non-negative).
This seems to work pretty nicely, much better than my previous approach, which fudges the constraints something rotten. The trouble is that everything else around this is happening in R, and it would be quite a nuisance to have to port it over to Matlab. So is doing this in R viable, and if so how?
This really boils down to two separate questions:
1) Are there any good R resources for this? As far as I can tell from the CRAN task page, the SOCP package options are CLSCOP and DWD, which includes an SOCP solver as an adjunct to its classifier. Both have similar but fairly opaque interfaces and are a bit thin on documentation and examples, which brings us to:
2) What's the best way of representing the above problem in the constraint block format used by these packages? The CVX syntax above hides a lot of tedious mucking about with extra variables and such, and I can just see myself spending weeks trying to get this right, so any tips or pointers to nudge me in the right direction would be very welcome...
You might find the R package CVXfromR useful. This lets you pass an optimization problem to CVX from R and returns the solution to R.
OK, so the short answer to this question is: there's really no very satisfactory way to handle this in R. I have ended up doing the relevant parts in Matlab with some awkward fudging between the two systems, and will probably migrate everything to Matlab eventually. (My current approach predates the answer posted by user2439686. In practice my problem would be equally awkward using CVXfromR, but it does look like a useful package in general, so I'm going to accept that answer.)
R resources for this are pretty thin on the ground, but the blog post by Vincent Zoonekynd that he mentioned in the comments is definitely worth reading.
The SOCP solver contained within the R package DWD is ported from the Matlab solver SDPT3 (minus the SDP parts), so the programmatic interface is basically the same. However, at least in my tests, it runs a lot slower and pretty much falls over on problems with a few thousand vars+constraints, whereas SDPT3 solves them in a few seconds. (I haven't done a completely fair comparison on this, because CVX does some nifty transformations on the problem to make it more efficient, while in R I'm using a pretty naive definition, but still.)
Another possible alternative, especially if you're eligible for an academic license, is to use the commercial Mosek solver, which has an R interface package Rmosek. I have yet to try this, but may give it a go at some point.
(As an aside, the other solver bundled with CVX, SeDuMi, fails completely on the same problem; the CVX authors aren't kidding when they suggest trying multiple solvers. Also, in a significant subset of cases, SDTP3 has to switch from Cholesky to LU decomposition, which makes the processing orders of magnitude slower, with only very marginal improvement in the objective compared to the pre-LU steps. I've found it worth reducing the requested precision to avoid this, but YMMV.)
There is a new alternative: CVXR, which comes from the same people.
There is a website, a paper and a github project.
Disciplined Convex Programming seems to be growing in popularity observing cvxpy (Python) and Convex.jl (Julia), again, backed by the same people.

Changing the LAPACK implementation used by IDL linear algebra routines?

Over at http://scicomp.stackexchange.com I asked this question about parallel matrix algorithms in IDL. The answers suggest using a multi-threaded LAPACK implementation and suggest some hacks to get IDL to use a specific LAPACK library. I haven't been able to get this to work.
I would ideally like the existing LAPACK DLM to simply be able to use a multi-threaded LAPACK library and it feels like this should be possible but I have not had any success. Alternatively I guess the next simplest step would be to create a new DLM to wrap a matrix inversion call in some C code and ensure this DLM points to the desired implementation. The documentation for creating DLMs is making me cross-eyed though, so any pointers to doing this (if it is required) would also be appreciated.
What platform are you targeting?
Looking at idl_lapack.so with nm on my platform (Mac OS X, IDL 8.2.1) seems to indicate that the LAPACK routines are directly in the .so, so my (albeit limited) understanding is that it would not be simple to swap out (i.e., by setting LD_LIBRARY_PATH).
$ nm idl_lapack.so
...
000000000023d5bb t _dgemm_
000000000023dfcb t _dgemv_
000000000009d9be t _dgeqp3_
000000000009e204 t _dgeqr2_
000000000009e41d t _dgeqrf_
000000000023e714 t _dger_
000000000009e9ad t _dgerfs_
000000000009f4ba t _dgerq2_
000000000009f6e1 t _dgerqf_
Some other possibilities...
My personal library has a directory src/dist_tools/bindings containing routines for automatically creating bindings for a library given "simple" (i.e., not using typedefs) function prototypes. LAPACK would be fairly easy to create bindings for (the hardest part would probably be to build the package you want to use ATLAS, PLAPACK, ScaLAPACK, etc.). The library is free to use, a small consulting contract could be done if you would like it done for you.
The next version of GPULib will contain a GPU implementation of LAPACK, using the MAGMA library. This is effectively a highly parallel option, but only works on CUDA graphics cards. It would also work best if other operations besides the matrix inversion could be done on the GPU to minimize memory transfer. This option costs money.

Converting models in Matlab/R to C++/Java

I would like to convert an ARIMA model developed in R using the forecast library to Java code. Note that I need to implement only the forecasting part. The fitting can be done in R itself. I am going to look at the predict function and translate it to Java code. I was just wondering if anyone else had been in a similar situation before and managed to successfully use a Java library for the same.
Along similar lines, and perhaps this is a more general question without a concrete answer; What is the best way to deal with situations where in model building can be done in Matlab/R but the prediction/forecasting needs to be done in Java/C++? Increasingly, I have been encountering such a situation over and over again. I guess you have to bite the bullet and write the code yourself and this is not generally as hard as writing the fitting/estimation yourself. Any advice on the topic would be helpful.
You write about 'R or Matlab' to 'C++ or Java'. This gives 2 x 2 choices which is too many degrees of freedom for my taste. So allow me to concentrate on C++ as the target.
Let's consider a simpler case: Prototyping in R, and deploying in C++. If and when the R package you use is actually implemented in C or C++, this becomes pretty easy. You "merely" need to disentangle the routine you are after from its other dependencies (header files, defines, data structures, ...) and provide it with the data and parameters needed. I have done that in the past for production systems.
Here, you talk about the forecast package. This happens to depend on the RcppArmadillo package which itself brings the nice Armadillo C++ library to R. So chances are you can in fact re-write this as a self-contained unit.
Armadillo is also interesting when you want to port Matlab to C++ as it is written to help with exactly that task in mind. I have ported some relatively extensive Matlab code to C++ and reaped a substantial speed gain.
I'm not sure whether this is possible in R, but in Matlab you can interact with your Matlab code from Java - see http://www.cs.virginia.edu/~whitehouse/matlab/JavaMatlab.html. This would enable you to leave all the forecasting code in Matlab and have e.g. an interface written in Java.
Alternatively, you might want to have predictive code written in Java so that you can produce a model and then distribute a program that uses the model without having a dependency on Matlab. The Matlab compiler maybe be useful here, but I've never used it.
A final simple way of interacting messily between Matlab and Java would be (on linux) using pseudoterminals where you would have a pty/tty pair to interface Java and Matlab. In this case you would send data from Java to Matlab, and have Matlab return the forecasting results. I expect this would also work in R, but I don't know the syntax.
In general though, reimplementing the code is a decent solution and probably quicker than learning how to interface java+matlab or create Matlab libraries.
Some further information on the answer given by Richante: Matlab has some really nice capabilities for interop with compiled languages such as C/C++, C#, and Java. In your particular case you might find the toolbox Matlab Builder JA to be particularly relevant. It allows you to export your Matlab code directly to Java, meaning you can directly call code that you've constructed during your model-building phase in Matlab from Java.
More information from the Mathworks here.
I am also concerned with converting "R to Java" so will speak to that part.
As Vincent Zooneykind said in his comment - the PMML library in R makes sense for model export in general but "forecast" is not a supported library as of yet.
An alternative is to use something like https://www.opencpu.org/ to make a call to R from your java program. It surfaces the R code on a http server. Can then just call it with parameters as with a normal http call and return what is neede using java.net.HttpUrlConnection or a choice of http libraries available in Java.
Pros: Separation of concerns, no need to re-write the R code
Cons: Invoking an R server in your live process so need to make sure that is handled robustly

Resources