I have been working on extension to R that is going to do some clustering. The project uses c++ and Rcpp (calculations are performed using RcppArmadillo). As a result I have a few classes I need to test. I was suggested to use googletest. Unfortunately, I fail to run any testing code.
The problem is that in order to test classes that use Rcpp with googletest framework I have to work outside of R environment.
I mean I do not transform data into standard c++ data structures like vector. The dataset is supposed to be enormous. I get NumericMatrix with data and I pass it down. That causes all c++ classes to use Rcpp.h (or armadillo). I wonder if I can use these classes outside of R.
I was looking for any information on standalone programs that use Rcpp as a library but all I get is 'standalone' code as opposite to c++ code compiled directly in R command line interface by inline package. I would prefer to work with googletest because I can test c++ directly.
The question is whether one can use Rcpp without R?
In a strict sense, you can't because Rcpp code is meant to be called from R.
In a wider sense, of course you can provided you write your interfaces correctly. Write C++ code that does not depend on R and Rcpp headers, using just C++ and STL and Armadillo and maybe googletest idioms. Ie do not use Rcpp types such as as Rcpp::NumericMatrix but use Armadillo types such as arma::mat. Test the living daylight out of them. Maybe wrap them up in a library.
Then just write a thin access layer using Rcpp and RcppArmadillo. Et voila -- you have tested code, accessed in R.
Related
I am new to Fortran and am trying to learn how to do simple plots. I already have a program that creates a file of the values that I'm looking to test out in a simple plotting exercise, but every example I've seen so far uses gnuplot. As the computer I'm using is not a personal computer, installing or downloading gnuplot is not really the easiest option at first glance.
Would it be correct to assume that without gnuplot, plotting using Fortran 90 is very difficult?
Fortran is a general purpose programming language. It is designed to work on any type of computer, even those without any screen or operating system (with some new possibilities to ineract with an OS if it is present).
All such languages, like Fortran, C or C++ cannot directly do any graphical output or plotting. They require external libraries which are written in a system-specific way to interact with the graphical interface. There are such libraries available for Fortran, but using them is not trivial. It is much (MUCH!) harder than installing gnuplot, if you already know how to use gnuplot.
I will not recommend any such libraries as it is off-topic here.
You can use gtk-fortran. It is a GTK / Fortran binding and it offers also an interface to PLplot:
https://github.com/vmagnin/gtk-fortran/wiki
But you need a Fortran 2003 compliant compiler (it is the case of all recent compilers).
Plotting with Fortran is generally not easy because you need to install such libraries and need to learn their functioning.
When RNGScope scope is needed when using Rcpp? In general, you need it when using C RNG functions. In some places you can read that it is not needed when using Rcpp, examples in Rcpp documentation use it, some examples by Dirk Eddelbuettel use it, while others do not like this one or this do not. So in the end I'm confused...
When exactly is it needed and when not? Does it make a difference if I use Rcpp::runif(), R::runif(), or R::unif_rand()? I'm interested mostly in using Rcpp within R package rather then calling standalone code.
In short:
whenever you call the RNGs of the C API of R, you need to save and later re-set state
RNGScope automates this for you as it is so darn useful (and generally inexpensive)
whenever you use Rcpp Attributes, it inserts RNGScope (as you can see when you turn on verbose=TRUE
So in general you don't need to anything manual -- unless you go really old school and write all code directly, foregoing the glue provided by Rcpp.
And if you use Rcpp, it doesn't matter whether you use scalar interfaces from the R:: namespace or the vectorized Rcpp Sugar onces via Rcpp::. RNGScope will be there for you.
I often use Rcpp code to incorporate C++ code into R. Through the BH-package I am also able to use the Boost-library. However, the Boost library lacks a function that I would like to use (to be precise, it only has Bessel function but I would like to get Log-Bessel immediately because of overflow). I know that Alglib does have this feature.
Would it be possible to use Alglib with Rcpp, that is, use the log-bessel function from Alglib somehow?
I do not see a clear difference in functionality between the
AlgLib documentation on Bessel functions, and
Boost documentation on Bessel functions.
As such, I think you can just use the BH package giving you all of Boost Math and then some.
Last but not least there is a package bessel on CRAN written by the R Core member focusing on special functions so you could start from there too.
Over at http://scicomp.stackexchange.com I asked this question about parallel matrix algorithms in IDL. The answers suggest using a multi-threaded LAPACK implementation and suggest some hacks to get IDL to use a specific LAPACK library. I haven't been able to get this to work.
I would ideally like the existing LAPACK DLM to simply be able to use a multi-threaded LAPACK library and it feels like this should be possible but I have not had any success. Alternatively I guess the next simplest step would be to create a new DLM to wrap a matrix inversion call in some C code and ensure this DLM points to the desired implementation. The documentation for creating DLMs is making me cross-eyed though, so any pointers to doing this (if it is required) would also be appreciated.
What platform are you targeting?
Looking at idl_lapack.so with nm on my platform (Mac OS X, IDL 8.2.1) seems to indicate that the LAPACK routines are directly in the .so, so my (albeit limited) understanding is that it would not be simple to swap out (i.e., by setting LD_LIBRARY_PATH).
$ nm idl_lapack.so
...
000000000023d5bb t _dgemm_
000000000023dfcb t _dgemv_
000000000009d9be t _dgeqp3_
000000000009e204 t _dgeqr2_
000000000009e41d t _dgeqrf_
000000000023e714 t _dger_
000000000009e9ad t _dgerfs_
000000000009f4ba t _dgerq2_
000000000009f6e1 t _dgerqf_
Some other possibilities...
My personal library has a directory src/dist_tools/bindings containing routines for automatically creating bindings for a library given "simple" (i.e., not using typedefs) function prototypes. LAPACK would be fairly easy to create bindings for (the hardest part would probably be to build the package you want to use ATLAS, PLAPACK, ScaLAPACK, etc.). The library is free to use, a small consulting contract could be done if you would like it done for you.
The next version of GPULib will contain a GPU implementation of LAPACK, using the MAGMA library. This is effectively a highly parallel option, but only works on CUDA graphics cards. It would also work best if other operations besides the matrix inversion could be done on the GPU to minimize memory transfer. This option costs money.
I would like to convert an ARIMA model developed in R using the forecast library to Java code. Note that I need to implement only the forecasting part. The fitting can be done in R itself. I am going to look at the predict function and translate it to Java code. I was just wondering if anyone else had been in a similar situation before and managed to successfully use a Java library for the same.
Along similar lines, and perhaps this is a more general question without a concrete answer; What is the best way to deal with situations where in model building can be done in Matlab/R but the prediction/forecasting needs to be done in Java/C++? Increasingly, I have been encountering such a situation over and over again. I guess you have to bite the bullet and write the code yourself and this is not generally as hard as writing the fitting/estimation yourself. Any advice on the topic would be helpful.
You write about 'R or Matlab' to 'C++ or Java'. This gives 2 x 2 choices which is too many degrees of freedom for my taste. So allow me to concentrate on C++ as the target.
Let's consider a simpler case: Prototyping in R, and deploying in C++. If and when the R package you use is actually implemented in C or C++, this becomes pretty easy. You "merely" need to disentangle the routine you are after from its other dependencies (header files, defines, data structures, ...) and provide it with the data and parameters needed. I have done that in the past for production systems.
Here, you talk about the forecast package. This happens to depend on the RcppArmadillo package which itself brings the nice Armadillo C++ library to R. So chances are you can in fact re-write this as a self-contained unit.
Armadillo is also interesting when you want to port Matlab to C++ as it is written to help with exactly that task in mind. I have ported some relatively extensive Matlab code to C++ and reaped a substantial speed gain.
I'm not sure whether this is possible in R, but in Matlab you can interact with your Matlab code from Java - see http://www.cs.virginia.edu/~whitehouse/matlab/JavaMatlab.html. This would enable you to leave all the forecasting code in Matlab and have e.g. an interface written in Java.
Alternatively, you might want to have predictive code written in Java so that you can produce a model and then distribute a program that uses the model without having a dependency on Matlab. The Matlab compiler maybe be useful here, but I've never used it.
A final simple way of interacting messily between Matlab and Java would be (on linux) using pseudoterminals where you would have a pty/tty pair to interface Java and Matlab. In this case you would send data from Java to Matlab, and have Matlab return the forecasting results. I expect this would also work in R, but I don't know the syntax.
In general though, reimplementing the code is a decent solution and probably quicker than learning how to interface java+matlab or create Matlab libraries.
Some further information on the answer given by Richante: Matlab has some really nice capabilities for interop with compiled languages such as C/C++, C#, and Java. In your particular case you might find the toolbox Matlab Builder JA to be particularly relevant. It allows you to export your Matlab code directly to Java, meaning you can directly call code that you've constructed during your model-building phase in Matlab from Java.
More information from the Mathworks here.
I am also concerned with converting "R to Java" so will speak to that part.
As Vincent Zooneykind said in his comment - the PMML library in R makes sense for model export in general but "forecast" is not a supported library as of yet.
An alternative is to use something like https://www.opencpu.org/ to make a call to R from your java program. It surfaces the R code on a http server. Can then just call it with parameters as with a normal http call and return what is neede using java.net.HttpUrlConnection or a choice of http libraries available in Java.
Pros: Separation of concerns, no need to re-write the R code
Cons: Invoking an R server in your live process so need to make sure that is handled robustly