I'm looking for some kind of references which explain the pro's en con's of using Rcpp when compared to using rdyncall.
Can someone who has used both explain the basic differences from an R package developers perspective who is interested in providing R wrappers to C++ code.
Thanks.
I think we mention rdyncall in the (brief) comparison to other approaches the intro vignette / JSS paper. It is a neat package, but aims for a much lower-level connection. As I understand it, it gives you C-level APIs with least amount of fuzz, as motivated by say, the rgl package. there is very good and detailed paper about rdyncall in a recent R Journal issue.
And unless I miss something, it does nothing for you on the C++ side. Whereas Rcpp makes use as .Call() to pass complete R objects back and forth, and manages to map a wide variety of R and C++ types automatically for you---with the possibility add your own mappers.
Related
I'm a R newbie.
is there a way i can calculate
(x+x^2+x^3)^2
in R?
so i will get the result:
x^6+2 x^5+3 x^4+2 x^3+x^2
I get an Error: object 'x' not found.
Thanks!
R isn't well suited for this. Some interface packages to languages and libraries that are better at this do exist, such as rSymPy, which allows you to access the SymPy Python library for symbolic mathematics (you'll need to install both). In a similar vein, Ryacas links to the yacas algebra system.
Those interfaces are useful if you need symbolic manipulation as part of an R workflow. Otherwise, consider using the original tools. The ones above are open source and freely available, while other free use alternatives also exist, such as the proprietary web based Wolfram Alpha (for limited use).
I would like to convert an ARIMA model developed in R using the forecast library to Java code. Note that I need to implement only the forecasting part. The fitting can be done in R itself. I am going to look at the predict function and translate it to Java code. I was just wondering if anyone else had been in a similar situation before and managed to successfully use a Java library for the same.
Along similar lines, and perhaps this is a more general question without a concrete answer; What is the best way to deal with situations where in model building can be done in Matlab/R but the prediction/forecasting needs to be done in Java/C++? Increasingly, I have been encountering such a situation over and over again. I guess you have to bite the bullet and write the code yourself and this is not generally as hard as writing the fitting/estimation yourself. Any advice on the topic would be helpful.
You write about 'R or Matlab' to 'C++ or Java'. This gives 2 x 2 choices which is too many degrees of freedom for my taste. So allow me to concentrate on C++ as the target.
Let's consider a simpler case: Prototyping in R, and deploying in C++. If and when the R package you use is actually implemented in C or C++, this becomes pretty easy. You "merely" need to disentangle the routine you are after from its other dependencies (header files, defines, data structures, ...) and provide it with the data and parameters needed. I have done that in the past for production systems.
Here, you talk about the forecast package. This happens to depend on the RcppArmadillo package which itself brings the nice Armadillo C++ library to R. So chances are you can in fact re-write this as a self-contained unit.
Armadillo is also interesting when you want to port Matlab to C++ as it is written to help with exactly that task in mind. I have ported some relatively extensive Matlab code to C++ and reaped a substantial speed gain.
I'm not sure whether this is possible in R, but in Matlab you can interact with your Matlab code from Java - see http://www.cs.virginia.edu/~whitehouse/matlab/JavaMatlab.html. This would enable you to leave all the forecasting code in Matlab and have e.g. an interface written in Java.
Alternatively, you might want to have predictive code written in Java so that you can produce a model and then distribute a program that uses the model without having a dependency on Matlab. The Matlab compiler maybe be useful here, but I've never used it.
A final simple way of interacting messily between Matlab and Java would be (on linux) using pseudoterminals where you would have a pty/tty pair to interface Java and Matlab. In this case you would send data from Java to Matlab, and have Matlab return the forecasting results. I expect this would also work in R, but I don't know the syntax.
In general though, reimplementing the code is a decent solution and probably quicker than learning how to interface java+matlab or create Matlab libraries.
Some further information on the answer given by Richante: Matlab has some really nice capabilities for interop with compiled languages such as C/C++, C#, and Java. In your particular case you might find the toolbox Matlab Builder JA to be particularly relevant. It allows you to export your Matlab code directly to Java, meaning you can directly call code that you've constructed during your model-building phase in Matlab from Java.
More information from the Mathworks here.
I am also concerned with converting "R to Java" so will speak to that part.
As Vincent Zooneykind said in his comment - the PMML library in R makes sense for model export in general but "forecast" is not a supported library as of yet.
An alternative is to use something like https://www.opencpu.org/ to make a call to R from your java program. It surfaces the R code on a http server. Can then just call it with parameters as with a normal http call and return what is neede using java.net.HttpUrlConnection or a choice of http libraries available in Java.
Pros: Separation of concerns, no need to re-write the R code
Cons: Invoking an R server in your live process so need to make sure that is handled robustly
Which R packages make good use of S4 classes? I'm looking for packages that use S4 appropriately (i.e. when the complexity of the underlying problem demands), are well written and well documented (so you can read the code and understand what's going on).
I'm interested because I'll be teaching S4 soon and I'd like to point students to good examples in practice so they can read the code to help them learn.
Thinking about this some more, maybe Matrix and/or lme4? Matrix does a lot of trickery with efficient representation of sparse matrices so this may be a worthwhile (though possibly heavy) example.
Else, given that all of BioConductor is done in S4, some of it is bound to be better than average :) I am sure Martin Morgan will pipe in with good examples.
This doesn't exactly answer your question, but....
R in a Nutshell develops an S4 class for a timeseries object and then compares it to the S3 representation. It's a very nice illustration (without being overly complex or too simple) of the differences between S3 and S4.
R programming for Bioinformatics briefly discusses the ExpressionSet set object.
In regards with using the Bioconductor packages, you might find that to fully appreciate the code - or even just to get started - you will have to a reasonable knowledge of biology. I suppose the same applies to complex statistics packages; you need to have a vague idea of what's going on to understand the reasons behind the code structure.
At the last LondonR meeting Brandon Whicher gave a fascinating talk about the use of S4 classes in his package dcemriS4, for use in analysing magnetic resonance imaging (MRI) in medical research.
His talk is available here:
http://www.londonr.org/Medical%20Image%20Analysis%20using%20S4%20classes%20&%20methods.pdf
And the package is on CRAN:
http://star-www.st-andrews.ac.uk/cran/web/packages/dcemriS4/index.html
sp and dependent packages use S4 and well documented. Alpha and omega for spatial stuff in R.
I would go for kernlab, which additionally includes a lot of C code.
It comes with an handy vignette, detailing some of S4 concepts. (It doesn't seem to use roxygen for the documentation, though, but this is not the question here.)
Trying to get a hold of the S4 system I ran across an educational package sequence. The implementation of the class system is illustrated in an accompanying set of slides in a repo roo by the same author. Though the example used is from biostatistics, it's good to follow.
It is a great learning resource, because the author took carefully contrasted the different object systems while at the same time keeping the complexity of the package adequate for learning.
I need a relatively efficient way to share data between Matlab and R.
I have checked SaveR and MATLAB R-link, but SaveR formats Matlab's binary data as text strings first and then prints them to an ASCII file, which is not efficient for large datasets, and MATLAB R-link only works on Windows (it uses a COM-based interface).
Update:
Dirk has posted a list of what seem to be better solutions to this problem than SaveR and Matlab R-link. I also learned recently about RAM disks (see here and here for some implementation examples), and thought that they might facilitate the task of sharing large datasets between Matlab and R (or similar computational environments) further. This leads me to the following questions:
Assumming that the data fits in the machines' memory in Matlab's or R's native data containers:
Are any of the solutions listed so
far a better fit for RAM disks?
Are there any additional
considerations to be taken into
account when dealing with RAM disks
instead of with secundary-storage
solutions?
Thanks!
Couple of ideas, and with the caveat that I know more about the R side of things:
Tthe R.matlab package on CRAN can help: This package provides methods to read and write MAT files. It also makes it possible to communicate (evaluate code, send and retrieve objects etc.) with Matlab v6 or higher running locally or on a remote host
HDF5, as you suggested, is a possibility but I heard that the R support in CRAN package hdf5 is somewhat basic
NetCDF may be an alternative; CRAN has packages RNetCDF, ncdf and ncdf4
Use a database, especially a light and file-based one like SQLite or H4 both of which have R support
Use a common serialization / de-serialization format; R has support for Google Protocol Buffers via RProtoBuf and Google points to protobuf-matlab for Matlab
Write your own! Especially when you only need something basic like large rectangular matrices then nothing will beat a direct binary write; I did this once years ago for Octave (which is close to Matlab). You can extend Matab via mex files; R has its API and helpers like Rcpp. The larger your data sets, the more attractive this may look as you save the conversions.
Matlab use HDF5 natively in last versions ("save" and "load"). There is a package for R. Then HDF5 might be a good solution.
I'm new to R and having a hard time piecing together information from various sources online related to what is considered a "good" practice with writing R code. I've read basic guides but I've been having a hard time finding information that is definitely up to date.
What are some examples of well written/documented S3 classes?
How about corresponding S4 classes?
What conventions do you use when commenting .R classes/functions? Do you put all of your comments in both .Rd files and .R files? Is synchronization of these files tiresome?
Whether to use S3, S4, or a package at all is mostly a style issue (as Dirk says), but I would suggest using one of those if you want to have a very well structured object (just as you would in any OOP language). For instance, all the time series classes have time series objects (I believe that they're all S3 with the exception of its) because it allows them to enforce certain behavior around the construction and usage of those objects. Similarly with the question about creating a package: it's a good idea to do this if you will be re-using your code frequently or if the code will be useful to someone else. It requires a little more effort, but the added organizational structure can easily make up for the cost.
Regarding S3 vs. S4 (discussed on R-Help here and here), the basic guideline is that S3 classes are more "quick and dirty" while S4 classes place more rigid control over objects and types. If you're working on Bioconductor, you typically will use S4 (see, for instance, "S4 classes and methods").
I would recommend reading some of the following:
"A (Not So) Short Introduction to S4" by Christophe Genolini
"Programmers' niche: A simple class, in S3 and S4" by Thomas Lumley
"Brobdingnag: a ''hello world'' package using S4 methods" by Robin K. S. Hankin
"Converting packages to S4" by Douglas Bates
"How S4 Methods Work" by John Chambers
For documentation, Hadley's suggestion is spot on: Roxygen will make life easier and puts the documentation right next to the code. That aside, you may still want to provide other comments in your code beyond what Roxygen or the man files require, in which case it's a good practice to comment your code for other developers. Those comments will not end up in your package; they will only be visible in the source code.
For 3. Use roxygen - it works like javadoc to take comments in your source files and build Rd files.
That's half a dozen or more questions bundled into one, which makes it difficult to answer.
So let's try from the inside out: First try to solve your RODBC wrapper problem. A code representation will suggest itself. I would start with simple functions, and then maybe build a package around it. That already gives you some encapsulation.
Much of the rest is style. Some prominent R codes swear by S4, while other swear about it. You can always read the packages of others as well as code in R itself. And you can always re-implement your RODBC wrapper in different ways and the compare your own approaches.
Edit: Reflecting you updated and much shortened question: Pick some packages from CRAN, in particular among those you use. I think you will quickly find some more or less interesting according to your style.
somewhat more style related than substance, but the Google R style guide is worth reading: