I have been implementing binary tree search algorithm recently in R, and before that I used linked array-like structures. These algorithm would be much easier if there were pointers in R (not C pointers, but references to objects). I wonder if there is a workaround. I don't know S4 at all; maybe it is possible in that framework? I would avoid environment-related tricks, since that pass-by-reference is a little too bit of a workaround. And I would avoid invocations of C or C++'s STL. It's an R question after all.
R 2.12 will start to bring you some of this. In the meantime, the common recommendation is to use environments to approximate call-by-reference.
You might also be interested in the binsearch() function from the genetics package: http://www.biometrics.mtu.edu/CRAN/web/packages/genetics/index.html . It implements a binary search.
Related
I am working with a very large dataset, typically dealing with a few millions of combinations.
I want to solve the assignment problem.(maximise the sum)
I had tried solving it on a small test set using adagio::assignment, clue::solve_LSAP
I wasnt able to successfully install the "lpSolve" package on my system, threw some segmentation fault
Wanted to know which of these is faster or any other method which does it faster.
Thanks....
An LP formulation is not a good way to solve the assignment problem, whichever library you use. You have to use the Hungarian algorithm, and it looks like solve_LSAP does exactly that.
No need to try anything else IMHO.
EDIT: An efficient implementation of the Hungarian method should be O(n^3), which is extremely fast for any optimization algorithm. If solve_LSAP is not fast enough for your problem (assumed it is implemented correctly), it is very unlikely that any exact method will work.
You will have to use some sort of heuristic to approximate the solution.
I'm looking for some kind of references which explain the pro's en con's of using Rcpp when compared to using rdyncall.
Can someone who has used both explain the basic differences from an R package developers perspective who is interested in providing R wrappers to C++ code.
Thanks.
I think we mention rdyncall in the (brief) comparison to other approaches the intro vignette / JSS paper. It is a neat package, but aims for a much lower-level connection. As I understand it, it gives you C-level APIs with least amount of fuzz, as motivated by say, the rgl package. there is very good and detailed paper about rdyncall in a recent R Journal issue.
And unless I miss something, it does nothing for you on the C++ side. Whereas Rcpp makes use as .Call() to pass complete R objects back and forth, and manages to map a wide variety of R and C++ types automatically for you---with the possibility add your own mappers.
Does someone have an implementation of Iterative Closest Point (ICP) algorithm for two dimensions (2D) in R?
Here is an attempt in c#
Iterative Closest Point Implementation
Here is a more general question
iterative closest point library
This is to match two sets of points through translation and scaling.
Spacedman's comment is probably best. You might also take a look at http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=12627&objectType=file for a matlab implementation. Assuming it works ok, translating Matlab to R code is relatively easy.
This is somewhat of an answer in the form of a non-answer.
There are many variants of ICP. The design choices are at least partially organized by the late 90's Ph.D. work of Pulli and by Rusinkiewicz & Levoy. If you're going to be using ICP for anything remotely important (translation: "more than just a class assignment"), you should understand the tradeoffs.
Thus, it's probably best to take one of the existing implementations and port it to R.
3 Years too late, but there is the function icpmat in the package Morpho by the same guy who wrote Rvcg. I don't know which variant is implemented though.
Link:
https://github.com/zarquon42b/Morpho
There is a self-contained (as far as I can tell) C++ implementation of ICP here. Maybe you can create your own R wrapper around this C++ code.
I would like to convert an ARIMA model developed in R using the forecast library to Java code. Note that I need to implement only the forecasting part. The fitting can be done in R itself. I am going to look at the predict function and translate it to Java code. I was just wondering if anyone else had been in a similar situation before and managed to successfully use a Java library for the same.
Along similar lines, and perhaps this is a more general question without a concrete answer; What is the best way to deal with situations where in model building can be done in Matlab/R but the prediction/forecasting needs to be done in Java/C++? Increasingly, I have been encountering such a situation over and over again. I guess you have to bite the bullet and write the code yourself and this is not generally as hard as writing the fitting/estimation yourself. Any advice on the topic would be helpful.
You write about 'R or Matlab' to 'C++ or Java'. This gives 2 x 2 choices which is too many degrees of freedom for my taste. So allow me to concentrate on C++ as the target.
Let's consider a simpler case: Prototyping in R, and deploying in C++. If and when the R package you use is actually implemented in C or C++, this becomes pretty easy. You "merely" need to disentangle the routine you are after from its other dependencies (header files, defines, data structures, ...) and provide it with the data and parameters needed. I have done that in the past for production systems.
Here, you talk about the forecast package. This happens to depend on the RcppArmadillo package which itself brings the nice Armadillo C++ library to R. So chances are you can in fact re-write this as a self-contained unit.
Armadillo is also interesting when you want to port Matlab to C++ as it is written to help with exactly that task in mind. I have ported some relatively extensive Matlab code to C++ and reaped a substantial speed gain.
I'm not sure whether this is possible in R, but in Matlab you can interact with your Matlab code from Java - see http://www.cs.virginia.edu/~whitehouse/matlab/JavaMatlab.html. This would enable you to leave all the forecasting code in Matlab and have e.g. an interface written in Java.
Alternatively, you might want to have predictive code written in Java so that you can produce a model and then distribute a program that uses the model without having a dependency on Matlab. The Matlab compiler maybe be useful here, but I've never used it.
A final simple way of interacting messily between Matlab and Java would be (on linux) using pseudoterminals where you would have a pty/tty pair to interface Java and Matlab. In this case you would send data from Java to Matlab, and have Matlab return the forecasting results. I expect this would also work in R, but I don't know the syntax.
In general though, reimplementing the code is a decent solution and probably quicker than learning how to interface java+matlab or create Matlab libraries.
Some further information on the answer given by Richante: Matlab has some really nice capabilities for interop with compiled languages such as C/C++, C#, and Java. In your particular case you might find the toolbox Matlab Builder JA to be particularly relevant. It allows you to export your Matlab code directly to Java, meaning you can directly call code that you've constructed during your model-building phase in Matlab from Java.
More information from the Mathworks here.
I am also concerned with converting "R to Java" so will speak to that part.
As Vincent Zooneykind said in his comment - the PMML library in R makes sense for model export in general but "forecast" is not a supported library as of yet.
An alternative is to use something like https://www.opencpu.org/ to make a call to R from your java program. It surfaces the R code on a http server. Can then just call it with parameters as with a normal http call and return what is neede using java.net.HttpUrlConnection or a choice of http libraries available in Java.
Pros: Separation of concerns, no need to re-write the R code
Cons: Invoking an R server in your live process so need to make sure that is handled robustly
Which R packages make good use of S4 classes? I'm looking for packages that use S4 appropriately (i.e. when the complexity of the underlying problem demands), are well written and well documented (so you can read the code and understand what's going on).
I'm interested because I'll be teaching S4 soon and I'd like to point students to good examples in practice so they can read the code to help them learn.
Thinking about this some more, maybe Matrix and/or lme4? Matrix does a lot of trickery with efficient representation of sparse matrices so this may be a worthwhile (though possibly heavy) example.
Else, given that all of BioConductor is done in S4, some of it is bound to be better than average :) I am sure Martin Morgan will pipe in with good examples.
This doesn't exactly answer your question, but....
R in a Nutshell develops an S4 class for a timeseries object and then compares it to the S3 representation. It's a very nice illustration (without being overly complex or too simple) of the differences between S3 and S4.
R programming for Bioinformatics briefly discusses the ExpressionSet set object.
In regards with using the Bioconductor packages, you might find that to fully appreciate the code - or even just to get started - you will have to a reasonable knowledge of biology. I suppose the same applies to complex statistics packages; you need to have a vague idea of what's going on to understand the reasons behind the code structure.
At the last LondonR meeting Brandon Whicher gave a fascinating talk about the use of S4 classes in his package dcemriS4, for use in analysing magnetic resonance imaging (MRI) in medical research.
His talk is available here:
http://www.londonr.org/Medical%20Image%20Analysis%20using%20S4%20classes%20&%20methods.pdf
And the package is on CRAN:
http://star-www.st-andrews.ac.uk/cran/web/packages/dcemriS4/index.html
sp and dependent packages use S4 and well documented. Alpha and omega for spatial stuff in R.
I would go for kernlab, which additionally includes a lot of C code.
It comes with an handy vignette, detailing some of S4 concepts. (It doesn't seem to use roxygen for the documentation, though, but this is not the question here.)
Trying to get a hold of the S4 system I ran across an educational package sequence. The implementation of the class system is illustrated in an accompanying set of slides in a repo roo by the same author. Though the example used is from biostatistics, it's good to follow.
It is a great learning resource, because the author took carefully contrasted the different object systems while at the same time keeping the complexity of the package adequate for learning.