Which packages make good use of S4 objects? - r

Which R packages make good use of S4 classes? I'm looking for packages that use S4 appropriately (i.e. when the complexity of the underlying problem demands), are well written and well documented (so you can read the code and understand what's going on).
I'm interested because I'll be teaching S4 soon and I'd like to point students to good examples in practice so they can read the code to help them learn.

Thinking about this some more, maybe Matrix and/or lme4? Matrix does a lot of trickery with efficient representation of sparse matrices so this may be a worthwhile (though possibly heavy) example.
Else, given that all of BioConductor is done in S4, some of it is bound to be better than average :) I am sure Martin Morgan will pipe in with good examples.

This doesn't exactly answer your question, but....
R in a Nutshell develops an S4 class for a timeseries object and then compares it to the S3 representation. It's a very nice illustration (without being overly complex or too simple) of the differences between S3 and S4.
R programming for Bioinformatics briefly discusses the ExpressionSet set object.
In regards with using the Bioconductor packages, you might find that to fully appreciate the code - or even just to get started - you will have to a reasonable knowledge of biology. I suppose the same applies to complex statistics packages; you need to have a vague idea of what's going on to understand the reasons behind the code structure.

At the last LondonR meeting Brandon Whicher gave a fascinating talk about the use of S4 classes in his package dcemriS4, for use in analysing magnetic resonance imaging (MRI) in medical research.
His talk is available here:
http://www.londonr.org/Medical%20Image%20Analysis%20using%20S4%20classes%20&%20methods.pdf
And the package is on CRAN:
http://star-www.st-andrews.ac.uk/cran/web/packages/dcemriS4/index.html

sp and dependent packages use S4 and well documented. Alpha and omega for spatial stuff in R.

I would go for kernlab, which additionally includes a lot of C code.
It comes with an handy vignette, detailing some of S4 concepts. (It doesn't seem to use roxygen for the documentation, though, but this is not the question here.)

Trying to get a hold of the S4 system I ran across an educational package sequence. The implementation of the class system is illustrated in an accompanying set of slides in a repo roo by the same author. Though the example used is from biostatistics, it's good to follow.
It is a great learning resource, because the author took carefully contrasted the different object systems while at the same time keeping the complexity of the package adequate for learning.

Related

Multivariate Dynamic Time Warping(DTW) with R

I'm currently dealing with multivariate dynamic time warping (DTW) in R. The best library I found so far is the dtw package as described here: http://dtw.r-forge.r-project.org/
But I do not know how multivariate dtw is actually implemented and it is also not described in the description of the package. All in all, I would like to know if it implements DTWD (Dependent DTW) or DTWI (Independent DTW).
Does anyone have an idea or a suggestion how to find out which of these two approaches the package uses? Or are there libaries which allow me to choose the variant?
You can always look at the companion papers that come with the packages for a more theoretical grounding of the packages
https://www.jstatsoft.org/article/view/v031i07
Additionally, you can always look into the "guts" of the functions to understand how things are implemented in the code
https://www.rdocumentation.org/packages/dtw/versions/1.20-1/source
As far as this question in general, perhaps it is better suited for https://stats.stackexchange.com/ where the questions are more methodological than programming.
The two types of multivariate dynamic time warping are carefully explained here
https://www.cs.ucr.edu/~eamonn/Multi-Dimensional_DTW_Journal.pdf

Rcpp or rdyncall

I'm looking for some kind of references which explain the pro's en con's of using Rcpp when compared to using rdyncall.
Can someone who has used both explain the basic differences from an R package developers perspective who is interested in providing R wrappers to C++ code.
Thanks.
I think we mention rdyncall in the (brief) comparison to other approaches the intro vignette / JSS paper. It is a neat package, but aims for a much lower-level connection. As I understand it, it gives you C-level APIs with least amount of fuzz, as motivated by say, the rgl package. there is very good and detailed paper about rdyncall in a recent R Journal issue.
And unless I miss something, it does nothing for you on the C++ side. Whereas Rcpp makes use as .Call() to pass complete R objects back and forth, and manages to map a wide variety of R and C++ types automatically for you---with the possibility add your own mappers.

Is an implementation of Iterative Closest Point (ICP) available in R?

Does someone have an implementation of Iterative Closest Point (ICP) algorithm for two dimensions (2D) in R?
Here is an attempt in c#
Iterative Closest Point Implementation
Here is a more general question
iterative closest point library
This is to match two sets of points through translation and scaling.
Spacedman's comment is probably best. You might also take a look at http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=12627&objectType=file for a matlab implementation. Assuming it works ok, translating Matlab to R code is relatively easy.
This is somewhat of an answer in the form of a non-answer.
There are many variants of ICP. The design choices are at least partially organized by the late 90's Ph.D. work of Pulli and by Rusinkiewicz & Levoy. If you're going to be using ICP for anything remotely important (translation: "more than just a class assignment"), you should understand the tradeoffs.
Thus, it's probably best to take one of the existing implementations and port it to R.
3 Years too late, but there is the function icpmat in the package Morpho by the same guy who wrote Rvcg. I don't know which variant is implemented though.
Link:
https://github.com/zarquon42b/Morpho
There is a self-contained (as far as I can tell) C++ implementation of ICP here. Maybe you can create your own R wrapper around this C++ code.

Workaround for pointers in R?

I have been implementing binary tree search algorithm recently in R, and before that I used linked array-like structures. These algorithm would be much easier if there were pointers in R (not C pointers, but references to objects). I wonder if there is a workaround. I don't know S4 at all; maybe it is possible in that framework? I would avoid environment-related tricks, since that pass-by-reference is a little too bit of a workaround. And I would avoid invocations of C or C++'s STL. It's an R question after all.
R 2.12 will start to bring you some of this. In the meantime, the common recommendation is to use environments to approximate call-by-reference.
You might also be interested in the binsearch() function from the genetics package: http://www.biometrics.mtu.edu/CRAN/web/packages/genetics/index.html . It implements a binary search.

R code examples/best practices

I'm new to R and having a hard time piecing together information from various sources online related to what is considered a "good" practice with writing R code. I've read basic guides but I've been having a hard time finding information that is definitely up to date.
What are some examples of well written/documented S3 classes?
How about corresponding S4 classes?
What conventions do you use when commenting .R classes/functions? Do you put all of your comments in both .Rd files and .R files? Is synchronization of these files tiresome?
Whether to use S3, S4, or a package at all is mostly a style issue (as Dirk says), but I would suggest using one of those if you want to have a very well structured object (just as you would in any OOP language). For instance, all the time series classes have time series objects (I believe that they're all S3 with the exception of its) because it allows them to enforce certain behavior around the construction and usage of those objects. Similarly with the question about creating a package: it's a good idea to do this if you will be re-using your code frequently or if the code will be useful to someone else. It requires a little more effort, but the added organizational structure can easily make up for the cost.
Regarding S3 vs. S4 (discussed on R-Help here and here), the basic guideline is that S3 classes are more "quick and dirty" while S4 classes place more rigid control over objects and types. If you're working on Bioconductor, you typically will use S4 (see, for instance, "S4 classes and methods").
I would recommend reading some of the following:
"A (Not So) Short Introduction to S4" by Christophe Genolini
"Programmers' niche: A simple class, in S3 and S4" by Thomas Lumley
"Brobdingnag: a ''hello world'' package using S4 methods" by Robin K. S. Hankin
"Converting packages to S4" by Douglas Bates
"How S4 Methods Work" by John Chambers
For documentation, Hadley's suggestion is spot on: Roxygen will make life easier and puts the documentation right next to the code. That aside, you may still want to provide other comments in your code beyond what Roxygen or the man files require, in which case it's a good practice to comment your code for other developers. Those comments will not end up in your package; they will only be visible in the source code.
For 3. Use roxygen - it works like javadoc to take comments in your source files and build Rd files.
That's half a dozen or more questions bundled into one, which makes it difficult to answer.
So let's try from the inside out: First try to solve your RODBC wrapper problem. A code representation will suggest itself. I would start with simple functions, and then maybe build a package around it. That already gives you some encapsulation.
Much of the rest is style. Some prominent R codes swear by S4, while other swear about it. You can always read the packages of others as well as code in R itself. And you can always re-implement your RODBC wrapper in different ways and the compare your own approaches.
Edit: Reflecting you updated and much shortened question: Pick some packages from CRAN, in particular among those you use. I think you will quickly find some more or less interesting according to your style.
somewhat more style related than substance, but the Google R style guide is worth reading:

Resources