Occasionally I see small ways I could improve either R (recently the IQR command) and R documentation (just this week perhaps elaborating differences among and better interconnecting aggregate, tapply, and by). But I don't see a way to really make that contribution back. I looked into the developer site and it seems that my options are either to attempt to become a full fledged developer or create packages, neither of which fit what I wish to accomplish.
I did propose IQR changes on the R mailing list but got no response so I figure that's going nowhere.
And to clarify, I'm talking about base-R. Additional packages are another matter.
Any tips?
Send (or CC) to r-devel. Traffic is quite high on r-help, and things can be overlooked there.
File a bug under the wishlist category detailing the improvement you would like to see.
Having filed the bug, try to provide a patch against the R code and or documentation as appropriate. I've done this before where there was a problem or infelicity in R, supplied a patch and a fix to the help files/manual and had the changes accepted (after suitable modification) by R Core.
If it is an addition to the R code base, you are going to have to show that there is a real pressing need for the addition. Basically you are asking R Core to maintain your code in perpetuity, and they are unlikely to do that unless you can demonstrate a need.
If it is an addition, look for a popular R package that does similar/related things and suggest to the package maintainer that they include your function. That way you don't need to start a whole package for something simple but contribute your code. There are several, popular, *misc packages on CRAN for example.
If you want to contribute fixes to the R documentation and/or manuals, provide patches to the sources. You can find the sources at svn.r-project.org/R
Hopefully that gives you some ideas. Patches and code always help!
How about patches to existing packages?
How about open bug reports on packages? R-Forge projects don't seem to use the issue trackers much, but some folks on the RPostgreSQL team I'm on enabled it (where it is hosted on Google Code), and it has been helpful -- see here. And we had a really useful inflow of fresh blood with a rocking new developer from Japan, probably in part because of the visibility of the project there.
In essence, try to find a project / group / team to become acquainted with and join. In that sense, this is just like any other Open Source project. The r-devel list (gmane view) is a good place for R development in general.
The R Core team, on the other hand, is a little more closed and per invitation only and unlikely to change. So be it, for better or worse. It has worked so far, and hence I am not among those who bemoan this loudly.
Related
I am working on a project where I am fetching bulk data from Bloomberg, such as the stock of the 1000 highest valued US companies, and then computing summary statistics on them.
I would like to use R for the procedure and I am wondering which package is would suit the task better, RBloomberg or Rblpapi.
This is what I think are the pros and cons of the packages:
RBloomberg
Has good Manual from 2010 and more SO questions
+May be more stable since it's been around for longer
May not work on new version of R, Requires Java
Will likely not receive new functions and support
Rblpapi
Faster, does not require Java
Will likely receive new functions
If the package is updated significantly, I may have to rewrite my code
In addition, is the functionality of the two packages equivalent?
Thank you for your input.
These opinion based questions are not always the best fit for Stack Overflow but this may help you:
1) This debate may be of use with Whit one of the writers of Rblpapi in 2014 saying go with Rbbg until the functionality is more developed.
2) #Dirk Eddelbuettel write-up explains the history of these packages. Dirk explains how the collaborators are linked from Dirk to Ana to John to Whit. So there is a lot of idea sharing between the two packages.
3) Only the binaries not source is available from which can be a problem for non-Windows users. (please see #GSee comments) Also packages like packrat for sandboxing do not like the lack of src files for Rbbg. (Others might comment on a workaround for this.)
Disclaimer: I do not use Rblpapi yet so I cannot judge it.
R offers a breadth and depth in statistical computing beyond what is available in commercial
closed source products. Yet R remains, primarily, a programming language for the highly
skilled statistician, and out of the reach of many. --- The R Journal Vol. ½, December 2009
Note: Name changed from Interactive R Language Online Learning Platform: CloudStat School
As stated, R is the best tool and is the lingua franca of statistics. But many people, especially my students found difficulties to use R.
I wish to make an interactive R Learning Platform, called CloudStat School.
The best way to learn R programming is doing while learning.
In CloudStat School, you will see a console box at your top left hand side, while the lesson notes at your top right. Bottom is the output box. Anything you “Run” in console box will be shown as a result in the output box.
So, while learning the notes, you can “run” the R examples immediately without open another windows, software or tabs. You can do it in a page.
I did make a simple working prototype:
Lesson 1: Overview of R Language & CloudStat School
The prototype is simply integrating R Web that hosted in Pôle Bioinformatique Lyonnais in iframe.
If many of you think that this idea great, I would start making a better version.
This is my current simple idea, hope to get some feedback from you.
Thanks a lot.
It would probably be much more resource intensive and require more effort to create, but check this out: I found Code Academy to be a fun way to tinker with JavaScript. Unfortunately the site is (so far) only for a single language and a closed-source, venture-backed startup.
The main problem is that Rweb (as I am learning right now) does execute everything in batch, so this interpreted line-by-line approach used in CA probably cannot be done with it. If you were to create a similar app to CA for R, you'd have to open an R session for every user, hence the resource intensiveness disadvantage stated above. Hope this can be overcome, maybe someone will have an idea.
Hope you find this useful, at least as an inspiration for your endeavors. I wish you the best of luck.
A couple of pointers that might help: Eloquent Javascript and CodingBat.
Eloquent Javascript is an "interactive Hyperbook" where the students can edit and try out the examples right there in HTML as they are learning Javascript. Might be worth a look to get ideas for CloudStat.
In CodingBat, Stanford professor Nick Parlante's has been doing (for Java and Python) exactly what you are attempting to do for R learners. Especially relevant is the Authoring Page.
The success of your CloudStat School will be in getting crowdsourced contributions. To that end, my suggestion is for you to create 4-5 really good exercises with levels and hints, and then to focus on the 'meta' aspect of directing others to create the R exercises for you. Provide instructions for creating hints, tests, code, and tags. You could even consider assigning the task of 'creating new exercises' as a midterm/endterm projects to your R students for extra credit.
Hope that helps and good luck.
Commonly, there are two method of R Learning, one is step-by-step, like what CloudStat School is doing, as well as other R Language books and websites. This is good especially for those (newbie) learn R without specific purpose.
Another one is learn through problems. When you face specific problems, need specific functions, you are forced to learn it. Instead of "start from zero", the better way is learning through examples. This work even for experienced R users.
Since we want to make the best Interactive R Language Online Learning Platform, we need to add as many analysis examples/study cases here. If you need to get some ideas to make a statistical analysis with R, there is a place you will visit first. :)
I want to look into rcpp to improve the speed of some of my R code without having to resort to messy C++ code (I've had some success with that, but it looks like code from hell).
So, I checked the documentation provided with Rcpp, and also the bundle of documents provided at Dirk Eddelbuettel's site. I installed and looked at RcppExamples, but (at least from its documentation) most of these refer to RcppClassic?. Besides that, I did some googling but that didn't result in answers to what seem like basic questions.
Do indexes in Rcpp work zero-based or one-based
List provides both operator() and
operator[], but apparently not
operator[[]]. It is not clear which
ones are similar to [] and [[]] in R.
Is there any support for factors in Rcpp (there does not appear to be any)?
Note: in fact I found some answers from the first example in Rcpp-introduction.pdf, but that just felt like luck.
Also, my stl is very rusty, so if anybody can provide me with a simple example where each element of a List is (e.g.) print-ed with an stl-style loop, that would be neat.
If anybody wants to call me an idiot for not finding this information: go ahead and make your day. Then make mine and point me to the docs I need :-)
As a suggestions to Mr. Eddelbuettel and other Rcpp authors (I expect some of them to read this): the class hierarchies and the like, provided by doxygen, are really neat when you are already kneedeep into Rcpp, but for a beginner (in Rcpp), I am more interested in a list of 'this method in this class does this like that function in R' rather than 'you can find the declaration of this operator in this header file'. After all, I understand one of the goals of Rcpp is to lower the threshold for using C++ in R? Note: from what I have seen and understood, I highly value the actual code of Rcpp and have the highest respect for its creators. If the lack of basic documentation is merely a result of 'lack of resources', I would be willing to become a resource (e.g.: work on 'basic' documentation once I get through it myself).
I do not quite know where to start answering this but here is a quick attempt:
The package has a website. The website lists the documentation.
The package has eight (8) vignettes. They are clearly listed. They are mostly meant to be read as documentation, some more introductory and some more advanced. Some (such as the unit testing output) are more of a quality-control iniative.
There is a vignette called Rcpp-introduction. We refer to it repeatedly. We suggest you read it. This is now also a peer-reviewed and published paper which may lend it even more credibility.
There is a vignette called Rcpp-FAQ. It's first question is "How do I get started?" which points to the aforementioned Rcpp-introduction.
There is a mailing list dedicated to project, you could actually read the archive.
We have given numerous talks, slides are available as is a 90 minute recording of a Google Tech Talk.
Even StackOverflow has a tag for it: [rcpp]. You could read the earlier posts.
There are over two dozen packages clearly listed on the CRAN page for Rcpp as using it. You could read their source code.
All that said, Rcpp cannot be used instead of C++ so if you do not know or understand that operator[[]] cannot exist in C++ we cannot help you either. This is not a magic fairy, or R-to-C++ code compiler. Rather, its focus is to make it much easier to get to C++ code from R, and in some cases even manages to improve on C++ practice. In essence, it tries to be "super-additive": the combination R and C++ should be more than either in isolation.
Lastly, I do grant you that the RcppExamples packages -- which by the way covers the old and new API -- could use more examples. However, its sourecs give good porting hints from old ("classic") to the new and current API.
But there is only so much documentation we can write ourselves. I myself find the above bullet points quite exhaustive. You may have honed in on the weakest element part of the chain though. That is bad luck. Please do try some of other pointers listed here.
This question is inspired by the remark of Duncan Murdoch on the r-devel mailing list in response to a bug report about Sweave :
This is fixed in R-patched. (It would
have been fixed in 2.12.0 if more
people tested the betas...).
Honestly, I've stayed away from beta -aka development- versions for a number of reasons, and these are reasons I hear from more people :
I am a bit horrified it would
somehow cause conflicts with my
current R distribution. As I need it
for work, having to repair it regularly would be a loss of
time I can't explain to my boss
I wouldn't have a clue how to test
efficiently. I reckon every test I
could come up with has already been
run by the development team.
I still find it difficult to figure
out when something is a bug, and
when (most often) it is my own
stupidity kicking in.
But as I understood, it would be a valuable contribution to the R community, and I'm willing to do my bit of the testing as well if I can fit it somehow into my own work. I was thinking of keeping the beta on the side and running my scripts through it as well as a checkup. Saving the constructed objects allows a quick and easy all.equal() to see if something is wrong.
Anybody some more/better ideas on how I could help testing with a minimum amount of effort and a maximum amount of efficiency?
I'd also like to promote this a bit more on our department as well. Apart from the "It's time to give back to the community", any other good reasons why testing betas is worth the effort? How can I counter the arguments given above?
Edit:
As Dirk Eddelbuettel pointed out in the comments, part of the deal is preventing the path variables in Windows. I have some ideas on that, but pointers on how to practically organize your computer for testing R-devel versions are greatly appreciated as well.
I fear you misunderstand. This may not be straightforward or obvious at first so maybe this helps:
"patched" is not "beta". Patched is what R 2.12.1 will be.
There is no conflict. It drops in for 2.12.0.
It is a separate download, and a nightly build available from here.
This is not r-devel but r-patched.
It is our duty as users to test pre-releases as well. So if anything, in an ideal word you would have R-patched installed --- as well as R-devel!
Testing can be as easy as installing another version, keeping it outside your path and then adjusting PATH and R_HOME dynamicaly from a script. Testing means running it on your code and data to prevent you from getting bitten by bugs once the new code is released.
I wouldn't have a clue how to test efficiently. I reckon every test I could come up with has already been run by the development team.
I still find it difficult to figure out when something is a bug, and when (most often) it is my own stupidity kicking in.
The problem is, software is not (or not only) going to be used by developers. It is going to be used by people that may not have programming knowledge at all (I'm speaking generally, this is valid for R as well as for any other software).
If the help or the interface or the general way the software is built do not give you enough informations on how to do something, well, that is maybe not a bug, but it is something that can be improved (and pointed out to the devs).
Also, remember that the developers wrote the software. They know how to use it and often they will be biased in testing it mainly by using it correctly and see if it gives the good result rather than by "trying to break it".
By using it in YOUR way (which may possibly be "uncorrect"), you are effectively running tests that maybe escaped the developers, just because they were not thinking of using it like you did.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I've started to use R a little while ago and am not sure how often to update the installed packages (at this time, I'm using mostly ggplot2 and rattle). One one hand it's the typical geek impulse to have the latest version :-) On the other, updates can break functionality and, as an R beginner, I don't want to waste time looking into package incompatibilities and reinstalling libraries, it's almost certain I wouldn't notice any difference with an improved package.
With other applications I have a sense developed from experience on how often to upgrade, how much to wait between the release of an upgrade and installing it and so on. But I'm in the dark with regards to R.
And to be clear: I'm not talking about R itself, but its libraries.
Thanks.
Here is my philosophy: the naïve user never updates. The sophisticated user always updates. The power user updates often, but carefully.
Mindless updating is not always beneficial. Bugs work their way in updated versions of R libraries (or R itself!), and you could break your existing code by updating without reading the change log or commit history. For example, R 2.11 broke lme4 on OS X... it pays to carefully update and run demos of packages between releases. It really sucks to update to a new library or R release and realize something broke when you have a deadline.
Yes it is.
Why exactly would you want to hang on to old bugs and lacking features?
The question is already answered, but I'll offer my 2 cents. In an organization, updating R should be treated like updating gcc or Java: with warnings, with a staging area, a rollback plan, etc. Others' work and results may be affected. [See update #2]
I am more impulsive about updating R packages. As long as you can reproduce the state of your system at any point in time, there's little to worry about. Ensuring that nightly backups occur should be the domain of your sysadmin.
The basic idea is that you should be able to reproduce everything. Actually testing that your earlier results are reproduced is dependent on whether you want to disprove your assumption that there are no bugs or changes that will affect later results. :)
Update 1. As has been mentioned in comments and above, updating in a production environment or any environment where stability is optimal (e.g. bugs are either known or not significant), introducing new bugs, new dependencies, different output, or any variety of other software regressions, should be done quite carefully. Moreover, where you're updating from matters a lot. Updating from R-Forge is more likely to expose you to the newest bugs than from CRAN. Even so, I have found and reported bugs that persisted through 3+ versions of a package on CRAN, as well as other regressions that have just magically appeared. I test a lot, but updating, finding new bugs, and debugging is an effort that I don't always want to (or have time to) undertake.
I am reminded of this question after finding a new bug in a new version of a package that I use a lot. Just to check, I reverted to an earlier version - no more crashes, though tracking down the cause took a couple of hours, because I assumed it was not arising in this package. I'll send a note to the maintainer before long, so others won't have to stumble on the same bug.
Update 2. In an organization, I have to say that the answer is no. In fact, in any case where there may be two or more concurrent instances of R, it is very unwise to blindly update the packages while another may be using them. There are likely to be good methods for hot-swapping packages, I just don't yet know them. Keep in mind that the two instances need only share libraries (i.e. where the packages are stored), and, AFAIK, need not run concurrently on the same machine. Thus, if libraries are placed on shared systems, e.g. over NFS, one may not know where else R is running at the same time, using those libraries. Accidentally killing another R process is not usually a good thing.
Yes, unless you have a good reason not to (see my comment to Dirk)
Although some of the following has been mentioned in previous answers, I think it might be beneficial to make a few things explicit. As a developer, I think that updating packages often (and R-devel for the matter), is a good practice. You definitely want to stick with the latest out there. If your package imports/depends/sugests... interacts with other packages, you want to ensure interoperability on day to day basis, and not face the 'bugs' just before release, when time is short.
On the other hand, some environments will put special emphasis on exact reproducibility. In that case, one may want to adopt a more careful strategy with updating.
But it is worth emphasising that these two behaviours are not exclusive. It is possible to install different versions of R and maintain different libraries, to benefit from a bleeding edge development environment and a more stable one for production.
Hope this helps.
I'd be inclined to respond as often as you need to, and never when you're in a hurry!
Firstly, debate the chances that you're labouring under a bug of which you are unaware. I would moot that is quite rare. If you're suffering under a bug and there's a newer version in which the bug is fixed, plan an upgrade. If you want a new feature, plan an upgrade. If it's your first day back after Christmas and the biggest overhead is trying to remember what you were actually doing last then the overhead of messing about with some new dependency requirements (which may include system components outside of R) is probably relatively small, so consider seeing what updates are available (guess what I did today) ;-)
The golden rule is probably that there isn't a single, recommended schedule other than what makes sense for your use; daily updates will inevitably result in fewer updates each time and thus minimize the pain of the actual update, but it's not worth it if you constantly get different numerical results from one day to the next because of some change to how a function does sampling (different numerical results have plagued Coursera students using caret). Don't underestimate the value of a stable system that allows you to just get on with productive work rather than faffing.