Is it good practice to update R packages often? [closed]

Is it good practice to update R packages often? [closed] - r

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I've started to use R a little while ago and am not sure how often to update the installed packages (at this time, I'm using mostly ggplot2 and rattle). One one hand it's the typical geek impulse to have the latest version :-) On the other, updates can break functionality and, as an R beginner, I don't want to waste time looking into package incompatibilities and reinstalling libraries, it's almost certain I wouldn't notice any difference with an improved package.
With other applications I have a sense developed from experience on how often to upgrade, how much to wait between the release of an upgrade and installing it and so on. But I'm in the dark with regards to R.
And to be clear: I'm not talking about R itself, but its libraries.
Thanks.

Here is my philosophy: the naïve user never updates. The sophisticated user always updates. The power user updates often, but carefully.
Mindless updating is not always beneficial. Bugs work their way in updated versions of R libraries (or R itself!), and you could break your existing code by updating without reading the change log or commit history. For example, R 2.11 broke lme4 on OS X... it pays to carefully update and run demos of packages between releases. It really sucks to update to a new library or R release and realize something broke when you have a deadline.

Yes it is.
Why exactly would you want to hang on to old bugs and lacking features?

The question is already answered, but I'll offer my 2 cents. In an organization, updating R should be treated like updating gcc or Java: with warnings, with a staging area, a rollback plan, etc. Others' work and results may be affected. [See update #2]
I am more impulsive about updating R packages. As long as you can reproduce the state of your system at any point in time, there's little to worry about. Ensuring that nightly backups occur should be the domain of your sysadmin.
The basic idea is that you should be able to reproduce everything. Actually testing that your earlier results are reproduced is dependent on whether you want to disprove your assumption that there are no bugs or changes that will affect later results. :)
Update 1. As has been mentioned in comments and above, updating in a production environment or any environment where stability is optimal (e.g. bugs are either known or not significant), introducing new bugs, new dependencies, different output, or any variety of other software regressions, should be done quite carefully. Moreover, where you're updating from matters a lot. Updating from R-Forge is more likely to expose you to the newest bugs than from CRAN. Even so, I have found and reported bugs that persisted through 3+ versions of a package on CRAN, as well as other regressions that have just magically appeared. I test a lot, but updating, finding new bugs, and debugging is an effort that I don't always want to (or have time to) undertake.
I am reminded of this question after finding a new bug in a new version of a package that I use a lot. Just to check, I reverted to an earlier version - no more crashes, though tracking down the cause took a couple of hours, because I assumed it was not arising in this package. I'll send a note to the maintainer before long, so others won't have to stumble on the same bug.
Update 2. In an organization, I have to say that the answer is no. In fact, in any case where there may be two or more concurrent instances of R, it is very unwise to blindly update the packages while another may be using them. There are likely to be good methods for hot-swapping packages, I just don't yet know them. Keep in mind that the two instances need only share libraries (i.e. where the packages are stored), and, AFAIK, need not run concurrently on the same machine. Thus, if libraries are placed on shared systems, e.g. over NFS, one may not know where else R is running at the same time, using those libraries. Accidentally killing another R process is not usually a good thing.

Yes, unless you have a good reason not to (see my comment to Dirk)

Although some of the following has been mentioned in previous answers, I think it might be beneficial to make a few things explicit. As a developer, I think that updating packages often (and R-devel for the matter), is a good practice. You definitely want to stick with the latest out there. If your package imports/depends/sugests... interacts with other packages, you want to ensure interoperability on day to day basis, and not face the 'bugs' just before release, when time is short.
On the other hand, some environments will put special emphasis on exact reproducibility. In that case, one may want to adopt a more careful strategy with updating.
But it is worth emphasising that these two behaviours are not exclusive. It is possible to install different versions of R and maintain different libraries, to benefit from a bleeding edge development environment and a more stable one for production.
Hope this helps.

I'd be inclined to respond as often as you need to, and never when you're in a hurry!
Firstly, debate the chances that you're labouring under a bug of which you are unaware. I would moot that is quite rare. If you're suffering under a bug and there's a newer version in which the bug is fixed, plan an upgrade. If you want a new feature, plan an upgrade. If it's your first day back after Christmas and the biggest overhead is trying to remember what you were actually doing last then the overhead of messing about with some new dependency requirements (which may include system components outside of R) is probably relatively small, so consider seeing what updates are available (guess what I did today) ;-)
The golden rule is probably that there isn't a single, recommended schedule other than what makes sense for your use; daily updates will inevitably result in fewer updates each time and thus minimize the pain of the actual update, but it's not worth it if you constantly get different numerical results from one day to the next because of some change to how a function does sampling (different numerical results have plagued Coursera students using caret). Don't underestimate the value of a stable system that allows you to just get on with productive work rather than faffing.

Related

What is the Purpose of Deprecated Code?

I was changing out some PHP code the other day because it was deprecated, and no longer worked. I understand the meaning of deprecated code based on an answer I found here: https://stackoverflow.com/a/8111799/1810777
But several question came to mind:
I was wondering what is the purpose of deprecating code?
Why not just leave it in use, instead of recommending others to use
new alternatives?
Does it slow stuff down?
I couldn't find anywhere else online that talked about it. I'm just really wondering why code that used to work well, isn't useful anymore. Thanks!

It means that in a future release it's going to be removed.
This allows an API developer to give people time to migrate to the new version / method of doing whatever rather than just pulling the rug out from under them. Both the new and the old versions are available for a limited time.
As for why not leave it there forever ... because there's a new, better way to do it. You can't support legacy code forever (if you value your sanity and your budget). All support has a cost (be that tech support hours, bug fixes, regression testing, etc)

Why and How to effectively test beta distributions of R as a normal user?

This question is inspired by the remark of Duncan Murdoch on the r-devel mailing list in response to a bug report about Sweave :
This is fixed in R-patched. (It would
have been fixed in 2.12.0 if more
people tested the betas...).
Honestly, I've stayed away from beta -aka development- versions for a number of reasons, and these are reasons I hear from more people :
I am a bit horrified it would
somehow cause conflicts with my
current R distribution. As I need it
for work, having to repair it regularly would be a loss of
time I can't explain to my boss
I wouldn't have a clue how to test
efficiently. I reckon every test I
could come up with has already been
run by the development team.
I still find it difficult to figure
out when something is a bug, and
when (most often) it is my own
stupidity kicking in.
But as I understood, it would be a valuable contribution to the R community, and I'm willing to do my bit of the testing as well if I can fit it somehow into my own work. I was thinking of keeping the beta on the side and running my scripts through it as well as a checkup. Saving the constructed objects allows a quick and easy all.equal() to see if something is wrong.
Anybody some more/better ideas on how I could help testing with a minimum amount of effort and a maximum amount of efficiency?
I'd also like to promote this a bit more on our department as well. Apart from the "It's time to give back to the community", any other good reasons why testing betas is worth the effort? How can I counter the arguments given above?
Edit:
As Dirk Eddelbuettel pointed out in the comments, part of the deal is preventing the path variables in Windows. I have some ideas on that, but pointers on how to practically organize your computer for testing R-devel versions are greatly appreciated as well.

I fear you misunderstand. This may not be straightforward or obvious at first so maybe this helps:
"patched" is not "beta". Patched is what R 2.12.1 will be.
There is no conflict. It drops in for 2.12.0.
It is a separate download, and a nightly build available from here.
This is not r-devel but r-patched.
It is our duty as users to test pre-releases as well. So if anything, in an ideal word you would have R-patched installed --- as well as R-devel!
Testing can be as easy as installing another version, keeping it outside your path and then adjusting PATH and R_HOME dynamicaly from a script. Testing means running it on your code and data to prevent you from getting bitten by bugs once the new code is released.

I wouldn't have a clue how to test efficiently. I reckon every test I could come up with has already been run by the development team.
I still find it difficult to figure out when something is a bug, and when (most often) it is my own stupidity kicking in.
The problem is, software is not (or not only) going to be used by developers. It is going to be used by people that may not have programming knowledge at all (I'm speaking generally, this is valid for R as well as for any other software).
If the help or the interface or the general way the software is built do not give you enough informations on how to do something, well, that is maybe not a bug, but it is something that can be improved (and pointed out to the devs).
Also, remember that the developers wrote the software. They know how to use it and often they will be biased in testing it mainly by using it correctly and see if it gives the good result rather than by "trying to break it".
By using it in YOUR way (which may possibly be "uncorrect"), you are effectively running tests that maybe escaped the developers, just because they were not thinking of using it like you did.

How can I contribute to base R in small ways?

Occasionally I see small ways I could improve either R (recently the IQR command) and R documentation (just this week perhaps elaborating differences among and better interconnecting aggregate, tapply, and by). But I don't see a way to really make that contribution back. I looked into the developer site and it seems that my options are either to attempt to become a full fledged developer or create packages, neither of which fit what I wish to accomplish.
I did propose IQR changes on the R mailing list but got no response so I figure that's going nowhere.
And to clarify, I'm talking about base-R. Additional packages are another matter.
Any tips?

Send (or CC) to r-devel. Traffic is quite high on r-help, and things can be overlooked there.
File a bug under the wishlist category detailing the improvement you would like to see.
Having filed the bug, try to provide a patch against the R code and or documentation as appropriate. I've done this before where there was a problem or infelicity in R, supplied a patch and a fix to the help files/manual and had the changes accepted (after suitable modification) by R Core.
If it is an addition to the R code base, you are going to have to show that there is a real pressing need for the addition. Basically you are asking R Core to maintain your code in perpetuity, and they are unlikely to do that unless you can demonstrate a need.
If it is an addition, look for a popular R package that does similar/related things and suggest to the package maintainer that they include your function. That way you don't need to start a whole package for something simple but contribute your code. There are several, popular, *misc packages on CRAN for example.
If you want to contribute fixes to the R documentation and/or manuals, provide patches to the sources. You can find the sources at svn.r-project.org/R
Hopefully that gives you some ideas. Patches and code always help!

How about patches to existing packages?
How about open bug reports on packages? R-Forge projects don't seem to use the issue trackers much, but some folks on the RPostgreSQL team I'm on enabled it (where it is hosted on Google Code), and it has been helpful -- see here. And we had a really useful inflow of fresh blood with a rocking new developer from Japan, probably in part because of the visibility of the project there.
In essence, try to find a project / group / team to become acquainted with and join. In that sense, this is just like any other Open Source project. The r-devel list (gmane view) is a good place for R development in general.
The R Core team, on the other hand, is a little more closed and per invitation only and unlikely to change. So be it, for better or worse. It has worked so far, and hence I am not among those who bemoan this loudly.

Upgrading from Drupal 6 to Drupal 7: best programmer's practices?

Although I am using drupal since the D4 series, I only started developing professionally for it with D6, so - despite I did various site upgrades - I was never faced by the task of having to port my own code to a new version.
I know the Drupal community will come up with lot of technical support about changed API's and architectural changes (see the deadwood module for D5-D6 or even these stubs of D6-D7 how-to's for modules and themes).
However what I am looking for with my question is more in the line of strategy thinking, or in other words, I am looking for inputs and advice on how to plan / implement / review the process of porting my own code, in the light of what colleague developers learned by previous experience. Some example:
Would you advice to begin to port my modules as soon as I have time for doing it, and to maintain a concurrent D7 for some time (so I am "prepared" for the D-day) or would you advice to rather wait for the day in which the port will be actually imminent and then upgrade the modules to D7 and drop the D6 version?
Only some of my modules have full test coverage. Would you advice to complete test coverage for the D6 version so to have all tests working to check the D7 port, or would you advice to write my test directing at porting time, to test the D7 version?
Did you find that being an early adopter gives you an edge in terms of new features and better API's or did you rather find that is more convenient to delay the conversion so as to leverage the larger amount of readily available contrib modules?
Did you set for yourself quality standards / evaluation criteria or did you just set the bar to "if it works, I'm happy"? Why? If you set certain standards or goals, what did they where / what will they be? How did they help you?
Are there common pitfalls that you experienced in the past and that you think are applicable to the D6-D7 porting process?
Is porting a good moment to do some refactoring or it is just going to make everything more complex to be put back together?
...
These questions are not an exhaustive list, but I hope they give an idea of what kind of information I am looking for. I would rather say: whatever you think is relevant and I did not list above gets a "plus"! :)
If I did not manage to express myself clearly enough, please post a comment with the info you think I should add in the question. Thank you in advance for your time!
PS: Yes I know... D7 is not yet out and it will take months before important contrib modules will be upgraded... but it's never too early to start thinking! :)

Good questions, so let's see:
(when to start porting)
This certainly depends on the complexity of the modules to port. If there are really complex/large ones, it might be useful to start early in order to find tricky spots while not being under pressure. For smaller/standard ones, I'd try to find a bigger time slot later on where I can port many of them in a row in order to get the routine stuff memorized quickly (and benefit from the probably improved documentation).
(test coverage)
Normally I'd say that having a good test coverage before starting refactoring/porting would certainly be advisable. But given that Drupal-7 introduces a major change concerning the testing framework by moving it to core, I'd expect the need to rewrite a significant amount of tests anyway. So if there is no need to maintain the Drupal-6 versions after the migration, I'd save the time/trouble and aim for increased coverage after the porting.
(early adopter vs. wait and see)
Using Drupal since the 4.7 version, we have always waited for at least the first official release of a new major version before even thinking about porting. With Drupal 6, we waited for the views module before porting our first site, and we still have some smaller projects on Drupal-5, as they are working just fine and it would be hard to justify the extra bill for our clients as long as there are still maintenance/security fixes for it. There is just so much time in a day and there is always this backlog of bugs to fix, features to add, etc., so no use playing with unfinished technology while there are more imminent things to do that would immediately benefit our clients. Now this would certainly be different if we'd have to maintain one or more 'official' contributed modules, as offering an early port would be a good thing.
I'm a bit in a bind here - being an early adopter certainly benefits the community, as someone has to find that bugs before they can get fixed, but on the other hand, it makes little business sense to fight hour after hour with bugs others might have found/fixed if you'd just waited a bit longer. As long as I have to do this for a living, I need to watch my resources, trying to strike an acceptable balance between serving the community and benefiting from it :-/
(quality standards)
"If it works, I'm happy" just doesn't cut it, as I don't want to be happy momentarily only, but tomorrow as well. So one of my quality standards is that I need to be (somewhat) certain that I 'grokked' new concepts well enough in order to not just makes things work, but make them work like they should. Now this is hard to define more precisely, as it is obviously impossible to know if one 'got it' before 'getting it', so it boils down to a gut feeling/distinction of 'yeah, it kinda works' vs. 'yup, that looks right', and one has to accept that he will quite regularly be wrong about this.
That said, one particular point I'm looking out for is 'intervene as early as possible'. As a beginner, I often tweaked stuff 'after the fact' during the theming stage, while it would have been much easier to apply the 'fix' earlier in the processing chain by means of one hook or the other. So right now, whenever I'm about to 'adjust' something in the theme layer, I deliberately take a small time out to check if this can not be done more cleanly/compatible within a hook earlier on. As I expect Drupal-7 to add even more hooking options, this is something I will pay extra attention to, as it usually reduces conflicts and sudden 'breaking of stuff' when adding new modules.
(common pitfalls)
Well - mainly porting to early, finding out afterwards/in between that one or more needed modules were not available for the new version at all, or only in dev/alpha/early beta state. So I'd make sure to compile a complete list of used/needed modules first, listing their porting state, along with a quick inspection of their issue queues.
Besides that, I have so far always been very pleased with the new versions and their improvements, and I'm looking forward for Drupal-7 again.
(refactoring while porting)
One could say that porting is a rather large refactoring in itself, so there is no need to add to the complexity by restructuring non porting related stuff. On the other hand, if you already have to shred your modules to pieces anyway, why not use the opportunity to make it a major overhaul? I'd try to draw a line based on complexity - for big/complex modules, I'd do the port as straight as possible, and refactor more later on, if need be. For smaller modules, it shouldn't really matter, as the likelihood of introducing subtle bugs should be rather small.
(other stuff)
... need to think about it ...
Ok, other stuff:
Resource needs - given some of the Drupal-7 threads, it looks like they are likely to go up, so this should be evaluated before porting smaller sites that sit on a shared/restricted hosting account.
Latest versions first - This one is rather obvious and always stressed in the upgrade guides, but nevertheless worth mentioning: Upgrade core and all modules to their latest current version first before doing a major upgrade, as the upgrade code is highly likely to depend on the latest table/data structures to work correctly. Given Drupals 'piecemeal', one step at a time update strategy, it would be very hard to implement upgrade code that would detect different pre-upgrade states and acted accordingly.

How to release a subset of deliverables?

Further to my question at accidentally-released-code-to-live-how-to-prevent-happening-again. After client UAT we often have the client saying they are happy for a subset of features to be released while others they want in a future release instead.
My question is "How do you release 2/3 (two out of 3) of your features".
I'd be interested in how the big players such as Microsoft handle situations like..
"Right we're only going to release 45/50 of the initially proposed features/fixes in the next version of Word, package it and ship it out".
Assuming those 5 features not being released in the next release have been started.. how can you ignore them in the release build & deployment?
How do you release 2/3 of your developed features?
How to release a subset of deliverables?
-- Lee

If you haven't thought about this in advance, it's pretty hard to do.
But in the future, here's how you could set yourself up to do this:
Get a real version control system, with very good support for both branching and merging. Historically, this has meant something like git or Mercurial, because Subversion's merge support has been very weak. (The Subversion team has recently been working improving their merge support, however.) On the Windows side, I don't know what VC tools are best for something like this.
Decide how to organize work on individual features. One approach is to keep each feature on its own branch, and only merge it back to the main branch when the new feature is ready. The goal here is to keep the main branch almost shippable at all times. This is easiest when the feature branches don't sit around collecting dust—perhaps each programmer could work on only 1 or 2 features at a time, and merge them as soon as they're ready?
Alternatively, you can try to cherry-pick individual patches out of your version control history. This is tedious and error-prone, but it may be possible for certain very disciplined development groups who write very clean patches that make exactly 1 complete change. You see this type of patch in the Linux kernel community. Try looking at some patches on the Linux 2.6 gitweb to see what this style of development looks like.
If you have trouble keeping your trunk "almost shippable" at all times, you might want to read a book on agile programming, such as Extreme Programming Explained. All the branching and merging in the world will be useless if your new code tends to be very buggy and require long periods of testing to find basic logic errors.
Updates
How do feature branches work with continuous integration? In general, I tend to build feature branches after each check-in, just like the main branch, and I expect developers to commit more-or-less daily. But more importantly, I try to merge feature branches back to the main branch very aggressively—a 2-week-old feature branch would make me very, very nervous, because it means somebody is living off in their own little world.
What if the client only wants some of the already working features? This would worry me a bit, and I would want to ask them why the client only wants some of the features. Are they nervous about the quality of the code? Are we building the right features? If we're working on features that the client really wants, and if our main branch is always solid, then the client should be eager to get everything we've implemented. So in this case, I would first look hard for underlying problems with our process and try to fix them.
However, if there were some special once-in-a-blue-moon reason for this request, I would basically create a new trunk, re-merge some branches, and cherry-pick other patches. Or disable some of the UI, as the other posters have suggested. But I wouldn't make a habit of it.

This reminds me a lot of an interview question I was asked at Borland when I was applying for a program manager position. There the question was phrased differently — there's a major bug in one feature that can't be fixed before a fixed release date — but I think the same approach can work: remove the UI elements for the features for a future release.
Of course this assume that there's no effect of the features you want to leave out with the rest of what you want to ship... but if that's the case just changing the UI is easier than trying to make a more drastic change.
In practice what I think you would do would be to branch the code for release and then make the UI removals on that branch.

Its usually a function of version control, but doing something like that can be quite complicated depending on the size of the project and how many changesets/revisions you have to classify as being desired or not desired.
A different but fairly successful strategy that I've employed in the past is making the features themselves configurable and just deploying them as disabled for unfinished features or for clients who don't want to use certain features yet.
This approach has several advantages in that you don't have to juggle what features/fixes have been merged and have not been merged, and depending on how you implement the configuration and if the feature was finished at the time of deployment, the client can change their mind and not have to wait until a new release to take advantage of the additional functionality.

That's easy, been there done that:
Create a 2/3 release branch off your current mainline.
In the 2/3 release branch, delete unwanted features.
Stabilize the 2/3 release branch.
Name the version My Product 2.1 2/3.
Release from the 2/3 release branch.
Return to the development in the mainline.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex