Alternatives to the packrat package - package reproducibility - r

Packrat is a neat tool in theory, but for years it has been plagued by huge hang times upon starting RStudio, and the devs don't seem to be able to fix the issue. It's become unsustainable in my project. Does anybody have any good alternatives to packrat? Google searches did not turn up anything useful, so any help would be greatly appreciated.

I'll assume you're using Packrat for reproducibility, rather than version control.
Start with the CRAN task view for reproducible research , specifically the section on Package Reproducibility. You'll find it suggests checkpoint, rbundler and packrat.
Another approach is to move from Base R to Microsoft R open. It has reproducibility built in.
Side Note: As an example use case of reproducibility, let's assume you've written some R code with packages. Then you share your research. But the package owner makes a change between the time you did the research and the time someone else is trying to reproduce your research. The change made by the package owner breaks your research. In order for someone to reproduce your research, they need to use your code WITH THE ORIGINAL PACKAGE - not the new package.

{renv} is developed by the RStudio folks and aims to solve at least some of the problems that packrat had: https://blog.rstudio.com/2019/11/06/renv-project-environments-for-r/
"The goal then is for renv to be a robust, stable replacement for the Packrat package, with fewer surprises and better default behaviors."

Related

Advice needed for R package security in production

I am working as a Data Scientist for a small start up and we are using R as part of our platform for analysis, dashboards etc. Therefore, I need to ensure that we maintain security with each package we use and load.
I have looked around and done extensive searching and have come across the following links:
This is the official R Studio Blog Security update page.
This blog post shows how you can implement rJava to help with those packages that require it, though it does state that '...the integrity & safety of the R package ecosystem is still in the “trust me, everything’s 👍!!”'
This post gives some good advice for package security, but basically boils down to: if you get it from CRAN or another trusted source then it should be ok.
The CVE site lists vulnerabilities, though the last one was back in 2017.
However, all the above links essentially say the same thing, which is "if its from CRAN (or similar), then it is probably fine". Now this might indeed be the case, but I was hoping for something a bit more rigorous. Has anyone else come across this issue with production R deployment?
If possible, if someone could direct to where I might be able to find out more information on checking for security updates, breaches and changes for R packages, or how to go about testing the security myself, I would be very grateful.
Thanks!

Installing, Loading Packages & Verifying these actions. Path, Lib, Dependencies, etc

This is one of the more comprehensive threads and discussions on these topics I have located to date.
How to find out which package version is loaded in R?
Nonetheless, I am finding this has not provided me with sufficient information to ensure I have installed and loaded the two packages I must have before I can begin to expect R to function properly. These packages are: Rserve (1.8-0) and MASS (7.3-45).
It seems answers to these topics can be application dependent or, perhaps purpose driven is a better phrase to use. Unfortunately, the R documentation will confuse you so, I thought it best to solicit the input of more experienced users.
I am working on a personal laptop with R 3.5.1 and Win7 Pro. It is clear from the r documentation that Windows is not the best or preferred environment for R.
despite a lot of work something remains missing and I have been unable to identify what it is.

Using feature hashing/hashing tricks for machine learning in R

I just learned about feature hashing (also known as the hashing trick) and that some see it as an important feature for efficiently doing machine learning on large data sets.
However, I haven't seen anything like this being used for machine learning with R.
A Google search revealed that there is indeed a package hash on CRAN.
Could someone provide an example where this is used in R to speed up a machine learning task (or just to reduce RAM usage)?
I submit a package named FeatureHashing recently. Please check the github page for demo: https://github.com/wush978/FeatureHashing and let me know if you have any issue of using it.

Centralizing libraries in julia

I've long thought about learing julia - a language I secretly hope will become the new standard for scientific computing - and when it is now packaged and included in the standard Ubuntu repositories, I figured it was time. I quickly found this tutorial and started hacking...
In the linked chapter, one is urged to download a library called ols.jl from a Github repository, place it in the local directory and start using it. I feel there must be a better way of doing this.
For example, it would be logical to have some "default"-directory in which julia can always look for library files. That folder could reside under my home directory, or (perhaps even better) somewhere under e.g. /usr/share/lib on an Ubuntu system.
Also, downloading the libraries directly seems to me like something I should be able to avoid. Isn't it possible to find libraries like these in some sort of packaging system (be it via Ubuntu's apt-get or something else)?
I do realize that many of these questions and concerns may be just because julia is a young language, that most of these features are missing because of this, and that there are plans (or at least wishes) to go in this direction in the future. However, it would be nice to know if I'm just missing something obvious =)
That tutorial on Forio is ancient. There's a newer, much better package system as of version 0.1 of Julia. See the documentation here: http://docs.julialang.org/en/release-0.1/manual/packages/

Are there features of R that are system-dependent?

My co-workers would like to make sure that our work in R is platform-independent, specifically that code will run on Linux, Mac, and Windows, and that files created on one system will work on other systems.
Since the issue has come up before in my group, I would appreciate a general answer that will make it easier for me to confidently assure my collaborators that there will not be an issue. E.g., it would help to have a reference other than "because (subject matter expert) said so on SO".
Generally, is there a way to know if any features of R are platform-specific (can I assume that this would be stated in a function's help)?
Are there packages or functions that I can be confident will be platform-independent?
Are there types of packages or functions that I should be wary of?
I have previously asked two questions about the cross-platform readability of files created by R: What are the disadvantages of using .Rdata files compared to HDF5 or netCDF? and Are R objects dumped using `dump` readable cross-platform?
Besides Carl's answer, the obvious way to ensure that your work in platform-independent is to test on all platforms.
Which is precisely what CRAN does with its 3800+ packages, and you have access to logs here.
In short, R really tries hard to be platform-independent, and mostly succeeds. To do so with your code, it is up to you to avoid APIs or tools which introduce dependencies. Look at abstractions like system.file(package="boot") and the functions they use---you can easily abstract file-system "roots", and separators are already taken care of.
Check cran.r-project.org for package listings. Every package has a page which will tell you if it's passed testing for different operating systems. Further, as you suggested, the help files are pretty explicit about OS dependencies.
R is "smart" enough to translate "/" to "\" in pathnames for those poor folks working in Windows.
Generally speaking, graphics access is the area most likely to have platform dependencies. Obviously if you system lacks {X11, ImageMagick, ..} you're stuck anyway.
Besides Carl's and Dirk's comments, you should understand that any package that requires compilation from source (as do many (all?) packages that are on Omegahat, Rforge or r-forge) will need to be done on a machine that has the proper C and Fortran libraries. Some interesting packages depend on GTK+ and Tcl/Tk, and there may be a need to make sure you can get the right versions. The http://r.research.att.com/ page that Simon Urbanek maintains is a useful resource for keeping up with supporting resources for Macs.

Resources