Docker in R and/or Packrat for Reproducible Science - r

I am not completely sure if Docker is enough for R development or I should use in in conjunction with Packrat. I have read several posts that state that docker is sufficient. The only place that support this claim is this post. However I was not able to build that example due to errors in the git2r installation.
My overall goal is to have full control of the package versions I use, so my analysis will still work even if the package is later upgraded.

You need both. Think that the docker image is just the final product of your source code, including the Dockerfile and every piece of data used to build the final image.
You should pin the docker (avoid FROM blah:latest) base image to be sure that the underlying libraries and tools will be always the same. Don't use base images such as debian/testing that may change on every run of apt-get install.
If you don`t use packrat when you need to rebuild your image you may get a new piece of code from some library that is not working anymore, for instance, think about a deprecated function you may have used.
And of course version your own code, at least tag it to be able to easily go back in time and start a new build again.
This is the minimum you can do because things like broken Dockerhub or CRAN repositories still can happen. Saving a versioned docker image in a private docker registry is just the final step.

Let's say you use a certain docker image to do your analysis now. If you later start the same docker image, i.e. not just the same name (e.g. rocker/rstudio) or the same version (e.g. rocker/rstudio:3.5.0) but the same image id, you are guaranteed to get the exact same versions of R, R packages and system libraries. This is more than what packrat offers (same R package versions), but requires you to save the docker image.

Related

R and Rstudio Docker vs Binder

My problem is that I can't use R-studio at my work place as the IT does not support it . I want to use R and R-studio that installed on my personnel laptop on my company laptop ( using a modern browser which is behind firewall ) . Some of the options I am thinking of two two things
should I need to build a docker for R and R-studio (I see base images are already available) , I am mostly interested in basic R , Dplyr (haven ,xporter, and Reticulate ) packages .
Should I have to use a binder . I am not technical person and my programming skills are very limited can any one suggest me way .
What exactly are the difference between using Docker option vs Binder ?
I know I can use R-Studio online and get my work done but with the new paid account I am running out of project hours and very slow sometimes . Thanks in advance
Here are some examples beyond the modern RStudio MyBinder example:
https://github.com/fomightez/pythonista_skewedf
https://github.com/fomightez/r_phylogenetics_worshop
https://github.com/fomightez/chapter7/tree/master/binder
The modern RStudio MyBinder example has been set as a template on GitHub so you can use
The first one is for a special use of a package not on conda. And I started that one from square one.
The other two were converted from content by others to aid in making them Binder-ready.
You essentially list everything you need from conda in the environment.yml along with the appropriate channels. If you need special stuff not on conda, you need the other configuration files included there.
Getting everything working can take some iterations on adding things, letting the image get built, and testing your libraries are available. Although you seem to think your situation is not overly complex.
The binder launch badges you see are just images where you modify the URL to point the MyBinder federation site at your repository. Look at the URL and you should see the pattern where you put studio at the end of the URL pointing at your repo. The form at MyBinder.org site can help with this; however, most often it is easier to just adapt a working launch badge's code copied from elsewhere. The form isn't set up at this time for making the URLs for launching to RStudio.
Download anything useful your create in a running session. The sessions timeout after 10 minutes, although RStudio usually keeps them active.
Lack of Persistence and limited memory, storage, & power can be drawbacks. The inherent reproducibility and portability are advantages.
MyBinder.org doesn't work with private repos. If you have code you don't want to share, you can upload it to the temporary session, using the repo for specifying the environment. You could host a private binderhub that does allow the use of private git repositories; however, that is probably overkill for your use case and exceed your ability level at this time.
GitHub isn't the only place to host repositories that can be pointed at the MyBinder system. If you go to the MyBinder.org page and click where it says 'GitHub' on the left side of the top line of the form, you can see a list of the sources at which you can host a repository and point the system to build an image and launch a container with that specified image.
Building the image from a source repository takes some minutes the first time. Once the image is built though on the service, launch is typically less than 30 seconds. Each time you make a change on the source repo, a build is necessary. Some changes don't cause the new build to be as long as the initial one as some optimizing is done to only build what is necessary after a change. Keep in mind there are several members of the federation around the workd and if traffic on the internet gets sent to where the built image isn't yet available, it will be built from scratch again first.
The Holepunch project is out there to offer some help for users working in the R ecosystem; however, with the R-Conda system that is now integrated into MyBinder it is pretty much as easy to do it the way I described. Last I knew, the Holepunch route makes a Dockerfile that isn't as easy to troubleshoot as using the current the R-Conda system route. Dockerfiles are essentially a last ditch configuration file that MyBinder can handle. The reason being the other configuration files are much easier and don't require knowing Dockerfile syntax. MyBinder aims to offer the ability to take advantage of Docker offering containers with a specified environment without users needing to know anything about Docker.
There is a Binder Help category for posting to get help at the Jupyter Discourse Forum. Some other examples of posts already there may help you troubleshoot.
Notice of a common pitfall
Most of the the configuration files for making a repository Binder-ready are simply text and can be edited right in the GitHub browser interface, without need to git or even cloning the repo locally.
Last I knew, there are two exceptions to this. The postBuild and start configuration files have settings that allow them to be run as scripts and these get altered in a way they no longer work if you edit them via the GitHub browser interface. (This was my experience when last I tried. Your mileage may vary or things may have changed now.) To edit those, you have to have git available on a system you have and pull one from some other source. Then edit that on your machine that has git working & add it your repo and push it back up from your local computer.
(If this is a problem, you can post in the Jupyter Discourse Forum Binder help category and you and I could coordinate where I fork and edit those files in your repo to your specifications and then make a pull request to update your source of the fork with those changes.)
If you are using Jupyter notebooks extensively then it may make sense to use Binder
But if you simply want to use R and Rstudio, then all you need is docker. A good resource is
https://github.com/rocker-org/rocker

How to make sure the user of a shiny app is using the right package versions in R

Due to recent experience with several bugs created by updating packages, I wonder what the best approach is for the following problem:
I currently provide a stand alone version so to say of my shiny App (just the script files to run it locally) and run a long list of require() functions to load / install the needed packages. However, in the end I would like to use fixed package versions to avoid bugs created by changes in packages.
Is there a way to ensure that the user, who may have older or newer versions of packages on their computer, is using the right version of all the packages my app needs?
You can consider using packrat: https://rstudio.github.io/packrat/.
Unfortunately, private libraries don’t travel well; like all R
libraries, their contents are compiled for your specific machine
architecture, operating system, and R version. Packrat lets you
snapshot the state of your private library, which saves to your
project directory whatever information packrat needs to be able to
recreate that same private library on another machine.
Short tutorial:
RStudio - File - New Project - New Directory - New Project - "Do: use Path" - Create Project
Enter in the R(Studio) console:
Code:
packrat::init()
.libPaths() # test if libpath has changed
install.packages("reshape2") # installs within one of the packrat libpaths
Installing package into ‘C:/R/packRatTest/packrat/lib/x86_64-w64-mingw32/3.4.3’
Assumption would be that you can use and share RStudio Projects, but i think it would be hard to work without them anyway ;).
Try writing your shiny app as a package. You can, somewhat, control that through the description file.
Since you said you're using script take a look at: https://github.com/chasemc/electricShine
Even of you don't use it, hopefully looking at the code will help for things like setting the download repo to be a specific MRAN date.

Using RStudio to Make Pull Requests in Git

My enterprise has a Git repository. To make changes, I have to make changes in my forked repository and then make a pull request.
I primarily use RStudio, so I have enabled its integration with Git. I can make changes to my forked repository and then push, pull, sync, etc. The problem is that I still have an additional step of logging into GitHub and making a pull request for my forked repository. Is there a way of doing this from RStudio?
I too use RStudio for R development and I do not believe there is a way to do this. The reason is because this is more than just adding code to a branch, you're requesting a management feature to take place which is pulling part of your code into another branch of the code base. RStudio appears to be limited to pulling, syncing and committing. Likely you need to use a separate, more full featured GitHub client.
This could be done via the GitHub API, which could be executed from an R package using the httr or curl package, after which such a package could have an addin for RStudio, which would let you check everything using a nice Shiny app!
Now we only need to look for someone who wants to develop this… Can’t seem to find it (Jan 2022).

Is there a quick way to debug an external meteor package?

You just installed a meteor package, and for some reason it isn't working. You suspect that it's the package itself that has a bug. You want to investigate that. How do you do that?
Optimally, you'd be able to run a command that forks the original package repository with the right version and replaces the original in your meteor application, ready for you to debug it and, once fixed, possibly generate a pull request.
I don't expect something like this to exist as a single command, but is there a workflow that you follow to do exactly that? Or do you approach the problem in a different way?
Do a git clone of the package into your local packages folder. Fix any bugs you need to. Commit them. And make a pull request. Once the pull request is accepted, you can remove the local package and use the regular package.
From when I've asked in the past, there isn't really an easier way to do this it seems. But to be honest, this approach isn't too much work.
Also, if you just want to debug, you can step through the package code while it's running without cloning the repo locally. (Assuming it's running in development mode and hasn't been minified by Meteor).

Create R Windows Binary from .tar.gz linux

This is sort of related to a previous post of mine. I have the need to use the bigmemory library on my 32bit Windows PC to do some ugly matrix calculations. Unfortunately, it appears that the maintainers have temporarily ceased production of Windows binaries. I have Ubuntu on my home PC. I would really like to take the .tar.gz file and build it into a Windows binary that I can actually run at work. I realize there are more efficient ways, like installing RTools on the Windows device. However, our IT keeps our admin rights on lockdown, so I can never edit my PATH enviro variable. Could anyone provide some general guidance for doing this? Are there any tools I need to install on my Ubuntu PC above and beyond R?
I found similar questions, but nothing that thoroughly answered my questions.
Unless the package source is incompatible with current versions of R, you could use the R project's win-builder site to build a Windows binary. Quoting from the linked site, win-builder is a service:
intended for useRs who do not have Windows available for checking and building Windows binary packages.
As a convenience, Hadley Wickham's devtools package includes a utility function, build_win(), that you can use for this purpose. From ?build_win:
Works by building source package, and then uploading to http://win-builder.r-project.org/>. Once building is complete you'll receive a link to the built package in the email address listed in the maintainer field. It usually takes around 30 minutes.
Windows has four sets of environment variables (system, user, volatile and process sets). The first three sets are stored in the registry but the process set is not so even if they have locked down the registry its typically still possible to set the process environment variables (including the PATH) in a local process, i.e. on a temporary basis, so you might double check your assumptions that you can't modify anything. Its more likely that you can't modify the system variables and registry but can still modify the set in your local process. To check this from the Windows cmd line enter this:
set mytest=123
set mytest
and if the second line shows that mytest has the value 123 then you likely have all the permissions you need.
Furthermore anything you need to set is all handled automatically for you by R.bat in the batchfiles distribution so you don't have to set anything yourself.
Just ensure that Rtools and R are installed into the standard locations (you can tell them to skip the setting of any registry keys during the installation process), ensure R.bat is on your path or in current directory and run:
R.bat CMD INSTALL mypackage.tar.gz
without setting environment variables, registry keys or path.
If that does not work try Rpathset.bat also from the batchfiles which is not automatic like R.bat but on the other hand is extremely flexible since you must modify the SET statments in it to whatever you want.
There is a PDF document that comes with the batchfiles which gives more info.

Resources