renv::restore() is nail-bitingly slow - r

I am trying out the renv package for the first time.
I took an existing project I manage with packrat, removed the .Rprofile and the packrat dir (I was happy starting from scratch)
I added a local work repo using options, and then ran renv::init(). This discovered what looks like a complete list of dependencies (138 CRAN packages, 10 work packages, and one github installed package)
I then copied that folder to another comupter, changed RENV_PATHS_SOURCE to something globally available on that system and went to the project directory, started R, it told me it was out of sync and asked me to run status. I did, it looked fine, reported a bunch of lirbaries that needed installing. then I ran renv::restore()
It then correctly listed all the dependencies, and then moves on to install them.
And this is really slow.
I sit here waiting, seeing new tarballs being listed as they are fetched, and it takes, at best, a whole minute to fetch each one. but more typically somewhere in the range 1-2 minutes each.
Which is strange, because they are listed as being downloaded in 0.2-0.7 seconds each. Sniffing network interfaces confirm this. There are bursts of packages coming in for brief moments, with each new tarball listed, which seem like they could match the reported times spent.
R cpu usage fluctuates between 0.0% and 1.0% during all this.
So what is renv doing?
So for a not particularly complicated project, that pulls in 180 packages, I'd be waiting 3 hours just to fetch tarballs? That themselves only should take 0.5 seconds each? This will become a problem. Me personally I'm holding of migrating from packrat until this is solved or understood.
But what is renv doing or waiting for? There is no network activity, no cpu usage, no iowait, as far as I can tell (looking at top output).

Related

How to replicate the package check time performed on CRAN?

I've been trying to reduce the check time on a package I am submitting to CRAN. On my local machines, check time is somewhere between a minute (i7 CPU) and 2 minutes (i5 CPU). However, CRAN reviewers keep pointing out the check time is over 10 minutes. The only way I could find to reproduce such long check times is by uploading my package to http://win-builder.r-project.org/, where it indeed takes > 600 s to check.
I wish I could reproduce this check time locally so I am not dependent on a remote solution. The only difference I can see between Win builder and my local machine is the OS (Win vs. Linux) and how Win builder seems to be doing multiarch checks (i386 and x64).
I am not sure how to reproduce this locally. I have tried R CMD check with seemingly-relevant switches like --multiarch and --force-multiarch but it doesn't seem to be doing anything differently. I guess I have to install some extra packages like r-cran-i386 or whatever, but I couldn't find anything of the sort in my repositories ("R" can be such a PITA of a search expression) and the instructions on README files like the one on https://cran.r-project.org/bin/linux/ubuntu/ didn't get me far enough.
I am already using --as-cran, and am aware of solutions like this, though I think installing R i386 on a separate VM containing a 32-bit OS defeats the purpose of what I am trying to accomplish.

How to install R package binaries on Linux just like on Windows?

When I run install.packages("somepkg") on Linux (Ubuntu mostly), the installation process invovled building the R package from source which can be time consuming. Also it can be prone to failure due to missing development related Linux packages.
Is there a way to install compiled binaries like on Windows? I heard that it can be done, but couldn't find an easy to understand resource. Hope by asking here I will make the answer (if it exists) more googlable.
It depends on whether binaries exist. Which, in turn, depends on which Linux distro and version you are running.
For Ubuntu 18.04 (and later, as they are compatible), you can use the Rutter PPAs which cover over four thousand CRAN package. This is described (albeit very briefly) at the top of this README at CRAN.
I also blogged about that (a few times) below my r4/ tag -- and because it didn't really "stick" amplified it again with short video plus slides, see this post. The video runs for about 5 mins during which we install rstan and tidyverse as binaries each with just one command and it takes about a good minute each (depending on bandwidth and disk speed, of course) pulling all dependencies in pre-built and in a fail-safe manner.
If this matches your needs, give it a try and please come to to the r-sig-debian list for questions.
If you are on a different Linux flavor then I unfortunately less sure if a comparable service exists.
Edit on 2020-09-17 As this was just upvoted and I was thus reminded of it, you now have better options and can get Linux binaries via install.packages("pkgname"). One way is RSPM, the other is BSPM. I have a first comparison blog post comparing both here (even with animated gif movies ;-)) and should be able to say more about BSPM "soon".
Edit on 2022-08-03 And going beyond RSPM and BSPM is the newer r2u which has been up for a few months and is currently serving around two thousand binaries a day. It is the best approach for binaries on Ubuntu LTS installations (currently: 20.04 and 22.04). See r2u for more including demos.

Rstudio is painfully slow

Suddenly, Rstudio is painfully slow, and now it is unusable. This means, I open it up and there is a lag of several seconds if I type anything. I have explored all the options I can come up with:
1. re-installing both R and Rstudio (although I am not 100% sure I could remove all components),
2. trying to reset settings.... the obvious things such as clearing the workspace and the console.
The size of my data is negligible. I cannot think of anything else.... any ideas?
The only observation i can make that shows something could be wrong with the configuration is (sometimes), I see "gctorture false" as a value in the environment.
Just a guess, but ?gctorture says
Provokes garbage collection on (nearly) every memory allocation.
Intended to ferret out memory protection bugs. Also makes R run
_very_ slowly, unfortunately.
which sounds about right for your problem! You could try
gctorture(FALSE)
If that speeds things up, then look for somewhere that this might have been set, e.g., in a .Rprofile (current working directory, or your user home directory, or the installation directory of R; see ?.Rprofile) and make sure that you start R without loading any .Rhistory or .RData files (again in the working directory, your home directory, etc.)
I had a RStudio project with Git Version Control and then it became very slow. I solved the problem by removing the Git Version Control

Pinning R package versions

How do you best pin package versions in R?
Rejected strategy 1: Pin to CRAN source tar.gzs
Doesn't work if you want to pin it at the latest version since CRAN does not put the tip version in the archive (duh)
Rejected strategy 2: Use devtools
Don't want to, because it takes ages to compile and adds lots of stuff I don't want to use
Rejected strategy 3: Vendor
Would rather avoid having to copy all source
To provide a little bit more information on packrat, which I use for this purpose. From the website.
R package dependencies can be frustrating. Have you ever had to use
trial-and-error to figure out what R packages you need to install to
make someone else’s code work–and then been left with those packages
globally installed forever, because now you’re not sure whether you
need them? Have you ever updated a package to get code in one of your
projects to work, only to find that the updated package makes code in
another project stop working?
We built packrat to solve these problems. Use packrat to make your R
projects more:
Isolated: Installing a new or updated package for one project won’t
break your other projects, and vice versa. That’s because packrat
gives each project its own private package library. Portable: Easily
transport your projects from one computer to another, even across
different platforms. Packrat makes it easy to install the packages
your project depends on. Reproducible: Packrat records the exact
package versions you depend on, and ensures those exact versions are
the ones that get installed wherever you go.
Packrat stores the version of the packages you use in the packrat.lock file, and then downloads that version from CRAN whenever you packrat::restore(). It is much lighter weight than devtools, but can still take some time to re-download all of the packages (depending on the packages you are using).
If you prefer to store all of the sources in a zip file, you can use packrat::snapshot() to pull down the sources / update the packrat.lock and then packrat::bundle() to "bundle" everything up. The aim for this is to make projects / research reproducible and portable over time by storing the package versions and dependencies used on the original design (along with the source code so that the OS dependency on a binary is avoided).
There is much more information on the website I linked to, and you can see current activity on the git repo. I have encountered a few cases that work in a less-than-ideal way (packages not on CRAN have some issues at times), but the git repo still seems to be pretty active with issues/patches which is encouraging.

RStudio server - Hangs when switching projects

I am currently running RStudio via a server installation that I only partially maintain. I am working with some fairly large data sets and models (> 9 million rows of 611 variables). When I try to switch projects, RStudio hangs when loading the project (it says "Switching projects to..." at the top) or, if it loads, takes forever.
RStudio works otherwise while attempting to switch projects, but menus and the like do not work.
I have searched thoroughly for a fix to no avail. How would I go about troubleshooting (or, ideally, fixing) the problem?
RStudio is running on a linux (Open SuSE) VM.
Thanks in advance.
EDIT: Per this thread: https://stackoverflow.com/a/15373596/3469671, I deleted .rstudio from my home directory and that seemed to free things up. Is there some setting I can change to facilitate the loading of larger projects?
Per this thread, I deleted .rstudio from my home directory and that seemed to free things up.
As a practice, I learned to stop saving my workspace while it was loaded with data and included steps within projects to save and load prepared data.

Resources