R/Rstudio slowly taking up hard drive storage - r

I have written a fairly extensive function but I still need to debug it. As I've been running the function several times, the available hard drive space is slowly decreasing from my system drive. Just over the last day or two it's taken up 10-20 GB of space on the hard drive (with nothing to my knowledge being downloaded in the background).
Since I've found this problem I've moved my script into an R project on another drive, but the system drive continues to fill up. The code generates a few tables and graphs, with all the results being stored in a list variable from which I then display the graphs. I clean the environment/plot windows every time I run the function.
I've checked all the R installation folders but they all look roughly the right size/not too big. Is there anywhere else on a default installation that R could be storing files that is causing this issue?
Similar question asked here 3 years ago with no solution (haven't tried reinstalling R or Rstudio yet though).
Windows x64 and R x64 installation.

Related

Rstudio potential memory leak / background activity?

I’m having a lot of trouble working with Rstudio on a new PC. I could not find a solution searching the web.
When Rstudio is on, it is constantly eating up memory until it becomes unworkable. If I work on an existing project, it takes half an hour to an hour to become impossible to work with. If I start a new project without loading any objects or packages, just writing scripts without even running them, it takes longer to reach that point, however, it still does.
When I first start the program, the Task Manager shows memory usage of 950-1000 MB already (sometimes larger), and as I work, it climbs up to 6000 MB at which point it is impossible to work with as every activity is delayed and 'stuck'. Just to compare, on my old PC while working on the program, the Task Manager shows 100-150 MB. When I click the "Memory Usage Report" within Rstudio, the "used by session" is very small, the "used by system" is almost at a maximum yet Rstudio is the only thing taking up they system memory on the PC.
Things I tried: installing older versions of both R and Rstudio, pausing my anti-virus program, changing compatibility mode, zoom on "100%". It feels like Rstudio is continuously running something in the background as the memory usage keeps growing (and quite quickly). But maybe it is something else entirely.
I am currently using the latest versions of R and Rstudio (4.1.2, and 2021.09.0-351), on a PC with processor Intel i7, x64 bit, RAM 16GM, Windows 10.
What should I look for at this point?
On Windows, there is several typical memory or CPU issues with Rstudio. In my answer, I explain how the Rstudio interface itself use memory and CPU, as soon as you open a project (e.g., when Rstudio show you some .Rmd files). The memory / CPU cost associated with the computation is not covered in my answer (i.e. when you have performance issues when executing a line of code = not covered).
When working on 'long' .Rmd files within Rstudio on Windows, the CPU and/or memory usage get sometimes very high and increases progressively (e.g., because of a process named 'Qtwebengineprocess'). To solve the problem caused by long Rmd files loaded within a Rstudio session, you should:
pay attention to the process of Rstudio that consume memory, when scanning your code (i.e. disable or enable stuff in the 'Global options' menu of Rstudio). For example, try to disable 'inline display'(Tools => Global options => Rmarkdown => Show equation and image preview => Never). This post put me on this way to consider that memory / CPU leak are sometimes due to Rstudio itself, nor the data or the code.
set up a bookdown project, in order to split your large Rmd files into several Rmd. See here.
Bonus step, see if there is a conflict in some packages which are loaded with the command tidyverse_conflicts(), but it's already a 'computing problem' (not covered here).

Global variables not saved after executing R code

Recently I swapped my personal PC with admin rights to my employee's PC, where we use AD to log in and where I do not hold admin rights.
Since that event, I have trouble running my R code. I currently develop Shiny app. Each time I click "Run app", two things appear to behave differently versus my old setup.
First of all, I can run app only once - after that, the Rstudio is busy forever, suppressing me from running app again, accessing data frames created by my app and so on. So after each test of some changes, I literally have to reset entire RStudio.
Secondly, despite the fact that I have set my working directory to "C:/Users/mylogin/Documents", after restarting R, there are no global variables visible, event after saving workspace image. I used to use global variables to debug my app after closing it. Of course I can rewrite entire code to, for example, dump all the tables to different files.
My question is: is it possible, that this behavior is related to not having admin rights on my current PC? Or is it related to another issue and if so - may someone provide me some help in that matter? I have little to no knowledge about debugging Rstudio.
Setup: Win 10 64bit, R version 4.0.2 (2020-06-22) Rstudio desktop, 1.3.1093, Apricot Nasturtium.
For future generations: the problem perished, without any particular reason, quite frustrating that we will never know the answer.

How to replicate the package check time performed on CRAN?

I've been trying to reduce the check time on a package I am submitting to CRAN. On my local machines, check time is somewhere between a minute (i7 CPU) and 2 minutes (i5 CPU). However, CRAN reviewers keep pointing out the check time is over 10 minutes. The only way I could find to reproduce such long check times is by uploading my package to http://win-builder.r-project.org/, where it indeed takes > 600 s to check.
I wish I could reproduce this check time locally so I am not dependent on a remote solution. The only difference I can see between Win builder and my local machine is the OS (Win vs. Linux) and how Win builder seems to be doing multiarch checks (i386 and x64).
I am not sure how to reproduce this locally. I have tried R CMD check with seemingly-relevant switches like --multiarch and --force-multiarch but it doesn't seem to be doing anything differently. I guess I have to install some extra packages like r-cran-i386 or whatever, but I couldn't find anything of the sort in my repositories ("R" can be such a PITA of a search expression) and the instructions on README files like the one on https://cran.r-project.org/bin/linux/ubuntu/ didn't get me far enough.
I am already using --as-cran, and am aware of solutions like this, though I think installing R i386 on a separate VM containing a 32-bit OS defeats the purpose of what I am trying to accomplish.

Loading .dta data into R takes long time

Some confidential data is stored on a server and accessible for researchers via remote access.
Researchers can login via some (I think cisco) remote client, and share virtual machines on the same host
There's a 64 bit Windows running on the virtual machine
The system appears to be optimized for Stata, I'm among the first to use the data using R. There is no RStudio installed on the client, just the RGui 3.0.2.
And here's my problem: the data is saved in the stata format (.dta), and I need to open it in R. At the moment I am doing
read.dta(fileName, convert.factors = FALSE)[fields]
Loading in a smaller file (around 200MB) takes 1-2 minutes. However, loading in the main file (3-4 GB) takes very long, longer than my patience was for me. During that time, the R GUI is not responding anymore.
I can test my code on my own machine (OS X, RStudio) on a smaller data sample, which works all fine. Is this
because of OS X + RStudio, or only
because of the size of the file?
A college is using Stata on a similar file in their environment, and that was working fine for him.
What can I do to improve the situation? Possible solutions I came up with were
Load the data into R somehow differently (perhaps there is a way that doesn't require all this memory usage). I have also access to stata. If all else fails, I could prepare the data in Stata, for example slice it into smaller pieces and reassemble it in R
Ask them to allocate more memory to my user of the VM (if that indeed is the issue)
Ask them to provide RStudio as a backend (even if that's not faster, perhaps its less prone to crashes)
Certainly the size of the file is a prime factor, but the machine and configuration might be, too. Hard to tell without more information. You need a 64 bit operating system and a 64 bit version of R.
I don't imagine that RStudio will help or hinder the process.
If the process scales linearly, it means your big data case will take (120 seconds)*(4096 MB/200 MB) =2458 seconds, or around three quarters of an hour. Is that how long you waited?
The process might not be linear.
Was the processor making progress? If you checked CPU and memory, was the process still running? Was it doing a lot of page swaps?

RStudio server - Hangs when switching projects

I am currently running RStudio via a server installation that I only partially maintain. I am working with some fairly large data sets and models (> 9 million rows of 611 variables). When I try to switch projects, RStudio hangs when loading the project (it says "Switching projects to..." at the top) or, if it loads, takes forever.
RStudio works otherwise while attempting to switch projects, but menus and the like do not work.
I have searched thoroughly for a fix to no avail. How would I go about troubleshooting (or, ideally, fixing) the problem?
RStudio is running on a linux (Open SuSE) VM.
Thanks in advance.
EDIT: Per this thread: https://stackoverflow.com/a/15373596/3469671, I deleted .rstudio from my home directory and that seemed to free things up. Is there some setting I can change to facilitate the loading of larger projects?
Per this thread, I deleted .rstudio from my home directory and that seemed to free things up.
As a practice, I learned to stop saving my workspace while it was loaded with data and included steps within projects to save and load prepared data.

Resources