Rstudio potential memory leak / background activity? - r

I’m having a lot of trouble working with Rstudio on a new PC. I could not find a solution searching the web.
When Rstudio is on, it is constantly eating up memory until it becomes unworkable. If I work on an existing project, it takes half an hour to an hour to become impossible to work with. If I start a new project without loading any objects or packages, just writing scripts without even running them, it takes longer to reach that point, however, it still does.
When I first start the program, the Task Manager shows memory usage of 950-1000 MB already (sometimes larger), and as I work, it climbs up to 6000 MB at which point it is impossible to work with as every activity is delayed and 'stuck'. Just to compare, on my old PC while working on the program, the Task Manager shows 100-150 MB. When I click the "Memory Usage Report" within Rstudio, the "used by session" is very small, the "used by system" is almost at a maximum yet Rstudio is the only thing taking up they system memory on the PC.
Things I tried: installing older versions of both R and Rstudio, pausing my anti-virus program, changing compatibility mode, zoom on "100%". It feels like Rstudio is continuously running something in the background as the memory usage keeps growing (and quite quickly). But maybe it is something else entirely.
I am currently using the latest versions of R and Rstudio (4.1.2, and 2021.09.0-351), on a PC with processor Intel i7, x64 bit, RAM 16GM, Windows 10.
What should I look for at this point?

On Windows, there is several typical memory or CPU issues with Rstudio. In my answer, I explain how the Rstudio interface itself use memory and CPU, as soon as you open a project (e.g., when Rstudio show you some .Rmd files). The memory / CPU cost associated with the computation is not covered in my answer (i.e. when you have performance issues when executing a line of code = not covered).
When working on 'long' .Rmd files within Rstudio on Windows, the CPU and/or memory usage get sometimes very high and increases progressively (e.g., because of a process named 'Qtwebengineprocess'). To solve the problem caused by long Rmd files loaded within a Rstudio session, you should:
pay attention to the process of Rstudio that consume memory, when scanning your code (i.e. disable or enable stuff in the 'Global options' menu of Rstudio). For example, try to disable 'inline display'(Tools => Global options => Rmarkdown => Show equation and image preview => Never). This post put me on this way to consider that memory / CPU leak are sometimes due to Rstudio itself, nor the data or the code.
set up a bookdown project, in order to split your large Rmd files into several Rmd. See here.
Bonus step, see if there is a conflict in some packages which are loaded with the command tidyverse_conflicts(), but it's already a 'computing problem' (not covered here).

Related

"Cannot allocate vector of size xxx mb" error, nothing seems to fix

I'm running RStudio x64 on Windows 10 with 16GB of RAM. RStudio seems to be running out of memory for allocating large vectors, in this case a 265MB one. I've gone through multiple tests and checks to identify the problem:
Memory limit checks via memory.limit() and memory.size(). Memory limit is ~16GB and size of objects stored in environment is ~5.6GB.
Garbage collection via gc(). This removes some 100s of MBs.
Upped priority of rsession.exe and rstudio.exe via Task Manager to real-time.
Ran chkdsk and RAM diagnostics on system restart. Both returned no errors.
But the problem persists. It seems to me that R can access 16GB of RAM (and shows 16GB committed on Resource Monitor), but somehow is still unable to make a large vector. My main confusion is this: the problem only begins appearing if I run code on multiple datasets consecutively, without restarting RStudio in between. If I do restart RStudio, the problem doesn't show up anymore, not for a few runs.
The error should be replicable with any large R vector allocation (see e.g. the code here). I'm guessing the fault is software, some kind of memory leak, but I'm not sure where or how, or how to automate a fix.
Any thoughts? What have I missed?

Rstudio is painfully slow

Suddenly, Rstudio is painfully slow, and now it is unusable. This means, I open it up and there is a lag of several seconds if I type anything. I have explored all the options I can come up with:
1. re-installing both R and Rstudio (although I am not 100% sure I could remove all components),
2. trying to reset settings.... the obvious things such as clearing the workspace and the console.
The size of my data is negligible. I cannot think of anything else.... any ideas?
The only observation i can make that shows something could be wrong with the configuration is (sometimes), I see "gctorture false" as a value in the environment.
Just a guess, but ?gctorture says
Provokes garbage collection on (nearly) every memory allocation.
Intended to ferret out memory protection bugs. Also makes R run
_very_ slowly, unfortunately.
which sounds about right for your problem! You could try
gctorture(FALSE)
If that speeds things up, then look for somewhere that this might have been set, e.g., in a .Rprofile (current working directory, or your user home directory, or the installation directory of R; see ?.Rprofile) and make sure that you start R without loading any .Rhistory or .RData files (again in the working directory, your home directory, etc.)
I had a RStudio project with Git Version Control and then it became very slow. I solved the problem by removing the Git Version Control

Loading .dta data into R takes long time

Some confidential data is stored on a server and accessible for researchers via remote access.
Researchers can login via some (I think cisco) remote client, and share virtual machines on the same host
There's a 64 bit Windows running on the virtual machine
The system appears to be optimized for Stata, I'm among the first to use the data using R. There is no RStudio installed on the client, just the RGui 3.0.2.
And here's my problem: the data is saved in the stata format (.dta), and I need to open it in R. At the moment I am doing
read.dta(fileName, convert.factors = FALSE)[fields]
Loading in a smaller file (around 200MB) takes 1-2 minutes. However, loading in the main file (3-4 GB) takes very long, longer than my patience was for me. During that time, the R GUI is not responding anymore.
I can test my code on my own machine (OS X, RStudio) on a smaller data sample, which works all fine. Is this
because of OS X + RStudio, or only
because of the size of the file?
A college is using Stata on a similar file in their environment, and that was working fine for him.
What can I do to improve the situation? Possible solutions I came up with were
Load the data into R somehow differently (perhaps there is a way that doesn't require all this memory usage). I have also access to stata. If all else fails, I could prepare the data in Stata, for example slice it into smaller pieces and reassemble it in R
Ask them to allocate more memory to my user of the VM (if that indeed is the issue)
Ask them to provide RStudio as a backend (even if that's not faster, perhaps its less prone to crashes)
Certainly the size of the file is a prime factor, but the machine and configuration might be, too. Hard to tell without more information. You need a 64 bit operating system and a 64 bit version of R.
I don't imagine that RStudio will help or hinder the process.
If the process scales linearly, it means your big data case will take (120 seconds)*(4096 MB/200 MB) =2458 seconds, or around three quarters of an hour. Is that how long you waited?
The process might not be linear.
Was the processor making progress? If you checked CPU and memory, was the process still running? Was it doing a lot of page swaps?

RStudio freezes on "saving workspace image," previously saved .RData file disappears

First, I've also posted this question on the RStudio Support page. If I get a response there, I will post it here for all to see (and vice versa).
I'm enjoying RStudio but am having trouble using Rprojects to save model outputs. I'm running sets of models that take ~1 day to run, so this is really setting me back. This is on an iMac running 10.9.5 (Mavericks).
Here's what happens:
I close the project and allow the "saving workspace image" to go through. (This is taking ~15 min, and the Rdata files are 6GB - this seems surprisingly large to me).
Often there is no problem upon reopening, the Rdata files are restored, and I see the objects I've created in the Global Environment pane. I run another model (or set of them), and close the project again. RStudio now gets hung up on "saving workspace image." Eventually, the wheel showing that this is active stops turning. Sometimes the mouse disappears from the screen and the entire computer freezes.
I either force RStudio to close, or force the computer to shutdown. When I restart and open RStudio, then load the Rproject, the Global Environment is empty. In the Files pane, there are no .RData files shown.
When I check the Rproject directory in Finder, there are multiple .RDataTmp (hidden) files. I'm not clear whether I can use any of them to recover my data, or how to attempt to load them in RStudio.
Solutions I've tried so far:
Updating everything, including R, RStudio, and Safari, per another post on RStudio Support.
Disabling my syncing program (SugarSync) from updating .Rproj.user file, also after reading a post there.
Enabling access for RStudio in the Privacy/Security settings.
I haven't been able to find any other possible solutions, and I am growing frustrated with testing this out, as it seems to happen only intermittently and (sigh) after the problem seems revolved, such that I've run a whole bunch of models and have a good deal of data to lose! This makes me wonder whether (a) the universe is simply cruel, or (b) it's the large file size that is causing the problem. The other option is (c) both.
I read elsewhere on RStudio Support that file compression can be enabled, but that this will slow the process of saving. Since it's already taking quite a long time to save upon closing the project and I'm not clear on why it might help, I'm hesitant to enable file compression until I know more.
Thanks for your help,
MK

RStudio server - Hangs when switching projects

I am currently running RStudio via a server installation that I only partially maintain. I am working with some fairly large data sets and models (> 9 million rows of 611 variables). When I try to switch projects, RStudio hangs when loading the project (it says "Switching projects to..." at the top) or, if it loads, takes forever.
RStudio works otherwise while attempting to switch projects, but menus and the like do not work.
I have searched thoroughly for a fix to no avail. How would I go about troubleshooting (or, ideally, fixing) the problem?
RStudio is running on a linux (Open SuSE) VM.
Thanks in advance.
EDIT: Per this thread: https://stackoverflow.com/a/15373596/3469671, I deleted .rstudio from my home directory and that seemed to free things up. Is there some setting I can change to facilitate the loading of larger projects?
Per this thread, I deleted .rstudio from my home directory and that seemed to free things up.
As a practice, I learned to stop saving my workspace while it was loaded with data and included steps within projects to save and load prepared data.

Resources