RAM used per mb in R workspace - r

Is there any way of telling how much RAM is used per mb stored in the workspace in R?
I've got ~700 mb of stuff in the workspace and it brings my pc to a complete freeze, even though it has 4GB of ram, on Ubuntu which is a lightweight OS.
This is just data, I am just doing basic exploratory stats on it like plotting, averaging e.t.c

Related

Memory Limit in R

I'm relatively new to R and seem to be having a memory limit issue with a new laptop. When I run a large dataset of about 50K survey respondents and over 500 variables I receive the error message: Error: cannot allocate vector of size 413 Kb
I got around this issue fine on my old laptop by increasing the memory limit size via the code: memory.limit(size = 10000). Everything worked fine but on my new laptop which is faster and more powerful, the memory limit fills up very fast and will crash at size 27000 after I run about 7 models.
I have tried closing all unnecessary programs, removing all the unneeded objects in R, and clearing the garbage can: gc(). I was using latest version of R 4.14 and have now gone back to 4.04 where it worked fine on my old PC - but none of these help really.
I am running the 64bit version of R on a 64bit PC that has 8GB capacity.
Does anyone know why this might be occurring on a brand new laptop that runs faster while running slower on my 4-year old PC but atleast worked with it.
Also, how high can you set the memory limit as the manual says R can handle 8TB? And how do you permanently set a memory limit?
Thanks

Problem loading large .RData file: Error reading from connection

I have an .RData file that is rather large. It takes up 1.10 GB on my hard drive, it contains a data frame with 100 variables and 12 million observations. Now when I try to load it, I can open the task manager and watch the memory usage go all the way up to 7450 MB; at which point my RAM is completely exhausted, and I get "Error reading from connection." I'm pretty sure this memory deficiency is the problem, but how can that be? Like I said, the .RData is only 1.10 GB.
I'm using R x64 4.0.5. If it's any clue, when I open the 32-Bit version of R (4.0.5) it tells me "Error: memory exhausted (limit reached?)", reinforcing my suspicion that this is a memory issue.
I am unable to access the data any other way, I have to make the .RData file work or it's gone. Why does R require more than 8 GB of RAM to load a 1GB workspace?

"Cannot allocate vector of size xxx mb" error, nothing seems to fix

I'm running RStudio x64 on Windows 10 with 16GB of RAM. RStudio seems to be running out of memory for allocating large vectors, in this case a 265MB one. I've gone through multiple tests and checks to identify the problem:
Memory limit checks via memory.limit() and memory.size(). Memory limit is ~16GB and size of objects stored in environment is ~5.6GB.
Garbage collection via gc(). This removes some 100s of MBs.
Upped priority of rsession.exe and rstudio.exe via Task Manager to real-time.
Ran chkdsk and RAM diagnostics on system restart. Both returned no errors.
But the problem persists. It seems to me that R can access 16GB of RAM (and shows 16GB committed on Resource Monitor), but somehow is still unable to make a large vector. My main confusion is this: the problem only begins appearing if I run code on multiple datasets consecutively, without restarting RStudio in between. If I do restart RStudio, the problem doesn't show up anymore, not for a few runs.
The error should be replicable with any large R vector allocation (see e.g. the code here). I'm guessing the fault is software, some kind of memory leak, but I'm not sure where or how, or how to automate a fix.
Any thoughts? What have I missed?

how to override the 2GB memory limit when R starts

When R boots, the memory limit (as returned by memory.limit) is set to 2GB, regardless of the available memory on the computer. (I found that out recently). I imagine that at some point in the booting process, this limit is set to the actually available memory.
This can be seen by printing memory.limit() in the .Rprofile file which is sourced at startup. It prints "2047". On the other hand, when R has booted and I type memory.limit() in the console, I get "16289".
I use a custom .Rprofile file and I need to have access to more than 2GB during bootup.
How can override this limit?
My current workaround is to set the limit myself in the .Rprofile using memory.limit(size=16289) but then I will have to edit this every time I work on a computer with a different amount of RAM which happens fairly often.
Is there an option I can change, a .ini file I can edit, or anything I can do about it?
OS and R version:
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Edit: this is not a duplicate, at least not a duplicate of the proposed question. It is not about managing available memory! I have 16GB of memory and memory.limit() shows that my limit is indeed 16GB.
It all started when I got the warning that I had "reached 2GB memory allocation" (implicating that I had a 2GB memory limit). After investigation, it appears that indeed R limits the memory at 2GB during the startup process.
I want to load my data automatically when R starts, for this I have a small loading script in the .Rprofile. I load more than 2GB data hence I need to have access to my 16GB. My question is about achieving this. This has nothing at all in common with the proposed duplicate, except keywords...
I'm interpreting this as you wanting memory.limit(size=16289) in your .RProfile file, but you don't want to set the specific number every time you change computers with different memory. Why not just dynamically pull the memory you need? In windows:
TOT_MEM <- as.numeric(gsub("\r","",gsub("TotalVisibleMemorySize=","",system('wmic OS get TotalVisibleMemorySize /Value',intern=TRUE)[3])))/1024
memory.limit(size=TOT_MEM)
which would set the available memory to the total memory of the system, or
FREE_MEM <- as.numeric(gsub("\r","",gsub("FreePhysicalMemory=","",system('wmic OS get FreePhysicalMemory /Value',intern=TRUE)[3])))/1024
memory.limit(size=FREE_MEM)
which would set memory.limit to the total available memory on boot.
Place this in RProfile, above where you load your data.

Loading .dta data into R takes long time

Some confidential data is stored on a server and accessible for researchers via remote access.
Researchers can login via some (I think cisco) remote client, and share virtual machines on the same host
There's a 64 bit Windows running on the virtual machine
The system appears to be optimized for Stata, I'm among the first to use the data using R. There is no RStudio installed on the client, just the RGui 3.0.2.
And here's my problem: the data is saved in the stata format (.dta), and I need to open it in R. At the moment I am doing
read.dta(fileName, convert.factors = FALSE)[fields]
Loading in a smaller file (around 200MB) takes 1-2 minutes. However, loading in the main file (3-4 GB) takes very long, longer than my patience was for me. During that time, the R GUI is not responding anymore.
I can test my code on my own machine (OS X, RStudio) on a smaller data sample, which works all fine. Is this
because of OS X + RStudio, or only
because of the size of the file?
A college is using Stata on a similar file in their environment, and that was working fine for him.
What can I do to improve the situation? Possible solutions I came up with were
Load the data into R somehow differently (perhaps there is a way that doesn't require all this memory usage). I have also access to stata. If all else fails, I could prepare the data in Stata, for example slice it into smaller pieces and reassemble it in R
Ask them to allocate more memory to my user of the VM (if that indeed is the issue)
Ask them to provide RStudio as a backend (even if that's not faster, perhaps its less prone to crashes)
Certainly the size of the file is a prime factor, but the machine and configuration might be, too. Hard to tell without more information. You need a 64 bit operating system and a 64 bit version of R.
I don't imagine that RStudio will help or hinder the process.
If the process scales linearly, it means your big data case will take (120 seconds)*(4096 MB/200 MB) =2458 seconds, or around three quarters of an hour. Is that how long you waited?
The process might not be linear.
Was the processor making progress? If you checked CPU and memory, was the process still running? Was it doing a lot of page swaps?

Resources