Strangely, I haven't found the answer and perhaps there's no solution. When I load a large file in R (a .csv of 2G with fread for example) it uses roughly 2G of system memory but even after deleting the imported data in R with rm() and gc() it still uses roughly 2G of system memory. My question is : is there a way to unleash the system memory keeped by R without restart R after deleting unused data?
easy.unleash.memory() :o)
Linux, 64bits, R version 3.4.3 (2017-11-30)
I'm aware that several older posts are close to my question but they don't solve my problem. It's possible enough that there's no answer but in this case it will be said!
Related
I am trying to deal with issues of memory limitation in R. I was running a code that would generate monthly data output. It was all going fine, R saved all monthly csv data in the computer as expected but seemed the have the console frozen (although the code ran entirely). When I restarted it, it did not launch as expected, I had to wipe everything and reinstall windows. I downloaded the new version of R (version 4.2.1) and my code no longer runs because of memory limitation. I get the error message below.
Error: cannot allocate vector of size 44.2 Gb
I tried increasing the memory with memory.limit() as I did before but it seems like it is no longer supported by R (memory.limit() bug: "memory.limit() is not longer supported". Increasing memory).
How to deal with this?
When R boots, the memory limit (as returned by memory.limit) is set to 2GB, regardless of the available memory on the computer. (I found that out recently). I imagine that at some point in the booting process, this limit is set to the actually available memory.
This can be seen by printing memory.limit() in the .Rprofile file which is sourced at startup. It prints "2047". On the other hand, when R has booted and I type memory.limit() in the console, I get "16289".
I use a custom .Rprofile file and I need to have access to more than 2GB during bootup.
How can override this limit?
My current workaround is to set the limit myself in the .Rprofile using memory.limit(size=16289) but then I will have to edit this every time I work on a computer with a different amount of RAM which happens fairly often.
Is there an option I can change, a .ini file I can edit, or anything I can do about it?
OS and R version:
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Edit: this is not a duplicate, at least not a duplicate of the proposed question. It is not about managing available memory! I have 16GB of memory and memory.limit() shows that my limit is indeed 16GB.
It all started when I got the warning that I had "reached 2GB memory allocation" (implicating that I had a 2GB memory limit). After investigation, it appears that indeed R limits the memory at 2GB during the startup process.
I want to load my data automatically when R starts, for this I have a small loading script in the .Rprofile. I load more than 2GB data hence I need to have access to my 16GB. My question is about achieving this. This has nothing at all in common with the proposed duplicate, except keywords...
I'm interpreting this as you wanting memory.limit(size=16289) in your .RProfile file, but you don't want to set the specific number every time you change computers with different memory. Why not just dynamically pull the memory you need? In windows:
TOT_MEM <- as.numeric(gsub("\r","",gsub("TotalVisibleMemorySize=","",system('wmic OS get TotalVisibleMemorySize /Value',intern=TRUE)[3])))/1024
memory.limit(size=TOT_MEM)
which would set the available memory to the total memory of the system, or
FREE_MEM <- as.numeric(gsub("\r","",gsub("FreePhysicalMemory=","",system('wmic OS get FreePhysicalMemory /Value',intern=TRUE)[3])))/1024
memory.limit(size=FREE_MEM)
which would set memory.limit to the total available memory on boot.
Place this in RProfile, above where you load your data.
Some confidential data is stored on a server and accessible for researchers via remote access.
Researchers can login via some (I think cisco) remote client, and share virtual machines on the same host
There's a 64 bit Windows running on the virtual machine
The system appears to be optimized for Stata, I'm among the first to use the data using R. There is no RStudio installed on the client, just the RGui 3.0.2.
And here's my problem: the data is saved in the stata format (.dta), and I need to open it in R. At the moment I am doing
read.dta(fileName, convert.factors = FALSE)[fields]
Loading in a smaller file (around 200MB) takes 1-2 minutes. However, loading in the main file (3-4 GB) takes very long, longer than my patience was for me. During that time, the R GUI is not responding anymore.
I can test my code on my own machine (OS X, RStudio) on a smaller data sample, which works all fine. Is this
because of OS X + RStudio, or only
because of the size of the file?
A college is using Stata on a similar file in their environment, and that was working fine for him.
What can I do to improve the situation? Possible solutions I came up with were
Load the data into R somehow differently (perhaps there is a way that doesn't require all this memory usage). I have also access to stata. If all else fails, I could prepare the data in Stata, for example slice it into smaller pieces and reassemble it in R
Ask them to allocate more memory to my user of the VM (if that indeed is the issue)
Ask them to provide RStudio as a backend (even if that's not faster, perhaps its less prone to crashes)
Certainly the size of the file is a prime factor, but the machine and configuration might be, too. Hard to tell without more information. You need a 64 bit operating system and a 64 bit version of R.
I don't imagine that RStudio will help or hinder the process.
If the process scales linearly, it means your big data case will take (120 seconds)*(4096 MB/200 MB) =2458 seconds, or around three quarters of an hour. Is that how long you waited?
The process might not be linear.
Was the processor making progress? If you checked CPU and memory, was the process still running? Was it doing a lot of page swaps?
I am running some basic data manipulation on a Macbook Air (4GB Memory, 120GB HD with 8GB available). My input file is about 40 MB, and I don't write anything to the disk until end of the process. However, in the middle of my process, my Mac says there's no memory to run. I checked hard drive and found there's about 500MB left.
So here are my questions:
How is it possible that R filled up my disk so quickly? My understanding is that R store everything in memory (unless I explicitly write something out to disk).
If R does write temporary files on the disk, how can I find these files to delete them?
Thanks a lot.
Update 1: error message I got:
Force Quit Applications: Your Mac OS X startup disk has no more space available for
application memory
Update 2: I checked tempdir() and it shows "var/folders/k_xxxxxxx/T//Rtmpdp9GCo". But I can't locate this director from my Finder
Update 3: After unlink(tempdir(),recursive=TRUE) in R and restarting my computer, I got my disk space back. I still would like to know if R write on my hard drive to avoid similar situations in the future.
Update 4: My main object is about 1GB. I use Activity Monitor to track process, and while Memory usage is about 2GB, Disk activity is extremely high: Data read: 14GB, data write, 44GB. I have no idea what R is writing.
R writes to a temporary per-session directory which it also cleans up at exit.
It follows convention and respects TMP and related environment variables.
What makes you think that disk space has anything to do with this? R needs all objects held in memory, not off disk (by default; there are add-on packages that allow a subset of operations on on-disk stored files too big to fit into RAM).
One of the steps in the "process" is causing R to request a chunk of RAM from the OS to enable it to continue. The OS could not comply and thus R terminated the "process" that you were running with the error message you failed to give us. [Hint, it would help if you showed the actual error not your paraphrasing thereof. Some inkling of the code you were running would also help. 40MB on-disk sounds like a reasonably large file; how many rows/columns etc.? How big is the object within R; object.size()?
I am executing a sql query in R using sqldf package to create a data frame in R. But, it is throwing an error:
Error: cannot allocate vector of size 3.9 Gb
I have gone through various threads with a similar issue but I could not find a suitable answer.
Can anyone please help me out on this.
I am using R 2.15.1 version on 64-bit linux machine with 32 GB RAM.
The error is often misunderstood. It means that R is unable to allocate an additional chunk of 3.9Gb of memory space. If you were to look at the R process, it would have been using a very large amount of the available RAM before it issued the error you saw and you'd have realised that the error meant additional RAM.
You will have to expand upon this in another question to explain what it is you are trying to do as if you can't read data into R with 32Gb of RAM available you will probably need to look at incremental processing of that data. For that we need details of what you are trying to achieve.
It's just may be the memory limit in R is too low. First try memory.size() then use memory.limit() to know limit and set new one. I'm not sure if it help. Just let us all know.