Problem loading large .RData file: Error reading from connection - r

I have an .RData file that is rather large. It takes up 1.10 GB on my hard drive, it contains a data frame with 100 variables and 12 million observations. Now when I try to load it, I can open the task manager and watch the memory usage go all the way up to 7450 MB; at which point my RAM is completely exhausted, and I get "Error reading from connection." I'm pretty sure this memory deficiency is the problem, but how can that be? Like I said, the .RData is only 1.10 GB.
I'm using R x64 4.0.5. If it's any clue, when I open the 32-Bit version of R (4.0.5) it tells me "Error: memory exhausted (limit reached?)", reinforcing my suspicion that this is a memory issue.
I am unable to access the data any other way, I have to make the .RData file work or it's gone. Why does R require more than 8 GB of RAM to load a 1GB workspace?

Related

R: how to deal with memory limitation

I am trying to deal with issues of memory limitation in R. I was running a code that would generate monthly data output. It was all going fine, R saved all monthly csv data in the computer as expected but seemed the have the console frozen (although the code ran entirely). When I restarted it, it did not launch as expected, I had to wipe everything and reinstall windows. I downloaded the new version of R (version 4.2.1) and my code no longer runs because of memory limitation. I get the error message below.
Error: cannot allocate vector of size 44.2 Gb
I tried increasing the memory with memory.limit() as I did before but it seems like it is no longer supported by R (memory.limit() bug: "memory.limit() is not longer supported". Increasing memory).
How to deal with this?

How to avoid RStudio writing temp files to SSD?

I have RStudio and R installed in my SSD with only 13 GB of empty space left. The problem is that when I launch some demanding algorithm with RStudio, it quickly fills my SSD with some temporary files that are quit huge, probable because of 16 GB RAM I have in my computer. However, I also have 1 TB almost empty hard disk in the same computer. Is it possible to reroute those temp files to hard disk? Any suggestions how to do this?

Error: vector memory exhausted (limit reached?)

I previously saved a 2.8G RData file and now I'm trying to load it so I can work on it again, but weirdly, I can't. It's giving the error
Error: vector memory exhausted (limit reached?)
This is weird since I was working with it fine before. One thing that changed though is I upgraded to the latest version of R 3.5.0. I saw a previous post with the same error like this but it wasn't resolved. I was hopeful with this solution which increases the memory.limit() but unfortunately, it's only available for Windows.
Can anyone help? I don't really understand what's the problem here since I was able to work with my dataset before the update so it shouldn't be throwing this error.
Did the update somehow decreased the RAM allocated to R? Can we manually increase the memory.limit() in Mac to solve this error?
This change was necessary to deal with operating system memory over-commit issues on Mac OS. From the NEWS file:
\item The environment variable \env{R_MAX_VSIZE} can now be used
to specify the maximal vector heap size. On macOS, unless specified
by this environment variable, the maximal vector heap size is set to
the maximum of 16GB and the available physical memory. This is to
avoid having the \command{R} process killed when macOS over-commits
memory.
Set the environment variable R_MAX_VSIZE to an appropriate value for your system before starting R and you should be able to read your file.

R Running out of memory over large file

I had a unique problem yesterday when trying to read a large .csv file into memory.
the file itself is 9GB with a bit more than 80 Million rows and 10 columns.
it loaded perfectly and took up around 7GB in memory using a remote machine with 128GB of RAM.
my problem is, i want to work on the data with a local machine that only has 32GB of RAM.
i tried reading it with data.table::fread but R crahes when it uses all of the machine's memory.
is there a safer way of reading the data that won't crash R?
is this a known issue? can something be wrong with the machine?
both machine are running windows 7 enterprise.
EDIT:
saving and reading the data in an RDS file worked, but i still want to be able to use just one computer for the entire job.
is there any other way to read the data directly from the csv file?
i don't want to report a bug in data.table unless i am sure this is an issue with fread and not a local issue.
any other ideas?

Where does R store temporary files

I am running some basic data manipulation on a Macbook Air (4GB Memory, 120GB HD with 8GB available). My input file is about 40 MB, and I don't write anything to the disk until end of the process. However, in the middle of my process, my Mac says there's no memory to run. I checked hard drive and found there's about 500MB left.
So here are my questions:
How is it possible that R filled up my disk so quickly? My understanding is that R store everything in memory (unless I explicitly write something out to disk).
If R does write temporary files on the disk, how can I find these files to delete them?
Thanks a lot.
Update 1: error message I got:
Force Quit Applications: Your Mac OS X startup disk has no more space available for
application memory
Update 2: I checked tempdir() and it shows "var/folders/k_xxxxxxx/T//Rtmpdp9GCo". But I can't locate this director from my Finder
Update 3: After unlink(tempdir(),recursive=TRUE) in R and restarting my computer, I got my disk space back. I still would like to know if R write on my hard drive to avoid similar situations in the future.
Update 4: My main object is about 1GB. I use Activity Monitor to track process, and while Memory usage is about 2GB, Disk activity is extremely high: Data read: 14GB, data write, 44GB. I have no idea what R is writing.
R writes to a temporary per-session directory which it also cleans up at exit.
It follows convention and respects TMP and related environment variables.
What makes you think that disk space has anything to do with this? R needs all objects held in memory, not off disk (by default; there are add-on packages that allow a subset of operations on on-disk stored files too big to fit into RAM).
One of the steps in the "process" is causing R to request a chunk of RAM from the OS to enable it to continue. The OS could not comply and thus R terminated the "process" that you were running with the error message you failed to give us. [Hint, it would help if you showed the actual error not your paraphrasing thereof. Some inkling of the code you were running would also help. 40MB on-disk sounds like a reasonably large file; how many rows/columns etc.? How big is the object within R; object.size()?

Resources