Memory issue in R - r

I know there are lots of memory questions about R, but why can it sometimes find room for an object but other times it cant. For instance, I'm running 64 bit R on Linux, on an interactive node with 15gb memory. My workspace is almost empty:
dat <- lsos()
dat$PrettySize
[1] "87.5 Kb" "61.8 Kb" "18.4 Kb" "9.1 Kb" "1.8 Kb" "1.4 Kb" "48 bytes"
The first time I load R after CD'ing into desired directory I can load an Rdata fine. BUt then sometimes I need to reload it and I get the usual:
> load("PATH/matrix.RData")
Error: cannot allocate vector of size 2.9 Gb
If I can load it once, and there's enough (I assume contiguous) room, then what's going on? Am I missing something obvious?

The basic answer is that the memory allocation function needs to find contiguous memory for construction of objects (both permanent and temporary) and other processes (R-process or others) may have fragmented the available space. R will not delete an object that is being overwritten until the load process is completed, so even though you think you may be laying new data on top of old data, you are not.

Related

Freeing all RAM in R session without restarting R session?

Is there way to clear more RAM than rm(list=ls()); gc() ?
I expected garbage collection (i.e. gc()) to clear all RAM back to the level of RAM that was being used when the R session began, however, I have observed the following on a laptop with 16gb RAM:
# Load a large object
large_object <- readRDS("large_object.RDS")
object.size(large_object)
13899229872 bytes # i.e. ~14 gig
# Clear everything
rm(list=ls(all=T)); gc()
# Load large object again
large_object <- readRDS("large_object.RDS")
Error: vector memory exhausted (limit reached?)
I can't explain why there was enough memory the first time, but not the second.
Note: when the R session is restarted (i.e. .rs.restartR()), readRDS("large_object.RDS") works again
Question
In addition to rm(list=ls()) and gc(), how can more RAM be freed during the current R session, without restarting?

Increase Available Memory in R on Ubuntu [duplicate]

R's memory.size() is a Windows only. For other functions (such as windows()) the help page gives pointer to non-windows counterparts.
But for memory.size() I could find no such pointers.
So here is my question: is there a function to do the same as memory.size() but in linux?
I think that this should be handled by the operating system. There is no built-in limit that I know of; if necessary, R will use all the memory that it can get.
To obtain information on the total and/or on the available memory in linux, you can try
system('grep MemTotal /proc/meminfo')
or
system('free -m')
or
system('lshw -class memory')
The last command will complain that you should run this as super-user and it will give a warning that the output may not be accurate; but from my experience it will still provide a fairly useful output.
To obtain information on the memory usage of a running R script one could either monitor the currently used resources by starting top in a separate terminal, or use, e.g., the following system call from within the R script:
system(paste0("cat /proc/",Sys.getpid(),"/status | grep VmSize"))
Hope this helps.
Using pryr library:
library("pryr")
mem_used()
# 27.9 MB
x <- mem_used()
x
# 27.9 MB
class(x)
# [1] "bytes"
Result is the same as #RHertel's answer, with pryr we can assign the result into a variable.
system('grep MemTotal /proc/meminfo')
# MemTotal: 263844272 kB
To assign to a variable with system call, use intern = TRUE:
x <- system('grep MemTotal /proc/meminfo', intern = TRUE)
x
# [1] "MemTotal: 263844272 kB"
class(x)
# [1] "character"
Yes, memory.size() and memory.limit() is not working in linux/unix.
I can suggest unix package.
To increase the memory limit in linux:
install.packages("unix")
library(unix)
rlimit_as(1e12) #increases to ~12GB
You can also check the memory with this:
rlimit_all()
for detailed information:
https://rdrr.io/cran/unix/man/rlimit.html
also you can find further info here:
limiting memory usage in R under linux

readr import - could not allocate memory ... in C function 'R_AllocStringBuffer'

Having trouble loading a large text file; I'll post the code below. The file is ~65 GB and is separated using a "|". I have 10 of them. The process I'll describe below has worked for 9 files but the last file is giving me trouble. Note that about half of the other 9 files are larger than this - about 70 GB.
# Libraries I'm using
library(readr)
library(dplyr)
# Function to filter only the results I'm interested in
f <- function(x, pos) filter(x, x[,41] == "CA")
# Reading in the file.
# Note that this has worked for 9/10 files.
tax_history_01 <- read_delim_chunked( "Tax_History_148_1708_07.txt",
col_types = cols(`UNFORMATTED APN` = col_character()),
DataFrameCallback$new(f), chunk_size = 1000000, delim = "|")
This is the error message I get:
Error: cannot allocate vector of size 81.3 Mb
Error during wrapup: could not allocate memory (47 Mb) in C function 'R_AllocStringBuffer'
If it helps, Windows says the file is 69,413,856,071 bytes and readr is indicating 100% at 66198 MB. I've done some searching and really haven't a clue as to what's going on. I have a small hunch that there could be something wrong with the file (e.g. a missing delimiter).
Edit: Just a small sample of the resources I consulted.
More specifically what's giving me trouble is "Error during wrapup: ... in C function 'R_AllocStringBuffer' " - I can't find much on this error.
Some of the language in this post has led me to believe that the limit of a string vector has been reached and there possibly a parsing error.
R could not allocate memory on ff procedure. How come?
Saw this post and it seemed I was facing a different issue. For me it's not really a calculations issue.
R memory management / cannot allocate vector of size n Mb
I referred to this post regarding cleaning up my work space. Not really an issue within one import but good practice when I ran the script importing all 10.
Cannot allocate vector in R of size 11.8 Gb
Just more topics related to this:
R Memory "Cannot allocate vector of size N"
Found this too but it's no help because of machine restrictions due to data privacy:
https://rpubs.com/msundar/large_data_analysis
Just reading up on general good practices:
http://adv-r.had.co.nz/memory.html
http://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html
Look at how wide the files are. If this is a very wide file, then your chunk_size = 1000000 could be making this the biggest single chunk that gets read in at one time, even if its not the biggest overall file.
Also, ensure that you're freeing (rm) the previous blocks read in, so that memory is returned and becomes available again. If you're relying on the overwriting of the previous chunk, then you've effectively doubled the memory requirements.
I just ran into this error - I went through maxo's links, read the comments, and still no solution.
Turns out, in my case, the csv I was reading had been corrupted during the copy (checked this using an md5sum check, which - in hindsight - I should have done right away).
I'm guessing what happened, was that due to the nature of the corrupted data, there was an open quote without its corresponding closing quote, leading to the rest of the file being read in as one VERRRRYY LARRRGE string. That's my guess.
Anyway, hope this helps someone in the future :-).

not all RAM is released after gc() after using ffdf object in R

I am running the script as follows:
library(ff)
library(ffbase)
setwd("D:/My_package/Personal/R/reading")
x<-cbind(rnorm(1:100000000),rnorm(1:100000000),1:100000000)
system.time(write.csv2(x,"test.csv",row.names=FALSE))
#make ffdf object with minimal RAM overheads
system.time(x <- read.csv2.ffdf(file="test.csv", header=TRUE, first.rows=1000, next.rows=10000,levels=NULL))
#make increase by 5 of the column#1 of ffdf object 'x' by the chunk approach
chunk_size<-100
m<-numeric(chunk_size)
#list of chunks
chunks <- chunk(x, length.out=chunk_size)
#FOR loop to increase column#1 by 5
system.time(
for(i in seq_along(chunks)){
x[chunks[[i]],][[1]]<-x[chunks[[i]],][[1]]+5
}
)
# output of x
print(x)
#clear RAM used
rm(list = ls(all = TRUE))
gc()
#another option to run garbage collector explicitly.
gc(reset=TRUE)
The issue is that I still some RAM unreleased but all objects and functions have been swept away from the current environment.
Moreover, the next run of the script will increase portion of RAM unreleased as if it is cumulative function (by Task manager in Win7 64bit).
However, if I make a non-ffdf object and sweep it away, the output of rm() and gc() will be Ok.
So my guess about RAM unreleased is connected with specifics of ffdf objects and ff package.
So the effective way to clear up RAM is to quit the current R-session and re-run it again. but it is not very convinient.
I have scanned a bunch of posts about memory cleaning up including this one:
Tricks to manage the available memory in an R session
But I have not found the clear explanations of such a situation and effective ways to overcome it (without resetting R-session).
I would be very grateful for your comments.

Why does ff still store data in RAM?

Using the ff package of R, I imported a csv file into a ffdf object, but was surprised to find that the object occupied some 700MB of RAM. Isn't ff supposed to keep data on disk rather than in RAM? Did I do something wrong? I am a novice in R. Any advices are appreciated. Thanks.
> training.ffdf <- read.csv.ffdf(file="c:/temp/training.csv", header=T)
> # [Edit: the csv file is conceptually a large data frame consisting
> # of heterogeneous types of data --- some integers and some character
> # strings.]
>
> # The ffdf object occupies 718MB!!!
> object.size(training.ffdf)
753193048 bytes
Warning messages:
1: In structure(.Internal(object.size(x)), class = "object_size") :
Reached total allocation of 1535Mb: see help(memory.size)
2: In structure(.Internal(object.size(x)), class = "object_size") :
Reached total allocation of 1535Mb: see help(memory.size)
>
> # Shouldn't biglm be able to process data in small chunks?!
> fit <- biglm(y ~ as.factor(x), data=training.ffdf)
Error: cannot allocate vector of size 18.5 Mb
Edit: I followed the advice of Tommy, omitted the object.size call and looked at Task Manager (I ran R on a Windows XP machine with 4GB RAM). I ffsave the object, closed R, reopened it, and loaded the data from file. The problem prevailed:
> library(ff); library(biglm)
> # At this point RGui.exe had used up 26176 KB of memory
> ffload(file="c:/temp/trainingffimg")
> # Now 701160 KB
> fit <- biglm(y ~ as.factor(x), data=training.ffdf)
Error: cannot allocate vector of size 18.5 Mb
I have also tried
> options("ffmaxbytes" = 402653184) # default = 804782080 B ~ 767.5 MB
but after loading the data, RGui still used up more than 700MB of memory and the biglm regression still issued an error.
You need to provide the data in chunks to biglm, see ?biglm.
If you pass a ffdf object instead of a data.frame, you run into one of the following two problems:
ffdf is not a data.frame, so something undefined happens
the function to which you passed tries to convert ffdf to data.frame by e.g. as.data.frame(ffdf), which easily exhausts your RAM, this likely is what happend to you
Check ?chunk.ffdf for an example of how to pass chunks from ffdf to biglm.
The ff package uses memory mapping to just load parts of the data into memory as needed.
But it seems that by calling object.size, you actually force loading the whole thing into memory! That's what the warning messages seem to indicate...
So don't do that... Use Task Manager (Windows) or the top command (Linux) to see how much memory the R process actually uses before and after you've loaded the data.
I had the same problem, and posted a question, and there is a possible explanation for your issue.
When you read a file, character rows are treated as factors, and if there is a lot of unique levels, they will go into RAM. ff seems to load always factor levels into RAM. See this
answer from jwijffels in my question:
Loading ffdf data take a lot of memory
best,
miguel.

Resources