R memory puzzle on ECDF environments - r

I have a massive list of ECDF objects.
Similar to:
vals <- rnorm(10000)
x <- ecdf(vals)
ecdfList <- lapply(1:10000, function(i) ecdf(vals))
save(ecdfList, file='mylist.rda')
class(ecdfList[[1]])
[1] "ecdf" "stepfun" "function"
Let's quit the R server and start fresh.
q()
> R (I'm on a server running ubuntu, R 3.4.4)
Now, the problem is, when starting with a fresh env,
loading and deleting the ecdfList doesn't free the memory.
load('mylist.rda')
rm(ecdfList)
gc()
top and free still show the memory as being used by R.
So I thought I would be clever and load them to a new environment.
e = new.env()
load('mylist.rda', envir=e)
rm(e)
gc()
But, same thing happens. top and free still show the memory as being used.
Where are those ecdf objects? How can I safely remove that list of ecdfs from memory?
Maybe the memory is just being held.. just in case.. by R? This doesn't happen with other data objects.
Here's an example of watching the memory with 'free'.
From Rstudio, I'll create a list of vectors and then release them, checking the memory used before and after.
dave#yoga:~$ free
total used free shared
available Mem: 16166812 1548680 11725452 932416
Then make a list of vectors.
x <- lapply(1:10000, function(a) rnorm(n=10000))
Then check the free memory.
davidgibbs#gibbs-yoga:~$ free
total used free shared
available Mem: 16166812 2330068 10954372 921956
From within Rstudio, rm the vectors.
rm(x)
gc()
Check the memory again,
davidgibbs#gibbs-yoga:~$ free
total used free shared
available Mem: 16166812 1523252 11750620 932528
OK, so the memory is returned.
Now we'll try it with a list of ECDFs.
# already saved the list as above
e = new.env()
open('mylist.rda', envir=e)
And check the memory
dave#yoga:~$ free
total used free shared
Mem: 16166812 1752808 10213168 1166136
e <- new.env()
load('ecdflist.rda', envir = e)
And we'll check the memory
dave#yoga:~$ free
total used free shared
Mem: 16166812 3365536 8667616 1096236
Now we'll rm that env.
rm(e)
gc()
Final memory check.
dave#yoga:~$ free
total used free shared
available Mem: 16166812 3321584 8726964
And still being used until we reset R.
Thank you!!
-dave

Related

Freeing all RAM in R session without restarting R session?

Is there way to clear more RAM than rm(list=ls()); gc() ?
I expected garbage collection (i.e. gc()) to clear all RAM back to the level of RAM that was being used when the R session began, however, I have observed the following on a laptop with 16gb RAM:
# Load a large object
large_object <- readRDS("large_object.RDS")
object.size(large_object)
13899229872 bytes # i.e. ~14 gig
# Clear everything
rm(list=ls(all=T)); gc()
# Load large object again
large_object <- readRDS("large_object.RDS")
Error: vector memory exhausted (limit reached?)
I can't explain why there was enough memory the first time, but not the second.
Note: when the R session is restarted (i.e. .rs.restartR()), readRDS("large_object.RDS") works again
Question
In addition to rm(list=ls()) and gc(), how can more RAM be freed during the current R session, without restarting?

Garbage Collection in R

I started to use gc() for garbage collection in R. I have 16 GB RAM and sometimes, up to 10 GB RAM gets freed when using this command.
Does it make sense to use gc() inside functions? Often, the functions I write/use need almost all RAM that is available. Or does R reliably clean up memory that was used only inside a function?
Example:
f <- function(x) {
# do something
y <- doStuff(x)
# do something else
z <- doMoreStuff(y)
# garbage collection
gc()
# return result
return(z)
}
Calling gc() is largely pointless, as R calls it automatically when more memory is needed. The only reason I can think of for calling gc() explicitly is if another program needs memory that R is hogging.

not all RAM is released after gc() after using ffdf object in R

I am running the script as follows:
library(ff)
library(ffbase)
setwd("D:/My_package/Personal/R/reading")
x<-cbind(rnorm(1:100000000),rnorm(1:100000000),1:100000000)
system.time(write.csv2(x,"test.csv",row.names=FALSE))
#make ffdf object with minimal RAM overheads
system.time(x <- read.csv2.ffdf(file="test.csv", header=TRUE, first.rows=1000, next.rows=10000,levels=NULL))
#make increase by 5 of the column#1 of ffdf object 'x' by the chunk approach
chunk_size<-100
m<-numeric(chunk_size)
#list of chunks
chunks <- chunk(x, length.out=chunk_size)
#FOR loop to increase column#1 by 5
system.time(
for(i in seq_along(chunks)){
x[chunks[[i]],][[1]]<-x[chunks[[i]],][[1]]+5
}
)
# output of x
print(x)
#clear RAM used
rm(list = ls(all = TRUE))
gc()
#another option to run garbage collector explicitly.
gc(reset=TRUE)
The issue is that I still some RAM unreleased but all objects and functions have been swept away from the current environment.
Moreover, the next run of the script will increase portion of RAM unreleased as if it is cumulative function (by Task manager in Win7 64bit).
However, if I make a non-ffdf object and sweep it away, the output of rm() and gc() will be Ok.
So my guess about RAM unreleased is connected with specifics of ffdf objects and ff package.
So the effective way to clear up RAM is to quit the current R-session and re-run it again. but it is not very convinient.
I have scanned a bunch of posts about memory cleaning up including this one:
Tricks to manage the available memory in an R session
But I have not found the clear explanations of such a situation and effective ways to overcome it (without resetting R-session).
I would be very grateful for your comments.

R data.table Size and Memory Limits

I have a 15.4GB R data.table object with 29 Million records and 135 variables. My system & R info are as follows:
Windows 7 x64 on a x86_64 machine with 16GB RAM."R version 3.1.1 (2014-07-10)" on "x86_64-w64-mingw32"
I get the following memory allocation error (see image)
I set my memory limits as follows:
#memory.limit(size=7000000)
#Change memory.limit to 40GB when using ff library
memory.limit(size=40000)
My questions are the following:
Should I change the memory limit to 7 TB
Break the file into chunks and do the process?
Any other suggestions?
Try to profile your code to identify which statements cause the "waste of RAM":
# install.packages("pryr")
library(pryr) # for memory debugging
memory.size(max = TRUE) # print max memory used so far (works only with MS Windows!)
mem_used()
gc(verbose=TRUE) # show internal memory stuff (see help for more)
# start profiling your code
Rprof( pfile <- "rprof.log", memory.profiling=TRUE) # uncomment to profile the memory consumption
# !!! Your code goes here
# Print memory statistics within your code whereever you think it is sensible
memory.size(max = TRUE)
mem_used()
gc(verbose=TRUE)
# stop profiling your code
Rprof(NULL)
summaryRprof(pfile,memory="both") # show the memory consumption profile
Then evaluate the memory consumption profile...
Since your code stops with an "out of memory" exception you should reduce the input data to an amount the makes your code workable and use this input for memory profiling...
You could try to use the ff package. It works well with on disk data.

Memory issue in R

I know there are lots of memory questions about R, but why can it sometimes find room for an object but other times it cant. For instance, I'm running 64 bit R on Linux, on an interactive node with 15gb memory. My workspace is almost empty:
dat <- lsos()
dat$PrettySize
[1] "87.5 Kb" "61.8 Kb" "18.4 Kb" "9.1 Kb" "1.8 Kb" "1.4 Kb" "48 bytes"
The first time I load R after CD'ing into desired directory I can load an Rdata fine. BUt then sometimes I need to reload it and I get the usual:
> load("PATH/matrix.RData")
Error: cannot allocate vector of size 2.9 Gb
If I can load it once, and there's enough (I assume contiguous) room, then what's going on? Am I missing something obvious?
The basic answer is that the memory allocation function needs to find contiguous memory for construction of objects (both permanent and temporary) and other processes (R-process or others) may have fragmented the available space. R will not delete an object that is being overwritten until the load process is completed, so even though you think you may be laying new data on top of old data, you are not.

Resources