R's memory.size() is a Windows only. For other functions (such as windows()) the help page gives pointer to non-windows counterparts.
But for memory.size() I could find no such pointers.
So here is my question: is there a function to do the same as memory.size() but in linux?
I think that this should be handled by the operating system. There is no built-in limit that I know of; if necessary, R will use all the memory that it can get.
To obtain information on the total and/or on the available memory in linux, you can try
system('grep MemTotal /proc/meminfo')
or
system('free -m')
or
system('lshw -class memory')
The last command will complain that you should run this as super-user and it will give a warning that the output may not be accurate; but from my experience it will still provide a fairly useful output.
To obtain information on the memory usage of a running R script one could either monitor the currently used resources by starting top in a separate terminal, or use, e.g., the following system call from within the R script:
system(paste0("cat /proc/",Sys.getpid(),"/status | grep VmSize"))
Hope this helps.
Using pryr library:
library("pryr")
mem_used()
# 27.9 MB
x <- mem_used()
x
# 27.9 MB
class(x)
# [1] "bytes"
Result is the same as #RHertel's answer, with pryr we can assign the result into a variable.
system('grep MemTotal /proc/meminfo')
# MemTotal: 263844272 kB
To assign to a variable with system call, use intern = TRUE:
x <- system('grep MemTotal /proc/meminfo', intern = TRUE)
x
# [1] "MemTotal: 263844272 kB"
class(x)
# [1] "character"
Yes, memory.size() and memory.limit() is not working in linux/unix.
I can suggest unix package.
To increase the memory limit in linux:
install.packages("unix")
library(unix)
rlimit_as(1e12) #increases to ~12GB
You can also check the memory with this:
rlimit_all()
for detailed information:
https://rdrr.io/cran/unix/man/rlimit.html
also you can find further info here:
limiting memory usage in R under linux
Related
I am running the script as follows:
library(ff)
library(ffbase)
setwd("D:/My_package/Personal/R/reading")
x<-cbind(rnorm(1:100000000),rnorm(1:100000000),1:100000000)
system.time(write.csv2(x,"test.csv",row.names=FALSE))
#make ffdf object with minimal RAM overheads
system.time(x <- read.csv2.ffdf(file="test.csv", header=TRUE, first.rows=1000, next.rows=10000,levels=NULL))
#make increase by 5 of the column#1 of ffdf object 'x' by the chunk approach
chunk_size<-100
m<-numeric(chunk_size)
#list of chunks
chunks <- chunk(x, length.out=chunk_size)
#FOR loop to increase column#1 by 5
system.time(
for(i in seq_along(chunks)){
x[chunks[[i]],][[1]]<-x[chunks[[i]],][[1]]+5
}
)
# output of x
print(x)
#clear RAM used
rm(list = ls(all = TRUE))
gc()
#another option to run garbage collector explicitly.
gc(reset=TRUE)
The issue is that I still some RAM unreleased but all objects and functions have been swept away from the current environment.
Moreover, the next run of the script will increase portion of RAM unreleased as if it is cumulative function (by Task manager in Win7 64bit).
However, if I make a non-ffdf object and sweep it away, the output of rm() and gc() will be Ok.
So my guess about RAM unreleased is connected with specifics of ffdf objects and ff package.
So the effective way to clear up RAM is to quit the current R-session and re-run it again. but it is not very convinient.
I have scanned a bunch of posts about memory cleaning up including this one:
Tricks to manage the available memory in an R session
But I have not found the clear explanations of such a situation and effective ways to overcome it (without resetting R-session).
I would be very grateful for your comments.
I am working on a job in which a temporary Hash table is repeatedly used through a loop. The Hash table is represented by an environment variable in R. The problem is that as the loop proceeds the memory cost keeps rising no matter what method I used to delete the table (I tried rm() and gc() however neither was able to free the memory.) As a consequence I cannot accomplish an extraordinary long loop, say 10M cycles. It looks like a memory leak problem but I fail to find a solution elsewhere. I would like to ask what is the correct way to completely removing an environment variable and simultaneously releasing all memory it previously occupied. Thanks in advance for helping check the problem for me.
Here is a very simple example. I am using Windows 8 and R version 3.1.0.
> fun = function(){
H = new.env()
for(i in rnorm(100000)){
H[[as.character(i)]] = rnorm(100)
}
rm(list=names(H), envir=H, inherits=FALSE)
rm(H)
gc()
}
>
> for(k in 1:5){
print(k)
fun()
gc()
print(memory.size(F))
}
[1] 1
[1] 40.43
[1] 2
[1] 65.34
[1] 3
[1] 82.56
[1] 4
[1] 100.22
[1] 5
[1] 120.36
Environments in R are not a good choice for situations where the keys can vary a lot during the computation. The reason is that environments require keys to be symbols, and symbols are not garbage collected. So each run of your function is adding to the internal symbol table. Arranging for symbols to be garbage collected would be one possibility, though care would be needed since a lot of internals code assumes they are not. Another option would be to create better hash table support so environments don't have to try to serve this purpose for which they were not originally designed.
I have a 15.4GB R data.table object with 29 Million records and 135 variables. My system & R info are as follows:
Windows 7 x64 on a x86_64 machine with 16GB RAM."R version 3.1.1 (2014-07-10)" on "x86_64-w64-mingw32"
I get the following memory allocation error (see image)
I set my memory limits as follows:
#memory.limit(size=7000000)
#Change memory.limit to 40GB when using ff library
memory.limit(size=40000)
My questions are the following:
Should I change the memory limit to 7 TB
Break the file into chunks and do the process?
Any other suggestions?
Try to profile your code to identify which statements cause the "waste of RAM":
# install.packages("pryr")
library(pryr) # for memory debugging
memory.size(max = TRUE) # print max memory used so far (works only with MS Windows!)
mem_used()
gc(verbose=TRUE) # show internal memory stuff (see help for more)
# start profiling your code
Rprof( pfile <- "rprof.log", memory.profiling=TRUE) # uncomment to profile the memory consumption
# !!! Your code goes here
# Print memory statistics within your code whereever you think it is sensible
memory.size(max = TRUE)
mem_used()
gc(verbose=TRUE)
# stop profiling your code
Rprof(NULL)
summaryRprof(pfile,memory="both") # show the memory consumption profile
Then evaluate the memory consumption profile...
Since your code stops with an "out of memory" exception you should reduce the input data to an amount the makes your code workable and use this input for memory profiling...
You could try to use the ff package. It works well with on disk data.
I know there are lots of memory questions about R, but why can it sometimes find room for an object but other times it cant. For instance, I'm running 64 bit R on Linux, on an interactive node with 15gb memory. My workspace is almost empty:
dat <- lsos()
dat$PrettySize
[1] "87.5 Kb" "61.8 Kb" "18.4 Kb" "9.1 Kb" "1.8 Kb" "1.4 Kb" "48 bytes"
The first time I load R after CD'ing into desired directory I can load an Rdata fine. BUt then sometimes I need to reload it and I get the usual:
> load("PATH/matrix.RData")
Error: cannot allocate vector of size 2.9 Gb
If I can load it once, and there's enough (I assume contiguous) room, then what's going on? Am I missing something obvious?
The basic answer is that the memory allocation function needs to find contiguous memory for construction of objects (both permanent and temporary) and other processes (R-process or others) may have fragmented the available space. R will not delete an object that is being overwritten until the load process is completed, so even though you think you may be laying new data on top of old data, you are not.
Using the ff package of R, I imported a csv file into a ffdf object, but was surprised to find that the object occupied some 700MB of RAM. Isn't ff supposed to keep data on disk rather than in RAM? Did I do something wrong? I am a novice in R. Any advices are appreciated. Thanks.
> training.ffdf <- read.csv.ffdf(file="c:/temp/training.csv", header=T)
> # [Edit: the csv file is conceptually a large data frame consisting
> # of heterogeneous types of data --- some integers and some character
> # strings.]
>
> # The ffdf object occupies 718MB!!!
> object.size(training.ffdf)
753193048 bytes
Warning messages:
1: In structure(.Internal(object.size(x)), class = "object_size") :
Reached total allocation of 1535Mb: see help(memory.size)
2: In structure(.Internal(object.size(x)), class = "object_size") :
Reached total allocation of 1535Mb: see help(memory.size)
>
> # Shouldn't biglm be able to process data in small chunks?!
> fit <- biglm(y ~ as.factor(x), data=training.ffdf)
Error: cannot allocate vector of size 18.5 Mb
Edit: I followed the advice of Tommy, omitted the object.size call and looked at Task Manager (I ran R on a Windows XP machine with 4GB RAM). I ffsave the object, closed R, reopened it, and loaded the data from file. The problem prevailed:
> library(ff); library(biglm)
> # At this point RGui.exe had used up 26176 KB of memory
> ffload(file="c:/temp/trainingffimg")
> # Now 701160 KB
> fit <- biglm(y ~ as.factor(x), data=training.ffdf)
Error: cannot allocate vector of size 18.5 Mb
I have also tried
> options("ffmaxbytes" = 402653184) # default = 804782080 B ~ 767.5 MB
but after loading the data, RGui still used up more than 700MB of memory and the biglm regression still issued an error.
You need to provide the data in chunks to biglm, see ?biglm.
If you pass a ffdf object instead of a data.frame, you run into one of the following two problems:
ffdf is not a data.frame, so something undefined happens
the function to which you passed tries to convert ffdf to data.frame by e.g. as.data.frame(ffdf), which easily exhausts your RAM, this likely is what happend to you
Check ?chunk.ffdf for an example of how to pass chunks from ffdf to biglm.
The ff package uses memory mapping to just load parts of the data into memory as needed.
But it seems that by calling object.size, you actually force loading the whole thing into memory! That's what the warning messages seem to indicate...
So don't do that... Use Task Manager (Windows) or the top command (Linux) to see how much memory the R process actually uses before and after you've loaded the data.
I had the same problem, and posted a question, and there is a possible explanation for your issue.
When you read a file, character rows are treated as factors, and if there is a lot of unique levels, they will go into RAM. ff seems to load always factor levels into RAM. See this
answer from jwijffels in my question:
Loading ffdf data take a lot of memory
best,
miguel.