I have loaded a fairly large set of data using data.table. I then want to add around 30 columns using instructions of the form:
DT[, x5:=cumsum(y1), by=list(x1, x2)]
DT[, x6:=cummean(y2), by=x1]
At some point I start to get "warnings" like this:
1: In structure(.Call(C_objectSize, x), class = "object_size") :
Reached total allocation of 8072Mb: see help(memory.size)
I check the tracemem(DT) every now and then to assure that no copies are made. The only output I ever get is:
"<0000000005E8E700>"
Also I check ls() to see which objects are in use and object.size() to see how much of my RAM is allocated by the object. The only output of ls() is my data.table and the object size after the first error is 5303.1 Mb.
I am on a Windows 64-bit machine running R in 64-bit and have 8 GB RAM. Of these 8 GB RAM only 80% are in use when I get the warning. Of these R is using 5214.0 Mb (strange since the table is bigger than this).
My question is, if the only RAM R is using is 5303.1 Mb and I still have around 2 Gb of free memory why do I get the error that R has reached the limit of 8 Gb and is there anything I can do against it? If not, what are other options? I know I could use Bigmemory but then I would have to rewrite my whole code and would loose the sweet by-reference modifications which data.table offers.
The problem is that the operations require RAM beyond what the object itself takes up. You could verify that windows is using a page file. If it is you could try increasing its size. http://windows.microsoft.com/en-us/windows/change-virtual-memory-size
If that fails you could try to run a live environment of Lubuntu linux to see if its memory overhead is small enough to allow the operation. http://lubuntu.net/
Ultimately, I suspect you're going to have to use bigmemory or similar.
Related
I am running into issues trying to use large objects in R. For example:
> memory.limit(4000)
> a = matrix(NA, 1500000, 60)
> a = matrix(NA, 2500000, 60)
> a = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb
> a = matrix(NA, 2500000, 60)
Error: cannot allocate vector of size 572.2 Mb # Can't go smaller anymore
> rm(list=ls(all=TRUE))
> a = matrix(NA, 3500000, 60) # Now it works
> b = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb # But that is all there is room for
I understand that this is related to the difficulty of obtaining contiguous blocks of memory (from here):
Error messages beginning cannot
allocate vector of size indicate a
failure to obtain memory, either
because the size exceeded the
address-space limit for a process or,
more likely, because the system was
unable to provide the memory. Note
that on a 32-bit build there may well
be enough free memory available, but
not a large enough contiguous block of
address space into which to map it.
How can I get around this? My main difficulty is that I get to a certain point in my script and R can't allocate 200-300 Mb for an object... I can't really pre-allocate the block because I need the memory for other processing. This happens even when I dilligently remove unneeded objects.
EDIT: Yes, sorry: Windows XP SP3, 4Gb RAM, R 2.12.0:
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_Caribbean.1252 LC_CTYPE=English_Caribbean.1252
[3] LC_MONETARY=English_Caribbean.1252 LC_NUMERIC=C
[5] LC_TIME=English_Caribbean.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
Consider whether you really need all this data explicitly, or can the matrix be sparse? There is good support in R (see Matrix package for e.g.) for sparse matrices.
Keep all other processes and objects in R to a minimum when you need to make objects of this size. Use gc() to clear now unused memory, or, better only create the object you need in one session.
If the above cannot help, get a 64-bit machine with as much RAM as you can afford, and install 64-bit R.
If you cannot do that there are many online services for remote computing.
If you cannot do that the memory-mapping tools like package ff (or bigmemory as Sascha mentions) will help you build a new solution. In my limited experience ff is the more advanced package, but you should read the High Performance Computing topic on CRAN Task Views.
For Windows users, the following helped me a lot to understand some memory limitations:
before opening R, open the Windows Resource Monitor (Ctrl-Alt-Delete / Start Task Manager / Performance tab / click on bottom button 'Resource Monitor' / Memory tab)
you will see how much RAM memory us already used before you open R, and by which applications. In my case, 1.6 GB of the total 4GB are used. So I will only be able to get 2.4 GB for R, but now comes the worse...
open R and create a data set of 1.5 GB, then reduce its size to 0.5 GB, the Resource Monitor shows my RAM is used at nearly 95%.
use gc() to do garbage collection => it works, I can see the memory use go down to 2 GB
Additional advice that works on my machine:
prepare the features, save as an RData file, close R, re-open R, and load the train features. The Resource Manager typically shows a lower Memory usage, which means that even gc() does not recover all possible memory and closing/re-opening R works the best to start with maximum memory available.
the other trick is to only load train set for training (do not load the test set, which can typically be half the size of train set). The training phase can use memory to the maximum (100%), so anything available is useful. All this is to take with a grain of salt as I am experimenting with R memory limits.
I followed to the help page of memory.limit and found out that on my computer R by default can use up to ~ 1.5 GB of RAM and that the user can increase this limit. Using the following code,
>memory.limit()
[1] 1535.875
> memory.limit(size=1800)
helped me to solve my problem.
Here is a presentation on this topic that you might find interesting:
http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/
I haven't tried the discussed things myself, but the bigmemory package seems very useful
The simplest way to sidestep this limitation is to switch to 64 bit R.
I encountered a similar problem, and I used 2 flash drives as 'ReadyBoost'. The two drives gave additional 8GB boost of memory (for cache) and it solved the problem and also increased the speed of the system as a whole.
To use Readyboost, right click on the drive, go to properties and select 'ReadyBoost' and select 'use this device' radio button and click apply or ok to configure.
If you are running your script at linux environment you can use this command:
bsub -q server_name -R "rusage[mem=requested_memory]" "Rscript script_name.R"
and the server will allocate the requested memory for you (according to the server limits, but with good server - hugefiles can be used)
One option is before and after running your command that causes high memory consumption to do a "garbage collection" by running the gc() command, this will free up memory for your analyzes, in addition to using the memory.limit() command.
Example:
gc()
memory.limit(9999999999)
fit <-lm(Y ~ X)
gc()
The save/load method mentioned above works for me. I am not sure how/if gc() defrags the memory but this seems to work.
# defrag memory
save.image(file="temp.RData")
rm(list=ls())
load(file="temp.RData")
I would like to increase (or decrease) the amount of memory available to R. What are the methods for achieving this?
From:
http://gking.harvard.edu/zelig/docs/How_do_I2.html (mirror)
Windows users may get the error that R
has run out of memory.
If you have R already installed and
subsequently install more RAM, you may
have to reinstall R in order to take
advantage of the additional capacity.
You may also set the amount of
available memory manually. Close R,
then right-click on your R program
icon (the icon on your desktop or in
your programs directory). Select
``Properties'', and then select the
``Shortcut'' tab. Look for the
``Target'' field and after the closing
quotes around the location of the R
executible, add
--max-mem-size=500M
as shown in the figure below. You may
increase this value up to 2GB or the
maximum amount of physical RAM you
have installed.
If you get the error that R cannot
allocate a vector of length x, close
out of R and add the following line to
the ``Target'' field:
--max-vsize=500M
or as appropriate. You can always
check to see how much memory R has
available by typing at the R prompt
memory.limit()
which gives you the amount of available memory in MB. In previous versions of R you needed to use: round(memory.limit()/2^20, 2).
Use memory.limit(). You can increase the default using this command, memory.limit(size=2500), where the size is in MB. You need to be using 64-bit in order to take real advantage of this.
One other suggestion is to use memory efficient objects wherever possible: for instance, use a matrix instead of a data.frame.
For linux/unix, I can suggest unix package.
To increase the memory limit in linux:
install.packages("unix")
library(unix)
rlimit_as(1e12) #increases to ~12GB
You can also check the memory with this:
rlimit_all()
for detailed information:
https://rdrr.io/cran/unix/man/rlimit.html
also you can find further info here:
limiting memory usage in R under linux
Microsoft Windows accepts any memory request from processes if it could be done.
There is no limit for the memory that can be provided to a process, except the Virtual Memory Size.
Virtual Memory Size is 4GB in 32bit systems for any processes, no matter how many applications you are running. Any processes can allocate up to 4GB memory in 32bit systems.
In practice, Windows automatically allocates some parts of allocated memory from RAM or page-file depending on processes requests and paging file mechanism.
But another limit is the size of paging file. If you have a small paging-file, you cannot allocated large memories. You could increase the size of paging file according to Microsoft to have more memory space.
Buy more ram
Switch to a 64-bit OS. Combine with point 1.
To increase the amount of memory allocated to R you can use memory.limit
memory.limit(size = ...)
Or
memory.size(max = ...)
About the arguments
size - numeric. If NA report the memory limit, otherwise request a new limit, in Mb. Only values of up to 4095 are allowed on 32-bit R builds, but see ‘Details’.
max - logical. If TRUE the maximum amount of memory obtained from the OS is reported, if FALSE the amount currently in use, if NA the memory limit.
In RStudio, to increase:
file.edit(file.path("~", ".Rprofile"))
then in .Rprofile type this and save
invisible(utils::memory.limit(size = 60000))
To decrease:
open .Rprofile
invisible(utils::memory.limit(size = 30000))
save and restart RStudio.
I know that some version of this question has been addressed multiple times in the past, but I think this iteration of this widely shared problem is sufficiently distinct to justify its own response. I would like to permanently set the maximum memory available to R to largest value that my machine can handle, i.e., not just for a single session. I am running 64-bit R on a windows 7 machine with 6 gig of RAM.
Currently I am trying to do a conversion of a 10 GB Stata file into a .rds object. On similar smaller objects the compression in the .dta to .rds conversion has been by a factor of four or better, and I (rather surprisingly) have not had any trouble doing dplyr manipulation on objects of 2 to 3 GB (after compression), even when two of them and work product are all in memory at once. This seems to conflict with my previous belief that the amount of physical RAM is the absolute upper limit of what R can handle, as I am fairly certain that between loaded .rds objects and various intermediate work products I have had more than 6 GB of undeleted objects laying about my workspace at one time.
I find conflicting statements about whether the maximum memory size is my actual RAM less OS demands, or my actual RAM, or my actual RAM plus an unknown (to me) amount of virtual RAM (subject to a potentially serious slowdown when you reach into virtual RAM). These file conversions are one-time (per file) jobs and I do not care if they are slow.
Looking at the base R help page on “Memory limits” and the help-pages for memory.size(), it seems that there are multiple distinct limits under Windows, relating to total memory used in a session, available to a single process, allocatable by malloc or contained in a single vector. The individual vectors in my file are only around eight million rows long.
memory.size and memory.limit both report current settings in the neighborhood of 6 GB. I got multiple warning messages saying that I was pressed up against that limit, but the actual error message was something like “cannot allocate vector of length 120 MB”.
So I think there are three distinct questions:
How do I determine the maximum possible memory for each 64-bit R
memory setting; and
How many distinct memory settings do I need to make; and
How do I make them permanently, as opposed to for a single session?
Following the advice of #Konrad below, I had this rather puzzling exchange with R/RStudio:
> memory.size()
[1] 424.85
> memory.size(max=TRUE)
[1] 454.94
> memory.size()
[1] 436.89
> memory.size(5000)
[1] 6046
Warning message:
In memory.size(5000) : cannot decrease memory limit: ignored
> memory.size()
[1] 446.27
The first three interactions seem to suggest that there is a hard memory limit on my machine of 455 MB. The second-to-last one, on the other hand, appears to be saying that the memory limit is set at my RAM level, without allowance for the OS, and without using virtual memory. Then the last one goes back claiming to a limit of around 450.
I just tried the recommendation here:
Increasing (or decreasing) the memory available to R processes
but with 6000 MB rather than 500; I'll provide a report.
I am running into issues trying to use large objects in R. For example:
> memory.limit(4000)
> a = matrix(NA, 1500000, 60)
> a = matrix(NA, 2500000, 60)
> a = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb
> a = matrix(NA, 2500000, 60)
Error: cannot allocate vector of size 572.2 Mb # Can't go smaller anymore
> rm(list=ls(all=TRUE))
> a = matrix(NA, 3500000, 60) # Now it works
> b = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb # But that is all there is room for
I understand that this is related to the difficulty of obtaining contiguous blocks of memory (from here):
Error messages beginning cannot
allocate vector of size indicate a
failure to obtain memory, either
because the size exceeded the
address-space limit for a process or,
more likely, because the system was
unable to provide the memory. Note
that on a 32-bit build there may well
be enough free memory available, but
not a large enough contiguous block of
address space into which to map it.
How can I get around this? My main difficulty is that I get to a certain point in my script and R can't allocate 200-300 Mb for an object... I can't really pre-allocate the block because I need the memory for other processing. This happens even when I dilligently remove unneeded objects.
EDIT: Yes, sorry: Windows XP SP3, 4Gb RAM, R 2.12.0:
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_Caribbean.1252 LC_CTYPE=English_Caribbean.1252
[3] LC_MONETARY=English_Caribbean.1252 LC_NUMERIC=C
[5] LC_TIME=English_Caribbean.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
Consider whether you really need all this data explicitly, or can the matrix be sparse? There is good support in R (see Matrix package for e.g.) for sparse matrices.
Keep all other processes and objects in R to a minimum when you need to make objects of this size. Use gc() to clear now unused memory, or, better only create the object you need in one session.
If the above cannot help, get a 64-bit machine with as much RAM as you can afford, and install 64-bit R.
If you cannot do that there are many online services for remote computing.
If you cannot do that the memory-mapping tools like package ff (or bigmemory as Sascha mentions) will help you build a new solution. In my limited experience ff is the more advanced package, but you should read the High Performance Computing topic on CRAN Task Views.
For Windows users, the following helped me a lot to understand some memory limitations:
before opening R, open the Windows Resource Monitor (Ctrl-Alt-Delete / Start Task Manager / Performance tab / click on bottom button 'Resource Monitor' / Memory tab)
you will see how much RAM memory us already used before you open R, and by which applications. In my case, 1.6 GB of the total 4GB are used. So I will only be able to get 2.4 GB for R, but now comes the worse...
open R and create a data set of 1.5 GB, then reduce its size to 0.5 GB, the Resource Monitor shows my RAM is used at nearly 95%.
use gc() to do garbage collection => it works, I can see the memory use go down to 2 GB
Additional advice that works on my machine:
prepare the features, save as an RData file, close R, re-open R, and load the train features. The Resource Manager typically shows a lower Memory usage, which means that even gc() does not recover all possible memory and closing/re-opening R works the best to start with maximum memory available.
the other trick is to only load train set for training (do not load the test set, which can typically be half the size of train set). The training phase can use memory to the maximum (100%), so anything available is useful. All this is to take with a grain of salt as I am experimenting with R memory limits.
I followed to the help page of memory.limit and found out that on my computer R by default can use up to ~ 1.5 GB of RAM and that the user can increase this limit. Using the following code,
>memory.limit()
[1] 1535.875
> memory.limit(size=1800)
helped me to solve my problem.
Here is a presentation on this topic that you might find interesting:
http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/
I haven't tried the discussed things myself, but the bigmemory package seems very useful
The simplest way to sidestep this limitation is to switch to 64 bit R.
I encountered a similar problem, and I used 2 flash drives as 'ReadyBoost'. The two drives gave additional 8GB boost of memory (for cache) and it solved the problem and also increased the speed of the system as a whole.
To use Readyboost, right click on the drive, go to properties and select 'ReadyBoost' and select 'use this device' radio button and click apply or ok to configure.
If you are running your script at linux environment you can use this command:
bsub -q server_name -R "rusage[mem=requested_memory]" "Rscript script_name.R"
and the server will allocate the requested memory for you (according to the server limits, but with good server - hugefiles can be used)
One option is before and after running your command that causes high memory consumption to do a "garbage collection" by running the gc() command, this will free up memory for your analyzes, in addition to using the memory.limit() command.
Example:
gc()
memory.limit(9999999999)
fit <-lm(Y ~ X)
gc()
The save/load method mentioned above works for me. I am not sure how/if gc() defrags the memory but this seems to work.
# defrag memory
save.image(file="temp.RData")
rm(list=ls())
load(file="temp.RData")
I would like to increase (or decrease) the amount of memory available to R. What are the methods for achieving this?
From:
http://gking.harvard.edu/zelig/docs/How_do_I2.html (mirror)
Windows users may get the error that R
has run out of memory.
If you have R already installed and
subsequently install more RAM, you may
have to reinstall R in order to take
advantage of the additional capacity.
You may also set the amount of
available memory manually. Close R,
then right-click on your R program
icon (the icon on your desktop or in
your programs directory). Select
``Properties'', and then select the
``Shortcut'' tab. Look for the
``Target'' field and after the closing
quotes around the location of the R
executible, add
--max-mem-size=500M
as shown in the figure below. You may
increase this value up to 2GB or the
maximum amount of physical RAM you
have installed.
If you get the error that R cannot
allocate a vector of length x, close
out of R and add the following line to
the ``Target'' field:
--max-vsize=500M
or as appropriate. You can always
check to see how much memory R has
available by typing at the R prompt
memory.limit()
which gives you the amount of available memory in MB. In previous versions of R you needed to use: round(memory.limit()/2^20, 2).
Use memory.limit(). You can increase the default using this command, memory.limit(size=2500), where the size is in MB. You need to be using 64-bit in order to take real advantage of this.
One other suggestion is to use memory efficient objects wherever possible: for instance, use a matrix instead of a data.frame.
For linux/unix, I can suggest unix package.
To increase the memory limit in linux:
install.packages("unix")
library(unix)
rlimit_as(1e12) #increases to ~12GB
You can also check the memory with this:
rlimit_all()
for detailed information:
https://rdrr.io/cran/unix/man/rlimit.html
also you can find further info here:
limiting memory usage in R under linux
Microsoft Windows accepts any memory request from processes if it could be done.
There is no limit for the memory that can be provided to a process, except the Virtual Memory Size.
Virtual Memory Size is 4GB in 32bit systems for any processes, no matter how many applications you are running. Any processes can allocate up to 4GB memory in 32bit systems.
In practice, Windows automatically allocates some parts of allocated memory from RAM or page-file depending on processes requests and paging file mechanism.
But another limit is the size of paging file. If you have a small paging-file, you cannot allocated large memories. You could increase the size of paging file according to Microsoft to have more memory space.
Buy more ram
Switch to a 64-bit OS. Combine with point 1.
To increase the amount of memory allocated to R you can use memory.limit
memory.limit(size = ...)
Or
memory.size(max = ...)
About the arguments
size - numeric. If NA report the memory limit, otherwise request a new limit, in Mb. Only values of up to 4095 are allowed on 32-bit R builds, but see ‘Details’.
max - logical. If TRUE the maximum amount of memory obtained from the OS is reported, if FALSE the amount currently in use, if NA the memory limit.
In RStudio, to increase:
file.edit(file.path("~", ".Rprofile"))
then in .Rprofile type this and save
invisible(utils::memory.limit(size = 60000))
To decrease:
open .Rprofile
invisible(utils::memory.limit(size = 30000))
save and restart RStudio.