Training R caret models with big datasets [duplicate] - r

I am running into issues trying to use large objects in R. For example:
> memory.limit(4000)
> a = matrix(NA, 1500000, 60)
> a = matrix(NA, 2500000, 60)
> a = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb
> a = matrix(NA, 2500000, 60)
Error: cannot allocate vector of size 572.2 Mb # Can't go smaller anymore
> rm(list=ls(all=TRUE))
> a = matrix(NA, 3500000, 60) # Now it works
> b = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb # But that is all there is room for
I understand that this is related to the difficulty of obtaining contiguous blocks of memory (from here):
Error messages beginning cannot
allocate vector of size indicate a
failure to obtain memory, either
because the size exceeded the
address-space limit for a process or,
more likely, because the system was
unable to provide the memory. Note
that on a 32-bit build there may well
be enough free memory available, but
not a large enough contiguous block of
address space into which to map it.
How can I get around this? My main difficulty is that I get to a certain point in my script and R can't allocate 200-300 Mb for an object... I can't really pre-allocate the block because I need the memory for other processing. This happens even when I dilligently remove unneeded objects.
EDIT: Yes, sorry: Windows XP SP3, 4Gb RAM, R 2.12.0:
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_Caribbean.1252 LC_CTYPE=English_Caribbean.1252
[3] LC_MONETARY=English_Caribbean.1252 LC_NUMERIC=C
[5] LC_TIME=English_Caribbean.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base

Consider whether you really need all this data explicitly, or can the matrix be sparse? There is good support in R (see Matrix package for e.g.) for sparse matrices.
Keep all other processes and objects in R to a minimum when you need to make objects of this size. Use gc() to clear now unused memory, or, better only create the object you need in one session.
If the above cannot help, get a 64-bit machine with as much RAM as you can afford, and install 64-bit R.
If you cannot do that there are many online services for remote computing.
If you cannot do that the memory-mapping tools like package ff (or bigmemory as Sascha mentions) will help you build a new solution. In my limited experience ff is the more advanced package, but you should read the High Performance Computing topic on CRAN Task Views.

For Windows users, the following helped me a lot to understand some memory limitations:
before opening R, open the Windows Resource Monitor (Ctrl-Alt-Delete / Start Task Manager / Performance tab / click on bottom button 'Resource Monitor' / Memory tab)
you will see how much RAM memory us already used before you open R, and by which applications. In my case, 1.6 GB of the total 4GB are used. So I will only be able to get 2.4 GB for R, but now comes the worse...
open R and create a data set of 1.5 GB, then reduce its size to 0.5 GB, the Resource Monitor shows my RAM is used at nearly 95%.
use gc() to do garbage collection => it works, I can see the memory use go down to 2 GB
Additional advice that works on my machine:
prepare the features, save as an RData file, close R, re-open R, and load the train features. The Resource Manager typically shows a lower Memory usage, which means that even gc() does not recover all possible memory and closing/re-opening R works the best to start with maximum memory available.
the other trick is to only load train set for training (do not load the test set, which can typically be half the size of train set). The training phase can use memory to the maximum (100%), so anything available is useful. All this is to take with a grain of salt as I am experimenting with R memory limits.

I followed to the help page of memory.limit and found out that on my computer R by default can use up to ~ 1.5 GB of RAM and that the user can increase this limit. Using the following code,
>memory.limit()
[1] 1535.875
> memory.limit(size=1800)
helped me to solve my problem.

Here is a presentation on this topic that you might find interesting:
http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/
I haven't tried the discussed things myself, but the bigmemory package seems very useful

The simplest way to sidestep this limitation is to switch to 64 bit R.

I encountered a similar problem, and I used 2 flash drives as 'ReadyBoost'. The two drives gave additional 8GB boost of memory (for cache) and it solved the problem and also increased the speed of the system as a whole.
To use Readyboost, right click on the drive, go to properties and select 'ReadyBoost' and select 'use this device' radio button and click apply or ok to configure.

If you are running your script at linux environment you can use this command:
bsub -q server_name -R "rusage[mem=requested_memory]" "Rscript script_name.R"
and the server will allocate the requested memory for you (according to the server limits, but with good server - hugefiles can be used)

One option is before and after running your command that causes high memory consumption to do a "garbage collection" by running the gc() command, this will free up memory for your analyzes, in addition to using the memory.limit() command.
Example:
gc()
memory.limit(9999999999)
fit <-lm(Y ~ X)
gc()

The save/load method mentioned above works for me. I am not sure how/if gc() defrags the memory but this seems to work.
# defrag memory
save.image(file="temp.RData")
rm(list=ls())
load(file="temp.RData")

Related

How to allocate vector size of any Gb in R studio [duplicate]

I am running into issues trying to use large objects in R. For example:
> memory.limit(4000)
> a = matrix(NA, 1500000, 60)
> a = matrix(NA, 2500000, 60)
> a = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb
> a = matrix(NA, 2500000, 60)
Error: cannot allocate vector of size 572.2 Mb # Can't go smaller anymore
> rm(list=ls(all=TRUE))
> a = matrix(NA, 3500000, 60) # Now it works
> b = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb # But that is all there is room for
I understand that this is related to the difficulty of obtaining contiguous blocks of memory (from here):
Error messages beginning cannot
allocate vector of size indicate a
failure to obtain memory, either
because the size exceeded the
address-space limit for a process or,
more likely, because the system was
unable to provide the memory. Note
that on a 32-bit build there may well
be enough free memory available, but
not a large enough contiguous block of
address space into which to map it.
How can I get around this? My main difficulty is that I get to a certain point in my script and R can't allocate 200-300 Mb for an object... I can't really pre-allocate the block because I need the memory for other processing. This happens even when I dilligently remove unneeded objects.
EDIT: Yes, sorry: Windows XP SP3, 4Gb RAM, R 2.12.0:
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_Caribbean.1252 LC_CTYPE=English_Caribbean.1252
[3] LC_MONETARY=English_Caribbean.1252 LC_NUMERIC=C
[5] LC_TIME=English_Caribbean.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
Consider whether you really need all this data explicitly, or can the matrix be sparse? There is good support in R (see Matrix package for e.g.) for sparse matrices.
Keep all other processes and objects in R to a minimum when you need to make objects of this size. Use gc() to clear now unused memory, or, better only create the object you need in one session.
If the above cannot help, get a 64-bit machine with as much RAM as you can afford, and install 64-bit R.
If you cannot do that there are many online services for remote computing.
If you cannot do that the memory-mapping tools like package ff (or bigmemory as Sascha mentions) will help you build a new solution. In my limited experience ff is the more advanced package, but you should read the High Performance Computing topic on CRAN Task Views.
For Windows users, the following helped me a lot to understand some memory limitations:
before opening R, open the Windows Resource Monitor (Ctrl-Alt-Delete / Start Task Manager / Performance tab / click on bottom button 'Resource Monitor' / Memory tab)
you will see how much RAM memory us already used before you open R, and by which applications. In my case, 1.6 GB of the total 4GB are used. So I will only be able to get 2.4 GB for R, but now comes the worse...
open R and create a data set of 1.5 GB, then reduce its size to 0.5 GB, the Resource Monitor shows my RAM is used at nearly 95%.
use gc() to do garbage collection => it works, I can see the memory use go down to 2 GB
Additional advice that works on my machine:
prepare the features, save as an RData file, close R, re-open R, and load the train features. The Resource Manager typically shows a lower Memory usage, which means that even gc() does not recover all possible memory and closing/re-opening R works the best to start with maximum memory available.
the other trick is to only load train set for training (do not load the test set, which can typically be half the size of train set). The training phase can use memory to the maximum (100%), so anything available is useful. All this is to take with a grain of salt as I am experimenting with R memory limits.
I followed to the help page of memory.limit and found out that on my computer R by default can use up to ~ 1.5 GB of RAM and that the user can increase this limit. Using the following code,
>memory.limit()
[1] 1535.875
> memory.limit(size=1800)
helped me to solve my problem.
Here is a presentation on this topic that you might find interesting:
http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/
I haven't tried the discussed things myself, but the bigmemory package seems very useful
The simplest way to sidestep this limitation is to switch to 64 bit R.
I encountered a similar problem, and I used 2 flash drives as 'ReadyBoost'. The two drives gave additional 8GB boost of memory (for cache) and it solved the problem and also increased the speed of the system as a whole.
To use Readyboost, right click on the drive, go to properties and select 'ReadyBoost' and select 'use this device' radio button and click apply or ok to configure.
If you are running your script at linux environment you can use this command:
bsub -q server_name -R "rusage[mem=requested_memory]" "Rscript script_name.R"
and the server will allocate the requested memory for you (according to the server limits, but with good server - hugefiles can be used)
One option is before and after running your command that causes high memory consumption to do a "garbage collection" by running the gc() command, this will free up memory for your analyzes, in addition to using the memory.limit() command.
Example:
gc()
memory.limit(9999999999)
fit <-lm(Y ~ X)
gc()
The save/load method mentioned above works for me. I am not sure how/if gc() defrags the memory but this seems to work.
# defrag memory
save.image(file="temp.RData")
rm(list=ls())
load(file="temp.RData")

How to reduce the occupied RAM when you are dealing with a very sparse matrix in a single-cell Experiment in R? [duplicate]

I am running into issues trying to use large objects in R. For example:
> memory.limit(4000)
> a = matrix(NA, 1500000, 60)
> a = matrix(NA, 2500000, 60)
> a = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb
> a = matrix(NA, 2500000, 60)
Error: cannot allocate vector of size 572.2 Mb # Can't go smaller anymore
> rm(list=ls(all=TRUE))
> a = matrix(NA, 3500000, 60) # Now it works
> b = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb # But that is all there is room for
I understand that this is related to the difficulty of obtaining contiguous blocks of memory (from here):
Error messages beginning cannot
allocate vector of size indicate a
failure to obtain memory, either
because the size exceeded the
address-space limit for a process or,
more likely, because the system was
unable to provide the memory. Note
that on a 32-bit build there may well
be enough free memory available, but
not a large enough contiguous block of
address space into which to map it.
How can I get around this? My main difficulty is that I get to a certain point in my script and R can't allocate 200-300 Mb for an object... I can't really pre-allocate the block because I need the memory for other processing. This happens even when I dilligently remove unneeded objects.
EDIT: Yes, sorry: Windows XP SP3, 4Gb RAM, R 2.12.0:
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_Caribbean.1252 LC_CTYPE=English_Caribbean.1252
[3] LC_MONETARY=English_Caribbean.1252 LC_NUMERIC=C
[5] LC_TIME=English_Caribbean.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
Consider whether you really need all this data explicitly, or can the matrix be sparse? There is good support in R (see Matrix package for e.g.) for sparse matrices.
Keep all other processes and objects in R to a minimum when you need to make objects of this size. Use gc() to clear now unused memory, or, better only create the object you need in one session.
If the above cannot help, get a 64-bit machine with as much RAM as you can afford, and install 64-bit R.
If you cannot do that there are many online services for remote computing.
If you cannot do that the memory-mapping tools like package ff (or bigmemory as Sascha mentions) will help you build a new solution. In my limited experience ff is the more advanced package, but you should read the High Performance Computing topic on CRAN Task Views.
For Windows users, the following helped me a lot to understand some memory limitations:
before opening R, open the Windows Resource Monitor (Ctrl-Alt-Delete / Start Task Manager / Performance tab / click on bottom button 'Resource Monitor' / Memory tab)
you will see how much RAM memory us already used before you open R, and by which applications. In my case, 1.6 GB of the total 4GB are used. So I will only be able to get 2.4 GB for R, but now comes the worse...
open R and create a data set of 1.5 GB, then reduce its size to 0.5 GB, the Resource Monitor shows my RAM is used at nearly 95%.
use gc() to do garbage collection => it works, I can see the memory use go down to 2 GB
Additional advice that works on my machine:
prepare the features, save as an RData file, close R, re-open R, and load the train features. The Resource Manager typically shows a lower Memory usage, which means that even gc() does not recover all possible memory and closing/re-opening R works the best to start with maximum memory available.
the other trick is to only load train set for training (do not load the test set, which can typically be half the size of train set). The training phase can use memory to the maximum (100%), so anything available is useful. All this is to take with a grain of salt as I am experimenting with R memory limits.
I followed to the help page of memory.limit and found out that on my computer R by default can use up to ~ 1.5 GB of RAM and that the user can increase this limit. Using the following code,
>memory.limit()
[1] 1535.875
> memory.limit(size=1800)
helped me to solve my problem.
Here is a presentation on this topic that you might find interesting:
http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/
I haven't tried the discussed things myself, but the bigmemory package seems very useful
The simplest way to sidestep this limitation is to switch to 64 bit R.
I encountered a similar problem, and I used 2 flash drives as 'ReadyBoost'. The two drives gave additional 8GB boost of memory (for cache) and it solved the problem and also increased the speed of the system as a whole.
To use Readyboost, right click on the drive, go to properties and select 'ReadyBoost' and select 'use this device' radio button and click apply or ok to configure.
If you are running your script at linux environment you can use this command:
bsub -q server_name -R "rusage[mem=requested_memory]" "Rscript script_name.R"
and the server will allocate the requested memory for you (according to the server limits, but with good server - hugefiles can be used)
One option is before and after running your command that causes high memory consumption to do a "garbage collection" by running the gc() command, this will free up memory for your analyzes, in addition to using the memory.limit() command.
Example:
gc()
memory.limit(9999999999)
fit <-lm(Y ~ X)
gc()
The save/load method mentioned above works for me. I am not sure how/if gc() defrags the memory but this seems to work.
# defrag memory
save.image(file="temp.RData")
rm(list=ls())
load(file="temp.RData")

Cannot allocate a vector of 2.8Gb error in R studio [duplicate]

I am running into issues trying to use large objects in R. For example:
> memory.limit(4000)
> a = matrix(NA, 1500000, 60)
> a = matrix(NA, 2500000, 60)
> a = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb
> a = matrix(NA, 2500000, 60)
Error: cannot allocate vector of size 572.2 Mb # Can't go smaller anymore
> rm(list=ls(all=TRUE))
> a = matrix(NA, 3500000, 60) # Now it works
> b = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb # But that is all there is room for
I understand that this is related to the difficulty of obtaining contiguous blocks of memory (from here):
Error messages beginning cannot
allocate vector of size indicate a
failure to obtain memory, either
because the size exceeded the
address-space limit for a process or,
more likely, because the system was
unable to provide the memory. Note
that on a 32-bit build there may well
be enough free memory available, but
not a large enough contiguous block of
address space into which to map it.
How can I get around this? My main difficulty is that I get to a certain point in my script and R can't allocate 200-300 Mb for an object... I can't really pre-allocate the block because I need the memory for other processing. This happens even when I dilligently remove unneeded objects.
EDIT: Yes, sorry: Windows XP SP3, 4Gb RAM, R 2.12.0:
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_Caribbean.1252 LC_CTYPE=English_Caribbean.1252
[3] LC_MONETARY=English_Caribbean.1252 LC_NUMERIC=C
[5] LC_TIME=English_Caribbean.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
Consider whether you really need all this data explicitly, or can the matrix be sparse? There is good support in R (see Matrix package for e.g.) for sparse matrices.
Keep all other processes and objects in R to a minimum when you need to make objects of this size. Use gc() to clear now unused memory, or, better only create the object you need in one session.
If the above cannot help, get a 64-bit machine with as much RAM as you can afford, and install 64-bit R.
If you cannot do that there are many online services for remote computing.
If you cannot do that the memory-mapping tools like package ff (or bigmemory as Sascha mentions) will help you build a new solution. In my limited experience ff is the more advanced package, but you should read the High Performance Computing topic on CRAN Task Views.
For Windows users, the following helped me a lot to understand some memory limitations:
before opening R, open the Windows Resource Monitor (Ctrl-Alt-Delete / Start Task Manager / Performance tab / click on bottom button 'Resource Monitor' / Memory tab)
you will see how much RAM memory us already used before you open R, and by which applications. In my case, 1.6 GB of the total 4GB are used. So I will only be able to get 2.4 GB for R, but now comes the worse...
open R and create a data set of 1.5 GB, then reduce its size to 0.5 GB, the Resource Monitor shows my RAM is used at nearly 95%.
use gc() to do garbage collection => it works, I can see the memory use go down to 2 GB
Additional advice that works on my machine:
prepare the features, save as an RData file, close R, re-open R, and load the train features. The Resource Manager typically shows a lower Memory usage, which means that even gc() does not recover all possible memory and closing/re-opening R works the best to start with maximum memory available.
the other trick is to only load train set for training (do not load the test set, which can typically be half the size of train set). The training phase can use memory to the maximum (100%), so anything available is useful. All this is to take with a grain of salt as I am experimenting with R memory limits.
I followed to the help page of memory.limit and found out that on my computer R by default can use up to ~ 1.5 GB of RAM and that the user can increase this limit. Using the following code,
>memory.limit()
[1] 1535.875
> memory.limit(size=1800)
helped me to solve my problem.
Here is a presentation on this topic that you might find interesting:
http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/
I haven't tried the discussed things myself, but the bigmemory package seems very useful
The simplest way to sidestep this limitation is to switch to 64 bit R.
I encountered a similar problem, and I used 2 flash drives as 'ReadyBoost'. The two drives gave additional 8GB boost of memory (for cache) and it solved the problem and also increased the speed of the system as a whole.
To use Readyboost, right click on the drive, go to properties and select 'ReadyBoost' and select 'use this device' radio button and click apply or ok to configure.
If you are running your script at linux environment you can use this command:
bsub -q server_name -R "rusage[mem=requested_memory]" "Rscript script_name.R"
and the server will allocate the requested memory for you (according to the server limits, but with good server - hugefiles can be used)
One option is before and after running your command that causes high memory consumption to do a "garbage collection" by running the gc() command, this will free up memory for your analyzes, in addition to using the memory.limit() command.
Example:
gc()
memory.limit(9999999999)
fit <-lm(Y ~ X)
gc()
The save/load method mentioned above works for me. I am not sure how/if gc() defrags the memory but this seems to work.
# defrag memory
save.image(file="temp.RData")
rm(list=ls())
load(file="temp.RData")

How to deal with huge dataset to avoid errors "cannot allocate vector of size .. Gb"? [duplicate]

I am running into issues trying to use large objects in R. For example:
> memory.limit(4000)
> a = matrix(NA, 1500000, 60)
> a = matrix(NA, 2500000, 60)
> a = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb
> a = matrix(NA, 2500000, 60)
Error: cannot allocate vector of size 572.2 Mb # Can't go smaller anymore
> rm(list=ls(all=TRUE))
> a = matrix(NA, 3500000, 60) # Now it works
> b = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb # But that is all there is room for
I understand that this is related to the difficulty of obtaining contiguous blocks of memory (from here):
Error messages beginning cannot
allocate vector of size indicate a
failure to obtain memory, either
because the size exceeded the
address-space limit for a process or,
more likely, because the system was
unable to provide the memory. Note
that on a 32-bit build there may well
be enough free memory available, but
not a large enough contiguous block of
address space into which to map it.
How can I get around this? My main difficulty is that I get to a certain point in my script and R can't allocate 200-300 Mb for an object... I can't really pre-allocate the block because I need the memory for other processing. This happens even when I dilligently remove unneeded objects.
EDIT: Yes, sorry: Windows XP SP3, 4Gb RAM, R 2.12.0:
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_Caribbean.1252 LC_CTYPE=English_Caribbean.1252
[3] LC_MONETARY=English_Caribbean.1252 LC_NUMERIC=C
[5] LC_TIME=English_Caribbean.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
Consider whether you really need all this data explicitly, or can the matrix be sparse? There is good support in R (see Matrix package for e.g.) for sparse matrices.
Keep all other processes and objects in R to a minimum when you need to make objects of this size. Use gc() to clear now unused memory, or, better only create the object you need in one session.
If the above cannot help, get a 64-bit machine with as much RAM as you can afford, and install 64-bit R.
If you cannot do that there are many online services for remote computing.
If you cannot do that the memory-mapping tools like package ff (or bigmemory as Sascha mentions) will help you build a new solution. In my limited experience ff is the more advanced package, but you should read the High Performance Computing topic on CRAN Task Views.
For Windows users, the following helped me a lot to understand some memory limitations:
before opening R, open the Windows Resource Monitor (Ctrl-Alt-Delete / Start Task Manager / Performance tab / click on bottom button 'Resource Monitor' / Memory tab)
you will see how much RAM memory us already used before you open R, and by which applications. In my case, 1.6 GB of the total 4GB are used. So I will only be able to get 2.4 GB for R, but now comes the worse...
open R and create a data set of 1.5 GB, then reduce its size to 0.5 GB, the Resource Monitor shows my RAM is used at nearly 95%.
use gc() to do garbage collection => it works, I can see the memory use go down to 2 GB
Additional advice that works on my machine:
prepare the features, save as an RData file, close R, re-open R, and load the train features. The Resource Manager typically shows a lower Memory usage, which means that even gc() does not recover all possible memory and closing/re-opening R works the best to start with maximum memory available.
the other trick is to only load train set for training (do not load the test set, which can typically be half the size of train set). The training phase can use memory to the maximum (100%), so anything available is useful. All this is to take with a grain of salt as I am experimenting with R memory limits.
I followed to the help page of memory.limit and found out that on my computer R by default can use up to ~ 1.5 GB of RAM and that the user can increase this limit. Using the following code,
>memory.limit()
[1] 1535.875
> memory.limit(size=1800)
helped me to solve my problem.
Here is a presentation on this topic that you might find interesting:
http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/
I haven't tried the discussed things myself, but the bigmemory package seems very useful
The simplest way to sidestep this limitation is to switch to 64 bit R.
I encountered a similar problem, and I used 2 flash drives as 'ReadyBoost'. The two drives gave additional 8GB boost of memory (for cache) and it solved the problem and also increased the speed of the system as a whole.
To use Readyboost, right click on the drive, go to properties and select 'ReadyBoost' and select 'use this device' radio button and click apply or ok to configure.
If you are running your script at linux environment you can use this command:
bsub -q server_name -R "rusage[mem=requested_memory]" "Rscript script_name.R"
and the server will allocate the requested memory for you (according to the server limits, but with good server - hugefiles can be used)
One option is before and after running your command that causes high memory consumption to do a "garbage collection" by running the gc() command, this will free up memory for your analyzes, in addition to using the memory.limit() command.
Example:
gc()
memory.limit(9999999999)
fit <-lm(Y ~ X)
gc()
The save/load method mentioned above works for me. I am not sure how/if gc() defrags the memory but this seems to work.
# defrag memory
save.image(file="temp.RData")
rm(list=ls())
load(file="temp.RData")

What should I do when R doesn't allocate vector of specific size? [duplicate]

I am running into issues trying to use large objects in R. For example:
> memory.limit(4000)
> a = matrix(NA, 1500000, 60)
> a = matrix(NA, 2500000, 60)
> a = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb
> a = matrix(NA, 2500000, 60)
Error: cannot allocate vector of size 572.2 Mb # Can't go smaller anymore
> rm(list=ls(all=TRUE))
> a = matrix(NA, 3500000, 60) # Now it works
> b = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb # But that is all there is room for
I understand that this is related to the difficulty of obtaining contiguous blocks of memory (from here):
Error messages beginning cannot
allocate vector of size indicate a
failure to obtain memory, either
because the size exceeded the
address-space limit for a process or,
more likely, because the system was
unable to provide the memory. Note
that on a 32-bit build there may well
be enough free memory available, but
not a large enough contiguous block of
address space into which to map it.
How can I get around this? My main difficulty is that I get to a certain point in my script and R can't allocate 200-300 Mb for an object... I can't really pre-allocate the block because I need the memory for other processing. This happens even when I dilligently remove unneeded objects.
EDIT: Yes, sorry: Windows XP SP3, 4Gb RAM, R 2.12.0:
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_Caribbean.1252 LC_CTYPE=English_Caribbean.1252
[3] LC_MONETARY=English_Caribbean.1252 LC_NUMERIC=C
[5] LC_TIME=English_Caribbean.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
Consider whether you really need all this data explicitly, or can the matrix be sparse? There is good support in R (see Matrix package for e.g.) for sparse matrices.
Keep all other processes and objects in R to a minimum when you need to make objects of this size. Use gc() to clear now unused memory, or, better only create the object you need in one session.
If the above cannot help, get a 64-bit machine with as much RAM as you can afford, and install 64-bit R.
If you cannot do that there are many online services for remote computing.
If you cannot do that the memory-mapping tools like package ff (or bigmemory as Sascha mentions) will help you build a new solution. In my limited experience ff is the more advanced package, but you should read the High Performance Computing topic on CRAN Task Views.
For Windows users, the following helped me a lot to understand some memory limitations:
before opening R, open the Windows Resource Monitor (Ctrl-Alt-Delete / Start Task Manager / Performance tab / click on bottom button 'Resource Monitor' / Memory tab)
you will see how much RAM memory us already used before you open R, and by which applications. In my case, 1.6 GB of the total 4GB are used. So I will only be able to get 2.4 GB for R, but now comes the worse...
open R and create a data set of 1.5 GB, then reduce its size to 0.5 GB, the Resource Monitor shows my RAM is used at nearly 95%.
use gc() to do garbage collection => it works, I can see the memory use go down to 2 GB
Additional advice that works on my machine:
prepare the features, save as an RData file, close R, re-open R, and load the train features. The Resource Manager typically shows a lower Memory usage, which means that even gc() does not recover all possible memory and closing/re-opening R works the best to start with maximum memory available.
the other trick is to only load train set for training (do not load the test set, which can typically be half the size of train set). The training phase can use memory to the maximum (100%), so anything available is useful. All this is to take with a grain of salt as I am experimenting with R memory limits.
I followed to the help page of memory.limit and found out that on my computer R by default can use up to ~ 1.5 GB of RAM and that the user can increase this limit. Using the following code,
>memory.limit()
[1] 1535.875
> memory.limit(size=1800)
helped me to solve my problem.
Here is a presentation on this topic that you might find interesting:
http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/
I haven't tried the discussed things myself, but the bigmemory package seems very useful
The simplest way to sidestep this limitation is to switch to 64 bit R.
I encountered a similar problem, and I used 2 flash drives as 'ReadyBoost'. The two drives gave additional 8GB boost of memory (for cache) and it solved the problem and also increased the speed of the system as a whole.
To use Readyboost, right click on the drive, go to properties and select 'ReadyBoost' and select 'use this device' radio button and click apply or ok to configure.
If you are running your script at linux environment you can use this command:
bsub -q server_name -R "rusage[mem=requested_memory]" "Rscript script_name.R"
and the server will allocate the requested memory for you (according to the server limits, but with good server - hugefiles can be used)
One option is before and after running your command that causes high memory consumption to do a "garbage collection" by running the gc() command, this will free up memory for your analyzes, in addition to using the memory.limit() command.
Example:
gc()
memory.limit(9999999999)
fit <-lm(Y ~ X)
gc()
The save/load method mentioned above works for me. I am not sure how/if gc() defrags the memory but this seems to work.
# defrag memory
save.image(file="temp.RData")
rm(list=ls())
load(file="temp.RData")

Resources