Using swap in R - r

I'm trying to join two tables that are ~100 MB more than a previous successful attempt.
This is what I tried:
left_join(A, B, by = c("col_1","col_2","col_3"))
And I get
Error in left_join_impl(x, y, by$x, by$y, suffix$x, suffix$y, check_na_matches(na_matches)) :
std::bad_alloc
Meaning that I'm out of RAM.
Have you bypassed a similar issue like using swap instead of RAM?

Yes.
I increased swap partion size in linux and that prevented before common R crashes with error message "bad_alloc" caused by RAM shortage.
I increased the partion from 8 GB to 16 GB size in Ubuntu using gparted application (Before I had created linux-swap also using gparted). Increasing only the partion size was not enough but also following steps were required to make changes in effect:
I had to right click the created partition and click "swapon".
I had check the UUID identity code clicking information in the window of previous step (Running blkid on command line also gives the information).
I had to edit line with text "swap" in the file /etc/fstab to include the UUID code found out in the previous step.
I rebooted system.
WARNING: This solution might be at cost of a somewhat shorter lifespan of the SSD (source). So if it is possible to increase physical RAM memory in the system increasing RAM might be preferred solution but also with a cost, monetary one.

Related

How to allocate enough memory to join datasets in R [duplicate]

I would like to increase (or decrease) the amount of memory available to R. What are the methods for achieving this?
From:
http://gking.harvard.edu/zelig/docs/How_do_I2.html (mirror)
Windows users may get the error that R
has run out of memory.
If you have R already installed and
subsequently install more RAM, you may
have to reinstall R in order to take
advantage of the additional capacity.
You may also set the amount of
available memory manually. Close R,
then right-click on your R program
icon (the icon on your desktop or in
your programs directory). Select
``Properties'', and then select the
``Shortcut'' tab. Look for the
``Target'' field and after the closing
quotes around the location of the R
executible, add
--max-mem-size=500M
as shown in the figure below. You may
increase this value up to 2GB or the
maximum amount of physical RAM you
have installed.
If you get the error that R cannot
allocate a vector of length x, close
out of R and add the following line to
the ``Target'' field:
--max-vsize=500M
or as appropriate. You can always
check to see how much memory R has
available by typing at the R prompt
memory.limit()
which gives you the amount of available memory in MB. In previous versions of R you needed to use: round(memory.limit()/2^20, 2).
Use memory.limit(). You can increase the default using this command, memory.limit(size=2500), where the size is in MB. You need to be using 64-bit in order to take real advantage of this.
One other suggestion is to use memory efficient objects wherever possible: for instance, use a matrix instead of a data.frame.
For linux/unix, I can suggest unix package.
To increase the memory limit in linux:
install.packages("unix")
library(unix)
rlimit_as(1e12) #increases to ~12GB
You can also check the memory with this:
rlimit_all()
for detailed information:
https://rdrr.io/cran/unix/man/rlimit.html
also you can find further info here:
limiting memory usage in R under linux
Microsoft Windows accepts any memory request from processes if it could be done.
There is no limit for the memory that can be provided to a process, except the Virtual Memory Size.
Virtual Memory Size is 4GB in 32bit systems for any processes, no matter how many applications you are running. Any processes can allocate up to 4GB memory in 32bit systems.
In practice, Windows automatically allocates some parts of allocated memory from RAM or page-file depending on processes requests and paging file mechanism.
But another limit is the size of paging file. If you have a small paging-file, you cannot allocated large memories. You could increase the size of paging file according to Microsoft to have more memory space.
Buy more ram
Switch to a 64-bit OS. Combine with point 1.
To increase the amount of memory allocated to R you can use memory.limit
memory.limit(size = ...)
Or
memory.size(max = ...)
About the arguments
size - numeric. If NA report the memory limit, otherwise request a new limit, in Mb. Only values of up to 4095 are allowed on 32-bit R builds, but see ‘Details’.
max - logical. If TRUE the maximum amount of memory obtained from the OS is reported, if FALSE the amount currently in use, if NA the memory limit.
In RStudio, to increase:
file.edit(file.path("~", ".Rprofile"))
then in .Rprofile type this and save
invisible(utils::memory.limit(size = 60000))
To decrease:
open .Rprofile
invisible(utils::memory.limit(size = 30000))
save and restart RStudio.

R: How do I permanently set the amount of memory R will use to the maximum for my machine?

I know that some version of this question has been addressed multiple times in the past, but I think this iteration of this widely shared problem is sufficiently distinct to justify its own response. I would like to permanently set the maximum memory available to R to largest value that my machine can handle, i.e., not just for a single session. I am running 64-bit R on a windows 7 machine with 6 gig of RAM.
Currently I am trying to do a conversion of a 10 GB Stata file into a .rds object. On similar smaller objects the compression in the .dta to .rds conversion has been by a factor of four or better, and I (rather surprisingly) have not had any trouble doing dplyr manipulation on objects of 2 to 3 GB (after compression), even when two of them and work product are all in memory at once. This seems to conflict with my previous belief that the amount of physical RAM is the absolute upper limit of what R can handle, as I am fairly certain that between loaded .rds objects and various intermediate work products I have had more than 6 GB of undeleted objects laying about my workspace at one time.
I find conflicting statements about whether the maximum memory size is my actual RAM less OS demands, or my actual RAM, or my actual RAM plus an unknown (to me) amount of virtual RAM (subject to a potentially serious slowdown when you reach into virtual RAM). These file conversions are one-time (per file) jobs and I do not care if they are slow.
Looking at the base R help page on “Memory limits” and the help-pages for memory.size(), it seems that there are multiple distinct limits under Windows, relating to total memory used in a session, available to a single process, allocatable by malloc or contained in a single vector. The individual vectors in my file are only around eight million rows long.
memory.size and memory.limit both report current settings in the neighborhood of 6 GB. I got multiple warning messages saying that I was pressed up against that limit, but the actual error message was something like “cannot allocate vector of length 120 MB”.
So I think there are three distinct questions:
How do I determine the maximum possible memory for each 64-bit R
memory setting; and
How many distinct memory settings do I need to make; and
How do I make them permanently, as opposed to for a single session?
Following the advice of #Konrad below, I had this rather puzzling exchange with R/RStudio:
> memory.size()
[1] 424.85
> memory.size(max=TRUE)
[1] 454.94
> memory.size()
[1] 436.89
> memory.size(5000)
[1] 6046
Warning message:
In memory.size(5000) : cannot decrease memory limit: ignored
> memory.size()
[1] 446.27
The first three interactions seem to suggest that there is a hard memory limit on my machine of 455 MB. The second-to-last one, on the other hand, appears to be saying that the memory limit is set at my RAM level, without allowance for the OS, and without using virtual memory. Then the last one goes back claiming to a limit of around 450.
I just tried the recommendation here:
Increasing (or decreasing) the memory available to R processes
but with 6000 MB rather than 500; I'll provide a report.

Is filebacked.big.matrix in the bigmemory packagage memory neutral?

I have been using filebacked.big.matrix to store a very large matrix (~1 million x 20 thousand). I am working on a cluster with very high memory, but not quite that much. I have previously used the ff package which worked great and kept the memory usage consistent despite the matrix size, but it died when I surpassed 10^32 items in the matrix (R community really needs to fix that problem). the filebacked.big.matrix initially seemed to work very well and generally runs without problems, but when I check on the memory usage it is sometimes spiking into the 100s of GBs. I am careful to only read/write to the matrix a relatively few rows at a time, so I think there should not be much in memory at any given time.
Does it do some sort of automatic memory caching or something that is driving the memory usage up? If so can this caching be disabled or limited? The high memory usage is causing some nasty side effects on the cluster so I need a way to do this that is memory neutral. I have checked the filebacked.big.matrix help page, but can't find any pertinent information there.
Thanks!
UPDATE:
I am also using bigmemoryExtras.
I was wrong earlier, the problem is happening when I loop through the entire matrix reading it into a different, smaller file.backed matrix like this:
tmpGeno=fileBackedMatrix(rowIndex-1,numMarkers,'double',tmpDir)
front=1
back=40000
large matrix must be copied in chunks to avoid integer.max errors
while(front < rowIndex-1){
if(back>rowIndex-1) back=rowIndex-1
tmpGeno[front:back,1:numMarkers]=genotypeMatrix[front:back,1:numMarkers,drop=F]
front=front+40000
back=back+40000
}
The physical memory usage is initially very low (with virtual memory very high). But while running this loop, and even after it has finished it seems to just keep most of the matrix in physical memory. I need it to only keep the one small chunk of the matrix in memory at a time.
UPDATE 2:
It is a bit confusing to me: the cluster metrics and top command say that it is using tons of memory (~80GB), but the gc() command says that memory usage never went over 2GB. The free command says that all the memory is used, but in the -/+ buffers/cache line is says only 7GB are being used total.

R - Memory allocation besides objects in ls()

I have loaded a fairly large set of data using data.table. I then want to add around 30 columns using instructions of the form:
DT[, x5:=cumsum(y1), by=list(x1, x2)]
DT[, x6:=cummean(y2), by=x1]
At some point I start to get "warnings" like this:
1: In structure(.Call(C_objectSize, x), class = "object_size") :
Reached total allocation of 8072Mb: see help(memory.size)
I check the tracemem(DT) every now and then to assure that no copies are made. The only output I ever get is:
"<0000000005E8E700>"
Also I check ls() to see which objects are in use and object.size() to see how much of my RAM is allocated by the object. The only output of ls() is my data.table and the object size after the first error is 5303.1 Mb.
I am on a Windows 64-bit machine running R in 64-bit and have 8 GB RAM. Of these 8 GB RAM only 80% are in use when I get the warning. Of these R is using 5214.0 Mb (strange since the table is bigger than this).
My question is, if the only RAM R is using is 5303.1 Mb and I still have around 2 Gb of free memory why do I get the error that R has reached the limit of 8 Gb and is there anything I can do against it? If not, what are other options? I know I could use Bigmemory but then I would have to rewrite my whole code and would loose the sweet by-reference modifications which data.table offers.
The problem is that the operations require RAM beyond what the object itself takes up. You could verify that windows is using a page file. If it is you could try increasing its size. http://windows.microsoft.com/en-us/windows/change-virtual-memory-size
If that fails you could try to run a live environment of Lubuntu linux to see if its memory overhead is small enough to allow the operation. http://lubuntu.net/
Ultimately, I suspect you're going to have to use bigmemory or similar.

Increasing (or decreasing) the memory available to R processes

I would like to increase (or decrease) the amount of memory available to R. What are the methods for achieving this?
From:
http://gking.harvard.edu/zelig/docs/How_do_I2.html (mirror)
Windows users may get the error that R
has run out of memory.
If you have R already installed and
subsequently install more RAM, you may
have to reinstall R in order to take
advantage of the additional capacity.
You may also set the amount of
available memory manually. Close R,
then right-click on your R program
icon (the icon on your desktop or in
your programs directory). Select
``Properties'', and then select the
``Shortcut'' tab. Look for the
``Target'' field and after the closing
quotes around the location of the R
executible, add
--max-mem-size=500M
as shown in the figure below. You may
increase this value up to 2GB or the
maximum amount of physical RAM you
have installed.
If you get the error that R cannot
allocate a vector of length x, close
out of R and add the following line to
the ``Target'' field:
--max-vsize=500M
or as appropriate. You can always
check to see how much memory R has
available by typing at the R prompt
memory.limit()
which gives you the amount of available memory in MB. In previous versions of R you needed to use: round(memory.limit()/2^20, 2).
Use memory.limit(). You can increase the default using this command, memory.limit(size=2500), where the size is in MB. You need to be using 64-bit in order to take real advantage of this.
One other suggestion is to use memory efficient objects wherever possible: for instance, use a matrix instead of a data.frame.
For linux/unix, I can suggest unix package.
To increase the memory limit in linux:
install.packages("unix")
library(unix)
rlimit_as(1e12) #increases to ~12GB
You can also check the memory with this:
rlimit_all()
for detailed information:
https://rdrr.io/cran/unix/man/rlimit.html
also you can find further info here:
limiting memory usage in R under linux
Microsoft Windows accepts any memory request from processes if it could be done.
There is no limit for the memory that can be provided to a process, except the Virtual Memory Size.
Virtual Memory Size is 4GB in 32bit systems for any processes, no matter how many applications you are running. Any processes can allocate up to 4GB memory in 32bit systems.
In practice, Windows automatically allocates some parts of allocated memory from RAM or page-file depending on processes requests and paging file mechanism.
But another limit is the size of paging file. If you have a small paging-file, you cannot allocated large memories. You could increase the size of paging file according to Microsoft to have more memory space.
Buy more ram
Switch to a 64-bit OS. Combine with point 1.
To increase the amount of memory allocated to R you can use memory.limit
memory.limit(size = ...)
Or
memory.size(max = ...)
About the arguments
size - numeric. If NA report the memory limit, otherwise request a new limit, in Mb. Only values of up to 4095 are allowed on 32-bit R builds, but see ‘Details’.
max - logical. If TRUE the maximum amount of memory obtained from the OS is reported, if FALSE the amount currently in use, if NA the memory limit.
In RStudio, to increase:
file.edit(file.path("~", ".Rprofile"))
then in .Rprofile type this and save
invisible(utils::memory.limit(size = 60000))
To decrease:
open .Rprofile
invisible(utils::memory.limit(size = 30000))
save and restart RStudio.

Resources