Cholmod error 'out of memory' : Merging Seurat Objects - r

I am trying to merge Seurat class objects that contain transcriptome count data (sparse matrix). I am relatively new to R, so any help/solutions is appreciated. I have added a screenshot of the data I'm working with.
**General Info:**
-------------
> memory.size(max = TRUE)
[1] 2533.94
R version 4.0.3 (2020-10-10)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 10 x64 (build 19041)
attached base packages:
[1] stats graphics grDevices utils
[5] datasets methods base
other attached packages:
[1] RSQLite_2.2.3 Seurat_3.2.3
I am not sure if my storage is the issue or if I should split the function in two.
options(stringsAsFactors = F)
setwd("C:/Users/Amara/OneDrive - Virginia Tech/XieLab/ZebraFish_Project/zf_brain-master/data")
folders <- list.files("C:/Users/Amara/OneDrive - Virginia Tech/XieLab/ZebraFish_Project/zf_brain-master/data")
library(Seurat)
library(dplyr)
zfbrainList = lapply(folders,function(folder){
CreateSeuratObject(counts = Read10X(folder),
project = folder )
})
zfbrain.combined <- merge(zfbrainList[[1]],
y = c(zfbrainList[[2]],zfbrainList[[3]],zfbrainList[[4]],zfbrainList[[5]],
zfbrainList[[6]],zfbrainList[[7]],zfbrainList[[8]],zfbrainList[[9]],
zfbrainList[[10]],zfbrainList[[11]],zfbrainList[[12]],zfbrainList[[13]],
zfbrainList[[14]],zfbrainList[[15]]),
add.cell.ids = folders,
project = "zebrafish")
Error in .cbind2Csp(x, y) :
Cholmod error 'out of memory' at file ../Core/cholmod_memory.c, line 147
Data folder

The machine used to process the data in the original question has a 64-bit Windows operating system running a 32-bit version of R. The result from memory.size() shows that approximately 2.4Gb of RAM is available to the malloc() function used by R. The 32-bit version of R on Windows can access a maximum of slightly less than 4Gb of RAM when running on 64-bit Windows, per the help for memory.size().
Memory Limits in R tells us that in 32-bit R on Windows it is usually not possible to allocate a single vector of 2Gb in size due to the fact that windows consumes some memory in the middle of the 2 Gb address space.
Once we load the data from the question, the zfbrainList object consumes about 1.2Gb of RAM.
options(stringsAsFactors = F)
folders <- list.files("./data/zebraFishData",full.names = TRUE)
library(Seurat)
library(dplyr)
zfbrainList = lapply(folders,function(folder){
CreateSeuratObject(counts = Read10X(folder),
project = folder )
})
format(object.size(zfbrainList),units = "Gb")
...and the result:
> format(object.size(zfbrainList),units = "Gb")
[1] "1.2 Gb"
At this point, the code attempts to merge the objects from the list into a single object.
zfbrain.combined <- merge(zfbrainList[[1]],
y = c(zfbrainList[[2]],zfbrainList[[3]],zfbrainList[[4]],zfbrainList[[5]],
zfbrainList[[6]],zfbrainList[[7]],zfbrainList[[8]],zfbrainList[[9]],
zfbrainList[[10]],zfbrainList[[11]],zfbrainList[[12]],zfbrainList[[13]],
zfbrainList[[14]],zfbrainList[[15]]),
add.cell.ids = folders,
project = "zebrafish")
When we calculate the size of the resulting zfbrain.combined object, we find that it is also about 1.2Gb in size, which exceeds the RAM available to R on the original poster's machine.
format(object.size(zfbrain.combined),units = "Gb")
> format(object.size(zfbrain.combined),units = "Gb")
[1] "1.2 Gb"
Since the zfbrainList must be in RAM while zfbrain.combined is being created, it is not possible to execute the merge as coded above in an instance of R that has only 2.4Gb of RAM accessible because the RAM consumed by both zfbrainList and zfbrain.combined is between 2.4 - 2.5Gb, exclusive of other RAM needed by R to run.
Solution: use the 64-bit version of R
Since most Windows-based machines have at least 4Gb of RAM, and the amount of RAM reported by memory.size() was 2.4Gb, it's likely there is at least 4Gb of RAM on the machine. The machine used in the original post already had 64-bit Windows installed, so we can enable R to access more memory by installing and running the 64-bit version of R.
On a Windows-based machine with 8Gb RAM, 32-bit R reports the following for memory.size() and memory.limit().
Interestingly, R reports 25.25 for memory.size() because 1Mb is rounded down to 0.01 per the help documentation, but memory.limit() provides a number between 0 and 4095 (also per the documentation). On our test machine it reports 3583, about 3.5Gb of RAM.
When we run these functions in 64-bit R on the same machine, memory.size() reports 34.25, which means that malloc() will allocate a single object as large as 3.3Gb, and memory.limit() reports that R can access a total of 8Gb of RAM, the total amount that is installed on this particular machine.
Testing the solution
When I run the code in a 32-bit R 4.0.3 session on 64-bit Windows, I am able to replicate the out of memory error.
When I run the code in the 64-bit version of R, it runs to completion, and I am able to calculate the size of the resulting zfbrain.combined object.

Related

How to get RAM usage (total, used, free, free/total, used/total) in R?

How can I get the current RAM usage in R? Ideally, it would work for both Unix and Windows platforms. In windows I used following code:
total_mem<-system("wmic ComputerSystem get TotalPhysicalMemory", intern = TRUE)
total_mem<-as.numeric(gsub("\\D", "", total_mem[2]))
total_mem<-total_mem/1000
free_mem<-system("wmic OS get FreePhysicalMemory", intern = TRUE)
free_mem<-as.numeric(gsub("\\D", "", free_mem[2]))
used_mem<-total_mem-free_mem
percentage_used_mem<-1-free_mem/total_mem
Is there a better way to get the current RAM usage, such that works with both Unix and Windows platforms.
According to how to get current cpu and ram usage in python? and the "reticulate" package:
library(reticulate)
aa<-reticulate::import("psutil")
mem_percent=aa$virtual_memory()$percent
But this way needs Python to be installed on the platform.

R system() cannot allocate memory even though the same command can be run from a terminal

I have an issue with the R system() function (for running an OS command from within R) that only arises when the R session uses up more than some fraction of the available RAM (maybe ~75% in my case), even though there is plenty of RAM available (~15GB in my case) and the same OS command can be easily run at the same time from a terminal.
System info:
64GB RAM PC (local desktop PC, not cloud-based or cluster)
Ubuntu 18.04.1 LTS - x86_64-pc-linux-gnu (64-bit)
R version 3.5.2 (executed directly, not e.g. via docker)
This example demonstrates the issue. The size of the data frame d needs to be adjusted to be as small as possible and still provoke the error. This will depend on how much RAM you have and what else is running at the same time.
ross#doppio:~$ R
R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> n <- 5e8
> d <- data.frame(
+ v0 = rep_len(1.0, n),
+ v1 = rep_len(1.0, n),
+ v2 = rep_len(1.0, n),
+ v3 = rep_len(1.0, n),
+ v4 = rep_len(1.0, n),
+ v5 = rep_len(1.0, n),
+ v6 = rep_len(1.0, n),
+ v7 = rep_len(1.0, n),
+ v8 = rep_len(1.0, n),
+ v9 = rep_len(1.0, n)
+ )
> dim(d)
[1] 500000000 10
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 260857 14.0 627920 33.6 421030 22.5
Vcells 5000537452 38151.1 6483359463 49464.2 5000559813 38151.3
> system("free -m", intern = FALSE)
Warning messages:
1: In system("free -m", intern = FALSE) :
system call failed: Cannot allocate memory
2: In system("free -m", intern = FALSE) : error in running command
The call to gc() indicates R has allocated ~38GB out of 64 GB RAM and running free -m in a terminal at the same time (see below) shows that the OS thinks there is ~16GB free.
ross#doppio:~$ free -m
total used free shared buff/cache available
Mem: 64345 44277 15904 461 4162 18896
Swap: 975 1 974
ross#doppio:~$
So free -m can't be run from within R because memory cannot be allocated, but free -m can be run at the same time from a terminal, and you would think that 15GB would be enough to run a light-weight command like free -m.
If the R memory usage is below some threshold then free -m can be run from within R.
I guess that R is trying allocate an amount of memory for free -m that is more than actually needed and depends on the amount of memory already allocated. Can anyone shed some light on what is going on here?
Thanks
I've run into this one. R runs fork to run the sub process, temporarily doubling the 35GB image to more than the 64GB you have. If it had lived it would have next called exec and given back the duped memory. This isn't how fork/exec is supposed to go (it is supposed to be copy on write with no extra cost- but somehow it does this in this case).
It looks like this may be known: that to fork you must have enough memory to potentially duplicate the pages (even if that does not happen). I would guess you may not have enough swap (it seems at least the size of RAM is recommended). Here are some instructions on configuring swap (it is for ec2, but covers the use of Linux): https://aws.amazon.com/premiumsupport/knowledge-center/ec2-memory-swap-file/

Memory allocation for entities in R only 2 GB?

Windows 10 64 bit, 32 GB RAM, Rstudio 1.1.383 and R 3.4.2 (up-to-date)
I have several csv files which have at least 1 or 2 lines full of many nul values. So I wrote a script that uses read_lines_raw() from stringr package in R which reads the file in raw format. It produces a list where each element is a row. Then I check for 00 (the nul value) and when it is found that line gets deleted.
One of the files is 2.5 GB in size and also has nul value somewhere in it. The problem is, read_lines_raw is not able to read this file and throws an error:
r in read_lines_raw_(ds, n_max = n_max, progress = progress) :
negative length vectors are not allowed
I don't even understand the problem. Some of my research hints something regarding the size, but not even half of the RAM is used. Some other files that it was able to read were 1.5 GB in size. Is this file too big, or is it something else that causes this?
Update 1:
I tried to read in the whole file using scan but that also gave me an error:
could not allocate memory (2048 Mb) in C function 'R_AllocStringBuffer'
So although my pc is 32 GB, the maximum allowed space for an entity is 2 GB? And I checked to make sure it is running 64 bit R, and yes it is.
> version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 4.2
year 2017
month 09
day 28
svn rev 73368
language R
version.string R version 3.4.2 (2017-09-28)
nickname Short Summer
It seems like many people are facing similar issues, but there is no solution I could find. How can we increase the memory allocation for individual entities? The memory.limit() gives back 32 GB, which is the RAM size, but that isn't helpful. memory.size() does give something close 2 GB, and since the file is 2.7 GB on the disk, I assume this is the reason for getting the error.
Thank you.

Is there a 2GB memory usage limit when R boots?

I have the following code for loading some data in my .Rprofile (which is a R script in my project folder running automatically when I switch to the project with Rstudio).
data_files <- list.files(pattern="\\.(RData|rda)$")
if("data.rda" %in% data_files) {
attach(what="data.rda",
pos = 2)
cat("The file 'data.rda' was attached to the search path under 'file:data.rda'.\n\n")
}
The data being loaded is relatively big:
Type Size PrettySize Rows Columns
individual_viewings_26 data.frame 1547911120 [1] "1.4 Gb" 3685312 63
viewing_statements_all data.table 892316088 [1] "851 Mb" 3431935 38
weights data.frame 373135464 [1] "355.8 Mb" 3331538 14
pet data.table 63926168 [1] "61 Mb" 227384 34
But I have 16 GB and I can allocate them:
> memory.limit()
[1] 16289
When my data was not as big, I did not have any issue. I recently saved some more data frames in data.rda and my R session suddenly fails at start-up (when I switch to the project in Rstudio and .Rprofile is executed):
Error: cannot allocate vector of size 26.2 Mb
In addition: Warning messages:
1: Reached total allocation of 2047Mb: see help(memory.size)
2: Reached total allocation of 2047Mb: see help(memory.size)
3: Reached total allocation of 2047Mb: see help(memory.size)
4: Reached total allocation of 2047Mb: see help(memory.size)
I suspect that for some reason, the memory limit is set at 2GB at boot? Any way I can change that?
Edit: Added OS and software version
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Edit2: Just to clarify, I am able to load the data myself by running the code, I have plenty of available memory and the R process commonly uses up to 10GB during my daily work. The problem is, there is a apparently a 2GB memory limit when R boots and executes the .Rprofile...
Yes, there is a 2GB limit when R starts, at least when the user profile (.Rprofile files and .First() functions) are executed.
Proof:
Content of Rprofile:
message("Available memory when .Rprofile is sourced: ", memory.limit())
.First <- function() {
message("Available memory when .First() is called: ", memory.limit())
}
Output at startup
Available memory when .Rprofile is sourced: 2047
Available memory when .First() is called: 2047
Output of memory.limit once R has started
> memory.limit()
[1] 16289

Parallel processing in R 2.11 Windows 64-bit using SNOW not quite working

I'm running R 2.11 64-bit on a WinXP64 machine with 8 processors. With R 2.10.1 the following code spawned 6 R processes for parallel processing:
require(foreach)
require(doSNOW)
cl = makeCluster(6, type='SOCK')
registerDoSNOW(cl)
bl2 = foreach(i=icount(length(unqmrno))) %dopar% {
(Some code here)
}
stopCluster(cl)
When I run the same code in R 2.11 Win64, the 6 R processes are not spawning, and the code hangs. I'm wondering if this is a problem with the port of SNOW to 2.11-64bit, or if any additional code is required on my part. Thanks
BTW, this works just fine on my multicore machine at home running Ubuntu Karmic 64-bit and R 2.11. Unfortunately I have to work on Win64 at work
The code seems to be working here.
R version 2.11.0 (2010-04-22)
x86_64-pc-mingw32
other attached packages:
[1] doSNOW_1.0.3 snow_0.3-3 foreach_1.3.0 codetools_0.2-2
[5] iterators_1.0.3
loaded via a namespace (and not attached):
[1] tools_2.11.0
Check your sessionInfo() to make sure your versions match mine. One thing I noted is that on my Windows 7 machine the first attempt to makeCluster made a request for a firewall exception. If you did not explicitly make allowances for the socket communication that could be why it is hanging. The defaults it opened (ugly as it is) was all TCP and UDP ports when operating under the private profile.
It is an old question, but I encountered the same problems with R-2.13.1 64 on Win 64 bits.
doSNOW was working fine with R 32-bits but not with R 64-bits, and was hanging at "cl = makeCluster(6, type='SOCK')" as well.
To resolve the problem I eventually added "C:\Program Files\R\R-2.13.1\bin\x64" to the %PATH% environment variable (win+pause/advanced system settings/advanced/environment variables/system variables). Make also sure to allow the R connections in Windows Firewall, and that C:\Program Files\R\R-2.13.1\bin contains copy of the 32-bits version of R.exe and Rscript.exe (not the x64 ones).
After doing this, when running makeCluster() 12 processes are started, 6 32 bits and 6 64 bits, but during the calculations only the 64 bits one are used.

Resources