I am trying to merge Seurat class objects that contain transcriptome count data (sparse matrix). I am relatively new to R, so any help/solutions is appreciated. I have added a screenshot of the data I'm working with.
**General Info:**
-------------
> memory.size(max = TRUE)
[1] 2533.94
R version 4.0.3 (2020-10-10)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 10 x64 (build 19041)
attached base packages:
[1] stats graphics grDevices utils
[5] datasets methods base
other attached packages:
[1] RSQLite_2.2.3 Seurat_3.2.3
I am not sure if my storage is the issue or if I should split the function in two.
options(stringsAsFactors = F)
setwd("C:/Users/Amara/OneDrive - Virginia Tech/XieLab/ZebraFish_Project/zf_brain-master/data")
folders <- list.files("C:/Users/Amara/OneDrive - Virginia Tech/XieLab/ZebraFish_Project/zf_brain-master/data")
library(Seurat)
library(dplyr)
zfbrainList = lapply(folders,function(folder){
CreateSeuratObject(counts = Read10X(folder),
project = folder )
})
zfbrain.combined <- merge(zfbrainList[[1]],
y = c(zfbrainList[[2]],zfbrainList[[3]],zfbrainList[[4]],zfbrainList[[5]],
zfbrainList[[6]],zfbrainList[[7]],zfbrainList[[8]],zfbrainList[[9]],
zfbrainList[[10]],zfbrainList[[11]],zfbrainList[[12]],zfbrainList[[13]],
zfbrainList[[14]],zfbrainList[[15]]),
add.cell.ids = folders,
project = "zebrafish")
Error in .cbind2Csp(x, y) :
Cholmod error 'out of memory' at file ../Core/cholmod_memory.c, line 147
Data folder
The machine used to process the data in the original question has a 64-bit Windows operating system running a 32-bit version of R. The result from memory.size() shows that approximately 2.4Gb of RAM is available to the malloc() function used by R. The 32-bit version of R on Windows can access a maximum of slightly less than 4Gb of RAM when running on 64-bit Windows, per the help for memory.size().
Memory Limits in R tells us that in 32-bit R on Windows it is usually not possible to allocate a single vector of 2Gb in size due to the fact that windows consumes some memory in the middle of the 2 Gb address space.
Once we load the data from the question, the zfbrainList object consumes about 1.2Gb of RAM.
options(stringsAsFactors = F)
folders <- list.files("./data/zebraFishData",full.names = TRUE)
library(Seurat)
library(dplyr)
zfbrainList = lapply(folders,function(folder){
CreateSeuratObject(counts = Read10X(folder),
project = folder )
})
format(object.size(zfbrainList),units = "Gb")
...and the result:
> format(object.size(zfbrainList),units = "Gb")
[1] "1.2 Gb"
At this point, the code attempts to merge the objects from the list into a single object.
zfbrain.combined <- merge(zfbrainList[[1]],
y = c(zfbrainList[[2]],zfbrainList[[3]],zfbrainList[[4]],zfbrainList[[5]],
zfbrainList[[6]],zfbrainList[[7]],zfbrainList[[8]],zfbrainList[[9]],
zfbrainList[[10]],zfbrainList[[11]],zfbrainList[[12]],zfbrainList[[13]],
zfbrainList[[14]],zfbrainList[[15]]),
add.cell.ids = folders,
project = "zebrafish")
When we calculate the size of the resulting zfbrain.combined object, we find that it is also about 1.2Gb in size, which exceeds the RAM available to R on the original poster's machine.
format(object.size(zfbrain.combined),units = "Gb")
> format(object.size(zfbrain.combined),units = "Gb")
[1] "1.2 Gb"
Since the zfbrainList must be in RAM while zfbrain.combined is being created, it is not possible to execute the merge as coded above in an instance of R that has only 2.4Gb of RAM accessible because the RAM consumed by both zfbrainList and zfbrain.combined is between 2.4 - 2.5Gb, exclusive of other RAM needed by R to run.
Solution: use the 64-bit version of R
Since most Windows-based machines have at least 4Gb of RAM, and the amount of RAM reported by memory.size() was 2.4Gb, it's likely there is at least 4Gb of RAM on the machine. The machine used in the original post already had 64-bit Windows installed, so we can enable R to access more memory by installing and running the 64-bit version of R.
On a Windows-based machine with 8Gb RAM, 32-bit R reports the following for memory.size() and memory.limit().
Interestingly, R reports 25.25 for memory.size() because 1Mb is rounded down to 0.01 per the help documentation, but memory.limit() provides a number between 0 and 4095 (also per the documentation). On our test machine it reports 3583, about 3.5Gb of RAM.
When we run these functions in 64-bit R on the same machine, memory.size() reports 34.25, which means that malloc() will allocate a single object as large as 3.3Gb, and memory.limit() reports that R can access a total of 8Gb of RAM, the total amount that is installed on this particular machine.
Testing the solution
When I run the code in a 32-bit R 4.0.3 session on 64-bit Windows, I am able to replicate the out of memory error.
When I run the code in the 64-bit version of R, it runs to completion, and I am able to calculate the size of the resulting zfbrain.combined object.
If I type
memory.limit()
in the console of Rstudio, I get "1e+13". Then, if I try to increase this memory limit by doing
memory.limit(1e+20)
or
memory.limit(size=1e+20)
I get an error: "Warning message: In memory.limit(size = 1e+20) : cannot decrease memory limit: ignored". Which makes no sense. How can I increase this memory limit? (I am running on Windows 10 with 24GB of RAM).
I have an issue with the R system() function (for running an OS command from within R) that only arises when the R session uses up more than some fraction of the available RAM (maybe ~75% in my case), even though there is plenty of RAM available (~15GB in my case) and the same OS command can be easily run at the same time from a terminal.
System info:
64GB RAM PC (local desktop PC, not cloud-based or cluster)
Ubuntu 18.04.1 LTS - x86_64-pc-linux-gnu (64-bit)
R version 3.5.2 (executed directly, not e.g. via docker)
This example demonstrates the issue. The size of the data frame d needs to be adjusted to be as small as possible and still provoke the error. This will depend on how much RAM you have and what else is running at the same time.
ross#doppio:~$ R
R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> n <- 5e8
> d <- data.frame(
+ v0 = rep_len(1.0, n),
+ v1 = rep_len(1.0, n),
+ v2 = rep_len(1.0, n),
+ v3 = rep_len(1.0, n),
+ v4 = rep_len(1.0, n),
+ v5 = rep_len(1.0, n),
+ v6 = rep_len(1.0, n),
+ v7 = rep_len(1.0, n),
+ v8 = rep_len(1.0, n),
+ v9 = rep_len(1.0, n)
+ )
> dim(d)
[1] 500000000 10
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 260857 14.0 627920 33.6 421030 22.5
Vcells 5000537452 38151.1 6483359463 49464.2 5000559813 38151.3
> system("free -m", intern = FALSE)
Warning messages:
1: In system("free -m", intern = FALSE) :
system call failed: Cannot allocate memory
2: In system("free -m", intern = FALSE) : error in running command
The call to gc() indicates R has allocated ~38GB out of 64 GB RAM and running free -m in a terminal at the same time (see below) shows that the OS thinks there is ~16GB free.
ross#doppio:~$ free -m
total used free shared buff/cache available
Mem: 64345 44277 15904 461 4162 18896
Swap: 975 1 974
ross#doppio:~$
So free -m can't be run from within R because memory cannot be allocated, but free -m can be run at the same time from a terminal, and you would think that 15GB would be enough to run a light-weight command like free -m.
If the R memory usage is below some threshold then free -m can be run from within R.
I guess that R is trying allocate an amount of memory for free -m that is more than actually needed and depends on the amount of memory already allocated. Can anyone shed some light on what is going on here?
Thanks
I've run into this one. R runs fork to run the sub process, temporarily doubling the 35GB image to more than the 64GB you have. If it had lived it would have next called exec and given back the duped memory. This isn't how fork/exec is supposed to go (it is supposed to be copy on write with no extra cost- but somehow it does this in this case).
It looks like this may be known: that to fork you must have enough memory to potentially duplicate the pages (even if that does not happen). I would guess you may not have enough swap (it seems at least the size of RAM is recommended). Here are some instructions on configuring swap (it is for ec2, but covers the use of Linux): https://aws.amazon.com/premiumsupport/knowledge-center/ec2-memory-swap-file/
When I run memory.size(max=NA) I get: [1] 16264
But memory.size(max=T) gives me [1] 336.88
When I look at Task Manager, the 4 threads are using a total of ~1,000 MB (1/16 of my 16GB of available RAM) but they are using 100% of my CPU. While running, all processes combined are only using 50% of my 16GB of available RAM.
Whenever I try to increase memory allocation with memory.size(max=1000), I get the warning message:
Warning message:
In memory.size(max = 1000) : cannot decrease memory limit: ignored
What is going on here?
1) Is my CPU just slow given the amount of RAM I have? (Intel i7-6500U 2.5 GHz)
2) Does memory allocation require additional steps when using parallel threading? (e.g. doParallel)
Windows 10 64 bit, 32 GB RAM, Rstudio 1.1.383 and R 3.4.2 (up-to-date)
I have several csv files which have at least 1 or 2 lines full of many nul values. So I wrote a script that uses read_lines_raw() from stringr package in R which reads the file in raw format. It produces a list where each element is a row. Then I check for 00 (the nul value) and when it is found that line gets deleted.
One of the files is 2.5 GB in size and also has nul value somewhere in it. The problem is, read_lines_raw is not able to read this file and throws an error:
r in read_lines_raw_(ds, n_max = n_max, progress = progress) :
negative length vectors are not allowed
I don't even understand the problem. Some of my research hints something regarding the size, but not even half of the RAM is used. Some other files that it was able to read were 1.5 GB in size. Is this file too big, or is it something else that causes this?
Update 1:
I tried to read in the whole file using scan but that also gave me an error:
could not allocate memory (2048 Mb) in C function 'R_AllocStringBuffer'
So although my pc is 32 GB, the maximum allowed space for an entity is 2 GB? And I checked to make sure it is running 64 bit R, and yes it is.
> version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 4.2
year 2017
month 09
day 28
svn rev 73368
language R
version.string R version 3.4.2 (2017-09-28)
nickname Short Summer
It seems like many people are facing similar issues, but there is no solution I could find. How can we increase the memory allocation for individual entities? The memory.limit() gives back 32 GB, which is the RAM size, but that isn't helpful. memory.size() does give something close 2 GB, and since the file is 2.7 GB on the disk, I assume this is the reason for getting the error.
Thank you.