I am running functions which are deeply nested and consume quite a bit of memory as reported by the Windows task manager. The output variables are relatively small (1-2 orders of magnitude smaller than the amount of memory consumed), so I am assuming that the difference can be attributed to intermediate variables assigned somewhere in the function (or within sub-functions being called) and a delay in garbage collection. So, my questions are:
1) Is my assumption correct? Why or why not?
2) Is there any sense in simply nesting calls to functions more deeply rather than assigning intermediate variables? Will this reduce memory usage?
3) Suppose a scenario in which R is using 3GB of memory on a system with 4GB of RAM. After running gc(), it's now using only 2GB. In such a situation, is R smart enough to run garbage collection on its own if I had, say, called another function which used up 1.5GB of memory?
There are certain datasets I am working with which are able to crash the system as it runs out of memory when they are processed, and I'm trying to alleviate this. Thanks in advance for any answers!
Josh
1) Memory used to represent objects in R and memory marked by the OS as in-use are separated by several layers (R's own memory handling, when and how the OS reclaims memory from applications, etc.). I'd say (a) I don't know for sure but (b) at times the task manager's notion of memory use might not accurately reflect the memory actually in use by R, but that (c) yes, probably the discrepancy you describe reflects memory allocated by R to objects in your current session.
2) In a function like
f = function() { a = 1; g=function() a; g() }
invoking f() prints 1, implying that memory used by a is still being marked as in use when g is invoked. So nesting functions doesn't help with memory management, probably the reverse.
Your best bet is to clean-up or re-use variables representing large allocations before making more large allocations. Appropriately designed functions can help with this, e.g.,
f = function() { m = matrix(0, 10000, 10000); 1 }
g = function() { m = matrix(0, 10000, 10000); 1 }
h = function() { f(); g() }
The large memory of f is no longer needed by the time f returns, and so is available for garbage collection if the large memory required for g necessitates this.
3) If R tries to allocate memory for a variable and can't, it'll run its garbage collector a and try again. So you don't gain anything by running gc() yourself.
I'd make sure that you've written memory efficient code, and if there are still issues I'd move to a 64bit platform where memory is less of an issue.
R has facilities for memory profiling, but it needs to be built that. While we enable that for Debian / Ubuntu, I do not know what the default for Windows is.
Usage of memory profiling is discussed (briefly) in the 'Writing R Extensions' manual.
Coping with (limited) memory on a 32-bit system (and particularly Windows) has its challenges. Most people will recommend that you switch to a system with as much RAM as possible running a 64-bit OS.
Related
In R I have a task that I'm trying to parallelize. Part of this is comparing run-times and peak memory usage for different implementations of the task at hand. I'm using the peakRAM library to determine peak memory, which I think just uses gc() under the surface, since if I do it manually I get the same peak memory results.
The problem is that the results from peakRAM are different from the computer's task manager (or top on Linux). If I run a single-core, these numbers are in the same ballpark, but even using 2 cores, they are really different.
I'm parallelizing using pblapply in a manner similar to this.
times_parallel = peakRAM(
pblapply(X = 1:10,
FUN = \(x) data[iteration==x] %>% parallel_task(),
cl = makeCluster(numcores, type = "FORK"))
)
With a single core, this process requires a peak of 30G of memory. But with 2 cores, peakRAM reports only about 3G of memory. Looking at top however, shows that each of the 2 threads is using around 20-30G of memory at a time.
The only thing I can think of is that peakRAM is only reporting the memory of the main thread but I see nothing in the gc() details that suggests this is happening.
The time reported from peakRAM seems appropriate. Sub-linear gains at different core levels.
Original motivation behind this is that I have a dynamically sized array of floats that I want to pass to R through Rcpp without either incurring the cost of a zeroing out nor the cost of a deep copy.
Originally I had thought that there might be some way to take heap allocated array, make it aware to R's gc system and then wrap it with other data to create a "Rcpp::NumericVector" but it seems like that that's not possible - or doable with my current knowledge.
However and correct me if I'm wrong it looks like simply constructing a NumericVector with a size N and then using it as an N sized allocation will call R.h's Rf_allocVector and that itself does not either zero out the allocated array - I tested it on a small C program that gets dyn.loaded into R and it looks like garbage values. I also took a peek at the assembly and there doesn't seem to be any zeroing out.
Can anyone confirm this or offer any alternate solution?
Welcome to StackOverflow.
You marked this rcpp but that is a function from the C API of R -- whereas the Rcpp API offers you its constructors which do in fact set the memory tp zero:
> Rcpp::cppFunction("NumericVector goodVec(int n) { return NumericVector(n); }")
> sum(goodVec(1e7))
[1] 0
>
This creates a dynamically allocated vector using R's memory functions. The vector is indistinguishable from R's own. And it has the memory set to zero
as we use R_Calloc, which is documented in Writing R Extension to setting the memory to zero. (We may also use memcpy() explicitly, you can check the sources.)
So in short, you just have yourself confused over what the C API of R, as well as Rcpp offer, and what is easiest to use when. Keep reading documentation, running and writing examples, and studying existing code. It's all out there!
My code eats up to 3GB of memory at a single time. I figured it out using gc():
gc1 <- gc(reset = TRUE)
graf(...) # the code
gc2 <- gc()
cat(sprintf("mem: %.1fMb.\n", sum(gc2[,6] - gc1[,2])))
# mem: 3151.7Mb.
Which I guess means that there is one single time, when 3151.7 MB are allocated at once.
My goal is to minimize the maximum memory allocated at any single time. How do I figure out which part of my code is reposponsible for the maximum usage of those 3GB of memory? I.e. the place where those 3GB are allocated at once.
I tried memory profiling with Rprof and profvis, but both seem to show different information (which seems undocumented, see my other question). Maybe I need to use them with different parameters (or use different tool?).
I've been looking at Rprofmem... but:
in the profmem vignette they wrote: "with utils::Rprofmem() it is not possible to quantify the total memory usage at a given time because it only logs allocations and does therefore not reflect deallocations done by the garbage collector."
how to output the result of Rprofmem? This source speaks for itself: "Summary functions for this output are still being designed".
My code eats up to 3GB of memory at a single time.
While it looks like your code is consuming a lot of RAM at once by calling one function you can break down the memory consumption by looking into the implementation details of the function (and its sub calls) by using RStudio's built-in profiling (based on profvis) to see the execution time and rough memory consumption. Eg. if I use my demo code:
# graf code taken from the tutorial at
# https://rawgit.com/goldingn/intecol2013/master/tutorial/graf_workshop.html
library(dismo) # install.packages("dismo")
library(GRaF) # install_github('goldingn/GRaF')
data(Anguilla_train)
# loop to call the code under test several times to get better profiling results
for (i in 1:5) {
# keep the first n records of SegSumT, SegTSeas and Method as covariates
covs <- Anguilla_train[, c("SegSumT", "SegTSeas", "Method")]
# use the presence/absence status to fit a simple model
m1 <- graf(Anguilla_train$Angaus, covs)
}
Start profiling with the Profile > Start Profiling menu item, source the above code and stop the profiling via the above menu.
After Profile > Stop Profiling RStudio is showing the result as Flame Graph but what you are looking for is hidden in the Data tab of the profile result (I have unfolded all function calls which show heavy memory consumption):
The numbers in the memory column indicate the memory allocated (positive) and deallocated (negative numbers) for each called function and the values should include the sum of the whole sub call tree + the memory directly used in the function.
My goal is to minimize the maximum memory allocated at any single time.
Why do you want to do that? Do you run out-of-memory or do you suspect that repeated memory allocation is causing long execution times?
High memory consumption (or repeated allocations/deallocations) often come together with a slow execution performance since copying memory costs time.
So look at the Memory or Time column depending on your optimization goals to find function calls with high values.
If you look into the source code of the GRaF package you can find a loop in the graf.fit.laplace function (up to 50 "newton iterations") that calls "slow" R-internal functions like chol, backsolve, forwardsolve but also slow functions implemented in the package itself (like cov.SE.d1).
Now you can try to find faster (or less memory consuming) replacements for these functions... (sorry, I can't help here).
PS: profvis uses Rprof internally so the profiling data is collected by probing the current memory consumption in regular time intervals and counting it for the currently active function (call stack).
Rprof has limitations (mainly not an exact profiling result since the garbage collector triggers at non-deterministic times and the freed memory is attributed to the function the next probing interval break stops at and it does not recognize memory allocated directly from the OS via C/C++ code/libraries that bypasses R's memory management API).
Still it is the easiest and normally good enough indication of memory and performance problems...
For an introduction into profvis see: For https://rstudio.github.io/profvis/
I have been using filebacked.big.matrix to store a very large matrix (~1 million x 20 thousand). I am working on a cluster with very high memory, but not quite that much. I have previously used the ff package which worked great and kept the memory usage consistent despite the matrix size, but it died when I surpassed 10^32 items in the matrix (R community really needs to fix that problem). the filebacked.big.matrix initially seemed to work very well and generally runs without problems, but when I check on the memory usage it is sometimes spiking into the 100s of GBs. I am careful to only read/write to the matrix a relatively few rows at a time, so I think there should not be much in memory at any given time.
Does it do some sort of automatic memory caching or something that is driving the memory usage up? If so can this caching be disabled or limited? The high memory usage is causing some nasty side effects on the cluster so I need a way to do this that is memory neutral. I have checked the filebacked.big.matrix help page, but can't find any pertinent information there.
Thanks!
UPDATE:
I am also using bigmemoryExtras.
I was wrong earlier, the problem is happening when I loop through the entire matrix reading it into a different, smaller file.backed matrix like this:
tmpGeno=fileBackedMatrix(rowIndex-1,numMarkers,'double',tmpDir)
front=1
back=40000
large matrix must be copied in chunks to avoid integer.max errors
while(front < rowIndex-1){
if(back>rowIndex-1) back=rowIndex-1
tmpGeno[front:back,1:numMarkers]=genotypeMatrix[front:back,1:numMarkers,drop=F]
front=front+40000
back=back+40000
}
The physical memory usage is initially very low (with virtual memory very high). But while running this loop, and even after it has finished it seems to just keep most of the matrix in physical memory. I need it to only keep the one small chunk of the matrix in memory at a time.
UPDATE 2:
It is a bit confusing to me: the cluster metrics and top command say that it is using tons of memory (~80GB), but the gc() command says that memory usage never went over 2GB. The free command says that all the memory is used, but in the -/+ buffers/cache line is says only 7GB are being used total.
Suppose I have a matrix bigm. I need to use a random subset of this matrix and give it to a machine learning algorithm such as say svm. The random subset of the matrix will only be known at runtime. Additionally there are other parameters that are also chosen from a grid.
So, I have code that looks something like this:
foo = function (bigm, inTrain, moreParamsList) {
parsList = c(list(data=bigm[inTrain, ]), moreParamsList)
do.call(svm, parsList)
}
What I am seeking to know is whether R uses new memory to save that bigm[inTrain, ] object in parsList. (My guess is that it does.) What commands can I use to test such hypotheses myself? Additionally, is there a way of using a sub-matrix in R without using new memory?
Edit:
Also, assume I am calling foo using mclapply (on Linux) where bigm resides in the parent process. Does that mean I am making mc.cores number of copies of bigm or do all cores just use the object from the parent?
Any functions and heuristics of tracking memory location and consumption of objects being made in different cores?
Thanks.
I am just going to put in here what I find from my research on this topic:
I don't think using mclapply makes mc.cores copies of bigm based on this from the manual for multicore:
In a nutshell fork spawns a copy (child) of the current process, that can work in parallel
to the master (parent) process. At the point of forking both processes share exactly the
same state including the workspace, global options, loaded packages etc. Forking is
relatively cheap in modern operating systems and no real copy of the used memory is
created, instead both processes share the same memory and only modified parts are copied.
This makes fork an ideal tool for parallel processing since there is no need to setup the
parallel working environment, data and code is shared automatically from the start.
For your first part of the question, you can use tracemem :
This function marks an object so that a message is printed whenever the internal code copies the object
Here an example:
a <- 1:10
tracemem(a)
## [1] "<0x000000001669cf00"
b <- a ## b and a share memory (no message)
d <- stats::rnorm(10)
invisible(lm(d ~ a+log(b)))
## tracemem[0x000000001669cf00 -> 0x000000001669e298] ## object a is copied twice
## tracemem[0x000000001669cf00 -> 0x0000000016698a38]
untracemem(a)
You already found from the manual that mclapply isn't supposed to make copies of bigm.
But each thread needs to make its own copy of the smaller training matrix as it varies across the threads.
If you'd parallelize with e.g. snow, you'd need to have a copy of the data in each of the cluster nodes. However, in that case you could rewrite your problem in a way that only the smaller training matrices are handed over.
The search term for the general investigation of memory consumption behaviour is memory profiling. Unfortunately, AFAIK the available tools are not (yet) very comfortable, see e.g.
Monitor memory usage in R
Memory profiling in R - tools for summarizing