R: Cannot allocate memory greater than x MB - r

I have a main function in R which calls other files to run my program. I call the main file through a bat file(.exe). When I run it line-by-line it runs without a memory error, but when I call the bat file to run it, it halts and gives me the following error:
Cannot allocate memory greater than 51 MB.
How can I avoid this?

Memory limitations in R such as this are a recurring nightmare for a lot of us.
Very often the problem is a limit imposed by your OS limits (which can usually be changed on a Bash or PowerShell command line), architecture (32 v. 64 bit), or the availability of contiguous free RAM, irregardless of overall available memory.
It's hard to say why something would not cause a memory issue when run line by line, but would hit the memory limit when run as a .bat.
What version of R are you running? Do you have both installed? Is 32-bit being called by Rscript when you run your .bat file whereas you run a 64-bit version line by line? You can check the version of R that's being run with R.Version().
You can test this by running the command memory.limit() in both your R IDE/terminal and in your .bat file (be sure to print or save the result as an object in your .bat file). You might also do well to try setting memory.limit() in your .bat file, as it may just have a smaller default, perhaps due to differences in your R Profile that's invoked in your IDE or terminal versus the .bat file.
If architecture isn't the cause of your memory error, then you have several more troubleshooting steps to try:
Check memory usage in both environments (in R directly and via your .bat process) using this:
sort( sapply(ls(),function(x){object.size(get(x))}))
Run the garbage collector explicitly in your scripts, that's the gc() command
Check all object sizes to make sure there are no unexpected results in your .bat process: sort( sapply(ls(),function(x){format(object.size(get(x)), units = "Mb")}))
Try memory profiling:
Rprof(tf <- "rprof.log", memory.profiling=TRUE)
Rprof(NULL)
summaryRprof(tf)
While this is a RAM issue, for good measure you might want to check that the compute power available is both sufficient and not varying between these two ways of running your code: parallel::detectCores()
Examine your performance with Prof. Hadley Wikham's lineprof tool (warning: requires devtools and doesn't work on lines of code which call the C programming language)
References While I'm pulling these snippets out of my own code, most of them originally came from other, related StackOverflow posts, such as:
Reaching memory allocation in R
R Memory Allocation "Error: cannot allocate vector of size 75.1 Mb"
R memory limit warning vs "unable to allocate..."
How to compute the size of the allocated memory for a general type
R : Any other solution to "cannot allocate vector size n mb" in R?

Yes you should be using 64bit R, if you can.
See this question, and this from the R docs.

Related

Tau2Slog2 not able to process 6gb tau.trc files

I am profiling my code using the TAU profiler. I am using tau_exec at runtime. It generates trace files. Some of which are in gigabytes. tau_treemerge.pl merges and generates a tau.trc which is 6GB. tau2slog2 now fails complaining about the heap space.
It would be helpful if anybody can show how to reduce the size of trace files.
Following is the way I am running the code:
mpirun -n 64 tau_exec ./a.out
tau_treemerge.pl;
tau2slog2 tau.tr tau.edf -o tau.slog2
I was able to solve the issue by increasing the heap size of the JVM.
java -Xmx50000m -Xms32000m -cp /tau/x86_64/lib/TAU_tf.jar:/tau/x86_64/lib/traceTOslog2.jar:/tau/x86_64/lib/tau2slog2.jar edu/uoregon/tau/Tau2Slog2 tau.trc tau.edf -o tau.slog2
Obviously it is a workaround and not an elegant solution. So to reduce the tau.trc filesize I have added more filtering parameters during the instrumentation.
Also I first just profiled the code export TAU_PROFILE=1 and then ran pprof and figured out which MPI function is called enormously then throttled those functions to further reduce the file size.

Error: Maximal number of DLLs reached

I'm writing an R package which depends upon many other packages. When I load too many packages into the session I frequently got this error:
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/proxy/libs/proxy.so':
`maximal number of DLLs reached...
This post Exceeded maximum number of DLLs in R pointed out that the issue is with the Rdynload.c of the base R code:
#define MAX_NUM_DLLS 100
Is there any way to bypass this issue except modifying and building from source?
As of R 3.4, you can set a different max number of DLLs using and environmental variable R_MAX_NUM_DLLS. From the release notes:
The maximum number of DLLs that can be loaded into R e.g. via
dyn.load() can now be increased by setting the environment
variable R_MAX_NUM_DLLS before starting R.
Increasing that number is of course "possible"... but it also costs a bit
(adding to the fixed memory footprint of R).
I did not set that limit, but I'm pretty sure it was also meant as reminder for the useR to "clean up" a bit in her / his R session, i.e., not load package namespaces unnecessarily. I cannot yet imagine that you need > 100 packages | namespaces loaded in your R session.
OTOH, some packages nowadays have a host of dependencies, so I agree that this at least may happen accidentally more frequently than in the past.
The real solution of course would be a code improvement that starts with a relatively small number of "DLLinfo" structures (say 32), and then allocates more batches (of size say 32) if needed.
Patches to the R sources (development trunk in subversion at https://svn.r-project.org/R/trunk/ ) are very welcome!
---- added Jan.26, 2017: In the mean time, we've had a public bug report about this, a proposed patch (which was not good enough: There is always an OS dependent limit on the number of open files), and today that bug report has been closed by R core member #TomasKalibera who implemented new code where the maximal number of loaded DLLs is set at
pmax(100, pmin(1000, 0.6* OS_dependent_getrlimit_or_equivalent()))
and so on Windows and Linux (and not yet tested, but "almost surely" macOS), the limit should be considerably higher than previously.
----- Update #2 (written Jan.5, 2018):
In Oct'17, the above change was made more automatic with the following commit to the sources (of the development version of R - only!)
r73545 | kalibera | 2017-10-12 14:41:20
Increase the number of DLLs that can be loaded by default. If needed,
increase the soft limit on open files.
and on the help page ?dyn.load (https://stat.ethz.ch/R-manual/R-devel/library/base/html/dynload.html) the ulimit -n <num_open_files> is now mentioned (section Note close to bottom).
So you might consider using R's development version till that becomes "main stream" in April.
Alternatively, you do (in a terminal / shell)
ulimit -n 2048
and then start R from that terminal. Tomas Kalibera mentioned this to work on macOS.
I had this issue with the simpleSingleCell library in bioconductor
On the macOS you can't exceed 256. So I set my .Renviron in my home dir
R_MAX_NUM_DLLS=150
It's easy
Go to the environment variable and edit
variable_name = R_MAX_NUM_DLL
value = 1000
Restart R
worked well for me

Git-svn out of memory

I'm trying to clone a reasonably big svn repository with git-svn and at a certain point I get a error message:
Failure loading plugin: APR: Can't create a character converter from 'UTF-8' to native encoding: Cannot allocate memory at /usr/libexec/git-core/git-svn line 5061
And sometimes a
Cannot allocate memory: zlib (compress2): out of memory: Compression of svndiff data failed at /usr/libexec/git-core/git-svn line 5061
error message. I still have ~3GB RAM free. What should I do so git-svn can utilize it?
(I'm doing this on RedHat Enterprise Linux 6.5 if that makes any difference)
From:
This error message is about the memory git is trying to allocate --
it's more than what is free. This is most likely caused by a large
file having been checked into SVN. Unfortunately, there's no easy way
to fix it (apart from buying more memory) -- you would have to remove
the large file and the commit adding it from SVN.
However try following:
Increase swap memory
Increase ulimit

java.lang.OutOfMemoryError using bartMachine package in R

I ran a BART model with 11000 samples and 20 features(half of them are categorical variable). My mac has 8G ram. At first, I set memory to 5000 MB via function set_bart_machine_memory(5000).
Then I can fit a model through the function bartMachine one time. If I want to run another model then the R returns a error like this:
Exception in thread "pool-10-thread-1" Exception in thread "pool-10-thread-3"
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-10-thread-2" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-10-thread-4" java.lang.OutOfMemoryError: Java heap space
Error in .jcall(bart_machine$java_bart_machine, "Z", "isDestroyed") :
java.lang.OutOfMemoryError: Java heap space
I think that having two bartMachine object in memory may not be a good idea, so I just kill the first model through function destroy_bart_machine(), then the second model is OK to run.
The main problem is on bartMachineCV(). There are about 20 model to fit in default, and the memory error like the one above hits me when R is running the bart model with second set of parameter setting (that is : bartMachine CV try: k: 2 nu, q: 3, 0.9 m: 200 ).
I'm not familiar to JAVA, is there some way to run bartMachineCV() on a 8GB RAM computer? Thanks.
I'm the maintainer of the bartMachine package. Make sure you download the new version and pay attention the message that appears after you initialize the library:
> library(bartMachine)
...
Welcome to bartMachine v1.2.0! You have 0.48GB memory available.
If you see a low amount of RAM on the message, something is wrong with your JVM setup. 64-bit JVM is a must. Use
options(java.parameters = "-Xmx2500m")
before calling library(bartMachine) to attempt to set more.
You'll need to run a 64-bit Java JVM; the 32-bit JVM only gives you ~1.8GB max heap. I'd recommend that you use JDK 7 or higher; that's production for Oracle these days.
Once you have that, you can set JVM memory settings like this:
http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
You'll want to set -Xmx=1024M or something like that.

Java runtime segfaults when saving large data.frame from JRI

I've followed the rtest.java example code from the rJava installation (/usr/lib/R/site-library/rJava/jri/examples/rtest.java on Debian and derivatives) for building data.frames from java arrays.
This works well for small data frames (~10000 rows), however when I try to do this in anger (i.e. > 1000000 rows) it causes the java runtime to segfault.
Oddly, I appear to be able to create the data.frame ok (making the usual rniPutXXXArray calls), however when I come to save the data.frame (using an eval, after assigning the data.frame to an R symbol) the issue occurs.
I can see some debug when I make calls to eval on the R engine, however when I go via the low level interface (rniXXX) I get no debug at all. Is there a way to switch more debug on than I already have?
For what it's worth, here's the top of the segv message. I can of course provide more detail on request.
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f1be6259ea5, pid=6898, tid=139758087001856
#
# JRE version: 7.0_03-b21
# Java VM: OpenJDK 64-Bit Server VM (22.0-b10 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea7 2.1.3
# Distribution: Debian GNU/Linux unstable (sid), package 7u3-2.1.3-1
# Problematic frame:
# C [libR.so+0x117ea5] SET_VECTOR_ELT+0x11f5
...
Please ask on stats-rosuda-devel including the actual code you're using. Note that with RNI calls you're responsible for protection of the objects - unfortunately the example code skips that aspect so what probably happens is that due to the size of your objects the garbage collection occurs before you are done with the construction so some of the objects get collected and thus are invalid and R crashes on you. If you want to be safe, protect the columns and then the generic vector you create out of it.
BTW: It is much safer to use the org.rosuda.REngine API instead of using RNI directly. It even provides REXP.createDataFrame() method that does all the work for you.

Resources