Java runtime segfaults when saving large data.frame from JRI - r

I've followed the rtest.java example code from the rJava installation (/usr/lib/R/site-library/rJava/jri/examples/rtest.java on Debian and derivatives) for building data.frames from java arrays.
This works well for small data frames (~10000 rows), however when I try to do this in anger (i.e. > 1000000 rows) it causes the java runtime to segfault.
Oddly, I appear to be able to create the data.frame ok (making the usual rniPutXXXArray calls), however when I come to save the data.frame (using an eval, after assigning the data.frame to an R symbol) the issue occurs.
I can see some debug when I make calls to eval on the R engine, however when I go via the low level interface (rniXXX) I get no debug at all. Is there a way to switch more debug on than I already have?
For what it's worth, here's the top of the segv message. I can of course provide more detail on request.
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f1be6259ea5, pid=6898, tid=139758087001856
#
# JRE version: 7.0_03-b21
# Java VM: OpenJDK 64-Bit Server VM (22.0-b10 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea7 2.1.3
# Distribution: Debian GNU/Linux unstable (sid), package 7u3-2.1.3-1
# Problematic frame:
# C [libR.so+0x117ea5] SET_VECTOR_ELT+0x11f5
...

Please ask on stats-rosuda-devel including the actual code you're using. Note that with RNI calls you're responsible for protection of the objects - unfortunately the example code skips that aspect so what probably happens is that due to the size of your objects the garbage collection occurs before you are done with the construction so some of the objects get collected and thus are invalid and R crashes on you. If you want to be safe, protect the columns and then the generic vector you create out of it.
BTW: It is much safer to use the org.rosuda.REngine API instead of using RNI directly. It even provides REXP.createDataFrame() method that does all the work for you.

Related

Error: Maximal number of DLLs reached

I'm writing an R package which depends upon many other packages. When I load too many packages into the session I frequently got this error:
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/proxy/libs/proxy.so':
`maximal number of DLLs reached...
This post Exceeded maximum number of DLLs in R pointed out that the issue is with the Rdynload.c of the base R code:
#define MAX_NUM_DLLS 100
Is there any way to bypass this issue except modifying and building from source?
As of R 3.4, you can set a different max number of DLLs using and environmental variable R_MAX_NUM_DLLS. From the release notes:
The maximum number of DLLs that can be loaded into R e.g. via
dyn.load() can now be increased by setting the environment
variable R_MAX_NUM_DLLS before starting R.
Increasing that number is of course "possible"... but it also costs a bit
(adding to the fixed memory footprint of R).
I did not set that limit, but I'm pretty sure it was also meant as reminder for the useR to "clean up" a bit in her / his R session, i.e., not load package namespaces unnecessarily. I cannot yet imagine that you need > 100 packages | namespaces loaded in your R session.
OTOH, some packages nowadays have a host of dependencies, so I agree that this at least may happen accidentally more frequently than in the past.
The real solution of course would be a code improvement that starts with a relatively small number of "DLLinfo" structures (say 32), and then allocates more batches (of size say 32) if needed.
Patches to the R sources (development trunk in subversion at https://svn.r-project.org/R/trunk/ ) are very welcome!
---- added Jan.26, 2017: In the mean time, we've had a public bug report about this, a proposed patch (which was not good enough: There is always an OS dependent limit on the number of open files), and today that bug report has been closed by R core member #TomasKalibera who implemented new code where the maximal number of loaded DLLs is set at
pmax(100, pmin(1000, 0.6* OS_dependent_getrlimit_or_equivalent()))
and so on Windows and Linux (and not yet tested, but "almost surely" macOS), the limit should be considerably higher than previously.
----- Update #2 (written Jan.5, 2018):
In Oct'17, the above change was made more automatic with the following commit to the sources (of the development version of R - only!)
r73545 | kalibera | 2017-10-12 14:41:20
Increase the number of DLLs that can be loaded by default. If needed,
increase the soft limit on open files.
and on the help page ?dyn.load (https://stat.ethz.ch/R-manual/R-devel/library/base/html/dynload.html) the ulimit -n <num_open_files> is now mentioned (section Note close to bottom).
So you might consider using R's development version till that becomes "main stream" in April.
Alternatively, you do (in a terminal / shell)
ulimit -n 2048
and then start R from that terminal. Tomas Kalibera mentioned this to work on macOS.
I had this issue with the simpleSingleCell library in bioconductor
On the macOS you can't exceed 256. So I set my .Renviron in my home dir
R_MAX_NUM_DLLS=150
It's easy
Go to the environment variable and edit
variable_name = R_MAX_NUM_DLL
value = 1000
Restart R
worked well for me

Running R code on linux in parallel on computing cluster

I've recently converted my windows R code to a Linux installation for running DEoptim on a function. On my windows system it all worked fine using:
ans <- DEoptim1(Calibrate,lower,upper,
DEoptim.control(trace=TRUE,parallelType=1,parVAr=parVarnames3,
packages=c("hydromad","maptools","compiler","tcltk","raster")))
where the function 'Calibrate' consisted of multiple functions. On the windows system I simply downloaded the various packages needed into the R library. The option paralleType=1 ran the code across a series of cores.
However, now I want to put this code onto a Linux based computing cluster - the function 'Calibrate' works fine when stand alone, as does DEoptim if I want to run the code on one core. However, when I specify the parelleType=1, the code fails and returns:
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
7 nodes produced errors; first error: there is no package called ‘raster’
This error is reproduced whatever package I try and recall, even though the
library(raster)
command worked fine and 'raster' is clearly shown as okay when I call all the libraries using:
library()
So, my gut feeling is, is that even though all the packages and libraries are loaded okay, it is because I have used a personal library and the packages element of DEoptim.control is looking in a different space. An example of how the packages were installed is below:
install.packages("/home/antony/R/Pkges/raster_2.4-15.tar.gz",rpeo=NULL,target="source",lib="/home/antony/R/library")
I also set the lib paths option as below:
.libPaths('/home/antony/R/library')
Has anybody any idea of what I am doing wrong and how to set the 'packages' option in DEoptim control so I can run DEoptim across multiple cores in parallel?
Many thanks, Antony

R: Cannot allocate memory greater than x MB

I have a main function in R which calls other files to run my program. I call the main file through a bat file(.exe). When I run it line-by-line it runs without a memory error, but when I call the bat file to run it, it halts and gives me the following error:
Cannot allocate memory greater than 51 MB.
How can I avoid this?
Memory limitations in R such as this are a recurring nightmare for a lot of us.
Very often the problem is a limit imposed by your OS limits (which can usually be changed on a Bash or PowerShell command line), architecture (32 v. 64 bit), or the availability of contiguous free RAM, irregardless of overall available memory.
It's hard to say why something would not cause a memory issue when run line by line, but would hit the memory limit when run as a .bat.
What version of R are you running? Do you have both installed? Is 32-bit being called by Rscript when you run your .bat file whereas you run a 64-bit version line by line? You can check the version of R that's being run with R.Version().
You can test this by running the command memory.limit() in both your R IDE/terminal and in your .bat file (be sure to print or save the result as an object in your .bat file). You might also do well to try setting memory.limit() in your .bat file, as it may just have a smaller default, perhaps due to differences in your R Profile that's invoked in your IDE or terminal versus the .bat file.
If architecture isn't the cause of your memory error, then you have several more troubleshooting steps to try:
Check memory usage in both environments (in R directly and via your .bat process) using this:
sort( sapply(ls(),function(x){object.size(get(x))}))
Run the garbage collector explicitly in your scripts, that's the gc() command
Check all object sizes to make sure there are no unexpected results in your .bat process: sort( sapply(ls(),function(x){format(object.size(get(x)), units = "Mb")}))
Try memory profiling:
Rprof(tf <- "rprof.log", memory.profiling=TRUE)
Rprof(NULL)
summaryRprof(tf)
While this is a RAM issue, for good measure you might want to check that the compute power available is both sufficient and not varying between these two ways of running your code: parallel::detectCores()
Examine your performance with Prof. Hadley Wikham's lineprof tool (warning: requires devtools and doesn't work on lines of code which call the C programming language)
References While I'm pulling these snippets out of my own code, most of them originally came from other, related StackOverflow posts, such as:
Reaching memory allocation in R
R Memory Allocation "Error: cannot allocate vector of size 75.1 Mb"
R memory limit warning vs "unable to allocate..."
How to compute the size of the allocated memory for a general type
R : Any other solution to "cannot allocate vector size n mb" in R?
Yes you should be using 64bit R, if you can.
See this question, and this from the R docs.

How to crash R?

Is there a simple way to trigger a crash in R? This is for testing purposes only, to see how a certain program that uses R in the background reacts to a crash and help determine if some rare problems are due to crashes or not.
The easiest way is to call C-code. C provides a standard function abort()[1] that does what you want. You need to call: .Call("abort").
As #Phillip pointed out you may need to load libc via:
on Linux, dyn.load("/lib/x86_64-linux-gnu/libc.so.6") before issuing .Call("abort"). The path may of course vary depending on your system.
on OS X, dyn.load("/usr/lib/libc.dylib")
on Windows (I just tested it on XP as I could not get hold of a newer version.) you will need to install Rtools[2]. After that you should load dyn.load("C:/.../Rtools/bin/cygwin1.dll").
There is an entire package on GitHub dedicated to this:
crash
R package that purposely crash an R session. WARNING: intended
for test.
How to install a package from github is covered in other questions.
I'm going to steal an idea from #Spacedman, but I'm giving him full conceptual credit by copying from his Twitter feed:
Segfault #rstats in one easy step:
options(device=function(){});plot(1)
reported Danger, will crash your R session.
— Barry Rowlingson (#geospacedman) July 16, 2014
As mentioned in a comment to your question, the minimal approach is a simple call to the system function abort(). One way to do this in one line is to
R> Rcpp::cppFunction('int crashMe(int ignored) { ::abort(); }');
R> crashMe(123)
Aborted (core dumped)
$
or you can use the inline package:
R> library(inline)
R> crashMe <- cfunction(body="::abort();")
R> crashMe()
Aborted (core dumped)
$
You can of course also do this outside of Rcpp or inline, but then you need to deal with the system-dependent ways of compiling, linking and loading.
I'll do this in plain C because my C++-foo isn't Dirkian:
Create a C file, segv.c:
#include <signal.h>
void crashme(){raise(SIGSEGV);}
Compile it at the command line (windows users will have to work this out for themselves):
R CMD SHLIB segv.c
In R, load and run:
dyn.load("segv.so") # or possibly .dll for Windows users
.C("crashme")
Producing a segfault:
> .C("crashme")
*** caught segfault ***
address 0x1d9e, cause 'unknown'
Traceback:
1: .C("crashme")
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 1
aborting ...
Segmentation fault
This is the same behaviour as the one Thomas references in the graphics system bug report which I have filed and might get fixed one day. However this two-liner will always raise a segfault...
Maybe Dirk can one-line-Rcpp-ise it?
If you want to crash your R, try this
lapply("", function(x) eval(sys.call(1)))
(Save everything before running because this immediately results in "R Session Aborted")
Edit: This works for me on Windows 10.

JVM crashes after one successful request when using JRI

I am using the JRI api to use "R" in Java. I have created a web-service which has the JRI code . When I consume this web-service for the first time it works properly, but with a subsequent request the JVM crashes and says : "The crash happened outside the Java Virtual Machine in native code."
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (0xc0000029), pid=9148, tid=9716
#
# JRE version: 6.0_26-b03
# Java VM: Java HotSpot(TM) Client VM (20.1-b02 mixed mode windows-x86 )
# Problematic frame:
# C [ntdll.dll+0x8e1b9]
#
# An error report file with more information is saved as:
# C:\Users\ambarish\.netbeans\dev\config\GF3\domain1\hs_err_pid9148.log
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
Is this somehow connected to the fact that R has no threading support, you can run only one instance of R within a multi-threaded application?
I am using Rengine to run R scripts in Java, I tried to stop/destroy the Rengine object but it did not work. How can I make sure that Rengine instance is garbage-collected before the second request.
Please let me know how can I solve this issue.
One can only create a single instance of Rengine using JRI. So use Rserve instead, which supports threading.

Resources