R: Memory Allocation on Ubuntu for R vs Rscript

R: Memory Allocation on Ubuntu for R vs Rscript - r

I have created a script, and when I run it line-by-line on Ubuntu, everything works from end-to-end, however, when I run it using Rscript it fails due to a memory allocation error.
I have tried to remove excess variables using rm followed by gc, and I have tried to run the code using R CMD BATCH, but nothing seems to work.
Does anyone know why memory allocation is different between running R and Rscript?
Sorry, I cannot really do a reproducible example for this problem due to the sensible data, but the packages I use are RJDBC with the setting options(java.parameters = "-Xmx8g") on a computer with 16 Gb of RAM, and I also use the parallel-package, which is where the error occurs.

Related

R package: Error appears not in Windows but in Linux

guys.
I am currently trying to build an R package (using rcpp). By this suggestion, I encountered an error, which was fixed by putting the following line in src/Makevar and/or src/Makevar.win.
PKG_CXXFLAGS = -DRCPP_ARMADILLO_FIX_Field
However, this error disappeared when I run my function in Windows on my laptop, but it again appeared in Linux. (I usually wirte the code in Windows on my laptop and run the code by using Linux for parallel computing provided by my university) It seems that that line can remove the flag in Windows but cannot remove it in Linux.
How can this happen, and how can I fix this?

Use valgrind with `R CMD check`

I want to run valgrind on the tests, examples and vignettes of my package. Various sources insinuate that the way to do this should be:
R CMD build my-pkg
R CMD check --use-valgrind my-pkg_0.0.tar.gz
R CMD check seems to run fine, but shows no evidence of valgrind output, even after setting the environment variable VALGRIND_OPTS: --memcheck:leak-check=full. I've found sources that hint that R needs to run interactively for valgrind to show output, but R -d CMD check (or R -d "CMD check") seems to be the wrong format.
R -d "valgrind --tool=memcheck --leak-check=full" --vanilla < my-pkg.Rcheck/my-pkg-Ex.R does work, but only on the example files; I can't see a simple way to run this against my vignettes and testthat tests.
What is the best way to run all relevant scripts through valgrind? For what it's worth, the goal is to integrate this in a GitHub actions script.

Edit Mar 2022: The R CMD check case is actually simpler, running R CMD check --use-valgrind [other options you may want] will run the tests and examples under valgrind and then append the standard valgrind summary at the end of the examples output (i.e., pkg.Rcheck/pkg-Ex.Rout) and test output (i.e., pkg.Rcheck/tinytest.Rout as I use tinytest)_. What puzzles me now is that an error detected by valgrind does not seem to fail the test.
Original answer below the separator.
There is a bit more to this: you helps to ensure that the R build is instrumented for it. See Section 4.3.2 of Writing R Extensions:
On platforms where valgrind is installed you can build a version of R with extra instrumentation to help valgrind detect errors in the use of memory allocated from the R heap. The configure option is --with-valgrind-instrumentation=level, where level is 0, 1 or 2. Level 0 is the default and does not add anything. Level 1 will detect some uses117 of uninitialised memory and has little impact on speed (compared to level 0). Level 2 will detect many other memory-use bugs118 but make R much slower when running under valgrind. Using this in conjunction with gctorture can be even more effective (and even slower).
So you probably want to build yourself a Docker-ized version of R to call from your GitHub Action. I think the excellent 'sumo' container by Winston has a valgrind build as well so you could try that as well. It's huge at over 4gb:
edd#rob:~$ docker images| grep wch # some whitespace edit out
wch1/r-debug latest a88fabe8ec81 8 days ago 4.49GB
edd#rob:~$
And of course if you test dependent packages you have to get them into the valgrind session too...

Unfortunately, I found the learning curve involved in dockerizing R in GitHub actions, per #dirk-eddelbuettel's suggestion, too steep.
I came up with the hacky approach of adding a file memcheck.R to the root of the package with the contents
devtools::load_all()
devtools::run_examples()
devtools::build_vignettes()
devtools::test()
(remembering to add to .Rbuildignore).
Running R -d "valgrind --tool=memcheck --leak-check=full" --vanilla < memcheck.R then seems to work, albeit with the reporting of some issues that appear to be false positives, or at least issues that are not identified by CRAN's valgrind implementation.
Here's an example of this in action in a GitHub actions script.
(Readers who know what they are doing are invited to suggest shortcomings of this approach in the comments!)

R commands (class, object.size) hang

I have run into a strange and impossible to reproduce outside of the context of my code problem. When in browser mode, certain basic R commands just hang. For instance, class() and object.size() never return, running on objects that should be 1-2 MB in size. I'm running R 3.6.3 on ubuntu 18.04 from the command line. Any thoughts on what this might mean? The code does NOT hang when not run in the browser mode.

can't run bartMachine parallel

bartMachine package for R should rely in parallel processing for reducing computing time, however I can't find how to make it work: the documentation of the packages repeats that it supports parallel processing, but there are no instruction for how to make it do it, and I can see that only one of the logical cores of my pc is working.
I use ubuntu 16.04.4 and I tied installing bartMachine via compilation from source, as recommended by its github page, thought I'm not sure I did everything correctly.
what can I do to make bartMachine finally work in parallel?

Have you tried running set_bart_machine_num_cores(num_cores) in R before running bartMachine? This did the trick for me.
See https://rdrr.io/cran/bartMachine/man/set_bart_machine_num_cores.html

On open, Rstudio starts many processes (started with parallel package in previous session) -- how to kill them?

I have read through this SO question and answers (R parallel computing and zombie processes) but it doesn't seem to quite address my situation.
I have a 4-core MacBook Pro running Mac OS X 10.10.3, R 3.2.0, and RStudio 0.99.441.
Yesterday, I was trying out the packages "foreach" and "doParallel" (I want to use them in a package I am working on). I did this:
cl <- makeCluster(14)
registerDoParallel(cl)
a <- 0
ls <- foreach(icount(100)) %dopar% {
b <- a + 1
}
It is clear to me that it doesn't make sense to have 14 processes on my 4-core machine, but the software will actually be run on a 16-core machine. At this point my computer ground to a halt. I opened activity monitor and found 16 (or more, maybe?) R processes. I tried to force quit them from the activity monitor -- no luck. I closed RStudio and that killed all the R processes. I reopened RStudio and that restarted all the R processes. I restarted the computer and restarted RStudio and that restarted all the R processes.
How can I start RStudio without restarting all those processes?
EDIT: I forgot to mention that I also rebuilt the package I was working on at the time (all the processes may have been running during the build)
EDIT2: Also, I can't StopCluster(cl) because cl is not in the environment anymore...I closed that R session.
EDIT3: When I open R.app (The R GUI provided with R) or open R in the terminal, no such problem occurs. So I think it must be RStudio-related.
EDIT4: There appears to be a random delay between opening RStudio and the starting of all these undesired processes. Between 15s and 2 mins.
EDIT5: It seems the processes only start after I open the project from which they were started.
EDIT6: I have been picking through the .Rproj.user files looking for things to delete. I deleted all the files (but not the directories) in ctx, pcs, and sdb. Problem persists.
EDIT7: When I run "killall R" at the command line it kills all these processes, but when I restart RStudio and reopen the project, all the processes start again.
EDIT8: I used "killall -s R | wc -l" to find that the number of R processes grows and grows while the project is open. It got up to 358 and then I ran "killall R" because my computer was making scary sounds.
EDIT9: RStudio is completely unusable currently. Every time I "killall R", it restarts all the processes within 15 seconds.
EDIT10: When I initiate a build that also starts up tons of R processes -- 109 at last check. These processes all get started up when the build says "preparing package for lazy loading". At this point the computer grinds to a near-halt.
EDIT11: I deleted the .Rproj file (actually just moved it as a backup) and the .Rproj.user directory. I used "create project from directory" in RStudio. When I open that NEW project, I still get the same behavior. What is RStudio doing when I open a project that isn't contained anywhere in the .Rproj file or the .Rproj.user directory!? I've spent the whole day on this one problem....:(

Best guess -- the newest version of RStudio tries to do some work behind the scenes to develop an autocompletion database, based on library() and require() calls it detects within open files in your project. To do this, it launches new R processes, loads those packages (with library()), and then returns the set of all objects made available by that package.
By any chance are you loading certain packages that have complex .onLoad() actions? It's possible that this engine in RStudio is running those in R processes behind the scenes, but getting stuck for some reason and leaving you with these (maybe stale or busy) R processes.
For reference, there was somewhat similar issue reported here.

Here's what ended up fixing it:
Delete the package I built (the binary, I believe...I clicked the "x" to the right of it's name in the "Packages" part of RStudio).
Rebuild it, with
library(parallel)
commented out.

unloadNamespace("doParallel")
will kill the unnamed worker started by registerDoParallel
if you have the name of the clusters, you can use:
stopCluster(cl)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex