Given the parallel package's warning against using mclapply() in GUI environments, I've been moving away from using RStudio for scripts calling that function. I think I've observed (though I can't test for) a performance improvement.
I realize that knitting markdown documents with parallel processes works in RStudio, as does running mclapply() in RStudio, much of the time. But can I expect better performance if I knit through Terminal, instead of through RStudio? Or might RStudio's calls to knit() not actually fork the GUI? If so, could calling source() from RStudio's Console be safe as well?
Unfortunately, I don't know how to reproduce the problems that (sometimes?) occur when forking through a GUI, so I haven't been able to run any tests myself. So perhaps a better question is, can anyone think of a systematic method of testing for which types of function calls will result in these problems?
For reference: https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf. (See point 2 in the introduction.)
Related
I use R, Rstudio and Rcpp and I spent over a week debugging some code, that was just giving errors and warnings in unexpected places, in some cases with direct sample code from online or package documentation.
I often restart the R session or Rstudio if there are obvious problems and they usually go away.
But this morning it was really bad to the point were basic R commands would fail and restarting R did nothing. I closed all the Rstudio sessions and restarted the machine for good measure, (which was unnecessary).
When it came back and I re-loaded the sessions everything seems to be working.
Even the some rcpp code I was working on for weeks with outside packages will now compile and run where it gave gibberish errors before.
I have known for a while that R needs to be restarted once in a while, but I know it when basic functions don't run, how can I know earlier.
I am looking for a good general resource or function that can tell me I need to restart because something is not running right. I would be nice if I can also know what to restart.
Whether the R session, the GUI such as Rstudio, all sessions and GUIs or a full machine restart.
For as long as I have been dabbling with or actually using R (ie more than two decades), it has always been recommended to start a clean and fresh session.
Which is why I prefer to work on command-line for tests. When you invoke R, or Rscript, or, in my case, r (from littler) you know you get a fresh session free of possible side-effects. By keeping these tests to the command-line, my main sessions (often multiple instances inside Emacs via ESS, possibly multiple RStudio sessions too) are less affected.
Even RStudio defaults to 'install and restart' when you rebuild a package.
(I will note that a certain development package implies you could cleanly unload a package. That has been debated at length, and I think by now even its authors qualify that claim. I don't really know or case as I don't use it, having had established workflows before it appeared.)
And to add: You almost never need to restart the computer. But a fresh clean process is a something to use often. Your computer can create millions of those for you.
I've recently run into an issue when using Rstudio-Server that multiple sessions are spawned instead of a single session. In my case (see below) five sessions are created instead of one. This happens even after trying the normal solutions: deleting ~/.rstudio, clearing .GlobalEnv, and restarting R. Note, there is no spawning issue when using the R command prompt.
My belief about the source of this problem is that it is due to a prematurely terminated mclapply. Here are the relevant docs from the parallel package. (discovered after the fact)
It is strongly discouraged to use these functions in GUI or embedded environments, because it leads to several processes sharing the same GUI which will likely cause chaos (and possibly crashes). Child processes should never use on-screen graphics devices.
At least one other person has had the same error as me but there is no documented solution that I can find. As the warning has already been ignored, I would appreciate any pointers that can help me get untangled.
Edit:
I am still encountering the error but was able to catch the ephemeral script sourcing issue that I believe is causing this problem. Unfortunately, I don't know what other files are being sourced and therefore what settings need to be changed. Grrrrr.....
Can anybody give me some direction on editing source code of an R package? From what I've seen, changing the package from within R does not seem to be possible. In editing outside of R, I'm stuck at unpacking the tar.gz. While I can now modify the function to my heart's content, the folder looks nothing like the working snow library. I presume I will need to turn the contents into a tar.gz once again and install it in the standard way?
My colleagues and I have been attempting to get makeSOCKcluster() to work with remote IPs for the past three days. Hangs indefinitely. After digging into the snow package I've found the issue to be in the way newSOCKnode() calls socketConnection(). If I run makeSOCKcluster("IP", manual=T) and then put the output into powershell, it results in the connection being made but the program not completing. However, I can run makeSOCKcluster("IP", manual=T) in one R instance and then run system("output", wait=F, input="") in another instance which results in the program completing. I believe I can simply modify snow to do this automatically.
I've inherited a sweave file from a different author. I'd like to pause it after it finishes running the R code to interrogate the variables and see the objects in the console before it goes to PDF generation.
Is there a way to do this Rstudio conveniently? Or even in emacs if I must?
Thanks!
For debugging or checking Sweave documents, run the file through Stangle, e.g.
Stangle("a.rnw")
This produces a pure R-file, which you can debug separately. If the tangled files runs ok, but the Sweave'd does not, it is almost always due to some \Sexpr{} expression. These are difficult to locate, the error messages can be highly confusing.
R CMD check takes a significant amount of time to complete on one of my packages because there are many examples/tests to run. Perhaps there's a way to run in parallel?
I stumbled upon this post which seems to have a solution for R CMD install on linux (I can't see how it would work on Windows):
http://r.789695.n4.nabble.com/parallel-build-for-package-equivalent-of-make-j8-td921920.html
Is there a solution for parallel R CMD check on Windows?
It's a hack, but you could take the tests out of the tests directory and put them somewhere else that they won't get run automatically (e.g. inst/tests), then use your own, parallelizable, framework (e.g. make run in parallel: http://dannythorpe.com/2008/03/06/parallel-make-in-win32/ may be relevant) to run the tests ... this won't help for examples, though.
A completely different approach would be to use the cacheSweave package, which caches the unchanging parts of your code from run-to-run. If you are tweaking some code but most of it is unchanged, this could save a lot of time. If plots are what's slowing things down however, cacheSweave won't help much (as explained in the vignette).