multiple Rstudio sessions following the use of the parallel package - r

I've recently run into an issue when using Rstudio-Server that multiple sessions are spawned instead of a single session. In my case (see below) five sessions are created instead of one. This happens even after trying the normal solutions: deleting ~/.rstudio, clearing .GlobalEnv, and restarting R. Note, there is no spawning issue when using the R command prompt.
My belief about the source of this problem is that it is due to a prematurely terminated mclapply. Here are the relevant docs from the parallel package. (discovered after the fact)
It is strongly discouraged to use these functions in GUI or embedded environments, because it leads to several processes sharing the same GUI which will likely cause chaos (and possibly crashes). Child processes should never use on-screen graphics devices.
At least one other person has had the same error as me but there is no documented solution that I can find. As the warning has already been ignored, I would appreciate any pointers that can help me get untangled.
Edit:
I am still encountering the error but was able to catch the ephemeral script sourcing issue that I believe is causing this problem. Unfortunately, I don't know what other files are being sourced and therefore what settings need to be changed. Grrrrr.....

Related

permissions for installing packages on julia in slurm cluster

I've just installed julia for usage on a slurm cluster. Running a hello world job works well, so the installation was successful ... until installing a first package which gives some permissions issues. Script with command
Pkg.add("MAT")
or
Pkg.installed()
gives error message
ERROR: LoadError: SystemError (with /home/<my_user_name>/.julia/logs): mkdir: Permission denied
The same error appears if I start up julia command line from the user directory. Such message disappears when starting julia using sudo, but obviously cannot sudo cluster jobs.
I tried installing the pkg with sudo on the user account, then just using it non-sudo, but other error messages arise similar to those documented here.
https://github.com/JuliaLang/julia/issues/12876
On this page it's indicated to chown user MAT.ji , but that does not work. I tried removing and re-add the package but I'm just running in circles with the same error messages. I also got, at one point, error messages like EACCES similar to documented here
https://discourse.julialang.org/t/juliapro-pkg-installation-error-ioerror-unlink-permission-denied-eacces/35912
I'm a novice with permissions issues like this, so I could use some guidance on how to approach this problem. I'm not sure what to try, and in what order.
Permissions issues on clusters can be tricky.
If we are talking about a physical cluster, the simplest generic solution that you can probably get to work without involving your sysadmin is probably to just install your .julia somewhere where just about every process has filesystem permissions. Namely, global networked scratch (wherever that exactly is on your cluster).
This is arguably a good idea anyways, given that global scratch tends to be the fastest or one of the fastest networked filesystems around on most clusters, and every julia process is going to have to read from .julia when you start a job, so if that's on a fast parallel filesystem, so much the better. On the other hand, scratch tends to have a time limit, so you might want to keep a local copy around for when scratch/<yourusername>/.julia inevitably gets deleted.
In order for this to work well, you have tell Julia so that it knows where to look for .julia, and not to just make a new one when it doesn't find it in the default location (~). One relatively simple way to do this is with environment variables. You could do this manually, but I recommend instead putting something like the following in your ~/.bash_profile.
# Some julia-specific environment variables
# To make sure I call the Julia installation in global scratch
export JULIA_DEPOT_PATH="/path/to/your/global/scratch/username/.julia"
export JULIA_PROJECT="/path/to/your/global/scratch/username/.julia/environments/v1.6/Project.toml"
export JULIA_LOAD_PATH="/path/to/your/global/scratch/username/.julia/environments/v1.6/Project.toml:/path/to/your/julia/application/folder/possibly/also/in/scratch/julia-1.6.1/share/julia/stdlib/v1.6" # The second one may be tricky to find if you're using a cluster-provided julia, but you can always just download the latest julia release yourself and plop it in scratch too
export JULIA_BINDIR="/path/to/your/julia/application/folder/possibly/also/in/scratch/julia-1.6.1/bin" # This last line may not be necessary, but doesn't hurt to specify.
where julia versions and the actual path to your global scratch folder are adjusted as appropriate.

When should I restart R session, GUI or computer?

I use R, Rstudio and Rcpp and I spent over a week debugging some code, that was just giving errors and warnings in unexpected places, in some cases with direct sample code from online or package documentation.
I often restart the R session or Rstudio if there are obvious problems and they usually go away.
But this morning it was really bad to the point were basic R commands would fail and restarting R did nothing. I closed all the Rstudio sessions and restarted the machine for good measure, (which was unnecessary).
When it came back and I re-loaded the sessions everything seems to be working.
Even the some rcpp code I was working on for weeks with outside packages will now compile and run where it gave gibberish errors before.
I have known for a while that R needs to be restarted once in a while, but I know it when basic functions don't run, how can I know earlier.
I am looking for a good general resource or function that can tell me I need to restart because something is not running right. I would be nice if I can also know what to restart.
Whether the R session, the GUI such as Rstudio, all sessions and GUIs or a full machine restart.
For as long as I have been dabbling with or actually using R (ie more than two decades), it has always been recommended to start a clean and fresh session.
Which is why I prefer to work on command-line for tests. When you invoke R, or Rscript, or, in my case, r (from littler) you know you get a fresh session free of possible side-effects. By keeping these tests to the command-line, my main sessions (often multiple instances inside Emacs via ESS, possibly multiple RStudio sessions too) are less affected.
Even RStudio defaults to 'install and restart' when you rebuild a package.
(I will note that a certain development package implies you could cleanly unload a package. That has been debated at length, and I think by now even its authors qualify that claim. I don't really know or case as I don't use it, having had established workflows before it appeared.)
And to add: You almost never need to restart the computer. But a fresh clean process is a something to use often. Your computer can create millions of those for you.

RStudio Project stalls even without running anything

One of my RStudio projects stalls even before I've run any code or loaded any packages or data. I can edit scripts but it won't save them and it won't run code in the console. I am still able to use R from the terminal. After some time (in the range of an hour or so), multiple dialog boxes will pop up with the message Unable to establish connection with R session. I've seen a similar thing before when loading big datasets or running something computationally intensive but never before I've even run any code or loaded any data.
My other Projects don't seem to have the same problem.
I've also filed an issue here on the RStudio github with some screenshots and log files in case that's helpful.
The solution was to update the data.table package. See the issue I filed or this thread on RStudio community

How switch R architectures dynamically in RStudio

In RStudio there's a Tools menu which allows you to select an installed version/architecture of R under Global Options.
That's great, but my issue with that is that, as the name implies, it is a Global option, so once you select a different architecture (or version number) you then have to restart RStudio and it applies to all of your RStudio instances and projects.
This is a problem for me because:
I have some scripts within a given project that strictly require 32-bit R due to the fact that they're interfacing with 32-bit databases, such as Hortonworks' Hadoop
I have other scripts within the same project which strictly require 64-bit R, due to (a) availability of certain packages and (b) memory limits being prohibitively small in 32-bit R on my OS
which we can call "Issue #1" and it's also a problem because I have certain projects which require a specific architecture, though all the scripts within the project use the same architecture (which should theoretically be an easier to solve problem that we can call "Issue #2").
If we can solve Issue #1 then Issue #2 is solved as well. If we can solve Issue #2 I'll still be better off, even if Issue #1 is unsolved.
I'm basically asking if anyone has a hack, work-around, or better workflow to address this need for frequently switching architectures and/or needing to run different architectures in different R/RStudio sessions simultaneously for different projects on a regular basis.
I know that this functionality would probably represent a feature request for RStudio and if this question is not appropriate for StackOverflow for that reason then let me know and I'll delete it. I just figured that a lot of other people probably have this issue, so maybe someone has found a work-around/hack?
There's no simple way to do this, but there are some workarounds. One you might consider is launching the correct bit-flavor of R from the current bit-flavor of R via system2 invoking Rscript.exe, e.g. (untested code):
source32 <- function(file) {
system2("C:\\Program Files\\R\\R-3.1.0\\bin\\i386\\Rscript.exe", normalizePath(file))
}
...
# Run a 64 bit script
source("my64.R")
# Run a 32 bit script
source32("my32.R")
Of course that doesn't really give you a 32 bit interactive session so much as the ability to run code as 32 bit.
One other tip: If you hold down CTRL while launching RStudio, you can pick the R flavor and bitness to launch on startup. This will save you some time if you're switching a lot.

Refresh R console without quitting the session?

I usually open the R console all day long, but sometimes I need to clean my history and my workspace's background so that I can test functions or load new data.
I'm wondering whether there is an easier way to use a command line in .Rprofile so that I can refresh the R console without quitting or rebooting my current session.
What I have usually done for this is to q() without saving and then start R again and clean the History. I think somebody here might be able to give me some better suggestions.
Thanks in advance.
For what concerns history, in UNIX-like systems (mine is Debian) this command refreshes it
loadhistory("")
However, as said in comments, loadhistory seems to be platform-dependent.
Check your ?loadhistory if present on your platform. Mine says:
There are several history mechanisms available for the different
R consoles, which work in similar but not identical ways. There
are separate versions of this help file for Unix and Windows.
The functions described here work on Unix-alikes under the
readline command-line interface but may not otherwise (for
example, in batch use or in an embedded application)

Resources