using RStudio with self compiled R - r

How can I get RStudio to recognize my version of R which is installed to
/opt/R/3.4.3/
by compiling it myself (make install) and ln -s /opt/R/${R_VERSION}/bin/R /bin/R. When executing on a shell, R works just fine. Only RStudio does not recognize the different path and is still looking at:
/usr/local/lib64/R/bin/exec/R
exact error message:
Feb 3 14:50:18 devbox systemd: Starting RStudio Server...
Feb 3 14:50:18 devbox systemd: Started RStudio Server.
Feb 3 14:50:18 devbox rserver[22411]: ERROR R did not return any output when queried for directory location information; LOGGED FROM: bool rstudio::core::r_util::<unnamed>::detectRLocationsUsingR(const std::string&, rstudio::core::FilePath*, rstudio::core::FilePath*, rstudio::core::config_utils::Variables*, std::string*) /root/rstudio/src/cpp/core/r_util/REnvironmentPosix.cpp:483
Feb 3 14:50:18 devbox rserver[22411]: ERROR system error 71 (Protocol error) [description=Unable to parse version from R, version-info=, r-error=/usr/local/lib64/R/bin/exec/R: error while loading shared libraries: libmkl_gf_lp64.so: cannot open shared object file: No such file or directory|||]; OCCURRED AT: rstudio::core::Error rstudio::core::r_util::rVersion(const rstudio::core::FilePath&, const rstudio::core::FilePath&, const std::string&, std::string*) /root/rstudio/src/cpp/core/r_util/REnvironmentPosix.cpp:784; LOGGED FROM: bool rstudio::core::r_util::detectREnvironment(const rstudio::core::FilePath&, const rstudio::core::FilePath&, const std::string&, std::string*, std::string*, rstudio::core::r_util::EnvironmentVars*, std::string*) /root/rstudio/src/cpp/core/r_util/REnvironmentPosix.cpp:678
I realized (see answer below) that R only worked as long as I did not loose the current bash environment. Executing:
source /opt/intel/mkl/bin/mklvars.sh intel64
fixes this. However, I cant get RStudio to execute this before starting up. I played around with ExecStartPre=/opt/intel/mkl/bin/mklvars.sh intel64, but it fails to get the environment up correctly

On Linux, RStudio Desktop and Open-Source Server use the version of R pointed to by the output of which R. If RStudio is unable to locate R using which R, it will fall back to scanning explicitly for the R script in the /usr/local/bin and /usr/bin directories.
If you want to override which version of R is used then you can set the RSTUDIO_WHICH_R environment variable to the R executable that you want to run against. For example:
export RSTUDIO_WHICH_R=/usr/local/bin/R
See RStudio Support: Using Different Versions of R

I manually need to load
source /opt/intel/mkl/bin/mklvars.sh intel64
into the environment for R to work, as otherwise links are broken and R won't start up which leads to RStudio complaining (with a not 100% helpful error message).

Related

R: "internet routines cannot be loaded" when starting from RStudio

I am running Red Hat Enterprise Linux (RHEL) 8.5 with Linux kernel 4.18 and Gnome 3.32.2. In this system, I've got R 4.1.2 compiled with the tool asdf with shared libraries enabled. On top of that, I installed RStudio 2021.09.01-372 from an RPM from the official RStudio website.
When I start Rstudio, the first line of output after the usual R startup is an error:
Error in tools::startDynamicHelp() : internet routines cannot be loaded
I am unable to figure out what's causing this error, and with it I can't run things like refresh CRAN or update packages. But if I start a pure R session from the terminal (instead of Rstudio) this error does not occur.
Some things I tried:
Install the krb5 and libssh2 packages on my host system: Didn't help.
Starting a "pure" R session (both with and without the --vanilla argument) from the Terminal tab within Rstudio also gives this error. If I try to run update.packages() from this session, it pops up a window to select a CRAN mirror then fails with the following:
Warning: failed to download mirrors file (internet routines cannot be loaded); using local file '/home/[my username]/.asdf/installs/R/4.1.2/lib64/R/doc/CRAN_mirrors.csv'
Warning: unable to access index for repository https://cloud.r-project.org/src/contrib:
internet routines cannot be loaded
Warning message:
In download.file(url, destfile = f, quiet = TRUE) :
unable to load shared object '/home/penyuan/.asdf/installs/R/4.1.2/lib64/R/modules//internet.so':
/lib64/libssh.so.4: undefined symbol: EVP_KDF_ctrl, version OPENSSL_1_1_1b
But like I said, the strange thing is if I start an R session outside of Rstudio, these errors don't happen.
Within RStudio, the only workaround I can find is to run this command upon startup (suggested in this thread):
options(download.file.method="wget")
Once this is done, everything else seems to work, such as package updates.
However, I don't want to manually do this every time I start RStudio. So I tried to put it into ~/.Rprofile including a test print() as follows:
print("This is `~/.Rprofile`")
options(download.file.method="wget")
When I open RStudio, I can see the output from the print() call, but the options() command is not run because the original error shows up again. I still have to manually enter options(download.file.method="wget") every time.
I also tried to fold everything into a .First function in ~/.Rprofile as follows:
.First <- function() {
options(download.file.method="wget")
print("This is the `.First` function in `~/.Rprofile`")
}
Unfortunately, same result as before: print()'s output is seen, but options() is not run.
I also made sure that my ~/.Rprofile includes a trailing newline as discussed here. But this didn't help.
The above are the steps I've tried so far.
Why does this error only occur when running RStudio or a terminal within Rstudio? Why doesn't it happen if I start R from a terminal outside of Rstudio?
Is there a way to solve the problem so that the error doesn't happen in the first place? If it can't be solved, how do I set up my ~/.Rprofile so that options(download.file.method="wget") will be run?
Thank you.

mlflow R installation MLFLOW_PYTHON_BIN

I am trying to install mlflow in R and im getting this error message saying
mlflow::install_mlflow()
Error in mlflow_conda_bin() :
Unable to find conda binary. Is Anaconda installed?
If you are not using conda, you can set the environment variable MLFLOW_PYTHON_BIN to the path of yourpython executable.
I have tried the following
export MLFLOW_PYTHON_BIN="/usr/bin/python"
source ~/.bashrc
echo $MLFLOW_PYTHON_BIN -> this prints the /usr/bin/python.
or in R,
sys.setenv(MLFLOW_PYTHON_BIN="/usr/bin/python")
sys.getenv() -> prints MLFLOW_PYTHON_BIN is set to /usr/bin/python.
however, it still does not work
I do not want to use conda environment.
how to I get past this error?
The install_mlflow command only works with conda right now, sorry about the confusing message. You can either:
install conda - this is the recommended way of installing and using mlflow
or
install mlflow python package yourself via pip
To install mlflow yourself, pip install correct (matching the the R package) python version of mlflow and set the MLFLOW_PYTHON_BIN environment variable as well as MLFLOW_BIN evn variable: e.g.
library(mlflow)
system(paste("pip install -U mlflow==", mlflow:::mlflow_version(), sep=""))
Sys.setenv(MLFLOW_BIN=system("which mlflow"))
Sys.setenv(MLFLOW_PYTHON_BIN=system("which python"))
Just ran across this, and the accepted answer by #Tomas was very helpful. I added a comment above but, for some additional context, I wanted to create a more thorough response if any other Enterprise Databricks R users run across this post trying to use the MLflow package for R on Databricks.
The Databricks MLflow quickstart guide will tell you that you need to run the following:
library(mlflow)
install_mlflow()
However, for Enterprise Databricks users, the install_mlflow() function will fail if your cluster doesn't have outside connectivity privileges (as most probably don't) and can't connect to the Anaconda repo to download the necessary packages. You'll likely get an error like this:
CondaHTTPError: HTTP 000 CONNECTION FAILED for url https://conda.anaconda.org/conda-forge/linux-64/current_repodata.js
The good news is that MLflow should already be installed on your Databricks runtime. So you can reference that install instead, and then as #Tomas mentioned, use it to set your R environment variables for MLFLOW_BIN and MLFLOW_PYTHON_BIN. From there, the R MLflow API works as specified (in my experience, but ymmv).
The only catch from the above solution is that when you use the system()function in R, you need to set intern=TRUE in order capture the output of the command. The default behavior of the system() function is intern=FALSE. Thus if you do not explicitly set intern=TRUE, then the exit code 0 will be returned from your system() call (or perhaps another exit code upon an error) and Sys.setenv() will set the environment variable to 0!
### intern=True missing ###
Sys.setenv(MLFLOW_BIN=system("which mlflow"))
Sys.setenv(MLFLOW_PYTHON_BIN=system("which python"))
Example output (you can see the the environment variables did not get set correctly):
s <- Sys.getenv()
s[grep("MLFLOW", names(s))]
MLFLOW_BIN 0
MLFLOW_CONDA_HOME /databricks/conda
MLFLOW_PYTHON_BIN 0
MLFLOW_PYTHON_EXECUTABLE
/databricks/python/bin/python
MLFLOW_TRACKING_URI databricks
However, when intern=TRUE, you'll get the correct environment variables:
### intern=True set ###
Sys.setenv(MLFLOW_BIN=system("which mlflow", intern=TRUE))
Sys.setenv(MLFLOW_PYTHON_BIN=system("which python", intern=TRUE))
Example output:
s <- Sys.getenv()
s[grep("MLFLOW", names(s))]
MLFLOW_BIN /databricks/python3/bin/mlflow
MLFLOW_CONDA_HOME /databricks/conda
MLFLOW_PYTHON_BIN /databricks/python3/bin/python
MLFLOW_PYTHON_EXECUTABLE
/databricks/python/bin/python
MLFLOW_TRACKING_URI databricks
Note: This was using Databricks runtime 9.1 LTS ML. This may or may not work on other Databricks runtime configurations.

Information shows in browser, not "R Help" pane, when using StatET

I am using Eclipse 2018-09 and StatET 3.6.1. I have two R environments: one for R 3.3.2, another for R 3.5.1. When I execute a command like ?lm to call up a help page in v3.3.2, the page appears in the StatET "R Help" pane, as I want it to. But when I execute the same command to call up a help page in v3.5.1, the console tells me "starting httpd help server," and the help page loads in my browser. How can I get help to load in the "R Help" pane when using v3.5.1?
The problem arises whether or not I run Eclipse as an administrator, and whether or not I "Auto Run" R 3.5.1 upon booting. And as far as I can tell, the configurations for the two versions of R are nearly identical:
Both are running Windows 10 and JRE 1.8.0_121.
Both with version 2.1 of the rj package.
For both, I've checked every box in "Run Configurations > R Console > R Console > Eclipse Integration," including "Enable R Help for StatET."
For both, options("help_type") == 'html'.
Inspection of the different log files that I get when using Auto Run with the two versions of R suggests that there is some sort of StatET config problem that has to do with Derby. These messages appear only when I boot R 3.5.1:
!ENTRY de.walware.statet.r.core 4 -1 2018-11-16 02:50:29.092
!MESSAGE An error occurred when initializing DB for model.
!STACK 1
org.eclipse.core.runtime.CoreException: An error occurred when loading embedded DB (Derby + DBCP)
DB ConnectionURL=[path redacted]\eclipse-workspace\.metadata\.plugins\de.walware.statet.r.core\db
[...]
Caused by: java.sql.SQLException: Another instance of Derby may have already booted the database C:\[path redacted]\eclipse-workspace\.metadata\.plugins\de.walware.statet.r.core\db.
[...]
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database C:\[path redacted]\eclipse-workspace\.metadata\.plugins\de.walware.statet.r.core\db.
But I don't know what to make of this problem or how to fix it.
Stephan Wahlbrink, the creator of StatET, recommended running this command at startup:
registerS3method("print", "help_files_with_topic", rj::print.help_files_with_topic)
I ran the command, and it solved the problem.
(Stephan added that the Derby errors that I saw in the logs were unrelated to the problem of displaying help in the "R Help" pane.)

Sys.setenv("SNC_LIB" = lib_path_64) fails

I'm using the package RSAP to read SAP data.
RSAP loads a SNC (Secure Network Connection) dynamic library and searching for it with the environment variable SNC_LIB.
Depending on the local user system, this might be a 32 or 64 bit library.
I'm setting the environment variable within my R script.
But RSAP still search in the old path.
I try to avoid setting the environment variable outside by application because it's a shiny app which should be used by many users.
It seems that the environment variable is changed only within the RSTUDIO session but not outside.
Initial situation of the environment variables within RStudio console:
> Sys.getenv("SNC_LIB_64")
[1] "C:\\Program Files\\SAP\\FrontEnd\\SecureLogin\\lib\\sapcrypto.dll"
> Sys.getenv("SNC_LIB")
[1] "C:\\Program Files (x86)\\SAP\\FrontEnd\\SecureLogin\\lib\\sapcrypto.dll"
Coding:
# check SNC_LIB path from environment variables
# 32 or 64 bit?
# if 64 bit lib path is set, set the default lib path variable
# SNC_LIB to it
lib_path_64 <- Sys.getenv("SNC_LIB_64")
if (lib_path_64 != "") {
Sys.setenv("SNC_LIB" = lib_path_64)
}
After the code is executed in RStudio debugger:
Browse[2]> Sys.getenv("SNC_LIB")
[1] "C:\\Program Files\\SAP\\FrontEnd\\SecureLogin\\lib\\sapcrypto.dll"
Error thrown by RSAP on loading the library:
[Thr 12160] Wed Jan 03 17:42:57 2018
[Thr 12160] *** ERROR => SncPDLInit()==SNCERR_INIT, Adapter #1 (C:\Program Files (x86)\SAP\FrontEnd\SecureLogin\lib\sapcrypto.dll) not loaded [sncxxdl.c 727]
Old path is used. When I change the path outside before running RStudio it's working.
Question:
Is there a way to set the library path variable SNC_LIB in another way to be sure is globally and not locally changed and RSAP dynamic loading is working well?
Easy way to reproduce is:
Start RStudio
Call Sys.setenv("TEST_VAR" = "good")
Call Sys.getenv("TEST_VAR")
See right result [1] "good"
Close RStudio
Start RStudio again
Call again Sys.getenv("TEST_VAR")
See 'wrong' unexpected result [1] ""
Environment variables set in R affect that process and processes it runs, they don't persist when R quits.
It's not clear what steps you took to lead to your RSAP error, but your "easy code to reproduce" script is acting as expected.
The only way a Sys.setenv() in an R session will affect a subsequent library load is if that load is happening in the R session (e.g. loading an R package that loads the library) or in a process R launches (e.g. running a command using system()).

rbundler build error: "cannot open file 'startup.Rs': No such file or directory"

I'm running into an issue when building the following package: https://github.com/yoni/rbundler
My test attempts to run rbundler's bundle command on a trivial package which has a single dependency. The test passes on my OSX machine, but fails on my x86_64-redhat-linux-gnu Jenkins server. Both machines are running R 2.15.1 with devtools 0.7.1, which includes this bug fix.
The full test output can be found in this gist.
Here's a short summary of error I'm seeing:
Error in file(filename, "r", encoding = encoding) :
cannot open the connection
Calls: local ... eval.parent -> eval -> eval -> eval -> eval -> source -> file
In addition: Warning message:
In file(filename, "r", encoding = encoding) :
cannot open file 'startup.Rs': No such file or directory
Execution halted
The background for this is that I'm trying to build a dependency management system for R. The idea is that an R project should be able to run without using system-wide or user-wide libraries. Rather, the R project will have it's own library installed under it's root directory.
For my previous Stack Overflow question related to Dependency Management in R, see Dependency management in R
In my case this issue was caused by the environment variable R_TESTS that was set to startup.Rs
When you execute another R process from within your tests (in my case it was submitted via OGS qsub), the presence of this environment variable causes issues.
I can't answer your question directly, but two things you can try get more information about what is happening.
use 'env' to dump environment variables on your OSX machine and the Jenkins host
run the process through strace on Linux and dtruss on OSX to trap the system calls
strace/dtruss should reveal the places in which it is searching for startup.Rs and env output will likely give you a environment variable that differs between the system accounting for the different outcome.

Resources