mlflow R installation MLFLOW_PYTHON_BIN - r

I am trying to install mlflow in R and im getting this error message saying
mlflow::install_mlflow()
Error in mlflow_conda_bin() :
Unable to find conda binary. Is Anaconda installed?
If you are not using conda, you can set the environment variable MLFLOW_PYTHON_BIN to the path of yourpython executable.
I have tried the following
export MLFLOW_PYTHON_BIN="/usr/bin/python"
source ~/.bashrc
echo $MLFLOW_PYTHON_BIN -> this prints the /usr/bin/python.
or in R,
sys.setenv(MLFLOW_PYTHON_BIN="/usr/bin/python")
sys.getenv() -> prints MLFLOW_PYTHON_BIN is set to /usr/bin/python.
however, it still does not work
I do not want to use conda environment.
how to I get past this error?

The install_mlflow command only works with conda right now, sorry about the confusing message. You can either:
install conda - this is the recommended way of installing and using mlflow
or
install mlflow python package yourself via pip
To install mlflow yourself, pip install correct (matching the the R package) python version of mlflow and set the MLFLOW_PYTHON_BIN environment variable as well as MLFLOW_BIN evn variable: e.g.
library(mlflow)
system(paste("pip install -U mlflow==", mlflow:::mlflow_version(), sep=""))
Sys.setenv(MLFLOW_BIN=system("which mlflow"))
Sys.setenv(MLFLOW_PYTHON_BIN=system("which python"))

Just ran across this, and the accepted answer by #Tomas was very helpful. I added a comment above but, for some additional context, I wanted to create a more thorough response if any other Enterprise Databricks R users run across this post trying to use the MLflow package for R on Databricks.
The Databricks MLflow quickstart guide will tell you that you need to run the following:
library(mlflow)
install_mlflow()
However, for Enterprise Databricks users, the install_mlflow() function will fail if your cluster doesn't have outside connectivity privileges (as most probably don't) and can't connect to the Anaconda repo to download the necessary packages. You'll likely get an error like this:
CondaHTTPError: HTTP 000 CONNECTION FAILED for url https://conda.anaconda.org/conda-forge/linux-64/current_repodata.js
The good news is that MLflow should already be installed on your Databricks runtime. So you can reference that install instead, and then as #Tomas mentioned, use it to set your R environment variables for MLFLOW_BIN and MLFLOW_PYTHON_BIN. From there, the R MLflow API works as specified (in my experience, but ymmv).
The only catch from the above solution is that when you use the system()function in R, you need to set intern=TRUE in order capture the output of the command. The default behavior of the system() function is intern=FALSE. Thus if you do not explicitly set intern=TRUE, then the exit code 0 will be returned from your system() call (or perhaps another exit code upon an error) and Sys.setenv() will set the environment variable to 0!
### intern=True missing ###
Sys.setenv(MLFLOW_BIN=system("which mlflow"))
Sys.setenv(MLFLOW_PYTHON_BIN=system("which python"))
Example output (you can see the the environment variables did not get set correctly):
s <- Sys.getenv()
s[grep("MLFLOW", names(s))]
MLFLOW_BIN 0
MLFLOW_CONDA_HOME /databricks/conda
MLFLOW_PYTHON_BIN 0
MLFLOW_PYTHON_EXECUTABLE
/databricks/python/bin/python
MLFLOW_TRACKING_URI databricks
However, when intern=TRUE, you'll get the correct environment variables:
### intern=True set ###
Sys.setenv(MLFLOW_BIN=system("which mlflow", intern=TRUE))
Sys.setenv(MLFLOW_PYTHON_BIN=system("which python", intern=TRUE))
Example output:
s <- Sys.getenv()
s[grep("MLFLOW", names(s))]
MLFLOW_BIN /databricks/python3/bin/mlflow
MLFLOW_CONDA_HOME /databricks/conda
MLFLOW_PYTHON_BIN /databricks/python3/bin/python
MLFLOW_PYTHON_EXECUTABLE
/databricks/python/bin/python
MLFLOW_TRACKING_URI databricks
Note: This was using Databricks runtime 9.1 LTS ML. This may or may not work on other Databricks runtime configurations.

Related

responding yes to terminal prompt via system2() in R

tl;dr: How can I invoke the system command y | conda create --name gee_interface from an R console, e.g. via system2()? I'm comfortable enough with system2('conda', c('create', '--name', 'gee_interface')), but I don't know how to handle piping in the 'y' via system2().
Details
I am trying to use an R console to run the bash command conda create --name gee_interface (OSX Mojave with Anaconda installed).
In terminal, that command executes just fine, but prompts me to answer with Proceed ([y]/n)? (I answer 'y' and everything works smoothly).
In R, I run
Sys.setenv(PATH = paste(c("/Applications/anaconda3/bin", Sys.getenv("PATH")), collapse = .Platform$path.sep)) # ensures that system2() finds conda
system2('conda', c('create', '--name', 'gee_interface')) # This is the key line for the purposes of this question
When running the second line [i.e. system2('conda', c('create', '--name', 'gee_interface'))], the process never finishes, but quickly falls to zero CPU usage. Presumably the system is waiting for my response to the prompt, but I don't know how to provide it. How does one do this via an R script? Note also that in my particular case, the number of times that I need to respond 'y' is variable, depending on whether an environment of the name gee_interface already exists or not.
The fix to your first problem is to tell conda not to ask for confirmation using -y:
system2('conda', c('create', '--name', 'gee_interface', '-y'))
As to the second part (variable times that your input is required), I'm guessing it's to overwrite the environment if it exists? In that case, you could check for its existence first with conda info --envs, and run conda remove --name gee_interface --all if necessary before creating it.
See:
https://docs.conda.io/projects/conda/en/latest/commands/create.html
https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#removing-an-environment
You could also try your system2 call, with the argument input = "y", but that doesn't fix your second problem of needing to affirm multiple times.
See: Invoke a system command and pipe a variable as an argument

Azure Machine Learning integration of R: Should the 'azureml' module have an attribute 'core'?

I'm having issues with Azure Machine Learning SDK for R: "module 'azureml' has no attribute 'core'"...
For reasons that aren't my own, I have to use azureml to apply machine learning (my own stuff, written in R) to data from our data warehouse that is put in the blob storage. The modelled output should be put back into the blob storage so it can be accessed from the data warehouse.
I've written the code in R on my local machine (stored in a git repo). Preferably, I'd find some method to pull my code from git into a pipeline in the azureml environment so that it can be directly run whenever new data is available in the blob storage.
I've embarked on a tutorial-spree and found this seemingly relevant walkthrough: Train and deploy your first model with Azure ML (and this one).
But... after trying all I could think of, I'm stuck on the first steps. After installing all (or at least.. that's what I think) packages, modules, apps etc, and running the following code in RStudio:
library(azuremlsdk)
existing_ws <- get_workspace(name = name,
subscription_id = subscription_id,
resource_group)
I run into an error that I haven't been able to fix:
AttributeError: module 'azureml' has no attribute 'core'
It seems that the azuerml is supposed to have an attribute "core", but when looking at it more closely, there is indeed no such attribute.
The function "get_workspace()" is trying to access: "azureml$core$Workspace$get".
I found that "azuerML$Workspace" does exist, but then I can't figure out how to make that work.
Can anyone explain to me why I'm encountering this error?
Does anyone know of a better tutorial on how to connect my R code the azureml's cloud service?
Any pointers in the right direction are much appreciated!
EDITS - still not solved:
After advice from others, I double, triple and quadruple checked the installation.
I updated R and I'm now running:
R.version
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 6.2
year 2019
month 12
day 12
svn rev 77560
language R
version.string R version 3.6.2 (2019-12-12)
nickname Dark and Stormy Night
I installed Conda with Python 3.6.10.
I installed the azuremlsdk R package (I tried both provided options).
I then realized that there are some inconsistencies with the versions of the azure-modules, so I also tried installing it with the keyword 'multi-arch':
remotes::install_cran('azuremlsdk', repos = 'http://cran.us.r-project.org', INSTALL_opts=c("--no-multiarch"))
Then, I installed the azureml python sdk.
I had a look at all the versions again (using python -m pip freeze):
azure-common==1.1.24
azure-graphrbac==0.61.1
azure-mgmt-authorization==0.60.0
azure-mgmt-containerregistry==2.8.0
azure-mgmt-keyvault==2.0.0
azure-mgmt-resource==7.0.0
azure-mgmt-storage==7.1.0
azureml==0.2.7
azureml-automl-core==1.0.83.1
azureml-core==1.0.69
azureml-dataprep==1.1.36
azureml-dataprep-native==13.2.0
azureml-pipeline==1.0.69
azureml-pipeline-core==1.0.69
azureml-pipeline-steps==1.0.69
azureml-sdk==1.0.69
azureml-telemetry==1.0.69
azureml-train==1.0.69
azureml-train-automl-client==1.0.83
azureml-train-core==1.0.69
azureml-train-restclients-hyperdrive==1.0.69
As I was surprised to see all the 1.0.69 versions, instead of the 1.0.83 versions, I re-installed the azureml python sdk using:
azuremlsdk::install_azureml(version = "1.0.83")
This worked, in the sense that indeed all versions are now 1.0.83:
azure-common==1.1.24
azure-graphrbac==0.61.1
azure-mgmt-authorization==0.60.0
azure-mgmt-containerregistry==2.8.0
azure-mgmt-keyvault==2.0.0
azure-mgmt-resource==7.0.0
azure-mgmt-storage==7.1.0
azureml==0.2.7
azureml-automl-core==1.0.83.1
azureml-core==1.0.83
azureml-dataprep==1.1.36
azureml-dataprep-native==13.2.0
azureml-pipeline==1.0.83
azureml-pipeline-core==1.0.83
azureml-pipeline-steps==1.0.83
azureml-sdk==1.0.83
azureml-telemetry==1.0.83
azureml-train==1.0.83
azureml-train-automl-client==1.0.83
azureml-train-core==1.0.83
azureml-train-restclients-hyperdrive==1.0.83
But still... I get the error with the missing core. I get it both when running:
library(azuremlsdk)
get_current_run()
and also when running:
library(azuremlsdk)
existing_ws <- get_workspace(name = name,
subscription_id = subscription_id,
resource_group)
Note that the first time running this code after starting up RStudio, I get the error:
Error in py_get_attr_impl(x, name, silent) :
AttributeError: module 'azureml' has no attribute '_base_sdk_common'
And every time after that I get this error:
Error in py_get_attr_impl(x, name, silent) :
AttributeError: module 'azureml' has no attribute 'core'
Any help would be much appreciated!
This issue was introduced by the latest reticulate 1.14 release, in which reticulate would create a default r-reticulate conda environment. Since Azure ML was installing the python SDK in an environment named r-azureml, the r-reticulate environment used by reticulate was missing the python SDK. A fix for this issue was addressed in a PR and has been merged into master. Please install from GitHub for now if you have reticulate version 1.14 and are running into this issue. We will be releasing an update to CRAN shortly.
I seemed to have fixed the issue by specifically installing the python package azureml AND azureml.core:
python -m pip install azureml
and then...
python -m pip install azureml.core
I did this for the Conda version that was called by R (r-reticulate). It's a bit odd to not be able to use the Conda environment 'r-azureml' without R switching back to 'r-reticulate', but ah well... at least I don't get my 'azureml' has no attribute 'core' anymore.

using RStudio with self compiled R

How can I get RStudio to recognize my version of R which is installed to
/opt/R/3.4.3/
by compiling it myself (make install) and ln -s /opt/R/${R_VERSION}/bin/R /bin/R. When executing on a shell, R works just fine. Only RStudio does not recognize the different path and is still looking at:
/usr/local/lib64/R/bin/exec/R
exact error message:
Feb 3 14:50:18 devbox systemd: Starting RStudio Server...
Feb 3 14:50:18 devbox systemd: Started RStudio Server.
Feb 3 14:50:18 devbox rserver[22411]: ERROR R did not return any output when queried for directory location information; LOGGED FROM: bool rstudio::core::r_util::<unnamed>::detectRLocationsUsingR(const std::string&, rstudio::core::FilePath*, rstudio::core::FilePath*, rstudio::core::config_utils::Variables*, std::string*) /root/rstudio/src/cpp/core/r_util/REnvironmentPosix.cpp:483
Feb 3 14:50:18 devbox rserver[22411]: ERROR system error 71 (Protocol error) [description=Unable to parse version from R, version-info=, r-error=/usr/local/lib64/R/bin/exec/R: error while loading shared libraries: libmkl_gf_lp64.so: cannot open shared object file: No such file or directory|||]; OCCURRED AT: rstudio::core::Error rstudio::core::r_util::rVersion(const rstudio::core::FilePath&, const rstudio::core::FilePath&, const std::string&, std::string*) /root/rstudio/src/cpp/core/r_util/REnvironmentPosix.cpp:784; LOGGED FROM: bool rstudio::core::r_util::detectREnvironment(const rstudio::core::FilePath&, const rstudio::core::FilePath&, const std::string&, std::string*, std::string*, rstudio::core::r_util::EnvironmentVars*, std::string*) /root/rstudio/src/cpp/core/r_util/REnvironmentPosix.cpp:678
I realized (see answer below) that R only worked as long as I did not loose the current bash environment. Executing:
source /opt/intel/mkl/bin/mklvars.sh intel64
fixes this. However, I cant get RStudio to execute this before starting up. I played around with ExecStartPre=/opt/intel/mkl/bin/mklvars.sh intel64, but it fails to get the environment up correctly
On Linux, RStudio Desktop and Open-Source Server use the version of R pointed to by the output of which R. If RStudio is unable to locate R using which R, it will fall back to scanning explicitly for the R script in the /usr/local/bin and /usr/bin directories.
If you want to override which version of R is used then you can set the RSTUDIO_WHICH_R environment variable to the R executable that you want to run against. For example:
export RSTUDIO_WHICH_R=/usr/local/bin/R
See RStudio Support: Using Different Versions of R
I manually need to load
source /opt/intel/mkl/bin/mklvars.sh intel64
into the environment for R to work, as otherwise links are broken and R won't start up which leads to RStudio complaining (with a not 100% helpful error message).

RStudio : Rook does not work?

I would like to build a simple webserver using Rook, however I am having strange errors when trying it in R-Studio:
The code
library(Rook)
s <- Rhttpd$new()
s$start()
print(s)
returns the rather useless error
"Error in listenPort > 0 :
comparison (6) is possible only for atomic and list types".
When trying the same code in a simple R-Console,everything works - so I would like to understand why that happens and how I can fix it.
RStudio is Version 0.99.484 and R is R 3.2.2
I've experienced same thing.
TLDR: This pull request solves the problem: https://github.com/jeffreyhorner/Rook/pull/31
RStudio is treated in different way and Rook port is same as tools:::httpdPort value. The problem is that in current Rook master tools:::httpdPort is assigned directly. It's a function that's why we need to evaluate it first.
If you want to have it solved right now, without waiting for merge into master: install devtools and load package from my fork #github.
install.packages("devtools")
library(devtools)
install_github("filipstachura/Rook")

External Scripting and R (Kognitio)

I have created the R script environment (used this command to create it "create script environment RSCRIPT command '/usr/local/R/bin/Rscript --vanilla --slave'") and tried running the one R script but it fails with the below error message.
ERROR: RS 10 S 332659 R 31A004F LO:Script stderr: external script vfork child: No such file or directory
Is it because of the below line which i am using in the script ?
mydata <- read.csv(file=file("stdin"), header=TRUE)
if (nrow(mydata) > 0){
I am not sure what is it expecting.
I have one more questions to ask.
1) do we need to install the R package on our unix box ? if not then the kognitio package has it
I suspect the problem here is that you have not installed the R environment on ALL the database nodes in your system - it must be installed on every DB node involved in processing (as explained in chapter 10 of the Kognitio Guide which you can download from http://www.kognitio.com/forums/viewtopic.php?t=3) or you will see errors like "external script vfork child: No such file or directory".
You would normally use a remote deployment tool (e.g. HP's RDP) to ensure the installation was identical on all DB nodes. Alternatively, you can leverage the Kognitio wxsync tool to synchronise files across nodes.
Section 10.6 of the Kognitio Guide also explains how to constrain which DB nodes are involved in processing - this is appropriate if your script environment should not run on all nodes for some reason (e.g. it has an expensive per-node/per-core licence). That does not seem appropriate for using R though.

Resources