Running Azure Machine Learning Service pipeline locally - azure-machine-learning-workbench

I'm using Azure Machine Learning Service with the azureml-sdk python library.
I'm using azureml.core version 1.0.8
I'm following this https://learn.microsoft.com/en-us/azure/machine-learning/service/how-to-create-your-first-pipeline tutorial.
I've got it working when I use Azure Compute resources. But I would like to run it locally.
I get the following error
raise ErrorResponseException(self._deserialize, response)
azureml.pipeline.core._restclients.aeva.models.error_response.ErrorResponseException: (BadRequest) Response status code does not indicate success: 400 (Bad Request).
Trace id: [uuid], message: Can't build command text for [train.py], moduleId [uuid] executionId [id]: Assignment for parameter Target is not specified
My code looks like:
run_config = RunConfiguration()
compute_target = LocalTarget()
run_config.target = LocalTarget()
run_config.environment.python.conda_dependencies = CondaDependencies(conda_dependencies_file_path='environment.yml')
run_config.environment.python.interpreter_path = 'C:/Projects/aml_test/.conda/envs/aml_test_env/python.exe'
run_config.environment.python.user_managed_dependencies = True
run_config.environment.docker.enabled = False
trainStep = PythonScriptStep(
script_name="train.py",
compute_target=compute_target,
source_directory='.',
allow_reuse=False,
runconfig=run_config
)
steps = [trainStep]
# Build the pipeline
pipeline = Pipeline(workspace=ws, steps=[steps])
pipeline.validate()
experiment = Experiment(ws, 'Test')
# Fails, locally, works on Azure Compute
run = experiment.submit(pipeline)
# Works both locally and on Azure Compute
src = ScriptRunConfig(source_directory='.', script='train.py', run_config=run_config)
run = experiment.submit(src)
The train.py is a very simple self contained script only dependent on numpy that approximates pi.

Local compute cannot be used with ML Pipelines. Please see this article.

Training on you local machine (for instance during development) is possible and very easy according to the documentation: how-to-set-up-training-targets
I did this on my windows computer as follows:
Define the local environment:
sklearn_env = Environment("user-managed-env")
sklearn_env.python.user_managed_dependencies = True
# You can choose a specific Python environment by pointing to a Python path
sklearn_env.python.interpreter_path = r'C:\Dev\tutorial\venv\Scripts\python.exe'
And compute_target='local' seems to be the magic word to direct a script to my local environment.
src = ScriptRunConfig(source_directory=script_folder,
script='train_iris.py',
arguments=[dataset.as_named_input('iris')],
compute_target='local',
environment=sklearn_env)
I will then need to make sure that my local environment has all the dependencies that the script needs.
Additionally I needed to install these packages on my local machine:
azureml-defaults
packaging

Related

Why is gmailr not working in docker build process?

I'm using the gmailr package for sending emails from a r script.
Locally it's all working fine, but when I try to run this during a docker build step in google cloud I'm getting an error.
I implemented it in the following way described here.
So basically, locally the part of my code for sending emails looks like this:
gm_auth_configure(path = "credentials.json")
gm_auth(email = TRUE, cache = "secret")
gm_send_message(buy_email)
Please note, that I renamed the .secret folder to secret, because I want to deploy my script with docker in gcloud and didn't want to get any unexpected errors due to the dot in the folder name.
This is the code, which I'm now trying to run in the cloud:
setwd("/home/rstudio/")
gm_auth_configure(path = "credentials.json")
options(
gargle_oauth_cache = "secret",
gargle_oauth_email = "email.which.was.used.to.get.secret#gmail.com"
)
gm_auth(email = "email.which.was.used.to.get.secret#gmail.com")
When running this code in a docker build process, I'm receiving the following error:
Error in gmailr_POST(c("messages", "send"), user_id, class = "gmail_message", :
Gmail API error: 403
Request had insufficient authentication scopes.
Calls: gm_send_message -> gmailr_POST -> gmailr_query
I can reproduce the error locally, when I do not check the
following box.
Therefore my first assumption is, that the secret folder is not beeing pushed correctly in the docker build process and that the authentication tries to authenticate again, but in a non interactive-session the box can't be checked and the error is thrown.
This is the part of the Dockerfile.txt, where I'm pushing the files and running the script:
#2 ADD FILES TO LOCAL
COPY . /home/rstudio/
WORKDIR /home/rstudio
#3 RUN R SCRIPT
CMD Rscript /home/rstudio/run_script.R
and this is the folder, which contains all files / folders beeing pushed to the cloud.
My second assumption is, that I have to somehow specify the scope to use google platform for my docker image, but unfortunately I'm no sure where to do that.
I'd really appreciate any help! Thanks in advance!
For anyone experiencing the same problem, I was finally able to find a solution.
The problem is that GCE auth is set by the "gargle" package, instead of using the "normal user OAuth flow".
To temporarily disable GCE auth, I'm using the following piece of code now:
library(gargle)
cred_funs_clear()
cred_funs_add(credentials_user_oauth2 = credentials_user_oauth2)
gm_auth_configure(path = "credentials.json")
options(
gargle_oauth_cache = "secret",
gargle_oauth_email = "sp500tr.cloud#gmail.com"
)
gm_auth(email = "email.which.was.used.for.credentials.com")
cred_funs_set_default()
For further references see also here.

Is CPlex Optimization Studio 12.9.0 compatible with Python notebooks for APIs. If yes, which version of Python?

I have installed Cplex (Optimization Studio 12.9.0 - Community Edition) and need to write Python APIs in it.
After installing setup.py as per https://www.ibm.com/support/knowledgecenter/SSSA5P_12.6.2/ilog.odms.cplex.help/CPLEX/GettingStarted/topics/set_up/Python_setup.html?view=embed,
I get error that
DOcplexException: CPLEX runtime not found: please install CPLEX or
solve this model on DOcplexcloud
How can i solve this error?
Have you set the Python path environment variable PYTHONPATH to the value of yourCplexhome/python/VERSION/PLATFORM?
Or you could try to use docplexcloud. For instance the following example from https://www.ibm.com/developerworks/community/forums/html/topic?id=80146d62-1e2b-490e-b5f8-6fbf38a51e18&ps=25
from docplex.mp.model import Model
from docplex.mp.context import Context
url = "https://api-oaas.docloud.ibmcloud.com/job_manager/rest/v1"
key = "YOUR API KEY"
ctx = Context.make_default_context(url=url, key=key)
mdl = Model(name='buses',context=ctx)
mdl.nbbus40 = mdl.integer_var(name='nbBus40')
mdl.nbbus30 = mdl.integer_var(name='nbBus30')
mdl.add_constraint(mdl.nbbus40*40 + mdl.nbbus30*30 >= 300, 'kids')
mdl.minimize(mdl.nbbus40*500 + mdl.nbbus30*400)
mdl.solve()
print(mdl.nbbus40.solution_value);
print(mdl.nbbus30.solution_value);
This works fine.

RDotNet (R.NET) issues in Azure Functions

I have an Azure Function that executes R code by using RDotNet library (R.Net)
Everything works fine on my local environment, but when I deploy my code to azure, the process to load specific libraries (zoo, TTR) never ends.
This is my code:
string rHome = #"D:\home\site\wwwroot\R\R-3.4.4";
string rPath = Path.Combine(rHome, System.Environment.Is64BitProcess ?
#"bin\x64" : #"bin\i386");
REngine.SetEnvironmentVariables(rPath, rHome);
if (engine == null)
{
engine = REngine.GetInstance();
}
engine.Evaluate(#"library(data.table)");
engine.Evaluate(#"library(RODBC)");
engine.Evaluate(#"library(nlstools)");
engine.Evaluate(#"library(minpack.lm)");
engine.Evaluate(#"library(zoo)");
engine.Evaluate(#"library(TTR)");
The first 4 libraries are loaded without any problem, but when the program tries to load zoo library this process never ends and I cannot continue executing the rest of my code.
No error is displayed so the Azure Function is restarted after some minutes.
Same thing happens with library TTR
Any idea on what could be the cause of these symptoms?

H2O .savemodel on network path not working

We have h2o cluster on linux machine (ran through command line), and we are connecting it from our local machine (Windows) which is on the same network.
When we try to call saveModel we are getting errors.
ERROR: Unexpected HTTP Status code: 412 Precondition Failed (url = http://10.0.0.4:54321/99/Models.bin/)
//Code to load model from local machine,IDE- Rstudio
ModelName <- "GLM_model_R_1522217891094_1279"
modelpath <-file.path("file://dsvm-dev/Models",ModelName)
Model.h2o <- h2o.loadModel(modelpath)
//to save model
h2o.saveModel(object = best_model,path = "file://dsvm-dev/Models2/",force = TRUE)
Please suggest any alternate ways to save model to local machine(Windows) instead of on server. Also while saving it on server it saves to library folder, because we do not have access to machine.

How to submit jobs to spark master running locally

I am using R and spark to run a simple example to test spark.
I have a spark master running locally using the following:
spark-class org.apache.spark.deploy.master.Master
I can see the status page at http://localhost:8080/
Code:
system("spark-submit --packages com.databricks:spark-csv_2.10:1.0.3 --master local[*]")
suppressPackageStartupMessages(library(SparkR)) # Load the library
sc <- sparkR.session(master = "local[*]")
df <- as.DataFrame(faithful)
head(df)
Now this runs fine when I do the following (code is saved as 'sparkcode'):
Rscript sparkcode.R
Problem:
But what happens is that a new spark instance is created, I want the R to use the existing master instance (should see this as a completed job http://localhost:8080/#completed-app)
P.S: using Mac OSX , spark 2.1.0 and R 3.3.2
A number of things:
If you use standalone cluster use correct url which should be sparkR.session(master = "spark://hostname:port"). Both hostname and port depend on the configuration but the standard port is 7077 and hostname should default to hostname. This is the main problem.
Avoid using spark-class directly. This is what $SPARK_HOME/sbin/ scripts are for (like start-master.sh). There are not crucial but handle small and tedious tasks for you.
Standalone master is only resource manager. You have to start worker nodes as well (start-slave*).
It is usually better to use bin/spark-submit though it shouldn't matter much here.
spark-csv is no longer necessary in Spark 2.x and even if it was Spark 2.1 uses Scala 2.11 by default. Not to mention 1.0.3 is extremely old (like Spark 1.3 or so).

Resources