How do you reference the models repository in Azure Machine Learning from within side a python script step? - azure-machine-learning-studio

I know there's a $MODEL_VERSION variable when you create a scoring script using AKS but how about for a script task (example python script task) but I can't find documentation on how to deserialize a model into object from within script step running on a Linux AML computer cluster.
Is there a way to use models I've published to models tab in Workspace (say name is my model) from within a python script step?
For example in this code snippet:
import job lib
model = joblib.load(file_path + "mymodel")
I'm looking for what relative or absolute NIX path to use for file_path during a run where mymodel has already been published to the Workspace.

You can interact with the registered models via the AML SDK.
Considering that you have the SDK installed/authenticated, or that you are running this on an AML compute, you can use the following code to get the workspace:
from azureml.core import Workspace, Model
ws = Workspace.from_config()
Once you have the workspace, you can list all the models:
for model in Model.list(ws):
print(model.name, model.version)
And for a specific model, you can get the path and load it back to memory with joblib:
import joblib
model_path = Model.get_model_path(model_name = 'RegressionDecisionTree', version = 1, _workspace= ws)
model_obj = joblib.load(model_path)

Related

Register Trained Model in Azure Machine Learning

I'm training a Azure Machine learning model using script via python SDK. I'm able to see the environment creation and the model getting trained in std_log in output&logs folder. After the Model training I try to dump the model, but I don't see the model in any folder.
If possible I want to register the model directly into the Model section in Azure ML rather than dumping it in the pickle file.
I'm using the following link for reference https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-sdk-train
Below is the output log snapshot for the model training run

Register model from Azure Machine Learning run without downloading to local file

A model was trained on a remote compute using azureml.core Experiment as follows:
experiment = Experiment(ws, name=experiment_name)
src = ScriptRunConfig(<...>)
run = experiment.submit(src)
run.wait_for_completion(show_output=True)
How can a model trained in this run be registered with Azure Machine Learning workspace without being downloaded to a local file first?
The model can be registered using register_model method available on the run object (click the link for documentation).
Example:
model = best_run.register_model(model_name='sklearn-iris', model_path='outputs/model.joblib')
The following notebook can also be used as an example for setting up training experiments and registering models obtained as a result of experiment runs.

Can R work with AWS S3 within a AWS Lambda function?

I'm looking for an approach to run a simulation algorithm developed in R in AWS. The input for the R model will come from S3 and the output will need to be written back to S3. The data scientist group at my organization are R experts and my organization has identified as AWS as the enterprise cloud platform. Given this, I need to find a way to run the R simulation model in AWS.
I saw this blog (https://aws.amazon.com/blogs/compute/analyzing-genomics-data-at-scale-using-r-aws-lambda-and-amazon-api-gateway/) which talks about using Lambda to run R code within Ptyhon using rpy2 package. I plan to follow the same approach. I was planning to implement the Lambda function as below-
1) Read input files from S3 to write to local lambda storage (/tmp). This will done using Python boto3 SDK.
2) Invoke R algorithm using rpy2. I plan to save the R algorithm as a .RDS file in S3 and load it using rpy2 and then run the R algorithm. R algorithm would write the output back to local lambda storage.
3) Write output from lambda storage to S3. Again this will be done using Python boto3 SDK.
As you can see, Python is used to interact with S3 and bring files to local lambda storage. R will read from local lambda storage and run the simulation algorithm. All R code will be wrapped within rpy2 in the lambda function. I was planning out this way because I was not sure if R can work with S3 directly.
I now realize that Lambda local storage is limited to 500 MB and I doubt if the input & output files will remain within this limit. I'm now trying to see if R can work directly with S3 within Lambda. If this is possible, I don't have to bring the files to local lambda storage and hence will not run out of space. Again, the R - S3 interaction will need to be wrapped inside rpy2. Is there a way to achieve this? Can the R Cloudyr library work in this scenario? Although I see examples of Cloudyr interacting with S3, but I don't see any example of this usage within Lambda using rpy2.
Any thoughts please?

Save and deploy R model in Watson Studio

I've developed a little model in RStudio in a Watson Studio environment in IBM Cloud (https://dataplatform.cloud.ibm.com).
I'm trying to save the model in RStudio and deploy it in Watson to publish it as an API, but I'm not finding the way to save it in RStudio.
Is it possible to do what I'm trying to do in the current version?
I've found the following documentation, but I guess it refers to a different version of Watson Studio:
https://content-dsxlocal.mybluemix.net/docs/content/SSAS34_current/local-dev/ml-r-models.htm
I couldn't find a way to save the model through Watson Studio functionality.
However, I was able to export it in PMML format using the R pmml library and then deploying the PMML as a service.
install.packages("pmml")
library(pmml)
pmml(LogModel, model.name = "Churn_Logistic_Regression_Model", app.name = "Churn_LogReg", description = "Modelo de Regresion para Demo", copyright = NULL, transforms = NULL, unknownValue = NULL, weights = NULL)
Some further documentation:
https://www.rdocumentation.org/packages/pmml/versions/1.5.7/topics/pmml.glm
Adding to what Gabo has answered with perspective from Watson studio and answering the deploying part of IBM Watson Machine Learning.
what you need to do is first convert the model using pmml
Ex. Run following code in Rstudio on Watson Studio or in R Notebook in Watson Studio.
install.packages("nnet")
library(nnet)
ird <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),
species = factor(c(rep("s",50), rep("c", 50), rep("v", 50))))
samp <- c(sample(1:50,25), sample(51:100,25), sample(101:150,25))
ir.nn2 <- nnet(species ~ ., data = ird, subset = samp, size = 2, rang = 0.1,
decay = 5e-4, maxit = 200)
install.packages("pmml")
library(pmml)
pmmlmodel <- pmml(ir.nn2)
saveXML(pmmlmodel,file = "IrisNet.xml")
The saveXML() will generate / write IrisNet.xml file to Rstudio or local space of R Notebook, you need to download this file to your local machine.
Now to deploy it to Watson machine learning service, follow following:-
Now Click Add to Project in Watson Studio Project -> Watson Machine Learning Model , Name your model and then select WML service that you want to use
Select From File Tab
Drag and drop the xml file
Click Create and it will save it to WML service that you have selected.
Now you can deploy this model to WML service using Deployment tab
Simply name your deployment and Click Save
Now you have deployed model and you can start consuming via REST API.

Refresh the dataset in Azure machine learning

I have an experiment (exp) which is published as a web service (exp [Predictive Exp.]) in the azure machine learning studio, the data used by this experiment was pushed by R using AzureML package
library(AzureML)
ws <- workspace(
id = 'xxxxxxxxx',
auth = 'xxxxxxxxxxx'
)
upload.dataset(data_for_azure, ws, "data_for_azure")
The above thing worked, but lets say I want to update the dataset(same schema just added more rows)
I tired this but this does not work:
delete.datasets(ws, "data_for_azure")
refresh(ws, what = c("everything", "data_for_azure", "exp", "exp [Predictive Exp.]"))
I get the error stating the following:
Error: AzureML returns error code:
HTTP status code : 409
Unable to delete dataset due to lingering dependants
I went through the documentation, and I know that a simple refresh is not possible(same name), the only alternative I see is to delete the web service and perform everything again
Any solution will be greatly helped!
From the R doc.
The AzureML API does not support uploads for replacing datasets with
new data by re-using a name. If you need to do this, first delete the
dataset from the AzureML Studio interface, then upload a new version.
Now, I think this is particular for the R sdk, as the Python SDK, and the AzureML Studio UI lets you upload a new dataset. Will check in with the R team about this.
I would recommend uploading it as a new dataset with a new name, and then replacing the dataset in your experiment with this new dataset. Sorry this seem's round about, but I think is the easier option.
Unless you want to upload the new version using the AzureML Studio, in which case go to +NEW, Dataset, select your file and select the checkbox that says this is an existing dataset. The filename should be the same.

Resources