azure ml metrics using log or parent.log()? - azure-machine-learning-studio

At the moment I am using log to track ml experiment metrics in azure. An Example of the output.
Run 1 mse=0.3
Run 2 mse=0.2
run 3 mse=0.1
However, I want one mse value that summarises the entire pipeline would parent_run.log allow me to do this?
Research material used
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-track-designer-experiments

totally! you can do this inside your training script you'd have
from azureml.core import Run
run = Run.get_context()
run.parent.log("mse_global", 0.3)
See the Remarks section of the Run class docs for more info!
You also might want to look into Run.log_list() or Run.log_table()

Related

how to include environment when submitting an automl experiment in azure machine learning

I use code like below to create an AutoML object to submit an experiment for classification training
automl_settings = {
"n_cross_validations": 2,
"primary_metric": 'accuracy',
"enable_early_stopping": True,
"experiment_timeout_hours": 1.0,
"max_concurrent_iterations": 4,
"verbosity": logging.INFO,
}
automl_config = AutoMLConfig(task = 'classification',
compute_target = compute_target,
training_data = train_data,
label_column_name = label,
**automl_settings
)
ws = Workspace.from_config()
experiment = Experiment(ws, "your-experiment-name")
run = experiment.submit(automl_config, show_output=True)
I want to include my conda yml file (like below) in my experiment submission.
env = Environment.from_conda_specification(name='myenv', file_path='conda_dependencies.yml')
However, I don't see any environment parameter in AutoMLConfig class documentation (similar to what environment parameter does in ScriptRunConfig) or find any example how to do so.
I notice after the experiment is submitted, I get message like this
Running on remote.
No run_configuration provided, running on aml-compute with default configuration
Is run_configuration used for specifying environment? If so, how do I provide run_configuration in my AutoML experiment run?
Thank you.
I figured out how to fix the issues associated with the sdk 1.19.0 upgrade in the AML environment I use, thus no need for the workaround (ie. pass in a SDK 1.18.0 conda environment file to AutoML experiment run) I was thinking about. My original question no longer needs an answer, I just want to add this note in case someone else has the same question later on.
I still don't know why AutoML experiment run has no option to pass in a conda environment file. It would be nice if a reason is given in the AML documentation.

LDA gensim model in a flask HTTP API - Memory issues

I am new to Machine Learning and it is the first time that I am using python's gensim in order to extract topics from text.
I successfully trained a model (for 100 topics) and then I had the idea to use that model in an HTTP API that I created using python flask. The endpoint gives as back terms for a given text.
Btw model is loaded when I initialize the API.
After trying this out on production, memory (on a small VM ~ 1GB Ram) exhausted and finally I got an error:
tags = tags + lda.topic_words(topic_index, num_of_keywords_for_topic, model, words)
File "/var/app/tagbee/lda.py", line 64, in topic_words
x2 = model.get_topic_terms(topicid=topic_index, topn=number_of_keywords)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/ldamodel.py", line 1224, in get_topic_terms
topic = self.get_topics()[topicid]
File "/usr/local/lib/python3.6/dist-packages/gensim/models/ldamodel.py", line 1204, in get_topics
topics = self.state.get_lambda()
File "/usr/local/lib/python3.6/dist-packages/gensim/models/ldamodel.py", line 269, in get_lambda
return self.eta + self.sstats
MemoryError: Unable to allocate 96.6 MiB for an array with shape (100, 253252) and data type float32
So I have some questions:
Can a gensim LDA model be used that way, mean in an HTTP API?
If yes, what is the trick to make it happen? If it needs at least 90MB of memory per request, how does it scale?
Is there any alternative approach?
Thank you in advance!
Your question seems to be related to LDA or gensim only accidentally. The main point seems to be how to and maintain (and reuse) an object in memory across a number of Flask requests.
Inspired by Flask documentation and answers from this question:
Flask - Store values in memory between requests I propose the following approach:
from flask import g # global context of all queries
def get_lda_model():
if 'lda' not in g:
g.lda = # read a model file here
return g.lda
#app.route('/example_request_path', methods =['POST'])
def my_request():
lda = get_lda_model()
# use lda model here....
Once the LDA model is loaded you can reuse it very quickly in a number of requests without reloading it into memory. As long as your model is not going to be changed across requests it does not matter if this approach is thread-safe.

MXNet Time-series Example - Dropout Error when running locally

I am looking into using MXNet LSTM modelling for time-series analysis for a problem i am currently working on.
As a way of understanding how to implement this, I am following the example code given by xnNet from the link: https://mxnet.incubator.apache.org/tutorials/r/MultidimLstm.html
When running this script after downloading the necessary data to my local source, i am able to execute the code fine until i get to the following section to train the model:
## train the network
system.time(model <- mx.model.buckets(symbol = symbol,
train.data = train.data,
eval.data = eval.data,
num.round = 100,
ctx = ctx,
verbose = TRUE,
metric = mx.metric.mse.seq,
initializer = initializer,
optimizer = optimizer,
batch.end.callback = NULL,
epoch.end.callback = epoch.end.callback))
When running this section, the following error occurs once gaining connection to the API.
Error in mx.nd.internal.as.array(nd) :
[14:22:53] c:\jenkins\workspace\mxnet\mxnet\src\operator\./rnn-inl.h:359:
Check failed: param_.p == 0 (0.2 vs. 0) Dropout is not supported at the moment.
Is there currently a problem internally within the XNNet R package which is unable to run this code? I can't imagine they would provide a tutorial example for the package that is not executable.
My other thought is that it is something to do with my local device execution and connection to the API. I haven't been able to find any information about this being a problem for other users though.
Any inputs or suggestions would be greatly appreciated thanks.
Looks like you're running an old version of R package. I think following instructions on this page to build a recent R-package should resolve this issue.

Use Azure custom-vision trained model with tensorflow.js

I've trained a model with Azure Custom Vision and downloaded the TensorFlow files for Android
(see: https://learn.microsoft.com/en-au/azure/cognitive-services/custom-vision-service/export-your-model). How can I use this with tensorflow.js?
I need a model (pb file) and weights (json file). However Azure gives me a .pb and a textfile with tags.
From my research I also understand that there are also different pb files, but I can't find which type Azure Custom Vision exports.
I found the tfjs converter. This is to convert a TensorFlow SavedModel (is the *.pb file from Azure a SavedModel?) or Keras model to a web-friendly format. However I need to fill in "output_node_names" (how do I get these?). I'm also not 100% sure if my pb file for Android is equal to a "tf_saved_model".
I hope someone has a tip or a starting point.
Just parroting what I said here to save you a click. I do hope that the option to export directly to tfjs is available soon.
These are the steps I did to get an exported TensorFlow model working for me:
Replace PadV2 operations with Pad. This python function should do it. input_filepath is the path to the .pb model file and output_filepath is the full path of the updated .pb file that will be created.
import tensorflow as tf
def ReplacePadV2(input_filepath, output_filepath):
graph_def = tf.GraphDef()
with open(input_filepath, 'rb') as f:
graph_def.ParseFromString(f.read())
for node in graph_def.node:
if node.op == 'PadV2':
node.op = 'Pad'
del node.input[-1]
print("Replaced PadV2 node: {}".format(node.name))
with open(output_filepath, 'wb') as f:
f.write(graph_def.SerializeToString())
Install tensorflowjs 0.8.6 or earlier. Converting frozen models is deprecated in later versions.
When calling the convertor, set --input_format as tf_frozen_model and set output_node_names as model_outputs. This is the command I used.
tensorflowjs_converter --input_format=tf_frozen_model --output_json=true --output_node_names='model_outputs' --saved_model_tags=serve path\to\modified\model.pb folder\to\save\converted\output
Ideally, tf.loadGraphModel('path/to/converted/model.json') should now work (tested for tfjs 1.0.0 and above).
Partial answer:
Trying to achieve the same thing - here is the start of an answer - to make use of the output_node_names:
tensorflowjs_converter --input_format=tf_frozen_model --output_node_names='model_outputs' model.pb web_model
I am not yet sure how to incorporate this into same code - do you have anything #Kasper Kamperman?

How to save a model when using MXnet

I am using MXnet for training a CNN (in R) and I can train the model without any error with the following code:
model <- mx.model.FeedForward.create(symbol=network,
X=train.iter,
ctx=mx.gpu(0),
num.round=20,
array.batch.size=batch.size,
learning.rate=0.1,
momentum=0.1,
eval.metric=mx.metric.accuracy,
wd=0.001,
batch.end.callback=mx.callback.log.speedometer(batch.size, frequency = 100)
)
But as this process is time-consuming, I run it on a server during the night and I want to save the model for the purpose of using it after finishing the training.
I used:
save(list = ls(), file="mymodel.RData")
and
mx.model.save("mymodel", 10)
But none of them can save the model! for example when I load the "mymodel.RData", I can not predict the labels for the test set!
Another example is when I load the "mymodel.RData" and try to plot it with the following code:
graph.viz(model$symbol$as.json())
I get the following error:
Error in model$symbol$as.json() : external pointer is not valid
Can anybody give me a solution for saving and then loading this model for future use?
Thanks
You can save the model by
model <- mx.model.FeedForward.create(symbol=network,
X=train.iter,
ctx=mx.gpu(0),
num.round=20,
array.batch.size=batch.size,
learning.rate=0.1,
momentum=0.1,
eval.metric=mx.metric.accuracy,
wd=0.001,
epoch.end.callback=mx.callback.save.checkpoint("model_prefix")
batch.end.callback=mx.callback.log.speedometer(batch.size, frequency = 100)
)
A mxnet model is an R list, but its first component is not an R object but a C++ pointer and can't be saved and reloaded as an R object. Therefore, the model needs to be serialized to behave as an actual R object. The serialized object is also a list, but its first object is a text string containing model information.
To save a model:
modelR <- mx.serialize(model)
save(modelR, file="~/model1.RData")
To retrieve it and use it again:
load("~/model1.RData", verbose=TRUE)
model <- mx.unserialize(modelR)
The best practice for saving a snapshot of your training progress is to use save_snapshot (http://mxnet.io/api/python/module.html#mxnet.module.Module.save_checkpoint) as part of the callback after every epoch training. In R the equivalent command is probably mx.callback.save.checkpoint, but I'm not using R and not sure about it usage.
Using these snapshots can also allow you to take advantage of the low cost option of using AWS Spot market (https://aws.amazon.com/ec2/spot/pricing/ ), which for example now offers and instance with 16 K80 GPUs for $3.8/hour compare to the on-demand price of $14.4. Such 80%-90% discount is common in the spot market and can optimize the speed and cost of your training, as long as you use these snapshots correctly.

Resources