AML - Web service TimeoutError - azure-machine-learning-studio

We created a webservice endpoint and tested it with the following code, and also with POSTMAN.
We deployed the service to an AKS in the same resource group and subscription as the AML resource.
UPDATE: the attached AKS had a custom networking configuration and rejected external connections.
import numpy
import os, json, datetime, sys
from operator import attrgetter
from azureml.core import Workspace
from azureml.core.model import Model
from azureml.core.image import Image
from azureml.core.webservice import Webservice
from azureml.core.authentication import AzureCliAuthentication
cli_auth = AzureCliAuthentication()
# Get workspace
ws = Workspace.from_config(auth=cli_auth)
# Get the AKS Details
try:
with open("../aml_config/aks_webservice.json") as f:
config = json.load(f)
except:
print("No new model, thus no deployment on AKS")
# raise Exception('No new model to register as production model perform better')
sys.exit(0)
service_name = config["aks_service_name"]
# Get the hosted web service
service = Webservice(workspace=ws, name=service_name)
# Input for Model with all features
input_j = [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]]
print(input_j)
test_sample = json.dumps({"data": input_j})
test_sample = bytes(test_sample, encoding="utf8")
try:
prediction = service.run(input_data=test_sample)
print(prediction)
except Exception as e:
result = str(e)
print(result)
raise Exception("AKS service is not working as expected")
In AML Studio, the deployment state is "Healthy".
We get the following error when testing:
Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'
Log just after deploying the AKS Webservice here.
Log after running the test script here.
How can we know what is causing this problem and fix it?

Did you try service.get_logs(). Please also try a local deployment first. https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-local-container-notebook-vm

I'm not sure what's the difference between Webservice and AKSWebservice, but give the AKS variant a try link. I would also try to isolate whether this is an AKS issue by deploying through ACI and validating your dependencies and scoring script.

We checked the AKS networking configuration and realized it has an Azure CNI profile.
In order to test the webservice we need to do it from inside the created virtual network.
It worked well!

Related

Using Apache Airflow Tool, Implement a DAG for a batch processing pipeline to get a directory from a remote system

Using Apache airflow tool, how can I implement a DAG for the following Python code. The task accomplished in the code is to get a directory from GPU server to local system. Code is working fine in Jupyter notebook. Please help to implement in Airflow...I'm very new to this. Thanks.
import pysftp
import os
myHostname = "hostname"
myUsername = "username"
myPassword = "pwd"
with pysftp.Connection(host=myHostname, username=myUsername, password=myPassword) as sftp:
print("Connection successfully stablished ... ")
src = '/path/src/'
dst = '/home/path/path/destination'
os.mkdir(dst)
sftp.get_d(src, dst, preserve_mtime=True)
print("Fetched source images from GPU server to local directory")
# connection closed automatically at the end of the with-block```
For SFTP duties, Airflow provides SFTOperator that you can use directly.
Alternatively it's corresponding SFTPHook can be used with a simple PythonOperator
I acknowledge there aren't many examples, but this might be helpful
For SSH-connection, see this

Running Azure Machine Learning Service pipeline locally

I'm using Azure Machine Learning Service with the azureml-sdk python library.
I'm using azureml.core version 1.0.8
I'm following this https://learn.microsoft.com/en-us/azure/machine-learning/service/how-to-create-your-first-pipeline tutorial.
I've got it working when I use Azure Compute resources. But I would like to run it locally.
I get the following error
raise ErrorResponseException(self._deserialize, response)
azureml.pipeline.core._restclients.aeva.models.error_response.ErrorResponseException: (BadRequest) Response status code does not indicate success: 400 (Bad Request).
Trace id: [uuid], message: Can't build command text for [train.py], moduleId [uuid] executionId [id]: Assignment for parameter Target is not specified
My code looks like:
run_config = RunConfiguration()
compute_target = LocalTarget()
run_config.target = LocalTarget()
run_config.environment.python.conda_dependencies = CondaDependencies(conda_dependencies_file_path='environment.yml')
run_config.environment.python.interpreter_path = 'C:/Projects/aml_test/.conda/envs/aml_test_env/python.exe'
run_config.environment.python.user_managed_dependencies = True
run_config.environment.docker.enabled = False
trainStep = PythonScriptStep(
script_name="train.py",
compute_target=compute_target,
source_directory='.',
allow_reuse=False,
runconfig=run_config
)
steps = [trainStep]
# Build the pipeline
pipeline = Pipeline(workspace=ws, steps=[steps])
pipeline.validate()
experiment = Experiment(ws, 'Test')
# Fails, locally, works on Azure Compute
run = experiment.submit(pipeline)
# Works both locally and on Azure Compute
src = ScriptRunConfig(source_directory='.', script='train.py', run_config=run_config)
run = experiment.submit(src)
The train.py is a very simple self contained script only dependent on numpy that approximates pi.
Local compute cannot be used with ML Pipelines. Please see this article.
Training on you local machine (for instance during development) is possible and very easy according to the documentation: how-to-set-up-training-targets
I did this on my windows computer as follows:
Define the local environment:
sklearn_env = Environment("user-managed-env")
sklearn_env.python.user_managed_dependencies = True
# You can choose a specific Python environment by pointing to a Python path
sklearn_env.python.interpreter_path = r'C:\Dev\tutorial\venv\Scripts\python.exe'
And compute_target='local' seems to be the magic word to direct a script to my local environment.
src = ScriptRunConfig(source_directory=script_folder,
script='train_iris.py',
arguments=[dataset.as_named_input('iris')],
compute_target='local',
environment=sklearn_env)
I will then need to make sure that my local environment has all the dependencies that the script needs.
Additionally I needed to install these packages on my local machine:
azureml-defaults
packaging

Azure Virtual Machine SQL Server 2016 R service Error :Unable to launch runtime for 'R' script

I am using R services in SQL Server 2016 SP1 with machine learning Server 9.2.1 in an Azure Virtual Machine (I have installed both SQL Server and Machine learning server in Virtual Machine). Everything was going fine.
After changing few network configuration of Virtual Machine it shows error in R services. 1st it shows Launchpad service can not run. Then I try to modify SQL Server 2016 (in modification it shows r service failed). After modify launchpad working but if I execute any SQL R script:
EXEC sp_execute_external_script
#language = N'R'
, #script = N'
X<-3'
I get this error:
Msg 39021, Level 16, State 1, Line 0
Unable to launch runtime for 'R' script. Please check the configuration of the 'R' runtime.
Msg 39019, Level 16, State 1, Line 0
An external script error occurred:
Unable to launch the runtime. ErrorCode 0x80070718: 1816(Not enough quota is available to process this command.).

undefined symbol in psycopg only when using uwsgi

I have an application built on CherryPy. I've been using the CherryPy built-in HTTP server but now want to move to uWSGI (and nginx).
On my workstation this works fine and the change was simple enough to make.
However on my test server I get the following error.
File "./validator/dbtools.py", line 3, in <module>
import psycopg2
File "/home/apiuser/API25/env/lib64/python3.5/site-packages/psycopg2/__init__.py", line 50, in <module>
from psycopg2._psycopg import BINARY, NUMBER, STRING, DATETIME, ROWID
ImportError: /home/apiuser/API25/env/lib64/python3.5/site-packages/psycopg2/_psycopg.cpython-35m-x86_64-linux-gnu.so: undefined symbol: lo_truncate64
unable to load app 0 (mountpoint='') (callable not found or import error)
The test server is different (CentOS 7 vs an Ubuntu based distribution on my workstation). When I google the error I see reports from similar errors but in different libraries. The psycopg library didn't change, in fact the two versions of the app (CherryPy / uWSGI version) runs from the same VirtualEnv. Running it as a CherryPy service works still fine.
I'm new to all of this so any help will be appreciated!
EDIT -- In response to the Question by Piotr: I star the app with a small bash script like this:
#!/bin/bash
. env/bin/activate
uwsgi --socket 127.0.0.1:8030 --protocol=http --wsgi-file wsgi.py --callable wsgiapp
Two other things I have picked up:
1. I set up a new test server running CentOs. On this new server the uwsgi app starts without the undefined symbol error. I would still like to know what causes it as a matter of learning about this.
2. Some of the API endpoints include some functions launched into background threads. Those work fine when running the app behind CherryPy's default server. When running the app using uWSGI the background threads doesn't execute but appears to get queued up. When I terminate the app (Control-C on the console) those threads suddenly run and those tasks (particularly push notifications) will go through.

Access Violation Error After Cx_Freeze/py2exe Qt5.5.1 App

I used QtDesigner for designing GUI and use pyuic5 for generating python code. After freezing it with cxfreeze or py2exe, app crashes on setupUI function, which is pyuic5 generated. When running app from code, it works as it supposed to. I use python 3.4 (Within Anaconda Distribution) and Qt 5.5.1
The first piece of code that app brokes (which is on setupUI method of main UI):
#Add a QWidget to QTabWidget
self.ModuleListingWidget.addTab(self.MarketingTab, "")
When i use dependencywalker and try to profile the resulted exe it logs following errors:
00:00:00.000: Started "dist\MAIN.EXE" (process 0x2938) at address 0x000C0000. Cannot hook module.
00:00:00.000: Loaded "c:\windows\system32\NTDLL.DLL" at address 0x772C0000. Cannot hook module.
00:00:00.047: Loaded "c:\windows\syswow64\KERNEL32.DLL" at address 0x76CA0000. Cannot hook module.
00:00:00.047: Loaded "c:\windows\syswow64\KERNELBASE.DLL" at address 0x74BF0000. Cannot hook module.
...
00:00:00.125: First chance exception 0xC0000005 (Access Violation) occurred at address 0x721E5B10.
00:00:00.125: Second chance exception 0xC0000005 (Access Violation) occurred at address 0x721E5B10.
Both py2exe and cxfreeze gives same error, 0xC0000005 (Access Violation).
How can i fix this problem, or understand more about the problem itself?
Here is the cxfreeze config:
import sys
from cx_Freeze import setup, Executable
base = "Win32GUI"
path_platforms = ( ".\platforms\qwindows.dll", "platforms\qwindows.dll" )
build_options = {"packages": ['atexit'], "include_files" : [ path_platforms ]}
setup(
name = "myapp",
version = "0.1",
description = "Sample cx_Freeze script",
options = {"build_exe" : build_options},
executables = [Executable("main.py", base = base)]
)
and py2Exe config:
from distutils.core import setup
import py2exe
import sys
sys.argv.append('py2exe')
import glob
import src.MyProj
setup(console=['main.py'], options = {"py2exe": {"typelibs":
# typelib for WMI
[('{565783C6-CB41-11D1-8B02-00600806D9B6}', 0, 1, 2)],
# create a compressed zip archive
"compressed": True,
'bundle_files': 1,
"optimize": 2,
"includes":['sip', "PyQt5", 'PyQt5.QtCore', 'PyQt5.QtWidgets', 'PyQt5.QtGui', 'PyQt5.QtSql']}},
# The lib directory contains everything except the executables and the python dll.
# Can include a subdirectory name.
zipfile = "./shared.zip",
data_files=[
('Images', glob.glob('Images/*.*')),
('sqldrivers', ('C:/Users/user/src/dist/plugins/qsqlmysql.dll',)),
('/c/Python34/Lib/site-packages/PyQt5', ['C:/Python34/Lib/site-packages/PyQt5/Qt5Core.dll'])])
I could not find and solution for that, probably dependency error. Instead i used pyinstaller, and it works like charm.

Resources