Azure Wide and Deep Recommender real inference pipeline error - Invalid graph: You have required input port(s) unconnected - azure-machine-learning-studio

It seems like there is bug in create real-time endpoints for "Wide & Deep Recommender" module at least with the sample workflow. I kept getting "Invalid graph: You have requested input port(s) unconnected". Does anyone know how to get around this issue?
Repro Steps:
Go to Azure ML -> Designer -> "Wide & Deep based Recommendation - Restaurant"
Train the model -> Create "Real-time inference pipeline" -> in Real-time inference pipeline, click "Submit" -> Error occurs

you can now register a trained model in Designer into the model registry and use automatically generated score.py/conda yaml to deploy it using SDK or CLI. After successful model training, publishing to inference cluster in authenticating to the AKS cluster and consume.

I was facing the same issue. I just deleted the 'Evaluate recommender' at the end (having a yellow exclamation mark) and submitted again.
That fixed my problem.

Related

Sagemaker Pipeline Error: Failed to upload to jumpstart-cache-prod-ap-southeast-1/source-directory-tarballs/lightgbm/inference/classification

I encountered this error when running pipeline.upsert()
S3UploadFailedError: Failed to upload /tmp/tmpexrxqr32/new.tar.gz to jumpstart-cache-prod-ap-southeast-1/source-directory-tarballs/lightgbm/inference/classification/v1.1.2/sourcedir.tar.gz: An error occurred (AccessDenied) when calling the CreateMultipartUpload operation: Access Denied
My pipeline consists of preprocessing, training, evaluating, creating model and transforming step. When i ran these steps seprarately they were working just fine, but when I put them together in a pipeline, the mentioned error occured. Can anyone tell me what is the cause of this error, I did not write any line of code to upload anything to Jumpstart S3.
model = Model(
image_uri=infer_image_uri,
model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
sagemaker_session=pipeline_session,
role=role,
source_dir=infer_source_uri,
entry_point="inference.py"
)
When I comment out the entry_point line, pipeline.upsert() returned no error, but the transform job failed. The model I'm using is JumpStart LightGBM.
Nature of the problem
This happens by default because your experiment tries to upload the source code into its default bucket (which is the jumpstart bucket).
You can check the default bucket assigned to pipeline_session by printing pipeline_session.default_bucket().
Evidently you do not have the correct write permissions on that bucket and I encourage you to check them.
When you comment the entry_point, it doesn't give you that error precisely because it doesn't have anything to load. However, the moment it tries to do inference, it does not find the script clearly.
One possible quick and controllable solution
If you want to apply a cheat to verify what I told you, try putting the code_location parameter in the Model.
This way you can control exactly where your pipeline step goes to write. You will clearly need to specify the s3 uri of the desired destination folder.
code_location (str) – Name of the S3 bucket where custom code is
uploaded (default: None). If not specified, the default bucket created
by sagemaker.session.Session is used.

I got Target group not attached to the Auto Scaling group error while doing blue green deployment through code deploy

I have been trying blue green deployment with code deploy, but it throws an error: The following validation error occurred: Target group not attached to the Auto Scaling group (Service: AmazonAutoScaling; Status Code: 400; Error Code: ValidationError; Request ID: cd58091b-fe83-4dcf-b090-18c3b3d2dbbc; Proxy: null)
Though the policy has been applied to create target groups:
codedeploy:GetDeployment
elasticloadbalancing:DescribeTargetGroups
autoscaling:AttachLoadBalancers
autoscaling:AttachLoadBalancerTargetGroups
Does anyone could help me to sort out the issue & what am i missing?
The following is the error i encounter.
error
In our case, we managed to fix it the hard way by contacting AWS support team. Briefly about our app, we run Magento application behind an application load balancer with autoscaling, and the deployment is managed using AWS CodeDeploy on blue/green deployment.
We spent several days figuring out what's going on. Others suggested that there might be issue with IAM permissions, but we didn't touch that for months and deployment has never had any issues.
AWS' rep replied to us and said that in our case, there is a known issue / limitation on AWS Codedeploy that it currently don't support Blue/Green deployments based on ASGs that use Target Tracking scaling policies, because currently they don't attach the Green ASG to the original target group, and this is a requirement when Target Tracking scaling policies is enabled on the autoscaling group.
We then realized that we did some minor changes on our autoscaling groups' dynamic scaling policies that we switched from "CPU utilization"-based metrics to "Request count". Reverting it back to CPU utilization-based metrics solved the issue and we can run the deployment successfully.
Hope it helps as this error seems not to be documented in AWS doc.

Azure Machine Learning throws error "Invalid graph: You have invalid compute target(s) in node(s)" while running the pipeline

I am facing a strange issue while dealing with Azure Machine Learning (Preview) interface.
I have designed a training pipeline, which was getting initiated on certain compute node (2 node cluster, with minimal configurations). However, it used to take lot of time for execution. So, I tried to create a new training cluster (8 node cluster, with higher config). During this process, I had created and deleted some of the training clusters.
But, strangely, since then, while submitting the pipeline I am getting error as "Invalid graph: You have invalid compute target(s) in node(s)".
Could you please advise on this situation.
Thanks,
Mitul
I bet this was pretty frustrating. A common debugging strategy I have is to delete compute targets and create new ones. Perhaps this was another "transient" error?
The issue should have been fixed and will be rolled out soon. Meanwhile, as a temporary solution, you can refresh the page to make it work.

Microsoft Translator custom model deploy failed

I've tried two times to deploy an already successfully trained model in Microsoft Custom Translator dashboard (Bleu score: 38.31).
The deploy process ends always with "Deployment Failed" status, not any error message. The model is the first and the only one present in the project.
I have a S1 subscription plan. Has anyone encountered a similar issue?
Thanks
Max

How to display the logged information on an aerospike server as a graph?

I would like to log some stats on some aerospike nodes and analyse the same.
I found that aerospike comes with this tool called asgraphite, which seems to be using a forked version of the
The integration guide of asgraphite mentions some commands which are supposed to, e.g., start logging. I can run the following command already on my node and see the expected output, so it looks like I am all set to start logging -
By the way, we are running the community edition, which, it seems, does not provide historical latency stats in the AMC dashboard.
python /opt/aerospike/bin/asgraphite --help
However, I don't see any information on how to monitor the data hence logged. I am expecting a web interface which is usually provided by graphite, like this -

Resources