I'm fairly new to using Microsoft's cognitive services. I'd like to know what is the difference between MS Computer Vision API and MS Custom Vision API?
They both deal with computer vision on images, but hopefully, I can help make them more distinguishable here. :)
Computer Vision
The Computer Vision API is where Microsoft has built their own image models that can give you a few things:
Image classification - This is where the API will give you a number of tags that classify the image. It should also give you a confidence score of how strongly the model predicts the image to be of that tag.
Content Moderation - The API can give you an isAdult and isRacy flags to determine if the image meets those criteria. An accompanied confidence score is with those, too.
OCR - The API can read text within the images and will give you the text. This API can also work with handwritten text instead of just text on signs.
Facial Recognition - This API will recognize the faces of celebrities or other well-known people within images.
Landmark Recognition - This will recognize landmarks within images.
Custom Vision
The Custom Vision service is a little bit different where you can train a model of your own images based off of a prebuilt model that Microsoft has. For one thing, this can only do image classification and object detection. The object detection portion is where it will tell you not only what tag an image is, but show where in the image it is. Currently, this part of the service is in preview, but I've seen good results with it so far.
Another difference is that the Custom Vision service allows you to upload your own images. For image classification, this means you can upload your images and, for each image, give it one or multiple tags. So when you run an image through the model it will return the tag(s) it thinks it is along with the tag's confidence score. For object detection, you do the same process, but you pick in the images where the object is you want to detect and give that a tag.
Each time you upload and tag new images the model needs to be trained. From there you can evaluate how well your model performs, give it test images, or even use the REST URLs or SDKs to interact with it.
To summarize, the biggest difference between the two is the Custom Vision service can only do image classification and object detection, as well as take in your own images to perform those against. The Computer Vision APIs can do a bit more, but you don't have any control over how the models are trained.
Hope that helps! If you have any questions, just let me know.
Related
I was wondering if there is some kind of documentation available for Google's AutoML Vision to train recognizing specific logo's.
At this point I only can find documentation about object detection.
Google already has a feature to detect popular product logos within an image - Detect logos. However it's managed by google and you won't find all logos there. It will recognize worldwide/know logos like GENERAL ELECTRIC, Shell, Nike.
There is also a IssueTracker - Feature Request which could give the possibility of modifying Vision API logo detection. You can star it to show you would also like this feature.
To achieve what you want, to recognise less known logos, you would need the AutoML train as you mention.
In AutoML Vision API Tutorial you can find some general information about training, however to your scenario I would suggest you to follow Building Image Detection with Google Cloud AutoML guide where Image Detection was used (open in private view in the browser if you have limitations).
You will find there all steps from Gathering Images through Image Labelling, Model Training, Model Evaluation to Outputs and Conclusions.
In this case you would need to use AutoML from scratch so you would need to train it using all logos.
I have been trying to use custom keyword using the speech devices sdk but have had problems when I use my own custom keyword and deploy to android phone (the standard ones are better but still not as good as I need or would expect in commercial application). The screen shots on linked page implies that you can "Add training data to train keyword model" however that doesn't appear when I use the speech studio.
My suspicion is that the generated speech files that are automatically created by the speech studio are not good enough to train model for users with accents (like myself).
We have not yet widely enabled the KWS model adaption.
The Custom Keyword generated from the portal aims to be sufficient for initial trial, it is not currently at the level for commercial application.
We are enabling the ability to upload data to adapt the model, this is being trialed with customers before wider roll-out. It is an upload on the Custom Keyword page and not the Custom Speech page.
Thank you for using the speech SDK!
Did you follow the instructions here:
https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-devices-sdk-create-kws
And here (for how to prepare the data):
https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-test-data#upload-data
I would like to implement a webcrawler that would be able to, given a photo of an object, search for it in a specific website.
I'm looking into Firebase ML Kit to do the job, more specificaly the Objects detection section but it is still unclear to me if that would be the right tool.
Anyone knows how could I implement that?
Comparing photos (or searching for photos) is not a use-case that ML Kit offers a pre-built model for.
The closest I can think of is running the ML Kit label extraction model on both images, and then checking if both return the same label(s). But even then you need to have all photo's locally already.
If you want something closer, you'll have to build a model yourself. But even then, you're likely going to need to build an extensive web service that searches web sites and pre-labels a lot of images.
Hi and thank you for looking into this.
(Disclaimer: I have little-to-no technical background and would like to find the least complex solution. Ideally, only "connecting" different out-of-the-box components and no coding.)
HIGH-LEVEL PROBLEM:
I have trained a model for text classification using Google AutoML. I want to make this model available on a website, ie I want to enable visitors to enter their text and to receive the model's predicted class.
CONSIDERATIONS SO FAR: AutoML allows us to deploy the model via REST API and I understand that what I want are the API's PUT and GET function (right?). Ideally, I would use some form of plug-in or script to create an input field for the user which accepts the PUT and then delivers the GET.
Are you aware of any services for this? I'm also happy to host the website in an content management system like WordPress.
I'm very open regarding other approaches to solving my problem and highly appreciate any constructive input.
Many thanks!
AutoML Documentation https://cloud.google.com/natural-language/automl/docs/predict
EDIT Jan 10 There is another question related to this and a depo is shared which supposedly provided a solution. I'm not able to access the depo but the question might help you to understand my issue. Is there a way to use Googles AutoML with JavaScript?
EDIT Jan 16 I have learned that in order to provide the input to the model the POST function could be used instead of the PUT.
I am using Google Charts API to generate QR code as referred in this link https://www.gregorystrike.com/2011/01/26/how-to-use-google-charts-api-to-generate-your-own-qr-code/
In this as the parameters are visible it seems to be insecure
Users can change the values available in the parameters using developer tools like inspector element of Firefox. Is there a secure way to generate QR codes
Thanks
QR codes are insecure anyway, anyone can see the data you place in it, by using any reader.
If want to use QR Codes in any sort of secure sense or to store sensitive data you will have to manage encryption and decryption of that data yourself, there a few encrypted examples but its not a heavily used function. Now we are past the Fab stage of QR Codes, they mainly used like barcodes for identification, with the ability to store more information if needed and the odd URL embedded link.
Although the limit is quite large this is still a limit to the amount of data you can place in a QR Code.
Lots of information on the Wikipedia page https://en.wikipedia.org/wiki/QR_code