I would like to store a set of images on my Google Cloud Services Bucket and compare an image against that set using the Vision API.
Is this possible?
The closest thing I could find in my research is creating a searchable image set https://cloud.google.com/solutions/image-search-app-with-cloud-vision but I can't see how I can leverage this to do what I want.
Ideal Scenario
I take an image on my device, I then send that image in a json object to the vision endpoint, that image is then compared against the image set in my Bucket and a similarity score is returned for each image in my Bucket.
Cloud Vision gives you a match percentage against a "label", not the specific image.
There is no universal measure of similarity between two images. Every another algorithm of similarity calculation uses the formula they thought would work best for their personal needs.
When I used the Could Vision to find the most similar image from the set probably the formula I used at the end was
https://drive.google.com/file/d/0B3BLwu7Vb2U-SVhKYWVMR2JvOFk/view?usp=sharing
But when I need to match rather by visual similarity than by labels I use my gem for the IDHash perceptual hashing algorithm https://github.com/Nakilon/dhash-vips
Related
I need to build an image classification model in Azure ML- which initially takes an input from Phone (A check in app which takes information like ID and also we will capture the image of the person- Here ID is used to tag the image) which will be redirected to data storage. once it's done, we will upload the n number of images of person to the data storage, it should able to classify the image based on facial recognition and should categorize as separate image folder for different person( Just like Google Photos). In short, If there's a 100 unique people come for check in and during the event if we click random images of these 100 unique persons, when we load this data to blob - it should categorize the persons separately.
Can I go with approach-
1.Check in app-- Loads image with tag
2.Blob- store the image
3. custom vison- ML classifier
4.Loding n number of images to blob
5. comparing the image with check in app loaded image and categorizing as album just like google photos
6. Loading albums to app to make attendees to see the images
Please guide me with the solution and services need to be considered to make this possible in azure
Thanks in adavance
Within Azure you need to look into Cognitive Services, with more information located here: https://azure.microsoft.com/en-us/services/cognitive-services/
Azure Cognitive Services is substantially surfaced as a series of API endpoints. In your example, you can post images from the mobile device to the Azure endpoint, where you can train the services to recognize individuals and have it return a JSON package of the people in the picture, or have it place rectangles around those people in a picture, etc. Other Cognitive Services include those related to images, speech, video, etc.
The Face API maps to your scenario well: https://azure.microsoft.com/en-us/services/cognitive-services/face/
https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/#overview
I want to detect a product within an image using cloud vision. If the product is too small relative to the image, then the algorithm does not detect it. For example, if I used an image of a product, it correctly labels it as a product but when I use a image of a person holding that product, it details plenty of (good) info about the person holding the object but fails to identify the object. Is there a way to force it?
You can use this image to test it using the Cloud Vision Web UI: https://img.bleacherreport.net/img/images/photos/003/758/947/hi-res-bc77cb085652783632d48c378e0a0ffb_crop_north.jpg?h=533&w=800&q=70&crop_x=center&crop_y=top
If I scan the entire image, it provides one label 'product' among other things. But if I crop just the coco cola in the image and scan that cropped image, it provides lot more details. E.g Coco Cola, soft drink etc. How can I get the details of the product if it only occupies a small portion within an larger image?
You can use Object Localization, which as stated can detect less prominent objects. I ran it on the image you provided, which returned 'bottle' for the cola - it also returns the boundingPoly vertices for the objects, which as you've noted you can use to crop the image and get a better detection
you need to pass feature PRODUCT_SEARCH with the request, which may default to TYPE_UNSPECIFIED ...so that it will know, that it shall detect products and not people or other prominent objects within the view.
see Searching for Products & Managing Products and Reference Images, which tell that, that you'd have to upload reference images of products to use that feature, which needs to ML learn to know these products first.
say that I have images and I want to generate labels for them in Spanish - does the Google Cloud Vision API allow to select which language to return the labels in?
Label Detection
Google Cloud Vision APIs do not allow configuring the result language for label detection. You will need to use a different API like Cloud Translation API to perform that operation instead.
OCR (Text detection)
If you're interested in text detection in your image, Google Cloud Vision APIs support Optical Character Recognition (OCR) with automatic language detection in a broad set of languages listed here.
For TEXT_DETECTION and DOCUMENT_TEXT_DETECTION requests, you can provide languageHints parameter in the request to get better results for certain cases where the language is unknown and/or not easily detectable.
languageHints[]
string
List of languages to use for TEXT_DETECTION. In most cases, an empty
value yields the best results since it enables automatic language
detection. For languages based on the Latin alphabet, setting
languageHints is not needed. In rare cases, when the language of the
text in the image is known, setting a hint will help get better
results (although it will be a significant hindrance if the hint is
wrong). Text detection returns an error if one or more of the
specified languages is not one of the supported languages.
The DetectedLanguage information is available in the request to identify the language along with a confidence value.
Detected language for a structural component.
JSON representation
{
"languageCode": string,
"confidence": number,
}
I am trying to use Google Cloud Vision with TEXT_DETECTION to try and do OCR on a seven segment display, but am getting pretty lousy results, mostly because it seems to think its a different language. The typical locale it seems to associate it with is "zh" or "ja".
Is there a specific hint that I can give Cloud Vision which might produce better results?
For example, this image below --
produces this output --
"locale" : "ja",
...
...
"description" : "ココ\n"
I have also tried to preprocess the image by increasing contrast, gaussian blur and even erode it to fill in the spaces between the segments, but without much luck.
Any help/pointers would be appreciated.
Try adding this in your json code:
"languageHints": ["en"]
Is there any way to get the approximate length and width of a building, given its' address from an API such as google maps or similar? I basically want to use it to find the lat, long coordinates of the approximate boundaries of any building or area that the address is inputted for. Free APIs or services would be preferred.
I don't think so - that would require enormous efort to catalogize most of the map into "is a building/is not a building" (I'd guess the military might have that, but they're unlikely to share (if they even have it)). What you could do is this:
- geocode address to lat/long
- grab satellite image of the surroundings
- try to detect shape in that
- estimate physical size from zoom+pixel size
I see a few problems there:
not sure if GMaps allows image scraping
the building may not be distinguishable from the background
the address might be geocoded off a building, or betwen two buildings
the address might be shared amongst multiple buildings