I'm trying the handwriting recognition API on MS cognitive service, obviously, the results must be not correct in 100%, so we need the confidence value/property of each word returned by API to mark the results in different color, for example, black for high confidence and red for low confidence. User could realize which word are right and which words might be wrong. I didn't find out any information about confidence on azure service.
I need help~~~
[I work on this OCR API at Microsoft]
Sorry for the delayed response - this is a frequently requested feature and is on our product roadmap. Unfortunately, at this time, I cannot provide a specific timeline on when it would be enabled.
Related
I'm trying to get body tracking to register on small action figures that are about 12" tall. I've tried using other depth sensors like the Zed2 and D435i and their skeletal SDK's recognize the toys as "humanoid" and attempt to track the skeleton.
Is it possible to change world scale or a filtering option so that the Azure Kinect or Kinect v2 do not ignore the toys?
I reached out to Microsoft and this was their response:
" AK Body Tracking has been tuned to process human’s from 7-8 years and up in age. The action doll is being filtered out as too small. They currently don’t expose the tuning parameters. They will considering exposing the tuning parameters but they have nothing to announce at this time. "
Unfortunately a no go at the moment.
I have a requirement to do integration between the batch transcription and LUIS wherein I will pass the transcriptions as such to LUIS and get the intent of the audio.
As far as I know we can pass the data for intent analysis to LUIS as a query which accepts only 500 characters.
So here comes the question is it possible to pass the full transcription from speech to text batch transcription API to LUIS for intent analysis or we have to feed the data in chunks to LUIS ?
If we feed the data in chunks(500 characters) how we will get the overall intent of the audio, since different utterances may lead to different top level intent.
I have done a lot of research on this reading the microsoft documentations , but could not find any answer.
Please suggest on the best possible way to achieve this scenario.
In my opinion, I don't think we can get the intent of the audio accurately if feed the data in chunks. I think we'd better to limit the length of the character to no more than 500. If it is longer than 500, just return error message(or not allow it longer than 500).
By the way, is it possible to get rid of unimportant words before sending to LUIS ?
Here is LUIS integration with speech service https://learn.microsoft.com/en-us/azure/architecture/solution-ideas/articles/interactive-voice-response-bot#ai-and-nlp-azure-services
We do have a Telephony channel which is currently in private preview, and as such, comes with preview terms (no SLA, etc).
Here are the details about the preview: https://github.com/microsoft/BotFramework-IVR.
say that I have images and I want to generate labels for them in Spanish - does the Google Cloud Vision API allow to select which language to return the labels in?
Label Detection
Google Cloud Vision APIs do not allow configuring the result language for label detection. You will need to use a different API like Cloud Translation API to perform that operation instead.
OCR (Text detection)
If you're interested in text detection in your image, Google Cloud Vision APIs support Optical Character Recognition (OCR) with automatic language detection in a broad set of languages listed here.
For TEXT_DETECTION and DOCUMENT_TEXT_DETECTION requests, you can provide languageHints parameter in the request to get better results for certain cases where the language is unknown and/or not easily detectable.
languageHints[]
string
List of languages to use for TEXT_DETECTION. In most cases, an empty
value yields the best results since it enables automatic language
detection. For languages based on the Latin alphabet, setting
languageHints is not needed. In rare cases, when the language of the
text in the image is known, setting a hint will help get better
results (although it will be a significant hindrance if the hint is
wrong). Text detection returns an error if one or more of the
specified languages is not one of the supported languages.
The DetectedLanguage information is available in the request to identify the language along with a confidence value.
Detected language for a structural component.
JSON representation
{
"languageCode": string,
"confidence": number,
}
Imagine you have a big place like a shopping mall and I have a 360 degree picture of several places inside and outside of it. Is it possible through Cognitive Services/Computer Vision to compare if a photo taken by users of my app is related to any of these 360 degree pictures so I can add a description saying what is in the photo?
Microsoft Cognitive Services - Computer Vision currently does not offer this type of functionality. Training or customization is not yet supported. This is a highly requested feature and under review.
I am building an internal reporting tool that I want to update with Googles pagerank once per week.
The list of keywords would be predefined at this point.
any ideas?
They do have an Adwords API that may get you closer to what you are looking to do.
API - http://code.google.com/apis/adwords/docs/
Specifically the TrafficEstimatorService allows you to specify keyword parameters and estimate what traffic you could receive.