use case for LUIS (Microsoft Cognitive Service) - microsoft-cognitive

we want to use LUIS to get the entities and intent from a user question and identify the entities that belong to our domain, so what we're doing is training LUIS with a lot of entities that comes from our context domain. Is this a valid and "correct" use of LUIS?
Thanks

Yes you can the intents and entities fro the user question with LUIS. You have to provide training examples accordingly. There are many features in LUIS to label entities which follow a specific pattern using Patterns feature (pattern.any) and provide phrase lists for synonyms. You have to use them based on the scenario. Hope that helps!!

I'm creating a search engine to find in medical documents with a very specific terms. For this I'm training LUIS with this kind of words or tags as "entities".
Yes you are right. The medical term you are referring to are suppose to be entities.
But this approach implies a big bulk of terms in LUIS
If there is a difference only in the term i.e if your utterances are like
search for a
search for b
Then you can add a and b as a phrase list in LUIS, in this way you don't have to keep repeating the utterance for each term. You can check out how to add phrase list. If you look at the 3rd point there you can see that for the name City many city values are being entered. You can do the same with the medical terms you need to search.
In this way you can get the medical terms at your server side by inspecting the entity value.

Related

In Watson conversation, How to turn on fuzzy matching only for synonyms and not for the value?

I am modelling a conversation in Watson conversation. The conversation is around the facilities available at airports. I have configured airport names as an entity and have added variations of airport names as synonyms and have kept the IATA code for the airport as the entity value. For example, Schiphol airport in Amsterdam looks like the below
I have turned on fuzzy matching on this entity so that I can catch typos people will make when they try to say Schiphol or Amsterdam. However, Watson is now capturing the word am in the below sentence as this entity
I am wondering if there is wifi in schiphol airport?
How do I stop Watson from fuzzy matching on the entity value but only do it on synonyms?
I don't think it's possible to enable Fuzzy matching on synonyms only, at the moment. I see a couple of "easy" workarounds.
Option 1: Leverage Watson's confidence level.
If you place <? entities ?> within your node's response and test it in the Try it out panel, you'll notice that each detected entity has a confidence level associated with it. This is Watson's confidence level in the entity, expressed as a float ranging from 0 to 1.
When testing it with your entity value and synonyms, I got 90% confidence (i.e., 0.9) for amsterdaam but only 70% for am.
So assuming these numbers hold for you, you could use entities[0].confidence > 0.7 as the condition in your node to decide when to assign the airport to a context variable and when to ask for clarification from the user in the response.
You might have to do some testing to see if you can find a confidence level value that works reasonably well as a threshold for your #airport entity values and their common misspellings.
Option 2: Use two entities
#airport-code: Use the airport code for the value, no synonyms, and no fuzzy matching enabled.
#airport-name: Use the airport name for the value, various synonyms including city, and fuzzy matching enabled.
Depending on how your chatbot works, this might be an acceptable compromise or complicate too much your logic.
I'll give you an example. If your $airport context variable will work whether AMS or Amsterdam Airport Schiphol is stored, this solution solves the problem for you as is. If not, there is an extra step before you can assign the value to the $airport context variable. Namely, you may need to implement a lookup to retrieve the airport code for the given airport name.

Extract the address portion of a callers response

In a conversation turn the caller is asked for the address of their destination. A few response examples:
I'm heading to 123 Lombard Street.
I'll be at 2210 third Ave.
I should be arriving to 44 Cross Terrace about 3:00 this afternoon.
Is it possible to isolate and extract the the address portion of the users response:
123 Lombard Street
2210 third Ave.
44 Cross Terrace
I'm looking for advice, best practices on whether this extraction can be accomplished using intents and entities to locate the numeric portion, and (street, Ave, Terrace) portions and wild-card what's in between (Lombard, third., Cross) or will application code be required to locate and extract the address portion.
If any additional information is required I'll be happy to provide on request.
A strategy could be to use the system entity #sys-number to point to the beginning of your referenced address snippets. The metadata for each found entity holds the location information (begin / end in the input string). From there you would need to search the input string for anything not in your specific "address vocabulary".
You could add your adress vocabulary (street, road, terrace, avenue plus the synonyms) as entities. The range from smallest entity position to highest could be extracted and then used to normalize the address.
The extraction and some processing can be done inside the conversation service, but you likely need to have outside logic to normalize the found address snippets to what you need.
This blog entry about tips & tricks for building chatbots has some useful stuff and links to a repository with some detailed examples of processing entities and variables.

LUIS Intent identification conflict

Am trying to implement a hierarchical chat bot capitalizing LUIS to identify primary and secondary intents.
As part of this created numerous LUIS models and trained.
However the behavior of the LUIS is observed weird and unpredicted at various instances.
For instance, got a LUIS model named Leave trained with following utterances.
Utterance Intent
Am I eligible for leave of adoption? Leave Query
What is my leave balance? Leave Query
What is sick leave? Leave Query
Who approves my sick leave? Leave Approval
Upon training these utterances, the queries against those on leave context are working as expected.
However when the following messages are validated against the Leaves Model with the expectation of receiving “None” intent, LUIS is returning intents other than “None”, which is not making any sense.
Query Expected Intent Actual Intent
Am I eligible for loan? None Leave Query
What is my loan balance None Leave Query
Who approves my loan None Leave Query
The issue here is “Am I eligible for loan” doesn’t belong to this LUIS model at all and am expecting a “None” intent.
The idea is to receive a None intent when the utterance doesn’t belong to queried LUIS model, so that can check other models for valid intent.
However am always getting some intent instead of “none”.
Not sure if I am doing something wrong here.
Any help/guidance on this would be much helpful.
I agree with what Steven has suggested above
Training None intent is a good practice
Defining entities will help
If you want to categorize your intents based on some domain for e.g., Leave in the present case. I would suggest creating a List entity with value as leave.
if you want to have anything with leave word go to leave Query Intent.
anything about [leave ]
Current version results
Top scoring intent
Leave Query (1)
Other intents
None (0.28)
and rest of sentences without Leave
anything about loan
Current version results
Top scoring intent
None (0.89)
Other intents
Leave Query (0)
Although the constraint here is, you would make it more definitive like scoring would be either 1 or 0 for Leave query.
it depends on your use case, whether you want to take a definitive approach or predictive approach. For Machine to Machine communication, you might think of taking a definitive approach but for things like chatbot you might prefer taking predictive approach.
None the less, this is nice little trick which might help you.
Hope this helps
How trained is your model and how many utterances are registered? Just to check, have you gone into the LUIS portal after you received the utterances "Am I eligible for loan?", and "Who approves my loan" and trained the bot that they are to not match against the Leave intents?
Please note that until any language understanding model is thoroughly trained, they are going to be prone to errors.
When looking at your utterances I noticed that they're all very similar:
"Am I eligible for leave of adoption?" vs "Am I eligible for loan?"
"What is my leave balance?" vs "What is my loan balance?"
"Who approves my sick leave?" vs "Who approves my loan"
These utterances have minimal differences. They're very general questions and you haven't indicated that any entities are currently being used. While the lack of entities for these questions is understandable with your simple examples, entities definitely help LUIS in understanding which intent to match against.
To resolve this problem you'll need to train your model more and should add entities. Some additional utterances you might use are "What's my leave balance?", "Check my leave balance", "Tell me my leave balance.", "Check leave balances", et cetera.

University data aggregation

I have a client who wants to build a web application targeted towards college students. They want the students to be able to pick which class they're in from a valid list of classes and teachers. Websites like koofers, schedulizer, and noteswap all have accurate lists from many universities which are accurate year by year.
How do these companies aggregate this data? Do these universities have some api for this specific purpose? Or, do these companies pay students from these universities to input this data every year?
We've done some of this for a client, and in each case we had to scrape the data. If you can get an API definitely use it, but my guess is that the vast majority will need to be scraped.
I would guess that these companies have some kind of agreements and use an API for data exchange. If you don't have access to that API though you can still build a simple webscraper that extracts that data for you.

Travel APIs how to integrate them all?

I may start working on a project very similar to Hipmunk.com, where it pulls the hotel cost information by calling different APIs (like expedia, orbitz, travelocity, hotels.com etc)
I did some research on this, but I am not able to find any unique hotel id or any field to match the hotels between several API's. Anyone have experience on how can to compare the hotel from expedia with orbitz or travelcity etc?
Thanks
EDIT: Google also doing the same thing http://www.google.com/hotelfinder/
From what I have seen of GDS systems, and these API's there is rarely a unique identifier between systems for e.g. hotels
Airports, airlines and countries have unique ISO identifiers: http://www.iso-code.com/airports.2.html
I would guess you are going to have to have your own internal mapping to identify and disambiguate the properties.
:|
When you get started with hotel APIs, the choice of free ones isn't really that big, see e.g. here for an overview.
The most extensive and accessible one is Expedia's EAN http://developer.ean.com/ which includes Sabre and Venere with unique IDs but still each structured differently.
That is, you are looking into different database tables.
You do get several identifies such as Name, Address, and coordinates, which can serve for unique identification, assuming they are free of errors. Which is an assumption.

Resources