How to handle shared entities (e.g. size) - watson-conversation

Trying to build a PoC that allows a user to ask something such as "I want a small pizza with a drink". Since drink didn't have a size, I would then prompt for the desired drink size.
Going with the restaurant example, should sizes (small, medium, large) be individual entities (e.g. #small, #medium, #large)? Does it matter that these entities would be used for both pizza size and drink size, or should we have a #pizza_small, #pizza_medium, #drink_small, #drink_medium, etc. entity? With this, I may need a #drink_no_size so I would know that I need to prompt for a size.
Thanks!

One solution to this problem could be that you would create the following entities: #size for general size, #pizza_size for pizza sizes, #drink_size for drink sizes. You could use this then to disambiguate which sizes were specified in the user input. If only general size was specified you'll get entity #size back, if explicit size for pizza or drink would get specified you would also get the particular entity.
The "no size" would be specified by no size entity detected in the user input.
Here is the example entity definition in the csv format.
size,small,tiny,little
size,medium,normal,standard
size,large,big,biggest,gigantic
pizza_size,psmall,small pizza,tiny pizza,little pizza
pizza_size,pmedium,medium pizza,normal pizza,standard pizza
pizza_size,plarge,large pizza,big pizza,biggest pizza,gigantic pizza
drink_size,dsmall,small drink,tiny drink,little drink,short drink
drink_size,dmedium,medium drink,normal drink,standard drink
drink_size,dlarge,large drink,big drink,biggest drink,tall drink,gigantic drink

Related

How to generate recommendations for a User using Gremlin?

I am using gremlin QL on AWS Neptune Database to generate Recommendations for a user to try new food items. The problem that I am facing is that the recommendations need to be in the same cuisine as the user likes.
We are given with three different types of nodes which are- "User", "the cuisine he likes" and "the category of the cuisine" that it lies in.
In the picture above, the recommendations for "User 2" would be "Node 1" and "Node 2". However "Node 1" belongs to a different category which is why we cannot recommend that node to "User2". We can only recommend "Node 2" to the user since that is the only node that belongs to the same category as the user likes. How do I write a gremlin query to achieve the same?
Note- There are multiple nodes for a user and multiple categories that these nodes belong to.
Here's a sample dataset that we can use:
g.addV('user').property('name','ben').as('b')
.addV('user').property('name','sally').as('s')
.addV('food').property('foodname','chicken marsala').as('fvm')
.addV('food').property('foodname','shrimp diavolo').as('fsd')
.addV('food').property('foodname','kung pao chicken').as('fkpc')
.addV('food').property('foodname','mongolian beef').as('fmb')
.addV('cuisine').property('type','italian').as('ci')
.addV('cuisine').property('type','chinese').as('cc')
.addE('hasCuisine').from('fvm').to('ci')
.addE('hasCuisine').from('fsd').to('ci')
.addE('hasCuisine').from('fkpc').to('cc')
.addE('hasCuisine').from('fmb').to('cc')
.addE('eats').from('b').to('fvm')
.addE('eats').from('b').to('fsd')
.addE('eats').from('b').to('fkpc')
.addE('eats').from('b').to('fmb')
.addE('eats').from('s').to('fmb')
Let's start with the user Sally...
g.V().has('name','sally').
Then we want to find all food item nodes that Sally likes.
(Note: It is best to add edge labels to your edges here to help with navigation.)
Let's call the edge from a user to a food item, "eats". Let's also assume that the direction of the edge (they must have a direction) goes from a user to a food item. So let's traverse to all foods that they like. We'll save this to a temporary list called 'liked' that we'll use later in the query to filter out the foods that Sally already likes.
.out('eats').aggregate('liked').
From this point in the graph, we need to diverge and fetch two downstream pieces of data. First, we want to go fetch the cuisines related to food items that Sally likes. We want to "hold our place" in the graph while we go fetch these items, so we use the sideEffect() step which allows us to go do something but come back to where we currently are in the graph to continue our traversal.
sideEffect(
out('hasCuisine').
dedup().
aggregate('cuisineschosen')).
Inside of the sideEffect() we want to traverse from food items to cuisines, deduplicate the list of related cuisines, and save the list of cuisines in a temporary list called 'cuisinechosen'.
Once we fetch the cuisines, we'll come back to where we were previously at the food items. We now want to go find the related users to Sally based on common food items. We also want to make sure we're not traversing back to Sally, so we'll use simplePath() here. simplePath() tells the query to ignore cycles.
in('eats').
simplePath().
From here we want to find all food items that our related users like and only return the ones with a cuisine that Sally already likes. We also remove the foods that Sally already likes.
out('eats').
where(without('liked')).
where(
out('hasCuisine').
where(
within('cuisineschosen'))).
values('foodname')
NOTE: You may also want to add a dedup() here after out('eats') to only return a distinct list of food items.
Putting it altogether...
g.V().has('name','sally').
out('eats').aggregate('liked').
sideEffect(
out('hasCuisine').
dedup().
aggregate('cuisineschosen')).
in('eats').
simplePath().
out('eats').
where(without('liked')).
where(
out('hasCuisine').
where(
within('cuisineschosen'))).
values('foodname')
Results:
['kung pao chicken']
At scale, you may need to use the sample() or coin() steps in Gremlin when finding related users as this can fan out really fast. Query performance is going to be based on how many objects each query needs to traverse.

How to query Gremlin when multiple connections between nodes are present

I'm trying to build a suggestion engine using Gremlin but I'm having a hard time trying to understand how to create a query when multiple nodes are connected by different intermediate nodes.
Playground:
https://gremlify.com/alxrvpfnlo9/2
Graph:
In this simple example I have two users, both like cheese and bread. But User2 also likes sandwiches, which seems a good suggestion for User1 as he shares some common interests with User2
The question I'm trying to answer is: "What can I suggest to User1 based on what other users like?"
The answer should be: Everything that other users that like the same things as User1 likes, but excluding what User1 already like. In this case it should return a sandwich
So far I have this query:
g.V(2448600).as('user1')
.out().as('user1Likes')
.in().where(neq('user1')) // to get to User2
.out().where(neq('user1Likes')) // to get to what User2 likes but excluding items that User1 likes
Which returns:
Sandwich, bread, Sandwich (again), cheese
I think that it returns that data because it walks through the graph by the Cheese node first, so Bread is not included in the 'user1Likes' list, thus not excluded in the final result. Then it walks through the Bread node, so cheese in this case is a good suggestion.
Any ideas/suggestions on how to write that query? Take into consideration that it should escalate to multiple users-ingredients
I suggest that you model your problem differently. Normally the vertex label is used to determine the type of the entity. Not to identify the entity. In your case, I think you need two vertex labels: "user" and "product".
Here is the code that creates the graph.
g.addV('user').property('name', 'User1').as('user1').
addV('user').property('name', 'User2').as('user2').
addV('product').property('name', 'Cheese').as('cheese').
addV('product').property('name', 'Bread').as('bread').
addV('product').property('name', 'Sandwiches').as('sandwiches').
addE('likes').from('user1').to('cheese').
addE('likes').from('user1').to('bread').
addE('likes').from('user2').to('cheese').
addE('likes').from('user2').to('bread').
addE('likes').from('user2').to('sandwiches')
And here is the traversal that gets the recommended products for "User1".
g.V().has('user', 'name', 'User1').as('user1').
out('likes').aggregate('user1Likes').
in('likes').
where(neq('user1')).
dedup().
out('likes').
where(without('user1Likes')).
dedup()
The aggregate step aggregates all the products liked by "User1" into a collection named "user1Likes".
The without predicate passes only the vertices that are not within the collection "user1Likes".

Querying firestore using multiple in clauses

I'm working on a product listing page (similar to any e-commerce site) where users are expected to filter products based on multiple attributes and multiple values per attribute.
Lets assume data model is as shown below,
Product
Category - Shirt
Size - Medium
Colour - Blue
Dropdown filters on the search page would be,
1. Category: Shirts, T-Shirts, Trousers etc
2. Size: Medium, Large etc
I'm clueless as how to query firestore when user would like to search all Shirts & T-Shirts of Small & Medium sizes?
Query like this isn't supported in firestore,
Firebase.firestore().collection("Products")
.where("Category", "in", ["Shirt, "T-Shirts"])
.where("Size", "in", ["Medium", "Large"])
On top of this, I need to paginate the response so filtering on the client side doesn't seem like an option.
Please suggest if there is any option.
Since you can only have a single in condition in the query, your current approach won't work. The only workaround I know of is to keep a separate field where each value is the combination of sizes that is available. So something like:
"available_in": "Medium_Large"
And then query with:
Firebase.firestore().collection("Products")
.where("Category", "in", ["Shirt, "T-Shirts"])
.where("available_in", "==", "Medium_Large")
This type of solution only works if the number of combinations is reasonable, but that seems to be the case here.

Designing "hot or not" style database in Entity Framework Model First

I want to design a duel of sorts between movies so that two are pitched against each other, the user selects the better one which then gets a point. In the database I want a list of all movies that have dueled and how many times they have won respectively.
Movie
Id
Name
Duels
Duel
ContenderOne
ContenderTwo
NumberOfDuels
ContenderOneWins
ContenderTwoWins
I'm trying to set this up so that the movie entity has 1 property called Duels in which I can fetch all duel entities where it is involved, regardless of it being contender one or contender two. ContenderOne and ContenderTwo should link back to the movies.
Any ideas how to achieve this? I am stumped.
You can't have just one property. You will need two: DuelsAsContenderOne and DuelsAsContenderTwo. If you need all duels disregarding the position, you can use the expression x.DuelsAsContenderOne.Concat(x.DuelsAsContenderTwo)

Database Headaches - Mind not working today

I cant seem to get my head around how to create this
Each Bold Letter is a Database Table
I need this to work with Entity Framework
Product
[ Product belongs to one group]
Product Group - [Computer]
[many to many]
[Group has many items]
[Product belongs to one Group Item]
Product Group Item - [Hard Drive]
[many to many]
[Group Items has Many Fields]
[Fields does not change for each product only changes for each Group Item]
Product Group Item Field - [Form Factor]
[Group Item Fields has many values]
[Field Values Change with each product]
Product Group Item Field Values - [ 3.5" ]
I can pretty much get the first 3 to work
my problem is how to do the last two tables
I hope I explained it clear enough
thanks in advance
alt text http://myimgs.net/images/cjgo.gif
maybe this will help or just hurt who knows
Product = is a harddrive
so:
Group - Computer
GroupItem - Harddrive
GroupItemField - Form Factor : GroupItemFieldValue - 3.5"
GroupItemField - Capacity : GroupItemFieldValue - 600MB
etc...
but the field value changes for each product of type Harddrive but the field does not
I think you may be trying to over-generalise your solution.
It seems to me you want to standardise the information you capture for different kinds of products.
E.g. Hard Drives
1 Supplier1 Model 1a 3.5" 600MB
2 Supplier1 Model 1b 3.5" 200GB
3 Supplier2 Model X 2.5" 600MB
And you want to represent the attributes in a single table:
1 FormFactor 3.5"
1 Capacity 600MB
2 FormFactor 3.5"
2 Capacity 200GB
3 FormFactor 2.5"
3 Capacity 600MB
The problem is that over-generalising like this you lose all the data integrity controls that your RDBMS provides.
You may be better off with:
Product (*Id, Name, GroupId, Supplier, Model, ...)
HardDrive (*Id, FormFactor, Capacity, ...)
Monitor (*Id, Resolution, ...)
Memory (*Id, Capacity, Speed, ...)
Each of the above product specific tables has an optional-to-one reference to Product. With such a design, it becomes impossible to capture Monitor attributes for a hard-drive unless you add a Monitor row for the product.
That said, if you're willing to forego integrity controls, or manage them yourself in code, then looking at sample data helps to produce your schema. (I'm going to use the terminology of attributes.)
AttributeValues (*ProductId, *AttributeId, Value) -- Note a problem here: what type should Value be?
You will need some way of indicating what attributes are allowed for each Group:
HardDrive FormFactor Req
HardDrive Capacity Req
Monitor Resolution Req
Monitor Colour Opt
Memory Capacity Req
Memory Speed Req
GroupAttributes (*GroupId, *AttributeId, IsOptional)
Then you need to indicate the group to which a product belongs (so that you can figure out which values need to be filled in)
1 Supplier1 Model 1a HardDrive
2 Supplier1 Model 1b HardDrive
3 Supplier2 Model X HardDrive
4 Supplier2 Model M1 Monitor
Products (*ProductId, Group, SupplierId, ModelNo)
I'm not sure where your GroupItems fit in.
Relationships
Products.GroupId -> Groups.GroupId
Products.SupplierId -> Suppliers.SupplierId
GroupAttribute.GroupId -> Groups.GroupId
GroupAttribute.AttributeId -> Groups.AttributeId
AttributeValue.ProductId -> Products.ProductId
AttributeValue.AttributeId -> Attributes.AttributeId
NOTE
I've illustrated how you can add columns defining rules for the attribute values. You could do the same for the Attributes table where you'd probably at a minimum need to indicate the data-type of the attribute.
You may notice that it won't be long and you'll soon be replicating the meta-data that your RDBMS provides to define tables and columns. The highly generalised solution does have its benefits such as using a simple template mechanism to capture and view products. But it becomes quite a bit more difficult (in code and processing time) to perform other tasks. So I suggest you consider your requirements holistically against the design.

Resources