Gremlin - Looking at a graph at different levels (hierarchies) - gremlin

Disclaimer: I'm coming with more of a relational DB world, so I might come with some misconceptions on what the best practices are for storing and working with graph databases.
Anyway, let's say I have data with some hierarchy in it.
Let's say I have the following hierarchy:
Food / Fruit / Orange
Food / Vegetable / Lettuce
Food / Vegetable / Onion
Dishes / Thai / Phad Thai
Dishes / Thai / Larb Gai
Dishes / Desert / Orange Cake
Dishes / Dish / Ceasar Salad
And in addition,
In my graph, I have vertices for every last level item in the hierarchy and every one of them has 2 properties to know what the full hierarchy is. For example: Tomato has the properties level1: 'Food', level2: 'Fruit'.
In addition, I have edges used_in when some ingredient is used in a dish.
All edges are between vertices (last level items in the hierarchy).
Now, I would like to be able to look at the some higher level graph, based on level2.
For example I would like to be able to see:
Fruit -> used_in -> Desert
Vegetable -> used_in -> Thai
Graph looks like this:
And I want to query the graph such that I get the following result:
So is there some way to group vertices by some combination of fields (in this case - key is combination of level1 and level2 fields) such that the edges relating between those groups, will remain?
If there some other way I should model my data? For example, adding labels based on all the items in the hierarchy?
To create the graph:
g.addV('Orange').property(id, 'Orange').property('level3', 'Orange').property('level2', 'Fruit').property('level1', 'Food')
.addV('Lettuce').property(id, 'Lettuce').property('level3', 'Lettuce').property('level2', 'Vegetable').property('level1', 'Food')
.addV('Onion').property(id, 'Onion').property('level3', 'Onion').property('level2', 'Vegetable').property('level1', 'Food')
.addV('Phad Thai').property(id, 'Phad Thai').property('level3', 'Spoon').property('level2', 'Thai').property('level1', 'Dishes')
.addV('Larb Gai').property(id, 'Larb Gai').property('level3', 'Fork').property('level2', 'Thai').property('level1', 'Dishes')
.addV('Orange Cake').property(id, 'Orange Cake').property('level3', 'Orange Crepe').property('level2', 'Desert').property('level1', 'Dishes')
.addV('Ceasars Salad').property(id, 'Ceasars Salad').property('level3', 'Ceasars Salad').property('level2', 'Salads').property('level1', 'Dishes')
.addE('used_in').from(g.V().has(id, 'Orange')).to(g.V().has(id, 'Orange Cake'))
.addE('used_in').from(g.V().has(id, 'Lettuce')).to(g.V().has(id, 'Ceasars Salad'))
.addE('used_in').from(g.V().has(id, 'Onion')).to(g.V().has(id, 'Phad Thai'))
.addE('used_in').from(g.V().has(id, 'Onion')).to(g.V().has(id, 'Larb Gai'))
.addE('used_in').from(g.V().has(id, 'Lettuce')).to(g.V().has(id, 'Larb Gai'))
.iterate()
Thanks in advance! :)

I re-formatted the graph creation steps and removed the g.V() and replaced with just V() for all the mid traversalsteps. This will no longer work at TinkerPop 3.5.x and higher versions as that form was deprecated. It has bad side effects that most users do not realize. I think that changing the data model might be a good idea.
Looking at the data - you are really using properties in a way that simulates what edges are good at. For example why not have edges with labels like level1 and use those edges to connect the appropriate vertices? Anyway, here is the reformatted graph creation.
g.addV('Orange').
property(id, 'Orange').
property('level3', 'Orange').
property('level2', 'Fruit').
property('level1', 'Food').
addV('Lettuce').
property(id, 'Lettuce').
property('level3', 'Lettuce').
property('level2', 'Vegetable').
property('level1', 'Food').
addV('Onion').
property(id, 'Onion').
property('level3', 'Onion').
property('level2', 'Vegetable').
property('level1', 'Food').
addV('Phad Thai').
property(id, 'Phad Thai').
property('level3', 'Spoon').
property('level2', 'Thai').
property('level1', 'Dishes').
addV('Larb Gai').
property(id, 'Larb Gai').
property('level3', 'Fork').
property('level2', 'Thai').
property('level1', 'Dishes').
addV('Orange Cake').
property(id, 'Orange Cake').
property('level3', 'Orange Crepe').
property('level2', 'Desert').
property('level1', 'Dishes').
addV('Ceasars Salad').
property(id, 'Ceasars Salad').
property('level3', 'Ceasars Salad').
property('level2', 'Salads').
property('level1', 'Dishes').
addE('used_in').
from(V().has(id, 'Orange')).
to(V().has(id, 'Orange Cake')).
addE('used_in').
from(V().has(id, 'Lettuce')).
to(V().has(id, 'Ceasars Salad')).
addE('used_in').
from(V().has(id, 'Onion')).
to(V().has(id, 'Phad Thai')).
addE('used_in').
from(V().has(id, 'Onion')).
to(V().has(id, 'Larb Gai')).
addE('used_in').
from(V().has(id, 'Lettuce')).
to(V().has(id, 'Larb Gai'))
As a start, and if you need more than this, I can edit the question, we can use a path step to generate the relationships shown in your diagrams. I would consider changing the data model though.
g.V().has('level2','Fruit').
outE('used_in').
inV().
path().
by('level2').
by(label).
by('level2')
Which finds:
[Fruit, used_in, Desert]

Related

How do I collect values from a vertex used in a traversal?

I want the details of a vertex along with details of vertices that are joined to it.
I have a group vertex, incoming 'member' edges to user vertices. I want the details of the vertices.
g.V(1).as('a').in('member').valueMap().as('b').select('a','b').unfold().dedup()
==>a=v[1]
==>b={image=[images/profile/friend9.jpg], name=[Thomas Thompson], email=[me#thomasthompson.co.uk]}
==>b={image=[images/profile/friend13.jpg], name=[Laura Tostevin], email=[me#lauratostevin.co.uk]}
==>b={image=[images/profile/friend5.jpg], name=[Alan Thompson], email=[me#alanthompson.co.uk]}
==>b={image=[images/profile/friend10.jpg], name=[Laura Bourne], email=[me#laurabourne.co.uk]}
Ideally what I'd want is:
{label: 'group', id=1, name='A Group', users=[{id=2, label="user",name=".."}, ... }]}
When I tried a project, it didn't like me using 'in'
gremlin> g.V('1').project('name','users').by('name').by(in('member').select())
groovysh_parse: 1: unexpected token: in # line 1, column 83.
'name','users').by('name').by(in('member
To get your preferred output format, you have to join the group's valueMap() with the list of users. On TinkerPop's modern toy graph you would do something like this:
gremlin> g.V(3).union(valueMap(true).
by(unfold()),
project('users').
by(__.in('created').
valueMap(true).
by(unfold()).
fold())).
unfold().
group().
by(keys).
by(select(values))
==>[name:lop,id:3,lang:java,label:software,users:[[id:1,label:person,name:marko,...],...]]
Mapping this to your graph should be pretty straight-forward, it's basically just about changing labels.
Because in is a reserved keyword in Groovy you must use the verbose syntax __.in
try:
g.V('1').project('name','users').by('name').by(__.in('member').valueMap(true).fold())

Gremlin traversal.Output all Edge details and also in/out Vertex id's

I'm having trouble constructing the gremlin query to give me all of the Edge details(label, properties) and also the ID's of the Inv and OutV adjoining Vertex's (I don't need any more info from the linked Vertex's, just the ID's).
All I have is the Edge ID as a starting point.
So my Edge is as follows:
Label: "CONTAINS"
id: c6b4f3cb-f96e-cc97-dedb-e405771cb4f2
keys:
key="ekey1", value="e1"
key="ekey2", value="e2"
inV has id 50b4f3cb-f907-c31c-6284-1a3463fd72b9
outV has id 7cb4f3cb-d9a2-1398-61d7-9339be34833b
What I want is a single query that will return me something like -
"CONTAINS", "c6b4f3cb-f96e-cc97-dedb-e405771cb4f2", {ekey1=e1, ekey2=e2, ...}, "50b4f3cb-f907-c31c-6284-1a3463fd72b9", "7cb4f3cb-d9a2-1398-61d7-9339be34833b"
I can get the info in separate queries i.e.
g.E("c6b4f3cb-f96e-cc97-dedb-e405771cb4f2").bothV()
==>v[50b4f3cb-f907-c31c-6284-1a3463fd72b9]
==>v[7cb4f3cb-d9a2-1398-61d7-9339be34833b]
g.E("c6b4f3cb-f96e-cc97-dedb-e405771cb4f2").valueMap()
==>{ekey1=e1, ekey2=e2}
g.E("c6b4f3cb-f96e-cc97-dedb-e405771cb4f2").label()
==>CONTAINS
But I can't for the life of me work out how to combine these into one.
You could use project() to get what you're looking for:
g.E("c6b4f3cb-f96e-cc97-dedb-e405771cb4f2").
project('ekey1', 'inV', 'outV', 'label').
by('ekey1').
by(inV().id()).
by(outV().id()).
by(label).

Finding subgraph from a graph in OrientDb

This is my graph
[Person] -livesIn-> [City]
[Factory] -locatedIn-> [City]
[Person] -worksAt-> [Factory]
How do I find people who have to travel far for work. i.e. "People working at at factory that is not located in the city they are living in".?
I tried to do this:
Match
{class:Person, as: person} -worksAt-> {class:Factory, as: factory} -locatedIn-> {class:City, as: city},
{how do i check, person !livesIn city }
return person
I don't think this problem is specific to orinentDb, so feel free to let me know how one can solve this in any other graphDb.
I'm familiar with SQL dialect of OrientDb & Gremlin too.
A direction/help in any of the languages is greatly appreciated.
Try this:
Match
{class:Person, as: person} -worksAt-> {class:Factory, as: factory} -locatedIn-> {class:City, as: city, where: (in("locatedIn").name <> name)}
return person.name as person
this is the output:
as you can see only John is returned:
Hope it helps
Regards

How do I define a custom slot type that isn't a list?

I'm playing around with the Alexa Skills Kit (for the Amazon Echo) and want to create a skill that would send the intent to an AWS Lambda function which would just email something back to me.
Sample Utterances would be something like this:
MemoIntent take a memo {myMemo}
MemoIntent to take a memo {myMemo}
MemoIntent send a memo {myMemo}
This would allow me to say something like "Alexa, ask my secretary to take a memo, remind me to go to the store on my way home today" and would then get an email from my Lambda function saying, "remind me to go to the store on my way home today."
The myMemo slot is freeform - at this point just a sentence or two will do, but I'm not finding a lot of help in the documentation for how to write the schema for something like this. My best guess at the moment fails with a:
Error: There was a problem with your request: Unknown slot name
'{myMemo}'. Occurred in sample 'MemoIntent take a memo {myMemo}' on
line 1.
I'm using the AMAZON.LITERAL slot type, which the documentation discourages, but it also doesn't offer any suggestions on how else to go about this. And besides, like I mentioned, it fails.
Here is the schema that fails:
{
"intents": [
{
"intent": "MemoIntent",
"slots": [
{
"name": "myMemo",
"type": "AMAZON.LITERAL"
}
]
}
]
}
Literals are different than other slot types in that you must provide training in the sample utterance, as mentioned in the official documentation:
https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/alexa-skills-kit-interaction-model-reference
Sample Utterances Syntax
Sample utterances map the phrases the user can speak to the intents you have defined. They are written as lines in a plain text file, using the following format:
IntentName this is a sample utterance with no slots
IntentName this is a sample utterance containing a {SlotName}
IntentName this is a sample utterance containing a {SlotName} and {AnotherSlotName}
Note that the above format applies to all slot types except AMAZON.LITERAL. For AMAZON.LITERAL, you also need to specify a sample slot value:
IntentName this is a sample utterance containing a {slot value|SlotName} using LITERAL
Alternatively, using Custom Slots will allow you to provide the slot after defining numerous sample custom slot values. In this scenario, you would create a new custom slot called myMemo with a type of the custom slot name, such as MY_MEMO. Your custom slot value would be populated with potential values (these are not the only values it will receive), such as:
walk the dog
eat more bacon
go to the store on the way home
We are currently developing an AI (for Alexa) which should be able to answer a wide variety of questions. It is very important that users are able to phrase complex questions which shall be analyzed in the backend. If Alexa drops them early on because of limited utterances and slot types, we can't provide such a service.
At the moment we are experimenting with the following approach. (Keep in mind that our experiment is based on German. Other languages might behave differently.)
1. Custom Slot Types per Word Class
We defined custom slot types for the following word classes:
interrogation (what, who, when)
item (cybersecurity, darknet, malware)
verb (is, has, can)
adjective (popular, inexpensive, insecure)
pronoun (the, he, she)
2. Sample Utterances for Sentence Structure
Then we have defined possible structures for sentences with sample utterances:
QuestionIntent {Interrogation}
QuestionIntent {Item}
QuestionIntent {Verb}
QuestionIntent {Adjective}
QuestionIntent {Interrogation} {Verb} {Item}
QuestionIntent {Interrogation} {Verb} {Item} {Adjective}
QuestionIntent {Interrogation} {Verb} {Pronoun} {Item}
QuestionIntent {Interrogation} {Verb} {Pronoun} {Pronoun} {Item}
QuestionIntent {Interrogation} {Verb} {Pronoun} {Item} {Preposition} {Item}
QuestionIntent {Interrogation} {Verb} {Adjective} {Item}
QuestionIntent {Interrogation} {Verb} {Pronoun} {Adjective} {Item}
QuestionIntent {Interrogation} {Item} {Verb}
QuestionIntent {Interrogation} {Item} {Verb} {Adjective}
QuestionIntent {Interrogation} {Item} {Verb} {Pronoun} {Adjective}
QuestionIntent {Item} {Verb} {Interrogation}
QuestionIntent {Verb} {Item} {Verb}
QuestionIntent {Verb} {Adjective} {Item} {Verb}
3. NLP Analysis in Backend
Then we do an NLP analysis of the submitted words in the backend. The received data looks like this:
"intent": {
"name": "QuestionIntent",
"slots": {
"Item": {
"name": "Item",
"value": "darknet"
},
"Preposition": {
"name": "Preposition"
},
"Adjective": {
"name": "Adjective"
},
"Verb": {
"name": "Verb",
"value": "is"
},
"Interrogation": {
"name": "Interrogation",
"value": "what"
},
"Pronoun": {
"name": "Pronoun",
"value": "the"
}
}
}
Some words might be lost, some others might be misheard. In this case, we remember topics from earlier exchanges and "fill" the missing words with these. For example: What is {it}? ⇒ What is {Darknet}?
We were experimenting with a broad list of lists for slot types. But this increases the risk of mishearing something (a good example in English is write and right, luckily they are not assigned to the same word class). So we switched to a very narrow approach. The lists only contain words which can be handled by the AI and are stored in the knowledge base. For example, the list of items does not contain the words pony or unicorn. We expect this to come up with better results (less confusing answers).
Complex sentences not defined with a utterances structure are highly confusing to work with. For example, if a sentence contains more than 2 verbs (which might be necessary to build tense). But so far our approach leads to results with a good level of accuracy as long as the user behaves with some level of politeness.
But in the end: Unfortunately, at the moment, it is not possible to dictate something like a memo with an infinite amount of different words and sentence structures.
I tried another approach to this.
I created a Custom Slot Type with a list of values like this.
wordOne
wordOne wordTwo
wordOne wordTwo wordThree
wordOne wordTwo wordThree wordFour
wordOne wordTwo wordThree wordFour wordFive
You can continue the list with as long strings as you need.
My guess was that Alexa, when trying to fill slots, orientates on the amount of space seperated words in a value of a slot type, to match what it heard.
I had quite some success grabbing whole sentences in a single slot with this Custom Slot Type. Though i have never tested it on intents with more than just the slot as utterance.
But if you seperate your intent it might work. Maybe something like this.
StartMemoIntent take a memo
StartMemoIntent to take a memo
StartMemoIntent send a memo
StartMemoIntent record a memo
StartMemoIntent listen to my memo
RecordMemoIntent {memo}
You have to be careful though, it can confuse the intents if you have not enough sample utterances for your other intents.
If you put enough sample utterances, at least 7-8, with the StartMemoIntent it should have no problem taking the right one.
According to some of the comments here, I figured out you can get Alexa to recognise free form words or phrases by adding a large random list of words to the custom slot values field.
I generated mine by running;
from nltk.corpus import words
import json
words_list = words.words()[:100]
values = []
for word in words_list:
value = {}
value['id'] = None
value['name'] = {}
value['name']['value'] = word
value['name']['synonyms'] = []
values.append(value)
print(json.dumps(values))
Then copy pasting those values to;
{
"languageModel": {
"types": [
{
"name": "phrase",
"values": [values you get from above]
...
AMAZON.SearchQuery
AMAZON.SearchQuery slot type lets you capture less-predictable input that makes up the search query.
Ex:
{
"intents": [
{
"name": "SearchIntent",
"slots": [
{
"name": "Query",
"type": "AMAZON.SearchQuery"
},
{
"name": "CityList",
"type": "AMAZON.US_CITY"
}
],
"samples": [
"search for {Query} near me",
"find out {Query}",
"search for {Query}",
"give me details about {CityList}"
]
}
]
}
More on AMAZON.SearchQuery here
There is AMAZON.LITERAL slot that passes the recognised words for the slot value with no conversion. But, its's not recommended. You cannot use AMAZON.LITERAL in a skill configured with a dialog model.

Xquery Multiple Results in a node

When I try to execute my Xquery Code on xml file, I am getting multiple results in one of my fields.
Here is my xml file
<Actors>
<Actor name="NTR">
<Movie TITLE="Yamadonga" Director="Rajamouli"></Movie>
<Movie TITLE="AADI" Director="VV vinayak">
</Movie>
</Actor>
<Actor name="Rajeev">
<Movie TITLE="Yamadonga" Director="Rajamouli" ></Movie>
</Actor>
<Actor name="mahesh">
<Movie TITLE="pokiri" Director="puri">
</Movie>
</Actor>
my xquery file
<Director>
{
for $Movie in doc("actors.xml")/Actors/Actor/Movie
return
if($Movie/#TITLE=$title)
then
data($Movie/#Director)
else()
}
</Director>
Most importantly, my result
<movies>
<movie>
<Title>Yamadonga</Title>
<Actor>NTR</Actor>
<Actor>Rajeev</Actor>
<Director>Rajamouli Rajamouli</Director>
</movie>
</movies>
How to get only one value in the director field?
My procedure :-
I ran the distinct values function over (../Movie/#TITLE) and that gave me the answer for displaying title. But as title and director are attributes of movie, I cannot access one using the other. When I iterate over actor, as there are two actors having a single director for single movie, the director name gets printed twice. When I iterate over movie, I cannot use distinct-values over it as it is not an attribute.
Your XQuery is really not very efficient or easily readable. You can do a simple xpath:
<Director>
{
data((doc("actors.xml")/Actors/Actor/Movie[#TITLE = $title])[1]/#Director)
}
</Director>
It's because the for is returning 2 movies. Why don't you just use an XPath with distinct-values()?
<Director>
{
distinct-values(doc("actors.xml")/Actors/Actor/Movie[#TITLE=$title]/data(#Director))
}
</Director>

Resources