I have a graph database storing different types of entities and I am building an API to fetch entities from the graph. It is however a bit more complicated since for each type of entity there is a set of rules that are used for fetching related entities as well as the original.
To do this I used an aggregate step to aggregate all related entities that I am fetching into a collection.
An added requirement is to fetch a batch of entities (and related entities). I was going to do this by changing the has step that is fetching entities to use P.within and map the aggregation to each of the found entities.
This works if I continue fetching a single entity, but if I want to fetch two then my result set will be correct for the first one but the result set for the second contains the results of the first one as well as its own results.
I think this is because second one will simply add to the aggregated collection from the first one since the aggregation key is the same.
I haven't found any way to clear the collection between the first and the second, nor any way to have a dynamic aggregation side effect key.
Code:
return graph.traversal().V()
.hasLabel(ENTITY_LABEL)
.has("entity_ref", P.within(entityRefs)) // entityRefs is a list of entities I am looking for
.flatMap(
__.aggregate("match")
.sideEffect(
// The logic that applies the rules lives here. It will add items to "match" collection.
)
.select("match")
.fold()
)
.toStream()
...
The result should be a list of lists of entities where the first list of entities in the outer list contains results for the first entity in entityRefs, and the second list of entities contains results for the second entity in entityRefs.
Example:
I want to fetch the vertices for entity refs A and B and their related entities.
Let's say I expect the results to then be [[A, C], [B, D, E]], but I get the results [[A, C], [A, C, B, D, E]] (The second results contain the results from the first one).
Questions:
Is there a way to clear the "match" collection after the selection?
Is there a way to have dynamic side effect keys such that I create a collection for each entityRef?
Is there perhaps a different way I can do this?
Have I misidentified the problem?
EDIT:
This is an example that is a miniature version of the problem. The graph is setup like so:
g.addV('entity').property('id',1).property('type', 'customer').as('1').
addV('entity').property('id',2).property('type', 'email').as('2').
addV('entity').property('id',6).property('type', 'customer').as('6').
addV('entity').property('id',3).property('type', 'product').as('3').
addV('entity').property('id',4).property('type', 'subLocation').as('4').
addV('entity').property('id',7).property('type', 'location').as('7').
addV('entity').property('id',5).property('type', 'productGroup').as('5').
addE('AKA').from('1').to('2').
addE('AKA').from('2').to('6').
addE('HOSTED_AT').from('3').to('4').
addE('LOCATED_AT').from('4').to('7').
addE('PART_OF').from('3').to('5').iterate()
I want to fetch a batch of entities, given their ids and fetch related entities. Which related entities should be returned is a function of the type of the original entity.
My current query is like this (slightly modified for this example):
g.V().
hasLabel('entity').
has('id', P.within(1,3)).
flatMap(
aggregate('match').
sideEffect(
choose(values('type')).
option('customer',
both('AKA').
has('type', P.within('email', 'phone')).
sideEffect(
has('type', 'email').
aggregate('match')).
both('AKA').
has('type', 'customer').
aggregate('match')).
option('product',
bothE('HOSTED_AT', 'PART_OF').
choose(label()).
option('PART_OF',
bothV().
has('type', P.eq('productGroup')).
aggregate('match')).
option('HOSTED_AT',
bothV().
has('type', P.eq('subLocation')).
aggregate('match').
both('LOCATED_AT').
has('type', P.eq('location')).
aggregate('match')))
).
select('match').
unfold().
dedup().
values('id').
fold()
).
toList()
If I only fetch for one entity i get correct results. For id: 1 I get [1,2,6] and for id: 3 I get [3,5,4,7]. However when i fetch for both I get:
==>[3,5,4,7]
==>[3,5,4,7,1,2,6]
The first result is correct, but the second contains the results for both ids.
You can leverage the (not too well documented to be honest but seemingly powerful traversal step) group().by(key).by(value).
That way you can drop the aggregate() side effect step that is causing you trouble. As an alternative to collect multiple vertices matching some traversal into a list I used union().
An example that uses the graph you posted(I only included the Customer option for brevity):
g.V().
hasLabel('entity').
has('id', P.within(1,3)).
<String, List<Entity>>group()
.by("id")
.by(choose(values("type"))
.option('customer', union(
identity(),
both('AKA').has('type', 'email'),
both('AKA').has('type', within('email', 'phone')).both('AKA').has('type', 'customer'))
.map((traversal) -> new Entity(traversal.get())) //Or whatever business class you have
.fold() //This is important to collect all 3 paths in the union together
.option('product', union()))
.next()
This traversal has the obvious drawback of the code being a bit more verbose. It declares it will step over the 'AKA' from a Customer twice. Your traversal only declared it once.
It does however keep the by(value) part of the group() step separate between different keys. Which is what we wanted.
The properties in my graph are dynamic. That means, there can be any number of properties on the vertices. This also means that, when I do a search, I will not know what property value to look for. Is it possible in gremlin to query the graph to find all vertices that have any property with a given value.
e.g., with name and desc as properties. If the incoming search request is 'test', the query would be g.V().has('name', 'test').or().has('desc', 'test'). How can I achieve similar functionality when I do not know what properties exist? I need to be able to search on all the properties and check if any of those properties' value is 'test'
You can do this using the following syntax:
g.V().properties().hasValue('test')
However, with any size dataset I would expect this to be a very slow traversal to perform as it is the equivalent of asking an RDBMS "Find me any cell in any column in any table where the value equals 'test'". If this is a high frequency request I would suggest looking at refactoring your graph model or using a database optimized for searches such as Elasticsearch.
I have this requirement in ArangoDB AQL: I have a graph created with Document collection for node and Edge collection for directed edge relation.
I want to input a subset of list of nodes as input to AQL query and get all the node traversals /sub graph as the output.
How to achieve this from AQL?
I want to know the relation between given nodes in that way. Please comment if more details are needed.
I know below query now
FOR v IN 1..1 INBOUND[or OUTBOUND] 'Collection/_key' EdgeCollection
OPTIONS {bfs: true}
RETURN v
I'd recommend reviewing the queries on the ArangoDB sample page where it shows how it performs graph queries, and how to review the results.
In your sample query above you are only returning v (vertex information) as in FOR v IN.
That returns only the last vertex from every path that the query returns, it doesn't return edge or path information.
For that you need to test with FOR v, e, p IN and it will return extra information about the last edge (e), and the path (p) it took.
In particular look at the results of p as it contains a JSON object that holds path information, which is a collection of vertices and edges.
By iterating through that data you should be able to extract the information you require.
AQL gives you many tools to aggregate, group, filter, de-duplicate, and reduce data sets, so make sure you look at the wider language functions and practice building more complex queries.
I have a data model like this:
Person node
Email node
OWNS relationship
LISTS relationship
KNOWS relationship
each Person can OWN one Email and LISTS multiple Emails (like a contact list, 200 contacts is assumed per Person).
The query I am trying to perform is finding all the Persons that OWN an Email that a Contact LISTS and create a KNOWS relationship between them.
MATCH (n:Person {uid:'123'}) -[r1:LISTS]-> (m:Email) <-[r2:OWNS]- (l:Person)
CREATE UNIQUE (n)-[:KNOWS]->[l]
The counts of my current database is as follows:
Number of Person nodes: 10948
Number of Email nodes: 1951481
Number of OWNS rels: 21882
Number of LISTS rels: 4376340 (Each Person has 200 unique LISTS rels)
Now my problem is that running the said query on this current database takes something between 4.3 to 4.8 seconds which is unacceptable for my need. I wanted to know if this is normal timing considering my data model or am I doing something wrong with the query (or even model).
Any help would be much appreciated. Also if this is normal for Neo4j please feel free to suggest other graph databases that can handle this kind of model better.
Thank you very much in advance
UPDATE:
My query is: profile match (n: {uid: '4692'}) -[:LISTS]-> (:Email) <-[:OWNS]- (l) create unique (n)-[r:KNOWS]->(l)
The PROFILE command on my query returns this:
Cypher version: CYPHER 2.2, planner: RULE. 3919222 total db hits in 2713 ms.
Yes, 4.5 seconds to match one person from index along with its <=100 listed email addresses and merging a relationship from user to the single owner of each email, is slow.
The first thing is to make sure you have an index for uid property on nodes with :Person label. Check your indices with SCHEMA command and if missing create such an index with CREATE INDEX ON :Person(uid).
Secondly, CREATE UNIQUE may or may not do the work fine, but you will want to use MERGE instead. CREATE UNIQUE is deprecated and though they are sometimes equivalent, the operation you want performed should be expressed with MERGE.
Thirdly, to find out why the query is slow you can profile it:
PROFILE
MATCH (n:Person {uid:'123'})-[:LISTS]->(m:Email)<-[:OWNS]-(l:Person)
MERGE (n)-[:KNOWS]->[l]
See 1, 2 for details. You may also want to profile your query while forcing the use of one or other of the cost and rule based query planners to compare their plans.
CYPHER planner=cost
PROFILE
MATCH (n:Person {uid:'123'})-[:LISTS]->(m:Email)<-[:OWNS]-(l:Person)
MERGE (n)-[:KNOWS]->[l]
With these you can hopefully find and correct the problem, or update your question with the information to help others help you find it.
Please explain above question with example scenario I am confusing which is best.
If you to fetch a specific object based on keyword or any identity in list then you have to iterate the list get object and compare with its values
In map you can directly create key value pair..you can pass key and get the value.
ex:
A object user is present which has several properties one of them is user code
Now if you have list of user object then you will fetch one by one user object and compare the code of each user...but in map you can directly store user object with user code as key pass the key and get the desired object
map.get("key");
but if you requirement is not based on key type access better to use list.. example as you to just display list of items or you have to perform sublisting.
Too broad question, but will try to shorten it:
When you have to get the value based on key (key can be anything) then you go for hashmap. Consider a telephone directory where you go to appropriate name and search for person's name to find his number.
While if you have similar object's and want to store them somehow and later on retrieve it say by index or traverse them one by one then you go for list. So if your task is to find employees older than age 50 yrs, you can just return a list of employees who are older than 50.