Gremlin with Neptune: How to handle Vertex ID updates? - gremlin

Case:
A Customer Vertex ~Id requires an update in certain cases.
In the following case, what is the recommended approach to update a Vertex ~id with the New Vertex Id?
Is this possible using an UPSERT or am I better to DELETE the old Vertex and then CREATE a new one assigning the New Vertex Id as ~id?
We have 20 million Vertex and approximately 2% of those will fall into this use case.
Customer Vertex updates
~id
New Vertex Id
Old Vertex Id
AAA
CCC
AAA
BBB
DDD
BBB

Vertex IDs in Neptune are immutable. If this is something that you need to change on a regular basis, then you'll want to make it a property instead.

#Sascha You can not change the vertex id once created as Taylor Riggan already mentioned. So your option would be to identify those rows and delete them and create new vertexes with new ids.
If you don't want to delete the old vertexes till new vertexes are created then you set say new attribute named "tobedeleted" to Y for the all identified old vertexes and then you create the new set vertexes and then delete the vertexes where "tobedeleted" property is set to Y.

Related

Query to identify vertices of Label "A" that have edges to multiple vertices of Label "B"

I have a graph with vertices of label "organization" with edges to, among others, vertices of label "social_account". In this one-to-many relationship, an organization vertex should only have one social_account vertex per distinct social platform. That is to say that an organization can have edges to a "Twitter" social_account vertex, a "Medium" social_account vertex, and an "Instagram" social_account vertex, but it should not have edges to more than one "Twitter" social_account vertex, illustrated here:
Toy data diagram
I'd like a query which identifies:
organization vertices with more than one social_account vertex for a given social platform
The IDs of the associated social_account vertices which should be dropped, based on their updated_at value.
I have the following query which returns the organization ID and the associated IDs of the social_account vertices to which the organization vertex has an edge.
g.V().hasLabel("organization").as("org", "social_ids").select("org", "social_ids").by(id).by(out().hasLabel('social_account').id().fold())
Given an input "Twitter" I would like to see something along the lines of this for the above toy data:
[org: Org-2, to_drop: [Twitter-39]]
and similar output for a general input corresponding to a social_account "social" property.
I'm relatively new to Gremlin so advice toward efficiently filtering these results to the desired output would be most appreciated.
It helps a lot when you provide a traversal that builds the graph so we can give you a tested answer.
g.addV('Organization').property(id, 'Org-1').as('org1').
addV('social_account').property(id, 'Twitter-01').property('social', 'Twitter').property('updated_at', 1234).as('twitter01').
addE('has_social_account').from('org1').to('twitter01').
addV('social_account').property(id, 'Medium-34').property('social', 'Medium').property('updated_at', 1345).as('medium34').
addE('has_social_account').from('org1').to('medium34').
addV('social_account').property(id, 'Insta-55').property('social', 'Instagram').property('updated_at', 4567).as('insta55').
addE('has_social_account').from('org1').to('insta55').
addV('Organization').property(id, 'Org-2').as('org2').
addV('social_account').property(id, 'Twitter-39').property('social', 'Twitter').property('updated_at', 1111).as('twitter39').
addE('has_social_account').from('org2').to('twitter39').
addV('social_account').property(id, 'Twitter-60').property('social', 'Twitter').property('updated_at', 2345).as('twitter60').
addE('has_social_account').from('org2').to('twitter60').
addV('social_account').property(id, 'Insta-19').property('social', 'Instagram').property('updated_at', 4567).as('insta19').
addE('has_social_account').from('org2').to('insta19')
And here is the traversal that gets the organizations having more than one Twitter account and the IDs of the Twitter accounts that need to be dropped.
g.V().hasLabel('Organization').where(out('has_social_account').has('social', 'Twitter').count().is(gt(1))).
project('org', 'to_drop').
by(id).
by(out('has_social_account').has('social', 'Twitter').
order().by('updated_at', desc).
range(1, -1).id().fold())
The where step filters by the organizations having more than one Twitter account. And the project step gets the IDs of these organizations and their Twitter accounts that need to be deleted.
To get the Twitter accounts that need to be deleted, the nested traversal in the second by modulator gets the Twitter accounts associated with the organization, orders them based on the value of "updated_at" (descending). Then uses the range step to skip the first Twitter account which does not need to be deleted.

gremlin traversal on AWS neptune

I have a graph structure like this-
Node1(Console) <----Uses--- Node2(Name, Age) -----plays----> Node3(Game)
So, i have three nodes -
Node1 is console like a PS3/ Nintendo.
Node2 is a person with properties name and Age.
Node3 is game node which holds game name like 'warcraft'.
Now I want to have a gremlin query which tells me that.
How many users (Node2) who has the console like PS3, plays a game like 'warcraft'
I think i need to start the traverse from Node2 , filter it based on Node1 property i.e console as as 'PS3' and plays game like 'warcraft'
I am new to gremlin and using some thing like this -
g.V().hasLabel('user').outE('uses').inV().has('console', 'ps3').count()
the above query only answers half of my required result. How do i filter Node2 based on plays relationship as well.
Any help is appreciated.
There are multiple ways to write the query.
Option 1: Start from console
g.V().has('console', 'ps3').in('uses').where(out('plays').has('game', 'warcraft')).valueMap('name')
Let me explain the structure here:
g.V().has('console', 'ps3') --> Select all vertices which have a property with key as console and value as ps3
in('uses') --> From the set of previous vertices, jump to incoming vertices via an edge that has the label uses. At this stage, we would have player vertices in our solution.
where(out('plays').has('game', 'warcraft')) --> Apply a filter on existing solutions. Since we are using where we would not jump/traverse to the next step of vertices.
valueMap('name') --> Project one or more properties if existing solutions which are player vertices.
Option 2: Another way to write above query
g.V().has('console', 'ps3').in('uses').as('myusers').out('plays').has('game', 'warcraft').select('myusers').by('name')
as('myusers') --> Provides a reference/alias to the vertices at this stage. Note that it does not store all the results at this stage instead it just provides a reference to the type of vertices at this point in the query.
out('plays').has('game', 'warcraft') --> Unlike previous time when we did not jump since we were using where, this time we jump onto the game vertices.
select('myusers').by('name') --> since we want to project the users but the current solutions are game vertices, we need to select the user vertices which we do using the reference we stored earlier.
Option 3: Start from user
g.V().hasLabel('user').where(out('plays').has('game','warcraft')).where(out('uses').has('console','ps3')).valueMap('name')
There are more ways to write this query such as using path() but I won't go into details here.
Since you are beginning to learn Gremlin, I would recommend that you start with https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html

Janusgraph - How to hide the edge relation between two vertices and establish / retrieve again based on a condition?

I'm new to the Janusgraph Database. I have a requirement where I need to hide the relation (edge) between two vertices without dropping them and later I should able retrieve / establish the same relation again between those vertices based on condition.
I only know how to drop the edges but I don't know how to retrieve/restore the relation again. Could you please help me out here.
Thanks a lot for your time.
If you want to 'restore' the connections I think you shouldn't drop them at all.
Just keep a property on the edge that indicates the edge state (active/inactive) or maybe keep a start and end date on the edge.
This way when you traverse your graph you need to makes sure to use only the active edges, but the old ones can still easily found if you want to restore them.
for example:
g.addV('person').property('id', 'bob').property('name', 'Bob')
g.addV('person').property('id', 'alice').property('name', 'Alice')
g.addV('person').property('id', 'eve').property('name', 'Eve')
g.V('bob').addE('friend').to(g.V('alice'))
g.V('bob').addE('friend').to(g.V('eve'))
So Bob friends with Alice and Eve:
g.V('bob').out('friend').values("name")
==>Alice
==>Eve
Let say Bob and Alice had a fallout, and they are no longer friends:
g.V('bob').outE('friend').where(inV().hasId('alice')).property('status', 'inactive')
now you can query only Bob active friends, without dropping the old edges:
g.V('bob').outE('friend').not(has('status', 'inactive')).inV().values("name")
==> Eve

neo4j Cypher Query

i have a following graph in neo4j graph database and by using the cypher query language, i want to retrieve the whole data with is connected to root node and their child node.
For example :
kindly find the below graph image.
[As per the image, node 1 has two child and their child also have too many child with the same relationship. now what i want, using Cypher, i hit the node 1 and it should response with the whole data of child node and there child node and so on, relationship between nodes are "Parent_of" relationship.]
can anyone help me on this.
start n=node(1) // use the id, or find it using an index
match n-[:parent_of*0..]->m
return m
will get you all the graph nodes in m. You could also take m.some_property instead of m if you don't want the node itself, but some property that is stored in your nodes.
Careful though, as the path has no limit, this query could become pretty huge in a large graph.
You can see an example of *0.. here: http://gist.neo4j.org/?6608600

How to store and retrieve different types of Vertices with the Tinkerpop/Blueprints graph API?

When looking at the Tinkerpop-Blueprints API it is quite straight forward to use one type of vertices but how can I store two? E.g. Users and their interests?
And how can I get a Vertex by id? I mean, there could be a user named 'timetabling' as well as the interests 'timetabling' - how to handle that id conflict?
-
I know that the first problem could be solved via introducing an index for a type-property and for the second problem I could auto generate the id and create another index for the name-property. BUT why would I then need the vertex id at all? E.g. for the in-memory there is a HashMap for all vertices which would be of no use and wasting memory! (I could solve the problem differently via combining type and name as the id but then it would inefficient if I e.g. list all users.)
Hmmh, ok. I'm just using the vertices for the combined id (name+type) and a separate index for type. Better solutions?
In general it is best to rely on the automatic ID system of the underlying graph database (e.g. Neo4j, InfiniteGraph, OrientDB, etc.). The way in which you would add the information you want is as follows:
Vertex v = graph.addVertex(null)
v.setProperty("name","timetabling")
Vertex marko = graph.addVertex(null)
graph.addEdge(null, marko, v, "hasInterest")
Verte aType = graph.addVertex(null)
graph.addEdge(null, aType, v, "hasType")
In short, the ID of a vertex/edge is a non-domain-specific way of retrieving vertices/edges. Generally, it is best to use properties in your domain model for indexing.
Hope that speaks to your question,
Marko.
http://markorodriguez.com

Resources