I'm creating a Titan graph (backed by Dynamodb); I'm using Titan 1.0.0 and running Gremlin-Server 3 (on TinkerPop3).
I'm trying to add a vertex to my graph with a label and multiple properties in a single line. I'm able to add a vertex with a label and a single property, and I can add multiple properties to a vertex after it has been created, but it seems that I can't do it all at once.
For testing I'm running commands in the gremlin shell, but the end use case is interacting with it via REST api (which is already working fine).
As a note, I'm rolling back after each of these transactions so I have a clean slate.
Here is how I'm initiating my session:
gremlin> graph = TitanFactory.open('conf/gremlin-server/dynamodb.properties')
==>standardtitangraph[com.amazon.titan.diskstorage.dynamodb.DynamoDBStoreManager:[127.0.0.1]]
gremlin> g = graph.traversal()
==>graphtraversalsource[standardtitangraph[com.amazon.titan.diskstorage.dynamodb.DynamoDBStoreManager:[127.0.0.1]], standard]
I can create a vertex with a label and a single property like this:
gremlin> graph.addVertex('date_of_birth').property('date_of_birth','1949-01-01')
==>vp[date_of_birth->1949-01-01]
gremlin> g.V().hasLabel('date_of_birth').has('date_of_birth','1949-01-01').valueMap()
==>[date_of_birth:[1949-01-01]]
I can also create a vertex and then append many properties afterward with a traversal starting at the vertex I just created:
gremlin> v1 = graph.addVertex('date_of_birth')
==>v[409608296]
gremlin> g.V(v1).property('date_of_birth','1949-01-01').property('year_of_birth',1949).property('date_of_birth','1949-01-01').property('day_of_birth',1).property('age',67).property('month_of_birth',1)
==>v[409608296]
gremlin> g.V(v1).valueMap()
==>[day_of_birth:[1], date_of_birth:[1949-01-01], month_of_birth:[1], age:[67], year_of_birth:[1949]]
This is all well and good, but I'm trying to avoid making 2 calls to achieve this result, so I'd like to create the vertex with all of these properties at once. Essentially, I want to be able to do something like the following, but it fails with more than 1 .property():
gremlin> graph.addVertex('date_of_birth').property('date_of_birth','1949-01-01').property('year_of_birth',1949).property('date_of_birth','1949-01-01').property('day_of_birth',1).property('age',67).property('month_of_birth',1)
No signature of method: com.thinkaurelius.titan.graphdb.relations.SimpleTitanProperty.property() is applicable for argument types: (java.lang.String, java.lang.String) values: [date_of_birth, 1949-01-01]
I've also tried using 1 .property() with multiple properties (along with all other syntax variations I could think of), but it only seems to catch the first one:
gremlin> graph.addVertex('date_of_birth').property('date_of_birth','1949-01-01','year_of_birth',1949,'date_of_birth','1949-01-01','day_of_birth',1,'age',67,'month_of_birth',1)
gremlin> g.V().hasLabel('date_of_birth').has('date_of_birth','1949-01-01').valueMap()
==>[date_of_birth:[1949-01-01]]
I've looked through all of the documentation I can get my hands on from all sources I can find and I can't find anything on this "all at once" method. Has anyone done this before or know how it could be done?
Thanks in advance!
As described in Chapter 3 Getting Started of the Titan docs, the GraphOfTheGodsFactory.java source code shows how to add a vertex with a label and multiple properties.
saturn = graph.addVertex(T.label, "titan", "name", "saturn", "age", 10000);
The method addVertex(Object... keyValues) ultimately comes from Graph interface defined by Apache TinkerPop. Titan 1.0.0 uses TinkerPop 3.0.1, and you can find more documentation on the addVertex step (and many other steps) in the TinkerPop docs.
Related
I am completely new to Gremlin and have some really old code that is using addInE() and addOutE(). I understand that it is deprecated as of release 3.1.0 and - according to the javadocs - should be replaced with addE().
My problem is that I have very little knowledge of Gremlin in general and found almost no documentation for the addInE() and addOutE() steps.
In the reference documentation for version 3.0.0 there is exactly one example where it is used, but not explained.
Here is the example that is given:
gremlin> g.V(1).as('a').out('created').in('created').where(neq('a')).addOutE('co-developer','a','year',2009) //(1)
==>e[12][4-co-developer->1]
==>e[13][6-co-developer->1]
gremlin> g.withSideEffect('a',g.V(3,5).toList()).V(4).addInE('createdBy','a') //(2)
==>e[14][3-createdBy->4]
==>e[15][5-createdBy->4]
gremlin> g.V().as('a').out('created').as('b').select('a','b').addOutE('b','createdBy','a','acl','public') //(3)
==>e[16][3-createdBy->1]
==>e[17][5-createdBy->4]
==>e[18][3-createdBy->4]
==>e[19][3-createdBy->6]
gremlin> g.V(1).as('a').out('knows').addInE('livesNear','a','year',2009).inV().inE('livesNear').values('year') //(4)
==>2009
==>2009
My current interpretation of the first query
g.V(1).as('a').out('created').in('created').where(neq('a')) selects elements from the graph
addOutE('co-developer','a','year',2009) will add something to the selection
I would appreciate if someone could first elaborate on what is happening here and then point out how addInE() and addOutE() could be represented using addE().
This is a trip down memory lane!
Using one of the examples you found
gremlin> g.V(1).as('a').out('created').in('created').where(neq('a')).addOutE('co-developer','a','year',2009)
would, in current day Gremlin be written as
g.V(1).as('a').
out('created').
in('created').
where(neq('a')).
addE('co-developer').to('a').property('year',2009)
The way to read this is
Starting at the vertex with an ID of one.
Find all the vertices connected to V(1) by outgoing 'created' edges
Find all the people who also created the same thing
Don't include where you started (ie ignore yourself)
Add a new 'co-developer' edge to V(1) from the people found, with an edge property of the year.
When replacing addInE still use the addE step, but replace the to with a from. Note that an addE can also have both from and to used with it at the same time.
I'm using Amazon Neptune for a personal project and I'm having trouble writing a query. I'm using Gremlin-Java to query and mutate the graph.
The best way I can think of to represent the desired outputs for different inputs is with diagrams, so below is the problem statement in diagram form, broken into 3 cases. Lastly, a full solution isn't necessary (though would be accepted) - even a nudge in the right direction would be much appreciated!
Here's the code I have that should perform this mutation, but it must have a flaw (or multiple flaws!) because it doesn't mutate the graph at all in Case 1 and I've thus been unable to test Case 2. Case 3, Case 4, and Case 5 would also not exhibit the desired behavior with this code.
void createShard(String username, String shardName, Set<String> inheritedShardNames, Set<String> inheritedUsers) {}
g.V()
.hasLabel("shard)
.has("shardName", P.within(inheritedShardNames))
.as("inheritedShards")
.V()
.hasLabel("user")
.has("username", P.within(inheritedUsers))
.as("inheritedUsers")
.V()
.has("user", "username", username)
.as("user")
.addV("shard)
.property(single, "shardName", shardName)
.property(single, "createdAt", new Date())
.as("newShard")
.addE("shardInheritsShard").from("newShard").to("inheritedShards")
.addE("shardInheritsUser").from("newShard").to("inheritedUsers")
.addE("userOwnsShard").from("user").to("newShard")
.addE("userFollowsShard").from("user").to("newShard")
.iterate();
}
Case 1
Function call:
createShard(
"john",
"newShard",
new HashSet<>(),
new HashSet<>()
)
Initial Graph State UML
Desired Final Graph State UML
Case 2
Function call:
createShard(
"john",
"newShard",
new HashSet<>(Arrays.asList("firstShardToInherit", "secondShardToInherit")),
new HashSet<>(Arrays.asList("john", "userToInherit"))
)
Initial Graph State UML
Desired Final Graph State UML
Case 3
Function call:
createShard(
"john",
"newShard",
new HashSet<>(Arrays.asList("userWhoDoesNotExist")),
new HashSet<>(),
)
Initial Graph State UML
Desired Final Graph State: Same as initial graph state. Would be ideal if the mutation query would throw an exception like java.util.NoSuchElementException or give some other indication that the graph wasn't mutated by the query.
Case 4
Function call:
createShard(
"john",
"newShard",
new HashSet<>(Arrays.asList()),
new HashSet<>(Arrays.asList("shardWhichDoesNotExist"))
)
Initial Graph State UML
Desired Final Graph State: Same as initial graph state. Would be ideal if the mutation query would throw an exception like java.util.NoSuchElementException or give some other indication that the graph wasn't mutated by the query.
Case 5
Function call:
createShard(
"john",
"existingShardName",
new HashSet<>(),
new HashSet<>()
)
Initial Graph State UML
Desired Final Graph State: Same as initial graph state. Would be ideal if the mutation query would throw an exception or give some other indication that the graph wasn't mutated by the query because a vertex with the same label and "shardName" property already exists.
This is a complex question but I think your problems tend to stem from assuming that more your traversal will execute than what actually is. Let's take "Case 1" as an example where it starts with just one "user" vertex in existence with the "name" of "john". Your traversal begins as:
g.V().hasLabel("shard).
has("shardName", P.within(inheritedShardNames)).as("inheritedShards")
As there are now "shard" vertices in the graph, the traversal immediately produces no traversers (i.e. objects that travel in the stream to carry the data) and immediately terminates as there is nothing to trigger downstream steps. It seems that you want some form of upsert or "get or create" sort of pattern to solve your problem.
A quick example in Gremlin Console demonstrates the issue:
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().hasLabel('country').addV('person')
gremlin> g.V().hasLabel('country').fold().addV('person')
==>v[13]
The first g.V().hasLabel('country').addV('person') does nothing because there are no "country" vertices in the graph and therefore addV() is never triggered. On the other hand, the traversal following that one does add a vertex because calling fold() is a reducing operation that will produce a traverser carrying a List object that is empty, which in turn enables addV() to be called.
The patterns for upsert-style traversals with Gremlin are described in a variety of places. Please see:
https://stackoverflow.com/a/49758568/1831717
https://tinkerpop.apache.org/docs/current/recipes/#element-existence
This can be a bit problem occuring for me as i am working for more time than needed . Can you tell me can i add an edge between two vertices and here i have 50 vertices and i cant find a way to add edge on it having vertex id 0 to 49 . Till now I have use
gremlin> (0..<50)each{g.addEdge(V[it],V[it+1]).next()}
No such property: V for class: groovysh_evaluate
gremlin> (0..<=49)each{g.addEdge(g.getVertex([NodeID]),g.getVertex([NodeID+1]),'abc')}
groovysh_parse: 2: unexpected token: = # line 2, column 6.
(0..<=49)each{g.addEdge(g.getVertex([NodeID]),g.getVertex([NodeID+1]),'abc')}
^
1 error
It looks like you just want to iterate through the vertices and add an edge from one vertex to the next until they are all connected. First, I'll create the 50 vertices:
gremlin> g.inject((0..<50).toArray()).as('i').addV('myid',select('i')).iterate()
Then I'll add the edges:
gremlin> (0..<49).each { def v = g.V().has('myid',(long) it).next(); v.addEdge('knows',g.V().has('myid',(long)it+1).next()) }
I cast to "long" in my example above as I was using a TinkerGraph. That cast may not be necessary for dynamo. Note that you can combine all of this into a single line with:
gremlin> g.addV().repeat(__.as('a').addV().as('b').
select(last,'a','b').
addE('.').from('a').to('b').
inV().as('a')).
times(49)
The above will create both the vertices and the edges at the same time in an iterative fashion. Note that "49" represents the number of edges you'd like to have.
You have spread this same question across multiple tags in StackOverflow including: here and here. In all cases you have lots of basic syntax errors and are calling methods that don't exist and referencing objects that don't exist. I suggest you focus on more of the basics of Java and Groovy before digging too deeply into dynamodb and TinkerPop. At a mimimum, start with the TinkerPop tutorials (like the one mentioned in the comment to your question) to get a better feel for the APIs and how the programmign syntax.
What is the easiest & most efficient way to count the number of nodes/edges in a large graph via Gremlin? The best I have found is using the V iterator:
gremlin> g.V.gather{it.size()}
However, this is not a viable option for large graphs, per the documentation for V:
The vertex iterator for the graph. Utilize this to iterate through all
the vertices in the graph. Use with care on large graphs unless used
in combination with a key index lookup.
I think the preferred way to do a count of all vertices would be:
gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.V.count()
==>6
gremlin> g.E.count()
==>6
though, I think that on a very large graph g.V/E just breaks down no matter what you do. On a very large graph the best option for doing a count is to use a tool like Faunus(http://thinkaurelius.github.io/faunus/) so that you can leverage the power of Hadoop to do the counts in parallel.
UPDATE: The original answer above was for TinkerPop 2.x. For TinkerPop 3.x the answer is largely the same and implies use of Gremlin Spark or some provider specific tooling (like DSE GraphFrames for DataStax Graph) that is optimized to do those kinds of large scale traversals.
I tried the above, it didn't work for me. For some of you, this may work:
gremlin> g.V.count()
{"detailedMessage":"Query parsing failed at line 1, character position at 3, error message : no viable alternative at input 'g.V.'","code":"MalformedQueryException","requestId":"99f749db-c240-9834-aa12-e17bb21e598e"}
Type ':help' or ':h' for help.
Display stack trace? [yN]
gremlin> g.V().count()
==>37
gremlin> g.E().count()
==>45
gremlin>
Use g.V().count instead of g.V.count(). (For those where the other command errors out).
via python:
from gremlin_python.structure.graph import Graph
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
graph = Graph()
graph_db_uri = 'ws://localhost/gremlin'
g = graph.traversal().withRemote(DriverRemoteConnection(graph_db_uri,'g'))
count=g.V().hasLabel('node_label').count().next()
print("vertex count: ",count)
count=g.E().hasLabel('edge_label').count().next()
print("edge count: ",count)
I need to use Gremlin find the shortest path between two nodes (vertices) while avoiding a list of given vertices.
I already have:
v.bothE.bothV.loop(2){!it.object.equals(y)}.paths>>1
To get my shortest path.
I was attempting something like:
v.bothE.bothV.filter{it.name!="ignored"}.loop(3){!it.object.equals(y)}.paths>>1
but it does not seem to work.
Please HELP!!!
The second solution you have looks correct. However, to be clear on what you are trying to accomplish. If x and y are the vertices that you want to find the shortest path between and a vertex to ignore during the traversal if it has the property name:"ignored", then the query is:
x.both.filter{it.name!="ignored"}.loop(2){!it.object.equals(y)}.paths>>1
If the "list of given vertices" you want filtered is actually a list, then the traversal is described as such:
list = [ ... ] // construct some list
x.both.except(list).loop(2){!it.object.equals(y)}.paths>>1
Moreover, I tend to use a range filter just to be safe as this will go into an infinite loop if you forget the >>1 :)
x.both.except(list).loop(2){!it.object.equals(y)}[1].paths>>1
Also, if there is a potential for no path, then to avoid an infinitely long search, you can do a loop limit (e.g. no more than 4 steps):
x.both.except(list).loop(2){!it.object.equals(y) & it.loop < 5}.filter{it.object.equals(y)}.paths>>1
Note why the last filter step before paths is needed. There are two reasons the loop is broken out of. Thus, you might not be at y when you break out of the loop (instead, you broke out of the loop because it.loops < 5).
Here is you solution implemented over the Grateful Dead graph distributed with Gremlin. First some set up code, where we load the graph and define two vertices x and y:
gremlin> g = new TinkerGraph()
==>tinkergraph[vertices:0 edges:0]
gremlin> g.loadGraphML('data/graph-example-2.xml')
==>null
gremlin> x = g.v(89)
==>v[89]
gremlin> y = g.v(100)
==>v[100]
gremlin> x.name
==>DARK STAR
gremlin> y.name
==>BROWN EYED WOMEN
Now your traversal. Note that there is not name:"ignored" property, so instead, I altered it to account for the number of performances of each song along the path. Thus, shortest path of songs played more than 10 times in concert:
gremlin> x.both.filter{it.performances > 10}.loop(2){!it.object.equals(y)}.paths>>1
==>v[89]
==>v[26]
==>v[100]
If you use Gremlin 1.2+, then you can use a path closure to provide the names of those vertices (for example) instead of just the raw vertex objects:
gremlin> x.both.filter{it.performances > 10}.loop(2){!it.object.equals(y)}.paths{it.name}>>1
==>DARK STAR
==>PROMISED LAND
==>BROWN EYED WOMEN
I hope that helps.
Good luck!
Marko.