In Apache Tinkerpop Gremlin load csv file(text file)? - gremlin

I am using the following script in gremlin to create a graph by using CSV file (text file):
code:
g = TinkerGraph.open()
vs=[ ] as Set
new File("edges.txt").eachLine{l->p=l.split("\t");vs<<p[0];vs<<p1;}
vs.each{v->g.addVertex(v)}
new File("edges.txt").eachLine{l->p=l.split("\t");g.addEdge(g.getVertex(p[0]),g.getVertex(p1),"friend")}
g.E
When the above code is applied these errors occur:
No signature of method: org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph.getVertex() is applicable for argument types: (String) values: 1
Possible solutions: addVertex([Ljava.lang.Object;), getAt(java.lang.String)
Note that csv file("edges.txt") is like this:
Source Destination
2 4
3 5
2 8

I changed your code to use the Traversal API, as there is no addEdge method exported by TinkerGraph (addEdge is actually a method of the Vertex class). In general using the Traversal API is the way to go for application code. The Graph API is more for backend developers these days (database providers).
g = TinkerGraph.open().traversal()
vs=[ ] as Set
new File(edges.txt").eachLine{l->p=l.split(",");vs<<p[0];vs<<p1;}
vs.each{v->g.addV().property(T.id,v).next()}
new File("edges.txt").eachLine{l->p=l.split(",");g.addE("friend").from(V(p[0])).to(V(p[1])).next()}
I ran this in the gremlin console and after all the steps above have been run, we can see the graph has been created. Note I changed the separator to be a comma, just as it was easier to type when I was creating the test file.
gremlin> g.E()
==>e[0][2-friend->4]
==>e[1][3-friend->5]
==>e[2][2-friend->8]

Related

Gremlin: Adding Edges to Graph Over HTTP using Vertex Variables

I am trying to execute a gremlin script over https against a remote JanusGraph instance. I have filtered my problem to the part where I am trying to add an edge using vertex variables. I am trying to add two vertices, assign the results to a variable and use them to add an edge. Also I am also tying to avoid a single line script like g.V().addV(..).aaddV(..).addE(..), because of the program logic that is behind the script
The following gremlin works in the gremlin console (remote session)
def graph=ConfiguredGraphFactory.open("ga");
def g = graph.traversal();
v1=g.addV('node1');
v2=g.addV('node2');
v1.addE('test').to(v2);
But when I try to do the same over https (issued against a compose-janusgraph server), I get an error. I did add .iterate() to the addV() and the vertexes are getting added if I remove the addE(..) line. But when I try
{"gremlin":"def graph=ConfiguredGraphFactory.open('ga');
def g = graph.traversal();
v1=g.addV('node16').property('name','testn16').iterate();
v2=g.addV('node17').property('name','testn2').iterate();
v1.addE('test18').to(v2);
g.tx().commit()"}
I get the exception
The traversal strategies are complete and the traversal can no longer
be modulated","Exception-Class":"java.lang.IllegalStateException"
Also note that I am joining the whole gremlin into one single line before sending it over curl. I have split them to newlines here for readability. Any help would be great. -- Thank you
iterate() doesn't return a Vertex...it just iterates the traversal to generate side-effects (i.e. the graph gets a vertex added but no result is returned). You probably just need to do:
{"gremlin":"graph=ConfiguredGraphFactory.open('ga');
g = graph.traversal();
g.addV('node16').property('name','testn16').as('v1').
addV('node17').property('name','testn2').as('v2').
addE('test18').from('v1').to('v2').iterate();
g.tx().commit()"}

Printing/Fetching Vertex values from a path

Just getting started with gremlin.
Printing out all the Vertex values worked out fine
gremlin> g.V().values()
==>testing 2
==>Cash Processing
==>Sales
==>Marketing
==>Accounting
I was able to find all the directly connected path between my Vertices.
gremlin> g.V().hasLabel('Process')
.repeat(both().simplePath())
.until(hasLabel('Process'))
.dedup().path()
==>[v[25],v[28]]
==>[v[25],v[26]]
==>[v[26],v[27]]
==>[v[26],v[25]]
Now am trying to print out the values in the path like ['Sales', 'Accounting'] instead of [v[25],v[28]]
Not been able to figure out a way yet
Already tried and failed with
Unfold: Does not get me 1-1 mapping
gremlin> g.V().hasLabel('Process').repeat(both().simplePath()).until(hasLabel('Process')).dedup().path().unfold().values()
==>Cash Processing
==>Accounting
==>Cash Processing
==>Sales
==>Sales
==>Marketing
==>Sales
==>Cash Processing
Path seems to be of a different data-type and does not support .values() function
gremlin> g.V().hasLabel('Process')
.repeat(both().simplePath())
.until(hasLabel('Process'))
.dedup().path().values()
org.apache.tinkerpop.gremlin.process.traversal.step.util.ImmutablePath cannot be cast to org.apache.tinkerpop.gremlin.structure.Element
Tried the following google searches and didnt get the answer
gremlin print a path
gremlin get values in a path
and few more word twisting
Found one at here that was for java but that didnt work for me
l = []; g.V().....path().fill(l)
(but cant create list, Cannot set readonly property: list for class: org.apache.tinkerpop.gremlin.structure.VertexProperty$Cardinality
)
I have running it on Gremlin console (running ./gremlin.sh)
You can use the by step to modulate the elements inside the path. For example by supplying valueMap(true) to by you get the properties of the vertices, together with the vertex labels and their ids:
gremlin> g.V().repeat(both().simplePath()).times(1).dedup().path().by(valueMap(true))
==>[[id:1,name:[marko],label:person,age:[29]],[id:3,name:[lop],lang:[java],label:software]]
==>[[id:1,name:[marko],label:person,age:[29]],[id:2,name:[vadas],label:person,age:[27]]]
==>[[id:1,name:[marko],label:person,age:[29]],[id:4,name:[josh],label:person,age:[32]]]
==>[[id:2,name:[vadas],label:person,age:[27]],[id:1,name:[marko],label:person,age:[29]]]
==>[[id:3,name:[lop],lang:[java],label:software],[id:6,name:[peter],label:person,age:[35]]]
==>[[id:4,name:[josh],label:person,age:[32]],[id:5,name:[ripple],lang:[java],label:software]]
I used the modern graph which is one of TinkerPop's toy graphs that are often used for such examples. Your output will look a bit different and you may want to use something else than valueMap(true) for the by modulator. The TinkerPop documentation of the path step itself contains two more advanced examples for path().by() that you might want to check out.

Creating graph in titan from data in csv - example wiki.Vote gives error

I am new to Titan - I loaded titan and successfully ran GraphOfTheGods example including queries given. Next I went on to try bulk loading csv file to create graph and followed steps in Powers of ten - Part 1 http://thinkaurelius.com/2014/05/29/powers-of-ten-part-i/
I am getting an error in loading wiki-Vote.txt
gremlin> g = TitanFactory.open("/tmp/1m") Backend shorthand unknown: /tmp/1m
I tried:
g = TitanFactory.open('conf/titan-berkeleydb-es.properties’)
but get an error in the next step in load-1m.groovy
==>titangraph[berkeleyje:/titan-0.5.4-hadoop2/conf/../db/berkeley] No signature of method: groovy.lang.MissingMethodException.makeKey() is applicable for argument types: () values: [] Possible solutions: every(), any()
Any hints what to do next? I am using groovy for the first time. what kind of groovy expertise needed for working with gremlin
That blog post is meant for Titan 0.4.x. The API shifted when Titan went to 0.5.x. The same principles discussed in the posts generally apply to data loading but the syntax is different in places. The intention is to update those posts in some form when Titan 1.0 comes out with full support of TinkerPop3. Until then, you will need to convert those code examples to the revised API.
For example, an easy way to create a berkeleydb database is with:
g = TitanFactory.build()
.set("storage.backend", "berkeleyje")
.set("storage.directory", "/tmp/1m")
.open();
Please see the docs here. Then most of the schema creation code (which is the biggest change) is now described here and here.
After much experimenting today, I finally figured it out. A lot of changes were needed:
Use makePropertyKey() instead of makeKey(), and makeEdgeLabel() instead of makeLabel()
Use cardinality(Cardinality.SINGLE) instead of unique()
Building the index is quite a bit more complicated. Use the management system instead of the graph both to make the keys and labels, as well as build the index (see https://groups.google.com/forum/#!topic/aureliusgraphs/lGA3Ye4RI5E)
For posterity, here's the modified script that should work (as of 0.5.4):
g = TitanFactory.build().set("storage.backend", "berkeleyje").set("storage.directory", "/tmp/1m").open()
m = g.getManagementSystem()
k = m.makePropertyKey('userId').dataType(String.class).cardinality(Cardinality.SINGLE).make()
m.buildIndex('byId', Vertex.class).addKey(k).buildCompositeIndex()
m.makeEdgeLabel('votesFor').make()
m.commit()
getOrCreate = { id ->
def p = g.V('userId', id)
if (p.hasNext()) {
p.next()
} else {
g.addVertex([userId:id])
}
}
new File('wiki-Vote.txt').eachLine {
if (!it.startsWith("#")){
(fromVertex, toVertex) = it.split('\t').collect(getOrCreate)
fromVertex.addEdge('votesFor', toVertex)
}
}
g.commit()

RNeo4j cypher - retrieving paths

I'm trying to extract a sub-graph from a global network (sub-networks of specific nodes to a specific depth).
The network is composed of nodes labeled as Account with a property of iban and relationships of TRANSFER_TO_AGG.
The cypher syntax is as followed:
MATCH (a:Account { iban :'FR7618206004274157697300156' }),(b:Account),
p = allShortestPaths((a)-[:TRANSFER_TO_AGG*..3]-(b))
RETURN p limit 250
This works perfectly on the Neo4J web interface. However, when trying to save the results to an R object using the command cypher I get the following error:
"Error in as.data.frame.list(value, row.names = rlabs) :
supplied 92 row names for 1 rows"
I believe this is due to the fact that if returning data, you can only query for tabular results. That is, this method has no current functionality for Cypher results containing array properties, collections, nodes, or relationships.
Can anyone offer a solution ?
I've recently added functionality for returning pathways as R objects. First, uninstall / reinstall RNeo4j. Then, see:
?getSinglePath
?getPaths
?shortestPath
?allShortestPaths
?nodes
?rels
?startNode
?endNode
For your query specifically, you would use getPaths():
library(RNeo4j)
graph = startGraph("http://localhost:7474/db/data/")
query = "
MATCH (a:Account { iban :'FR7618206004274157697300156' }),(b:Account),
p = allShortestPaths((a)-[:TRANSFER_TO_AGG*..3]-(b))
RETURN p limit 250
"
p = getPaths(graph, query)
p is a list of path objects. See the docs for examples of using the apply family of functions with a list of path objects.

Issue while ingesting a Titan graph into Faunus

I have installed both Titan and Faunus and each seems to be working properly (titan-0.4.4 & faunus-0.4.4)
However, after ingesting a sizable graph in Titan and trying to import it in Faunus via
FaunusFactory.open( )
I am experiencing issues. To be more precise, I do seem to get a faunus graph from the call FaunusFactory.open( ),
faunusgraph[titanhbaseinputformat->titanhbaseoutputformat]
but then, even asking a simple
g.v(10)
I do get this error:
Task Id : attempt_201407181049_0009_m_000000_0, Status : FAILED
com.thinkaurelius.titan.core.TitanException: Exception in Titan
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.getAdminInterface(HBaseStoreManager.java:380)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.ensureColumnFamilyExists(HBaseStoreManager.java:275)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.openDatabase(HBaseStoreManager.java:228)
My property file is taken straight out of the Faunus page with Titan-HBase input, except of course changing the url of the hadoop cluster:
faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname= my IP
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
faunus.graph.output.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseOutputFormat
faunus.graph.output.titan.storage.backend=hbase
faunus.graph.output.titan.storage.hostname= IP of my host
faunus.graph.output.titan.storage.port=2181
faunus.graph.output.titan.storage.tablename=titan
faunus.graph.output.titan.storage.batch-loading=true
faunus.output.location=output1
zookeeper.znode.parent=/hbase-unsecure
titan.graph.output.ids.block-size=100000
Anyone can help?
ADDENDUM:
To address the comment below, here is some context: as I have mentioned, I have a graph in Titan and can perform basic gremlin queries on it.
However, I do need to run a gremlin global query which, due to the size of the graph, needs Faunus and its underlying MR capabilities. Hence the need to import it. The error I get doesn't look to me as if it points to some inconsistency in the graph itself.
I'm not sure that you have your "flow" of Faunus right. If your end result is to do a global query of the graph, then consider this approach:
pull your graph to sequence file
issue your global query over the sequence file
More specifically create hbase-seq.properties:
# input graph parameters
faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname=localhost
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
# hbase.mapreduce.scan.cachedrows=1000
# output data (graph or statistic) parameters
faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=snapshot
faunus.output.location.overwrite=true
In Faunus, copy do:
g = FaunusFactory.open('hbase-seq.properties')
g._()
That will read the graph from hbase and write it to sequence file in HDFS. Next, create: seq-noop.properties with these contents:
# input graph parameters
faunus.graph.input.format=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat
faunus.input.location=snapshot/job-0
# output data parameters
faunus.graph.output.format=com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=analysis
faunus.output.location.overwrite=true
The above configuration will read your sequence file from the previous step and without re-writing the graph (that's what NoOpOutputFormat is for). Now in Faunus do:
g = FaunusFactory.open('seq-noop.properties')
g.V.sideEffect('{it.degree=it.bothE.count()}').degree.groupCount()
This will execute a degree distribution, writing the results in HDFS to the 'analysis' directory. Obviously you can do whatever Faunus-flavored Gremlin you want here - I just wanted to provide an example. I think this is a pretty standard "flow" or pattern for using Faunus from a graph analysis perspective.

Resources