Faunus graph not printing nodes without using side effect from gremlin shell - gremlin

I'm trying to print a graph in Faunus (v0.4.0) where a node has any edges (incoming or outgoing). From the gremlin shell, I tried:
g = FaunusFactory.open('faunus.properties')
g.V.filter("{it.bothE.hasNext()}").sideEffect("{println it}")
When I do this, I get a printout of all the nodes as I expected
But without the println, I do not.
According to How do I write a for loop in gremlin?, the gremlin terminal should print this info out for me, but it does not seem to.
Is there something specific I need to do to enable the printing from the console?

Faunus and Gremlin are close to each other in terms of purpose and functionality but not identical. The filter isn't producing a side-effect, which will be written to HDFS. If you did:
g.V.filter("{it.bothE.hasNext()}").id
You could then view the list of ids matching that filter with something like:
hdfs.head('output',100)
to see the first 100 lines of the output. If you need more than just the element identifier you could do a transform to get some of the element properties in there as well. You might find these hdfs helper tips helpful.

Related

hasNot() having no effect

I'm using gremlin traversals via a Jupiter Notebook on Amazon Neptune.
I'm trying to filter edges from a specific vertex by their label, but it doesn't seem to work.
some sample data:
%%gremlin
g.addV().property(id, 'u0').as('u0').
addV().property(id, 'u1').as('u1').
addV().property(id, 'u2').as('u2').
addV().property(id, 'u3').as('u3').
addE('freind').
from('u0').
to('u1').
addE('buddy').
from('u0').
to('u2').
addE('foe').
from('u0').
to('u3').
iterate()
and my query:
(It is more complex than needed for this example, but my actual query repeats several times, therefore I can't simply use has('friend').has('buddy') because the next step has other labels.)
%%gremlin
g.withSack(1.0f).V('u0')
.repeat(
bothE().hasNot('foe')
.bothV())
.times(1)
.path().by().by(label)
output:
path[v[u0], freind, v[u1]]
path[v[u0], buddy, v[u2]]
path[v[u0], foe, v[u3]]
I have a user I start with (u0) and want all user who are his friends, buddies, and so on, but not his foes.
unfortunately its not filtering as its supposed to...
any Idea what I'm doing wrong?
The hasNot() step will only filter out elements that have a property with the specified name, in this case a property named foe. Instead, you should look at using not() with hasLabel() to find items that do not have a specific label, as shown here:
g.withSack(1.0f).V('u0')
.repeat(
bothE().not(hasLabel('foe'))
.bothV())
.times(1)
.path().by().by(label)

Assign query using 'match()' to subgraph

I have a JanusGraph database with a graph structure as follows:
(Paper)<-[AuthorOf]-(Author)
I'm want to use Gremlin's match clause to query the data and assign the results to a subgraph. This is what I have so far:
g.V().match(
__.as('a').has('Paper','paperTitle', 'The name of my paper'),
__.as('a').in('AuthorOf').outV().as('b')).
select('b').values()
This query returns what I want, the Authors of the paper I'm for which I'm searching. However, I want to assign the results to a subgraph so I can export it using:
sg.io(IoCore.graphml()).writeGraph("/home/ubuntu/myresults.graphml")
Previously, I've achieved this with a different query structure like this:
sg = g.V().has('paperTitle', 'The name of my paper').
inE('AuthorOf').subgraph('sg1').
outV().
cap('sg1').
next()
Is there away to achieve the same results using the 'match()' statement?
After a little trial and error I was able to create a working solution:
sg = g.V().match(
__.as('a').has('Paper','paperTitle', 'ladle pouring guide'),
__.as('a').inE('AuthorOf').subgraph('sg').outV().as('b')).
cap('sg').next()
At first, I was trying to use the 'select' statement to isolate the subgraph. After reviewing the documentation on 'subgraph' and learning more about side-effects in gremlin I realized it wasn't necessary.

Gremlin: Adding Edges to Graph Over HTTP using Vertex Variables

I am trying to execute a gremlin script over https against a remote JanusGraph instance. I have filtered my problem to the part where I am trying to add an edge using vertex variables. I am trying to add two vertices, assign the results to a variable and use them to add an edge. Also I am also tying to avoid a single line script like g.V().addV(..).aaddV(..).addE(..), because of the program logic that is behind the script
The following gremlin works in the gremlin console (remote session)
def graph=ConfiguredGraphFactory.open("ga");
def g = graph.traversal();
v1=g.addV('node1');
v2=g.addV('node2');
v1.addE('test').to(v2);
But when I try to do the same over https (issued against a compose-janusgraph server), I get an error. I did add .iterate() to the addV() and the vertexes are getting added if I remove the addE(..) line. But when I try
{"gremlin":"def graph=ConfiguredGraphFactory.open('ga');
def g = graph.traversal();
v1=g.addV('node16').property('name','testn16').iterate();
v2=g.addV('node17').property('name','testn2').iterate();
v1.addE('test18').to(v2);
g.tx().commit()"}
I get the exception
The traversal strategies are complete and the traversal can no longer
be modulated","Exception-Class":"java.lang.IllegalStateException"
Also note that I am joining the whole gremlin into one single line before sending it over curl. I have split them to newlines here for readability. Any help would be great. -- Thank you
iterate() doesn't return a Vertex...it just iterates the traversal to generate side-effects (i.e. the graph gets a vertex added but no result is returned). You probably just need to do:
{"gremlin":"graph=ConfiguredGraphFactory.open('ga');
g = graph.traversal();
g.addV('node16').property('name','testn16').as('v1').
addV('node17').property('name','testn2').as('v2').
addE('test18').from('v1').to('v2').iterate();
g.tx().commit()"}

Printing/Fetching Vertex values from a path

Just getting started with gremlin.
Printing out all the Vertex values worked out fine
gremlin> g.V().values()
==>testing 2
==>Cash Processing
==>Sales
==>Marketing
==>Accounting
I was able to find all the directly connected path between my Vertices.
gremlin> g.V().hasLabel('Process')
.repeat(both().simplePath())
.until(hasLabel('Process'))
.dedup().path()
==>[v[25],v[28]]
==>[v[25],v[26]]
==>[v[26],v[27]]
==>[v[26],v[25]]
Now am trying to print out the values in the path like ['Sales', 'Accounting'] instead of [v[25],v[28]]
Not been able to figure out a way yet
Already tried and failed with
Unfold: Does not get me 1-1 mapping
gremlin> g.V().hasLabel('Process').repeat(both().simplePath()).until(hasLabel('Process')).dedup().path().unfold().values()
==>Cash Processing
==>Accounting
==>Cash Processing
==>Sales
==>Sales
==>Marketing
==>Sales
==>Cash Processing
Path seems to be of a different data-type and does not support .values() function
gremlin> g.V().hasLabel('Process')
.repeat(both().simplePath())
.until(hasLabel('Process'))
.dedup().path().values()
org.apache.tinkerpop.gremlin.process.traversal.step.util.ImmutablePath cannot be cast to org.apache.tinkerpop.gremlin.structure.Element
Tried the following google searches and didnt get the answer
gremlin print a path
gremlin get values in a path
and few more word twisting
Found one at here that was for java but that didnt work for me
l = []; g.V().....path().fill(l)
(but cant create list, Cannot set readonly property: list for class: org.apache.tinkerpop.gremlin.structure.VertexProperty$Cardinality
)
I have running it on Gremlin console (running ./gremlin.sh)
You can use the by step to modulate the elements inside the path. For example by supplying valueMap(true) to by you get the properties of the vertices, together with the vertex labels and their ids:
gremlin> g.V().repeat(both().simplePath()).times(1).dedup().path().by(valueMap(true))
==>[[id:1,name:[marko],label:person,age:[29]],[id:3,name:[lop],lang:[java],label:software]]
==>[[id:1,name:[marko],label:person,age:[29]],[id:2,name:[vadas],label:person,age:[27]]]
==>[[id:1,name:[marko],label:person,age:[29]],[id:4,name:[josh],label:person,age:[32]]]
==>[[id:2,name:[vadas],label:person,age:[27]],[id:1,name:[marko],label:person,age:[29]]]
==>[[id:3,name:[lop],lang:[java],label:software],[id:6,name:[peter],label:person,age:[35]]]
==>[[id:4,name:[josh],label:person,age:[32]],[id:5,name:[ripple],lang:[java],label:software]]
I used the modern graph which is one of TinkerPop's toy graphs that are often used for such examples. Your output will look a bit different and you may want to use something else than valueMap(true) for the by modulator. The TinkerPop documentation of the path step itself contains two more advanced examples for path().by() that you might want to check out.

Issue while ingesting a Titan graph into Faunus

I have installed both Titan and Faunus and each seems to be working properly (titan-0.4.4 & faunus-0.4.4)
However, after ingesting a sizable graph in Titan and trying to import it in Faunus via
FaunusFactory.open( )
I am experiencing issues. To be more precise, I do seem to get a faunus graph from the call FaunusFactory.open( ),
faunusgraph[titanhbaseinputformat->titanhbaseoutputformat]
but then, even asking a simple
g.v(10)
I do get this error:
Task Id : attempt_201407181049_0009_m_000000_0, Status : FAILED
com.thinkaurelius.titan.core.TitanException: Exception in Titan
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.getAdminInterface(HBaseStoreManager.java:380)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.ensureColumnFamilyExists(HBaseStoreManager.java:275)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.openDatabase(HBaseStoreManager.java:228)
My property file is taken straight out of the Faunus page with Titan-HBase input, except of course changing the url of the hadoop cluster:
faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname= my IP
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
faunus.graph.output.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseOutputFormat
faunus.graph.output.titan.storage.backend=hbase
faunus.graph.output.titan.storage.hostname= IP of my host
faunus.graph.output.titan.storage.port=2181
faunus.graph.output.titan.storage.tablename=titan
faunus.graph.output.titan.storage.batch-loading=true
faunus.output.location=output1
zookeeper.znode.parent=/hbase-unsecure
titan.graph.output.ids.block-size=100000
Anyone can help?
ADDENDUM:
To address the comment below, here is some context: as I have mentioned, I have a graph in Titan and can perform basic gremlin queries on it.
However, I do need to run a gremlin global query which, due to the size of the graph, needs Faunus and its underlying MR capabilities. Hence the need to import it. The error I get doesn't look to me as if it points to some inconsistency in the graph itself.
I'm not sure that you have your "flow" of Faunus right. If your end result is to do a global query of the graph, then consider this approach:
pull your graph to sequence file
issue your global query over the sequence file
More specifically create hbase-seq.properties:
# input graph parameters
faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname=localhost
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
# hbase.mapreduce.scan.cachedrows=1000
# output data (graph or statistic) parameters
faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=snapshot
faunus.output.location.overwrite=true
In Faunus, copy do:
g = FaunusFactory.open('hbase-seq.properties')
g._()
That will read the graph from hbase and write it to sequence file in HDFS. Next, create: seq-noop.properties with these contents:
# input graph parameters
faunus.graph.input.format=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat
faunus.input.location=snapshot/job-0
# output data parameters
faunus.graph.output.format=com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=analysis
faunus.output.location.overwrite=true
The above configuration will read your sequence file from the previous step and without re-writing the graph (that's what NoOpOutputFormat is for). Now in Faunus do:
g = FaunusFactory.open('seq-noop.properties')
g.V.sideEffect('{it.degree=it.bothE.count()}').degree.groupCount()
This will execute a degree distribution, writing the results in HDFS to the 'analysis' directory. Obviously you can do whatever Faunus-flavored Gremlin you want here - I just wanted to provide an example. I think this is a pretty standard "flow" or pattern for using Faunus from a graph analysis perspective.

Resources