Assign query using 'match()' to subgraph - gremlin

I have a JanusGraph database with a graph structure as follows:
(Paper)<-[AuthorOf]-(Author)
I'm want to use Gremlin's match clause to query the data and assign the results to a subgraph. This is what I have so far:
g.V().match(
__.as('a').has('Paper','paperTitle', 'The name of my paper'),
__.as('a').in('AuthorOf').outV().as('b')).
select('b').values()
This query returns what I want, the Authors of the paper I'm for which I'm searching. However, I want to assign the results to a subgraph so I can export it using:
sg.io(IoCore.graphml()).writeGraph("/home/ubuntu/myresults.graphml")
Previously, I've achieved this with a different query structure like this:
sg = g.V().has('paperTitle', 'The name of my paper').
inE('AuthorOf').subgraph('sg1').
outV().
cap('sg1').
next()
Is there away to achieve the same results using the 'match()' statement?

After a little trial and error I was able to create a working solution:
sg = g.V().match(
__.as('a').has('Paper','paperTitle', 'ladle pouring guide'),
__.as('a').inE('AuthorOf').subgraph('sg').outV().as('b')).
cap('sg').next()
At first, I was trying to use the 'select' statement to isolate the subgraph. After reviewing the documentation on 'subgraph' and learning more about side-effects in gremlin I realized it wasn't necessary.

Related

hasNot() having no effect

I'm using gremlin traversals via a Jupiter Notebook on Amazon Neptune.
I'm trying to filter edges from a specific vertex by their label, but it doesn't seem to work.
some sample data:
%%gremlin
g.addV().property(id, 'u0').as('u0').
addV().property(id, 'u1').as('u1').
addV().property(id, 'u2').as('u2').
addV().property(id, 'u3').as('u3').
addE('freind').
from('u0').
to('u1').
addE('buddy').
from('u0').
to('u2').
addE('foe').
from('u0').
to('u3').
iterate()
and my query:
(It is more complex than needed for this example, but my actual query repeats several times, therefore I can't simply use has('friend').has('buddy') because the next step has other labels.)
%%gremlin
g.withSack(1.0f).V('u0')
.repeat(
bothE().hasNot('foe')
.bothV())
.times(1)
.path().by().by(label)
output:
path[v[u0], freind, v[u1]]
path[v[u0], buddy, v[u2]]
path[v[u0], foe, v[u3]]
I have a user I start with (u0) and want all user who are his friends, buddies, and so on, but not his foes.
unfortunately its not filtering as its supposed to...
any Idea what I'm doing wrong?
The hasNot() step will only filter out elements that have a property with the specified name, in this case a property named foe. Instead, you should look at using not() with hasLabel() to find items that do not have a specific label, as shown here:
g.withSack(1.0f).V('u0')
.repeat(
bothE().not(hasLabel('foe'))
.bothV())
.times(1)
.path().by().by(label)

Gremlin : How do you find vertex and edges when some edges does not exists

I am new to gremlin.
I am facing issue in fetching the vertex and edges when sometimes edge from a vertex does not exists.
for example bellow query works fine if it gets all the vertex and edges.
but for one use case edge
`.outE("PRODUCES").`as`("produces"))`
does not exists in db.
in that case bellow query doesnt return any result.
I need your help to resolve this issue.
when edges does not exit then i want input_entity and processed_by in result.
janusGraph.traversal().V()
.has("isActive", "true")
.hasLabel("ENTITY").`as`("input_entity")
.outE("PROCESSED_BY").`as`("processed_by")
.inV().`as`("job")
.outE("PRODUCES").`as`("produces")
.select<String>("job").outE("HAS_STATE")
.`as`("job_state_edge").inV().hasLabel("JOB_STATE").`as`("job_state")
.select<String>("input_entity").outE("HAS_STATE")
.`as`("input_entity_state_edge").inV().hasLabel("ENTITY_STATE").`as`("input_entity_state")
.select<String>("input_entity").outE("CONNECTS_TO").`as`("connects_to")
.inV().hasLabel("ENTITY").has("entityName", TextP.startingWith(rootNamespace))
.`as`("output_entity").outE("HAS_STATE")
.`as`("output_entity_state_edge").inV().hasLabel("ENTITY_STATE").`as`("output_entity_state")
.select<String>("input_entity","output_entity","processed_by","produces","job","job_state","input_entity_state","output_entity_state","input_entity_state_edge","output_entity_state_edge","job_state_edge","connects_to")
.by(elementMap<Element, Any>()).toList()
with optional
janusGraph.traversal().V()
.has("isActive", "true")
.hasLabel("ENTITY").`as`("input_entity")
.outE("HAS_STATE").`as`("input_entity_state_edge").inV().hasLabel("ENTITY_STATE").`as`("input_entity_state")
.select<String>("input_entity").outE("PROCESSED_BY").`as`("processed_by")
.inV().`as`("job").outE("HAS_STATE").`as`("job_state_edge").inV().hasLabel("JOB_STATE").`as`("job_state")
.select<String>("job")
.optional(
outE("PRODUCES").`as`("produces")
.select<String>("input_entity").outE("CONNECTS_TO").`as`("connects_to")
.inV().hasLabel("ENTITY").has("entityName", TextP.startingWith(rootNamespace))
.`as`("output_entity").outE("HAS_STATE").`as`("output_entity_state_edge").inV().hasLabel("ENTITY_STATE").`as`("output_entity_state"))
.select<String>("input_entity","output_entity","processed_by","produces","job","job_state","input_entity_state","output_entity_state","input_entity_state_edge","output_entity_state_edge","job_state_edge","connects_to")
.by(elementMap<Element, Any>()).toList()
There are two Gremlin steps that can help in cases like this. When you have a part of a query that may or may not exist, but you either want the results up to that point if it does not exist or the results afterwards if it does exist, you can wrap that part of the query in an optional step.
For example :
g.V('3').optional(out())
Will either return V['3'] or the adjacent vertices if out yields results.
In cases where you want to select a value that may not exist, you can do something like this:
coalesce(select('a'),constant('No results'))
EDITED to add:
If you need to return multiple results, rather than just using select try a project('a','b,',c') type of approach where each by modulator for the project can contain its own coalesce step.

How can we reach to Key/value pairs after group by function on Gremlin?

I have a simple gremlin query, first of all, I aggregate data, then I want to reach aggregated data and I want to use it on other operations. But after group-by operation, I couldn’t reach them as key/value pair.
GraphTraversal t = graph.V().hasLabel("App").as("a")
.inE("RANKS").as("r")
.outV().as("k")
.choose(__.select("k").by("countryCode").is(__.in(...)),
__.math("1.0 / r").by("rank1"),
__.math("1.0 / r").by("rank2"))
.as("score")
.select("a").aggregate("ap")
.select("k").by("countryCode").aggregate("country")
.select("a", "k").by("appId").by("countryCode")
.group().by("grp_res").by(__.select("score").sum().as("sum_score"))
.cap("ap", "country", "grp_res")
.V().hasLabel("App").where(P.within("ap")).as("app")
.select("app", "country", "ak").by("appId").by().by();
The last line .select("app", "ak").by("appId").by() couldn’t be reachable askey/value pair after group-by. How can I reach them? Do you have any suggestion?
Now it looks:
{app=1, country=US, ak={{a=1, k=US}=363.0, {a=2, k=US}=544.5}}
{app=2, country=US, ak={{a=1, k=US}=363.0, {a=2, k=US}=544.5}}
But expected output is:
{app=1, country=US, ak=363.0}
{app=2, country=US, ak=544.5}
Solved like:
group().by(...)
.select(Column.values).as("grp_result")
.select("grp_result").select("score").as("score")
...
I don't see where you define "ak" in your query but the basic issue is that you have nested maps where the keys are themselves a map. So you will need to select from the map using that exact key or order the groups differently to have simpler keys. Then would do something like select('ak').select('key').

RNeo4j cypher - retrieving paths

I'm trying to extract a sub-graph from a global network (sub-networks of specific nodes to a specific depth).
The network is composed of nodes labeled as Account with a property of iban and relationships of TRANSFER_TO_AGG.
The cypher syntax is as followed:
MATCH (a:Account { iban :'FR7618206004274157697300156' }),(b:Account),
p = allShortestPaths((a)-[:TRANSFER_TO_AGG*..3]-(b))
RETURN p limit 250
This works perfectly on the Neo4J web interface. However, when trying to save the results to an R object using the command cypher I get the following error:
"Error in as.data.frame.list(value, row.names = rlabs) :
supplied 92 row names for 1 rows"
I believe this is due to the fact that if returning data, you can only query for tabular results. That is, this method has no current functionality for Cypher results containing array properties, collections, nodes, or relationships.
Can anyone offer a solution ?
I've recently added functionality for returning pathways as R objects. First, uninstall / reinstall RNeo4j. Then, see:
?getSinglePath
?getPaths
?shortestPath
?allShortestPaths
?nodes
?rels
?startNode
?endNode
For your query specifically, you would use getPaths():
library(RNeo4j)
graph = startGraph("http://localhost:7474/db/data/")
query = "
MATCH (a:Account { iban :'FR7618206004274157697300156' }),(b:Account),
p = allShortestPaths((a)-[:TRANSFER_TO_AGG*..3]-(b))
RETURN p limit 250
"
p = getPaths(graph, query)
p is a list of path objects. See the docs for examples of using the apply family of functions with a list of path objects.

Faunus graph not printing nodes without using side effect from gremlin shell

I'm trying to print a graph in Faunus (v0.4.0) where a node has any edges (incoming or outgoing). From the gremlin shell, I tried:
g = FaunusFactory.open('faunus.properties')
g.V.filter("{it.bothE.hasNext()}").sideEffect("{println it}")
When I do this, I get a printout of all the nodes as I expected
But without the println, I do not.
According to How do I write a for loop in gremlin?, the gremlin terminal should print this info out for me, but it does not seem to.
Is there something specific I need to do to enable the printing from the console?
Faunus and Gremlin are close to each other in terms of purpose and functionality but not identical. The filter isn't producing a side-effect, which will be written to HDFS. If you did:
g.V.filter("{it.bothE.hasNext()}").id
You could then view the list of ids matching that filter with something like:
hdfs.head('output',100)
to see the first 100 lines of the output. If you need more than just the element identifier you could do a transform to get some of the element properties in there as well. You might find these hdfs helper tips helpful.

Resources