How do you find all vertices that have no incoming edges? - gremlin

Below I'm trying to find all vertices where there are no incoming edges using a filter on the vertices. fullyQualifiedName is a unique index. I noticed some vertices that appeared to have incoming edges so I added a step below to just print them out if they existed. I would have expected no output since I thought I had filtered these vertices above; however, I'm still seeing incoming edges displayed.
def g = BerkeleyGraphFactory.create()
def vertices = g.V.filter {
it.inE('depends').count() == 0
}
Set<String> u = []
u.addAll(vertices.collect {v->
v.fullyQualifiedName
})
u.each {
def focusIter = g.V('fullyQualifiedName', it)
def vertex = focusIter.next()
// this shouldn't print out anything since these vertices were filtered above
vertex.inE('depends').each { e->
def classRefV = e.outV.next()
println it + " is used by " + classRefV.name + " " + e.toString()
}
}

I can't seem to recreate your problem. A rough simplification of your code here seems to show that things work as expected:
gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> ids = g.V.filter{!it.inE('knows').hasNext()}.id.toList()
==>1
==>3
==>5
==>6
gremlin> ids.collect{g.v(it).inE('knows').toList()}
==>[]
==>[]
==>[]
==>[]
Perhaps you can try to convert your code to match the approach I took to see if that helps? I'm not sure what else to say short of you providing some sample data to work with for your specific case where the problem can be recreated.

Related

Creating a subgraph using Cypher projection

I am trying to create a subgraph of my graph using Cypher projection because I want to use the GDS library. First, I am creating a subgraph using Cypher query which works perfectly fine. Here is the query:
// Filter for only recurrent events
WITH [path=(m:IDHcodel)--(n:Tissue)
WHERE (m.node_category = 'molecular' AND n.event_class = 'Recurrence')
AND NOT EXISTS((m)--(:Tissue{event_class:'Primary'})) | m] AS recur_events
// Obtain the sub-network with 2 or more patients in edges
MATCH p=(m1)-[r:hasIDHcodelPatients]->(m2)
WHERE (m1 IN recur_events AND m2 IN recur_events AND r.total_common_patients >= 2)
WITH COLLECT(p) AS all_paths
WITH [p IN all_paths | nodes(p)] AS path_nodes, [p IN all_paths | relationships(p)] AS path_rels
RETURN apoc.coll.toSet(apoc.coll.flatten(path_nodes)) AS subgraph_nodes, apoc.coll.flatten(path_rels) AS subgraph_rels
So far so good. Now all I am trying to do is a Cypher projection by sending the subgraph nodes and subgraph rels as parameters in the GDS create query and this gives me a null pointer exception:
// All the above lines except using WITH instead of RETRUN in the last line. ie.,
...
WITH apoc.coll.toSet(apoc.coll.flatten(path_nodes)) AS subgraph_nodes, apoc.coll.flatten(path_rels) AS subgraph_rels
// Call gds library to create a graph by sending subgraph_nodes and subgraph_rels as parameters
CALL gds.graph.create.cypher(
'example',
'MATCH (n) where n in $sn RETURN id(n) as id',
'MATCH ()-[r]-() where r in $sr RETURN r.start as source , r.end as target',
{parameters: {sn: subgraph_nodes, sr: subgraph_rels} }
) YIELD graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipQuery, relationshipCount AS rels
RETURN graph
What could be wrong? Thanks.
To access start and end node of a relationship, there is a slightly different syntax that you are using:
WITH apoc.coll.toSet(apoc.coll.flatten(path_nodes)) AS subgraph_nodes, apoc.coll.flatten(path_rels) AS subgraph_rels
// Call gds library to create a graph by sending subgraph_nodes and subgraph_rels as parameters
CALL gds.graph.create.cypher(
'example',
'MATCH (n) where n in $sn RETURN id(n) as id',
'MATCH ()-[r]-() where r in $sr RETURN id(startNode(r)) as source , id(endNode(r)) as target',
{parameters: {sn: subgraph_nodes, sr: subgraph_rels} }
) YIELD graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipQuery, relationshipCount AS rels
RETURN graph
This is what I noticed, hopefully this is the only error.

Python/Neptune/Gremlin: "'list' object has no attribute 'next'"

I trying to mirror the following gremlin code in Python to do pagination.
gremlin> t = g.V().hasLabel('person');[]
gremlin> t.next(2)
==>v[1]
==>v[2]
gremlin> t.next(2)
==>v[4]
==>v[6]
Here are the Python code
from neptune_python_utils.gremlin_utils import GremlinUtils
from neptune_python_utils.endpoints import Endpoints
GremlinUtils.init_statics(globals())
endpoints = '...'
gremlin_utils = GremlinUtils(endpoints)
conn = gremlin_utils.remote_connection()
g = gremlin_utils.traversal_source(connection=conn)
t = g.V().hasLabel('my-label')
cnt, ipp = True, 100
while cnt:
r = t.next(ipp)
if not r:
cnt = False
But I'm getting error
"errorMessage": "'list' object has no attribute 'next'",
"errorType": "AttributeError"
on line ---> r = t.next(ipp)
The trace show that the first iteration for r = t.next(ipp) actually ran, but it returned a list object, so there is no .next()
anymore. How can I keep the traversal in the iterations?

Best way to count downstream with edge data

I have a NetworkX problem. I create a digraph with a pandas DataFrame and there is data that I set along the edge. I now need to count the # of unique sources for nodes descendants and access the edge attribute.
This is my code and it works for one node but I need to pass a lot of nodes to this and get unique counts.
graph = nx.from_pandas_edgelist(df, source="source", target="target",
edge_attr=["domain", "category"], create_using=nx.DiGraph)
downstream_nodes = list(nx.descendants(graph, node))
downstream_nodes.append(node)
subgraph = graph.subgraph(downstream_nodes).copy()
domain_sources = {}
for s, t, v in subgraph.edges(data=True):
if v["domain"] in domain_sources:
domain_sources[v["domain"]].append(s)
else:
domain_sources[v["domain"]] = [s]
down_count = {}
for k, v in domain_sources.items():
down_count[k] = len(list(set(v)))
It works but, again, for one node the time is not a big deal but I'm feeding this routine at least 40 to 50 nodes. Is this the best way? Is there something else I can do that can group by an edge attribute and uniquely count the nodes?
Two possible enhancements:
Remove copy from line creating the sub graph. You are not changing anything and the copy is redundant.
Create a defaultdict with keys of set. Read more here.
from collections import defaultdict
import networkx as nx
# missing part of df creation
graph = nx.from_pandas_edgelist(df, source="source", target="target",
edge_attr=["domain", "category"], create_using=nx.DiGraph)
downstream_nodes = list(nx.descendants(graph, node))
downstream_nodes.append(node)
subgraph = graph.subgraph(downstream_nodes)
domain_sources = defaultdict(set)
for s, t, v in subgraph.edges(data=True):
domain_sources[v["domain"]].add(s)
down_count = {}
for k, v in domain_sources.items():
down_count[k] = len(set(v))

Gremlin CSV parse creating extra vertices

My code should read the 4 columns, split them into vertices for the first 2 columns, and edge properties for the last two columns.The CSV file has 33 unique vertices in 37 lines of data. What I don't understand is why I get 74 vertices instead and 37 edges. Interestingly enough, if I omit the addE statment I just get 37 vertices.
Obviously the property portion hasn't been included as I've been trying to resolve my current issues.
1\t2\tstep\tcmp
2\t3\tconductor\tna
3\t4\tswitch\tZ300
\t for tab
etc.
My code is:
graph = TinkerGraph.open()
graph.createIndex('myId', Vertex.class)
g = graph.traversal()
getOrCreate = { myid ->
p = g.V('myId', myid)
if (!p.hasNext())
{g.addV('connector').property('myId',myid) }
else
{p.next()}
}
new File('Continuity.txt').eachLine {
if (!it.startsWith("#")){
def row = it .split('\t')
def fromVertex = getOrCreate(row[0])
def toVertex = getOrCreate(row[1])
g.addE("connection").from(fromVertex).to(toVertex).iterate()
}
}
There's at least on problem in the code that I see. In this line:
p = g.V('myId', myid)
you are telling gremlin to find vertices with ids "myId" and whatever the value of the variable myid is. You instead want:
p = g.V().has('myId', myid)
The syntax you were using is from TinkerPop 2.x. I tested your code this way with some other changes and it seems to work properly now:
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> graph.createIndex('myId', Vertex.class)
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> getOrCreate = { myid ->
......1> if (!g.V().has('myId', myid).hasNext())
......2> g.addV('connector').property('myId',myid)
......3> else
......4> g.V().has('myId', myid)
......5> }
==>groovysh_evaluate$_run_closure1#29d37757
gremlin> g.addE('connection').from(getOrCreate(1)).to(getOrCreate(2)).iterate()
gremlin> g.addE('connection').from(getOrCreate(1)).to(getOrCreate(2)).iterate()
gremlin> g.V()
==>v[0]
==>v[2]
gremlin> g.E()
==>e[4][2-connection->0]
==>e[5][2-connection->0]

Add edge between existing nodes in Gremlin

I'm new to Gremlin and just trying to build out a basic graph. I've been able to do a basic addEdge on new vertices, i.e.
gremlin> v1 = g.addVertex()
==>v[200004]
gremlin> v2 = g.addVertex()
==>v[200008]
gremlin> e = g.addEdge(v1, v2, 'edge label')
==>e[4c9f-Q1S-2F0LaTPQN8][200004-edge label->200008]
I have also been able to create an edge between vertices looked up by id:
gremlin> v1 = g.v(200004)
==>v[200004]
gremlin> v2 = g.v(200008)
==>v[200008]
gremlin> e = g.addEdge(v1, v2, 'edge label')
==>e[4c9f-Q1S-2F0LaTPQN8][200004-edge label->200008]
However, I now want to look up vertices based on multiple properties, which is where it gets tricky. In order to look up the right vertex, I'm making 2 calls to .has. It appears that the correct vertices are found, but adding the edge fails.
gremlin> v1 = g.V.has("x",5).has('y",7)
==>v[200004]
gremlin> v2 = g.V.has("x",3).has('y",5)
==>v[200008]
gremlin> e = g.addEdge(v1, v2, 'edge label')
No signature of method: groovy.lang.MissingMethodException.addEdge() is applicable for argument types: () values: []
What's the easiest way to add a simple edge between two existing vertices, based on a property value lookup?
The key issue is that .has returns a Pipe: in order to get the specific vertex instance, a simple call to .next() does the trick:
gremlin> v1 = g.V.has("x",5).has('y",7).next()
==>v[200004]
gremlin> v2 = g.V.has("x",3).has('y",5).next()
==>v[200008]
gremlin> e = g.addEdge(v1, v2, 'edge label')
==>e[4c9f-Q1S-2F0LaTPQN8][200004-edge label->200008]
Note that .next() will simply return the next item in the pipe. In this case, any additional vertices matching the property values are ignored.

Resources