networkX: ego_graph without sibling? - graph

I have a directed with no-cycles networkX graph. I would like to create a subgraph with only all direct or direct predecessors of a given node n. For instance, if n has 3 predecessors, a, b and c, I will also search for predecessors for each of those 3 nodes.
I am currently using the ego_graph method of networkX, this works perfectly but the output also keeps sibling nodes with no direct access to my target node since it's an directed graph.
def draw(graph_path: Path, target: str, radius: int)
graph = nx.read_graphml(graphml)
subgraph = nx.ego_graph(graph, target, undirected=True, radius=radius)
draw_graph(subgraph, table)
My undirected is set to False because when I set it True, it is only retuning my target only, does not matter what the radius value is.
Target node is called CORE - SUPPLY CHAIN [DEV].20220128 AS_IS stock_V1norm.DIM calendar and with a radius of 1:
The result is what I am expecting.
Now, same target but with a radius of 2:
The result is not what I was expecting since I am getting sibling and I only wants to get predecessors nodes such as:
graphML sample:
<?xml version='1.0' encoding='utf-8'?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="d1" for="node" attr.name="kind" attr.type="string" />
<key id="d0" for="node" attr.name="zone" attr.type="string" />
<graph edgedefault="directed">
<node id="gold_core.customers">
<data key="d0">gold</data>
<data key="d1">core</data>>
</node>
<node id="rt.F0116">
<data key="d0">silver</data>>
</node>
<node id="hy.F4211">
<data key="d0">silver</data>
</node>
<edge
source="hy.F4211"
target="gold_core.customers"
/>

You can get the predecessors for a node using the DiGraph.predecessors method.
#!/usr/bin/env python
"""
Find predecessors to a given node.
"""
import matplotlib.pyplot as plt
import networkx as nx
from netgraph import Graph # pip install netgraph
# create a test graph
edges = [
('parent a', 'target'),
('parent b', 'target'),
('parent c', 'target'),
('grandparent aa', 'parent a'),
('grandparent bb', 'parent b'),
('grandparent cc', 'parent c'),
('parent a', 'sibling'),
('target', 'child')
]
g = nx.from_edgelist(edges, create_using=nx.DiGraph)
# get predecessors
parents = list(g.predecessors('target'))
grandparents = []
for parent in parents:
for grandparent in list(g.predecessors(parent)):
grandparents.append(grandparent)
predecessors = parents + grandparents
# give predecessors a red color
node_color = dict()
for node in g:
if node in predecessors:
node_color[node] = 'red'
else:
node_color[node] = 'white'
# plot
fig, (ax1, ax2) = plt.subplots(1, 2)
Graph(g,
node_layout='dot',
arrows=True,
node_color=node_color,
node_labels=True,
node_label_fontdict=dict(size=10),
node_label_offset=0.1,
ax=ax1
)
# plot subgraph
subgraph = g.subgraph(predecessors + ['target'])
Graph(subgraph,
node_layout='dot',
arrows=True,
node_labels=True,
node_label_fontdict=dict(size=10),
node_label_offset=0.1,
ax=ax2,
)
plt.show()

Related

Creating a subgraph using Cypher projection

I am trying to create a subgraph of my graph using Cypher projection because I want to use the GDS library. First, I am creating a subgraph using Cypher query which works perfectly fine. Here is the query:
// Filter for only recurrent events
WITH [path=(m:IDHcodel)--(n:Tissue)
WHERE (m.node_category = 'molecular' AND n.event_class = 'Recurrence')
AND NOT EXISTS((m)--(:Tissue{event_class:'Primary'})) | m] AS recur_events
// Obtain the sub-network with 2 or more patients in edges
MATCH p=(m1)-[r:hasIDHcodelPatients]->(m2)
WHERE (m1 IN recur_events AND m2 IN recur_events AND r.total_common_patients >= 2)
WITH COLLECT(p) AS all_paths
WITH [p IN all_paths | nodes(p)] AS path_nodes, [p IN all_paths | relationships(p)] AS path_rels
RETURN apoc.coll.toSet(apoc.coll.flatten(path_nodes)) AS subgraph_nodes, apoc.coll.flatten(path_rels) AS subgraph_rels
So far so good. Now all I am trying to do is a Cypher projection by sending the subgraph nodes and subgraph rels as parameters in the GDS create query and this gives me a null pointer exception:
// All the above lines except using WITH instead of RETRUN in the last line. ie.,
...
WITH apoc.coll.toSet(apoc.coll.flatten(path_nodes)) AS subgraph_nodes, apoc.coll.flatten(path_rels) AS subgraph_rels
// Call gds library to create a graph by sending subgraph_nodes and subgraph_rels as parameters
CALL gds.graph.create.cypher(
'example',
'MATCH (n) where n in $sn RETURN id(n) as id',
'MATCH ()-[r]-() where r in $sr RETURN r.start as source , r.end as target',
{parameters: {sn: subgraph_nodes, sr: subgraph_rels} }
) YIELD graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipQuery, relationshipCount AS rels
RETURN graph
What could be wrong? Thanks.
To access start and end node of a relationship, there is a slightly different syntax that you are using:
WITH apoc.coll.toSet(apoc.coll.flatten(path_nodes)) AS subgraph_nodes, apoc.coll.flatten(path_rels) AS subgraph_rels
// Call gds library to create a graph by sending subgraph_nodes and subgraph_rels as parameters
CALL gds.graph.create.cypher(
'example',
'MATCH (n) where n in $sn RETURN id(n) as id',
'MATCH ()-[r]-() where r in $sr RETURN id(startNode(r)) as source , id(endNode(r)) as target',
{parameters: {sn: subgraph_nodes, sr: subgraph_rels} }
) YIELD graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipQuery, relationshipCount AS rels
RETURN graph
This is what I noticed, hopefully this is the only error.

add a second edge's attribute into a existing graph

I'm trying to add a second edge's attribute in a existing graph.
I created a graph G and save it as a pkl file.
edges1 = pd.DataFrame({'source':[0,1,2,3,4],
'target':[10,11,12,13,14],
'weight':[50,50,50,50,50]})
G = nx.from_pandas_edgelist(edges1, 'source', 'target', 'weight')
I loaded G and then tried to add the second edge's attribute(cost) and a node attribute.
But it keeps overwriting the first edges' attribute(weight).
edges2 = pd.DataFrame({'source':[0,1,2,6,7,8],
'target':[10,11,12,16,17,18],
'cost':[100,100,100,100,100,100]})
nodes = pd.DataFrame({'node':[0,1,2,3,10,18],
'name':['A','B','C','D','E','F']})
nx.from_pandas_edgelist(edges2, 'source', 'target', 'cost')
nx.set_node_attributes(G, pd.Series(nodes.name, index=nodes.node).to_dict(), 'name')
I must load the graph G, so combining edges1 and edges2 DataFrames and creating a graph isn't what I need.
How can I get this?
[(0, 10, {'weight':50, 'cost': 100}), (1, 11, {'weight':50, 'cost':
100}) ...]
instead of this
[(0, 10, {'cost': 100}), (1, 11, {'cost': 100}) ...]
I'm not clear if you want to add new edges from edges2 or not. If you are okay with adding new edges, you can use nx.compose:
H = nx.from_pandas_edgelist(edges2, 'source', 'target', 'cost')
G_updated = nx.compose(G, H)
If you don't want to add new edges, then you can check if the edge exists and then set the edge attribute directly:
H = nx.from_pandas_edgelist(edges2, 'source', 'target', 'cost')
for edge in H.edges():
if edge in G.edges():
G.edges[edge]['cost'] = H.edges[edge]['cost']
If performance is an issue, you could also consider setting the edge attributes of G directly by using your edges2 data without building a second graph or even a second dataframe.

Add round feedback arrow to horizontal graph in Graphviz / DiagrammR

I like to add a feedback arrow to a Graphviz graph, where the ordinary "flow" remains horizontal, but the feedback should be round, like the manually added blue arrow below.
Here is what I tried so far. I use the DiagrammR package for the R language but a suggestion for plain or python Graphviz or would of course also be helpful.
library("DiagrammeR")
grViz("digraph feedback {
graph [rankdir = 'LR']
node [shape = box]
Population
node [shape = circle]
Source Sink
node [shape = none]
Source -> Growth -> Population -> Death -> Sink
Population -> Growth [constraint = false]
Death -> Population [constraint = false]
}")
You can try using the headport and tailport options and indicate "north" for both of these (for Population and Growth).
The headport is the cardinal direction for where the arrowhead meets the node.
The tailport is the cardinal direction for where the tail is emitted from the node.
library("DiagrammeR")
grViz("digraph feedback {
graph [rankdir = 'LR']
node [shape = box]
Population
node [shape = circle]
Source Sink
node [shape = none]
Source -> Growth -> Population -> Death -> Sink
Population -> Growth [tailport = 'n', headport = 'n', constraint = false]
}")
Output

Best way to count downstream with edge data

I have a NetworkX problem. I create a digraph with a pandas DataFrame and there is data that I set along the edge. I now need to count the # of unique sources for nodes descendants and access the edge attribute.
This is my code and it works for one node but I need to pass a lot of nodes to this and get unique counts.
graph = nx.from_pandas_edgelist(df, source="source", target="target",
edge_attr=["domain", "category"], create_using=nx.DiGraph)
downstream_nodes = list(nx.descendants(graph, node))
downstream_nodes.append(node)
subgraph = graph.subgraph(downstream_nodes).copy()
domain_sources = {}
for s, t, v in subgraph.edges(data=True):
if v["domain"] in domain_sources:
domain_sources[v["domain"]].append(s)
else:
domain_sources[v["domain"]] = [s]
down_count = {}
for k, v in domain_sources.items():
down_count[k] = len(list(set(v)))
It works but, again, for one node the time is not a big deal but I'm feeding this routine at least 40 to 50 nodes. Is this the best way? Is there something else I can do that can group by an edge attribute and uniquely count the nodes?
Two possible enhancements:
Remove copy from line creating the sub graph. You are not changing anything and the copy is redundant.
Create a defaultdict with keys of set. Read more here.
from collections import defaultdict
import networkx as nx
# missing part of df creation
graph = nx.from_pandas_edgelist(df, source="source", target="target",
edge_attr=["domain", "category"], create_using=nx.DiGraph)
downstream_nodes = list(nx.descendants(graph, node))
downstream_nodes.append(node)
subgraph = graph.subgraph(downstream_nodes)
domain_sources = defaultdict(set)
for s, t, v in subgraph.edges(data=True):
domain_sources[v["domain"]].add(s)
down_count = {}
for k, v in domain_sources.items():
down_count[k] = len(set(v))

Justify node text in DiagrammeR

Does anybody know if DiagrammeR currently supports left- and right-justification of node labels when using GraphViz?
Here is a quick example, where I would like to left-justify the text within both of the nodes:
library(DiagrammeR)
grViz("
digraph test {
graph [fontsize = 10]
node [shape = box]
A [label = 'Foo\nBar']
B [label = 'Bar\nFoo']
A -> B
}
")
I was able to find one resource here for the native GraphViz that uses /l for left-justification, but when I try that within the grViz function I receive an error. For example:
library(DiagrammeR)
grViz("
digraph test {
graph [fontsize = 10]
node [shape = box]
A [label = 'Foo\lBar']
B [label = 'Bar\lFoo']
A -> B
}
")
I appreciate any help in advance!
You need a double backslash to escape the first slash. Here are left and right justified labels:
grViz("
digraph test {
graph [fontsize = 10]
node [shape = box]
A [label = 'Foo\\lBar\\l']
B [label = 'Bar\\rFoo\\r']
A -> B
}
")

Resources