add a second edge's attribute into a existing graph - graph

I'm trying to add a second edge's attribute in a existing graph.
I created a graph G and save it as a pkl file.
edges1 = pd.DataFrame({'source':[0,1,2,3,4],
'target':[10,11,12,13,14],
'weight':[50,50,50,50,50]})
G = nx.from_pandas_edgelist(edges1, 'source', 'target', 'weight')
I loaded G and then tried to add the second edge's attribute(cost) and a node attribute.
But it keeps overwriting the first edges' attribute(weight).
edges2 = pd.DataFrame({'source':[0,1,2,6,7,8],
'target':[10,11,12,16,17,18],
'cost':[100,100,100,100,100,100]})
nodes = pd.DataFrame({'node':[0,1,2,3,10,18],
'name':['A','B','C','D','E','F']})
nx.from_pandas_edgelist(edges2, 'source', 'target', 'cost')
nx.set_node_attributes(G, pd.Series(nodes.name, index=nodes.node).to_dict(), 'name')
I must load the graph G, so combining edges1 and edges2 DataFrames and creating a graph isn't what I need.
How can I get this?
[(0, 10, {'weight':50, 'cost': 100}), (1, 11, {'weight':50, 'cost':
100}) ...]
instead of this
[(0, 10, {'cost': 100}), (1, 11, {'cost': 100}) ...]

I'm not clear if you want to add new edges from edges2 or not. If you are okay with adding new edges, you can use nx.compose:
H = nx.from_pandas_edgelist(edges2, 'source', 'target', 'cost')
G_updated = nx.compose(G, H)
If you don't want to add new edges, then you can check if the edge exists and then set the edge attribute directly:
H = nx.from_pandas_edgelist(edges2, 'source', 'target', 'cost')
for edge in H.edges():
if edge in G.edges():
G.edges[edge]['cost'] = H.edges[edge]['cost']
If performance is an issue, you could also consider setting the edge attributes of G directly by using your edges2 data without building a second graph or even a second dataframe.

Related

How can I animate simultaneously all the graph points inside Manim?

I made this class : 50 points of a spiral change to a cirle.
But the animation is sequential and I would like to start it at the same time.
class SpiralToCircle(Scene):
def construct(self):
vertices1 = range(50)
vertices2 = range(50)
edges = [(48, 49),(3, 4)]
g1 = Graph(vertices1, edges, layout="spiral")
g2 = Graph(vertices2, edges, layout="circular")
# self.add(graph)
self.play(Create(g1))
self.wait(5)
for i in vertices1:
self.play(g1[i].animate.move_to(g2[i]))
self.wait()
I thought about this trick, but I returns an error :
self.play((g1[i].animate.move_to(g2[i])) for i in vertices1)
TypeError: Unexpected argument <generator object GraphCircular.construct.. at 0x00000229667509E0> passed to Scene.play().
This should work: self.play([g1[i].animate.move_to(g2[i]) for i in vertices1]) The play function can take a list of animations.
Try unpacking the animation list and pass them as parameters to the play method:
self.play(*[g1[i].animate.move_to(g2[i]) for i in vertices1])
So, the code would be:
class SpiralToCircle(Scene):
def construct(self):
vertices1 = range(50)
vertices2 = range(50)
edges = [(48, 49), (3, 4)]
g1 = Graph(vertices1, edges, layout="spiral")
g2 = Graph(vertices2, edges, layout="circular")
# self.add(graph)
self.play(Create(g1))
self.wait(5)
self.play(*[g1[i].animate.move_to(g2[i]) for i in vertices1])
self.wait()
Generating this output:
Output Animation

Getting the Vertices numbers from an Edge

I am using the shortest path algorithm from LightGraphs.jl. In the end I want to collect some information about the nodes along the path. In order to do that I need to be able to extract the vertices from the edges that the function gives back.
Using LightGraphs
g = cycle_graph(4)
path = a_star(g, 1, 3)
edge1 = path[1]
Using this I get: Edge 1 => 2
How would I automatically get the vertices 1, 2 without having to look at the Edge manually? I thinking about some thing like edge1[1] or edge1.From which both does not work.
Thanks in advance!
The accessors for AbstractEdge classes are src and dst, used like this:
using LightGraphs
g = cycle_graph(4)
path = a_star(g, 1, 3)
edge1 = path[1]
s = src(edge1)
d = dst(edge1)
println("source: $s") # prints "source: 1"
println("destination: $d") # prints "destination: 2"

All path *lengths* from source to target in Directed Acyclic Graph

I have a graph with an adjacency matrix shape (adj_mat.shape = (4000, 4000)). My current problem involves finding the list of path lengths (the sequence of nodes is not so important) that traverses from the source (row = 0 ) to the target (col = trans_mat.shape[0] -1).
I am not interested in finding the path sequences; I am only interested in propagating the path length. As a result, this is different from finding all simple paths - which would be too slow (ie. find all paths from source to target; then score each path). Is there a performant way to do this quickly?
DFS is suggested as one possible strategy (noted here). My current implementation (below) is simply not optimal:
# create graph
G = nx.from_numpy_matrix(adj_mat, create_using=nx.DiGraph())
# initialize nodes
for node in G.nodes:
G.nodes[node]['cprob'] = []
# set starting node value
G.nodes[0]['cprob'] = [0]
def propagate_prob(G, node):
# find incoming edges to node
predecessors = list(G.predecessors(node))
curr_node_arr = []
for prev_node in predecessors:
# get incoming edge weight
edge_weight = G.get_edge_data(prev_node, node)['weight']
# get predecessor node value
if len(G.nodes[prev_node]['cprob']) == 0:
G.nodes[prev_node]['cprob'] = propagate_prob(G, prev_node)
prev_node_arr = G.nodes[prev_node]['cprob']
# add incoming edge weight to prev_node arr
curr_node_arr = np.concatenate([curr_node_arr, np.array(edge_weight) + np.array(prev_node_arr)])
# update current node array
G.nodes[node]['cprob'] = curr_node_arr
return G.nodes[node]['cprob']
# calculate all path lengths from source to sink
part_func = propagate_prob(G, 4000)
I don't have a large example by hand (e.g. >300 nodes), but I found a non recursive solution:
import networkx as nx
g = nx.DiGraph()
nx.add_path(g, range(7))
g.add_edge(0, 3)
g.add_edge(0, 5)
g.add_edge(1, 4)
g.add_edge(3, 6)
# first step retrieve topological sorting
sorted_nodes = nx.algorithms.topological_sort(g)
start = 0
target = 6
path_lengths = {start: [0]}
for node in sorted_nodes:
if node == target:
print(path_lengths[node])
break
if node not in path_lengths or g.out_degree(node) == 0:
continue
new_path_length = path_lengths[node]
new_path_length = [i + 1 for i in new_path_length]
for successor in g.successors(node):
if successor in path_lengths:
path_lengths[successor].extend(new_path_length)
else:
path_lengths[successor] = new_path_length.copy()
if node != target:
del path_lengths[node]
Output: [2, 4, 2, 4, 4, 6]
If you are only interested in the number of paths with different length, e.g. {2:2, 4:3, 6:1} for above example, you could even reduce the lists to dicts.
Background
Some explanation what I'm doing (and I hope works for larger examples as well). First step is to retrieve the topological sorting. Why? Then I know in which "direction" the edges flow and I can simply process the nodes in that order without "missing any edge" or any "backtracking" like in a recursive variant. Afterwards, I initialise the start node with a list containing the current path length ([0]). This list is copied to all successors, while updating the path length (all elements +1). The goal is that in each iteration the path length from the starting node to all processed nodes is calculated and stored in the dict path_lengths. The loop stops after reaching the target-node.
With igraph I can calculate up to 300 nodes in ~ 1 second. I also found that accessing the adjacency matrix itself (rather than calling functions of igraph to retrieve edges/vertices) also saves time. The two key bottlenecks are 1) appending a long list in an efficient manner (while also keeping memory) 2) finding a way to parallelize. This time grows exponentially past ~300 nodes, I would love to see if someone has a faster solution (while also fitting into memory).
import igraph
# create graph from adjacency matrix
G = igraph.Graph.Adjacency((trans_mat_pad > 0).tolist())
# add edge weights
G.es['weight'] = trans_mat_pad[trans_mat_pad.nonzero()]
# initialize nodes
for node in range(trans_mat_pad.shape[0]):
G.vs[node]['cprob'] = []
# set starting node value
G.vs[0]['cprob'] = [0]
def propagate_prob(G, node, trans_mat_pad):
# find incoming edges to node
predecessors = trans_mat_pad[:, node].nonzero()[0] # G.get_adjlist(mode='IN')[node]
curr_node_arr = []
for prev_node in predecessors:
# get incoming edge weight
edge_weight = trans_mat_pad[prev_node, node] # G.es[prev_node]['weight']
# get predecessor node value
if len(G.vs[prev_node]['cprob']) == 0:
curr_node_arr = np.concatenate([curr_node_arr, np.array(edge_weight) + propagate_prob(G, prev_node, trans_mat_pad)])
else:
curr_node_arr = np.concatenate([curr_node_arr, np.array(edge_weight) + np.array(G.vs[prev_node]['cprob'])])
## NB: If memory constraint, uncomment below
# set max size
# if len(curr_node_arr) > 100:
# curr_node_arr = np.sort(curr_node_arr)[:100]
# update current node array
G.vs[node]['cprob'] = curr_node_arr
return G.vs[node]['cprob']
# calculate path lengths
path_len = propagate_prob(G, trans_mat_pad.shape[0]-1, trans_mat_pad)

Best way to count downstream with edge data

I have a NetworkX problem. I create a digraph with a pandas DataFrame and there is data that I set along the edge. I now need to count the # of unique sources for nodes descendants and access the edge attribute.
This is my code and it works for one node but I need to pass a lot of nodes to this and get unique counts.
graph = nx.from_pandas_edgelist(df, source="source", target="target",
edge_attr=["domain", "category"], create_using=nx.DiGraph)
downstream_nodes = list(nx.descendants(graph, node))
downstream_nodes.append(node)
subgraph = graph.subgraph(downstream_nodes).copy()
domain_sources = {}
for s, t, v in subgraph.edges(data=True):
if v["domain"] in domain_sources:
domain_sources[v["domain"]].append(s)
else:
domain_sources[v["domain"]] = [s]
down_count = {}
for k, v in domain_sources.items():
down_count[k] = len(list(set(v)))
It works but, again, for one node the time is not a big deal but I'm feeding this routine at least 40 to 50 nodes. Is this the best way? Is there something else I can do that can group by an edge attribute and uniquely count the nodes?
Two possible enhancements:
Remove copy from line creating the sub graph. You are not changing anything and the copy is redundant.
Create a defaultdict with keys of set. Read more here.
from collections import defaultdict
import networkx as nx
# missing part of df creation
graph = nx.from_pandas_edgelist(df, source="source", target="target",
edge_attr=["domain", "category"], create_using=nx.DiGraph)
downstream_nodes = list(nx.descendants(graph, node))
downstream_nodes.append(node)
subgraph = graph.subgraph(downstream_nodes)
domain_sources = defaultdict(set)
for s, t, v in subgraph.edges(data=True):
domain_sources[v["domain"]].add(s)
down_count = {}
for k, v in domain_sources.items():
down_count[k] = len(set(v))

How can I add a second new edge between two nodes update the old one

What I have: a multigraph H in networkX. Two nodes '0' and '1'. An existing edge e1=(0,1).
What I want: add a second new edge e2 between nodes 0 and 1.
PROBLEM: When I add the new edge e2 between 0 and 1, e1 is updated whit the new value (attributes) of e2, and e2 is not added. There is always a single edge between 0 and 1
My example code:
H=nx.MultiGraph()
H=nx.read_gml('my_graph.gml')
If I print all the edges of H I correctly have this:
for i in H.edges(data=True):
print i
>>>>>(0, 1, {}) #this is ok
Now I add a new edge to e2=(0,1) using the key attribute:
H.add_edge(0,1,key=1,value='blue')
But if i print all the edges of H:
for i in H.edges(data=True):
print i
>>>>>(0, 1, {'key': 1, 'value': 'blue'}) #this is error e1 was updated instead add of e2
As you can see, the second edge has update the first one, but e2 was added with a specified key, different form e1 (default is 0).
How can I avoid this problem??
I Want this result after adding edge e2:
for i in H.edges(data=True):
print i
>>>>>(0: 0, 1, {}, 1: 0,1,{'value': 'blue'} ) #this is correct
You don't have a multigraph so you are replacing edges instead of adding new ones.
Use
H=nx.MultiGraph(nx.read_gml('my_graph.gml'))

Resources