Sequencing of river network calculation using sfnetworks and r - r

I’ve got a rooted tree setup in sfnetworks. It is derived from a line shapefile of a stream network. For each tributary in the stream network, there are obviously start and end nodes which sfnetworks determines, but there are also nodes connecting between start and end nodes on the line. I am going to take the example network developed in this question for simplicity sake.
library(sfnetworks)
library(sf)
library(tidygraph)
library(dplyr)
library(tidyverse)
rm(list = ls())
n01 = st_sfc(st_point(c(0, 0)))
n02 = st_sfc(st_point(c(1, 2)))
n03 = st_sfc(st_point(c(1, 3)))
n04 = st_sfc(st_point(c(1, 4)))
n05 = st_sfc(st_point(c(2, 1)))
n06 = st_sfc(st_point(c(2, 3)))
n07 = st_sfc(st_point(c(2, 4)))
n08 = st_sfc(st_point(c(3, 2)))
n09 = st_sfc(st_point(c(3, 3)))
n10 = st_sfc(st_point(c(3, 4)))
n11 = st_sfc(st_point(c(4, 2)))
n12 = st_sfc(st_point(c(4, 4)))
from = c(1, 2, 2, 3, 3, 5, 5, 8, 8, 9, 9)
to = c(5, 3, 6, 4, 7, 2, 8, 9, 11, 10, 12)
nodes = st_as_sf(c(n01, n02, n03, n04, n05, n06, n07, n08, n09, n10, n11, n12))
edges = data.frame(from = from, to = to)
G_1 = sfnetwork(nodes, edges)%>%
convert(to_spatial_explicit, .clean = TRUE) %>%
activate(edges) %>%
mutate(edgeID = c("a","c","d","e","f","b","g","h","i","j","k")) %>%
mutate(dL = c(1100, 300, 100, 100, 100, 500, 500, 300, 100, 100, 100))
ggplot()+
geom_sf(data = G_1 %>% activate(edges) %>% as_tibble() %>% st_as_sf(), aes(color = factor(edgeID)), size = 1.5)+
geom_sf(data = G_1 %>% activate(nodes) %>% as_tibble() %>% st_as_sf())+
scale_color_brewer(palette = "Paired")+
labs(x = "Longitude", y = "Latitude", color = "EdgeID")+
theme_bw()
I am working on a solving an equation as you move upstream on a stream, starting at the river outlet. I'll save you as much of the details on that equation as I can, but essentially:
I am solving for hx at each node, x, based on node attributes.
hx = sqrt(h0^2 + (N*dX/K)*(2*L – dX))
N, dX, K, and L are all fixed parameters known before we start the calculation.
N, K are fixed for all nodes, they never change.
dX is a node specific attribute.
h0 and L are edge specific attributes (they are the same for all nodes on an edge)
h0 is defined by the user at the start of the calculation, but then changes as you traverse the network upstream, changing at the end node when new edges converge.
h0 is initially set to 10, N = 0.5, K = 1500.
For sake of simplicity of presenting this, we will say that each edge is 100m long.
dX just reflects the distance on each edge. The downstream node on each edge, dX = 0. The upstream node on each edge, dX = 100.
L reflects the total distance of channel upstream of an edge, plus the total length of that edge.
I want to start the calculation at the downstream node of edge a with h0 = 10.
hx is calculated at that starting point and then at every node along the edge (in this example only the starting and ending node is shown) until you get to the junction where edge g and edge b merge to form edge a.
At that end point of edge a:
hx = sqrt(10^2 + (0.5*100/1500)*(2*1100-100)).
hx = 13.03
At this junction of edge g and edge b, I want to update h0 to reflect the hx calculated at the end point of edge a, so 13.03. This hx value will be used as h0 for both edge g and edge b.
I now want to make this calculation for each subnetwork. I will start with the subnetwork that starts at edge b. h0 has been changed to 13.03, we calculate hx for the nodes on edge b, then get to the junction of edge c and edge d.
At that end point of edge b:
hx = sqrt(13.03^2 + (0.5*100/1500)*(2*500-100))
hx = 14.13
h0 is updated for both edge c and edge d so that it reflects hx of edge b’s endpoint (14.13)
hx is calculated for all nodes on edge d, and again on edge c.
At the endpoint of edge c, h0 is updated to reflect the hx at that junction of edge e and edge f
hx = sqrt(14.13^2+(0.5*100/1500)*(2*300 – 100))
hx = 14.79
hx is finally calculated for all the nodes on edge e and edge f, using that hx calculated at the endpoint of edge c for the h0. That completes the traversal of that subnetwork.
We return to edge g to calculate hx for the subnetwork that begins there.
As we know from previous calculations, hx at the endpoint of edge a is 13.03. We now want h0 to again reflect this value for the beginning of the traversal of the subnetwork starting at edge g.
When we get to the node junction at edges h and edge i, again, we update h0 to reflect the hx calculated at the end point of edge g.
hx = sqrt(13.03^2 +(0.5*100/1500)*(2*500-100))
hx = 14.13
This is the value used as h0 for edges h and edge i. We calculate hx on edge i and then on edge h. h0 is updated a final time where edge j and edge k merge to reflect the hx calculated at the endpoint of edge h.
hx = sqrt(14.13^2 + (0.5*100/1500)*(2*500-100))
hx = 14.79
After that final update to h0, hx is calculated for all nodes on edge j and edge k.
In gist, h0 needs to be updated as you move upstream to new edges to reflect the maximum hx of the immediate downstream edge. There is a certain order of solution that must be achieved in that, before solving any edge (outside of the first edge), you need to know the ho for that segment, meaning you have to solve the edge below first, to determine the hx to use.
I have struggled to come up with a loop/iteration/recursion solution to this calculation that at the surface seems like it should be quite easily solved. The solution along a single drainage thread is very easy to do manually, but when it is scaled up to a whole network there are these sequencing concerns that must be addressed. When you have many thousand of edges in a network, manually making these calculations would not be an option.
This type of calculation sequencing is quite common in many basic hydrologic analyses (calculating stream order, stream magnitude, calculating upstream contributing area, etc.).
sfnetworks and tidygraph seem well suited to make this type of calculation but there are few example applications on hydrologic networks (most tools for graph analysis are geared for road networks which, given the difference in terms of number of users, makes complete sense).
I tried to make this as much of a repex as possible, if more is needed I will happily supply that, it was difficult to do given the more conceptual nature of the question.

Related

Normalizing graph edit distance to [0,1] (networkx)

I want to have a normalized graph edit distance.
I'm using this function:
https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.similarity.graph_edit_distance.html#networkx.algorithms.similarity.graph_edit_distance
I'm trying to understand to graph_edit_distance function in order to be able to normalize it between 0 and 1 but I don't understand it fully.
For example:
def compare_graphs(Ga, Gb):
draw_graph(Ga)
draw_graph(Gb)
graph_edit_distance = nx.graph_edit_distance(Ga, Gb, node_match=lambda x,y : x==y)
return graph_edit_distance
compare_graphs(G1, G3)
Why is the graph_edit_distance = 4?
Graph construction:
Hey
e1 = [(1,2), (2,3)]
e3 = [(1,3), (3,1)]
G1 = nx.DiGraph()
G1.add_edges_from(e1)
G3 = nx.DiGraph()
G3.add_edges_from(e3)
The edit distance is measured by:
nx.graph_edit_distance(Ga, Gb, node_match=lambda x,y : x==y)
The difference from graph_edit_distance is that it relates to node indexes.
This is the output of optimize_edit_paths:
list(optimize_edit_paths(G1, G2, node_match, edge_match,
node_subst_cost, node_del_cost, node_ins_cost,
edge_subst_cost, edge_del_cost, edge_ins_cost,
upper_bound, True))
Out[3]:
[([(1, 1), (2, None), (3, 3)],
[((1, 2), None), ((2, 3), None), (None, (1, 3)), (None, (3, 1))],
5.0),
([(1, 1), (2, 3), (3, None)],
[((1, 2), (1, 3)), (None, (3, 1)), ((2, 3), None)],
4.0)]
I know it should be the minimum sequence of node and edge edit operations transforming graph G1 to graph isomorphic to G2.
When I try to count, I get:
1. Add node 2 to G3,
2. Cancel e1=(1,3) from G3
3. Cancel e2=(3,1) from G3
4. Add e3 = (1,2) to G3
5. Add e4 = (2,3) to G3
graph_edit_distance = 5.
What am I missing?
Or alternatively, what can I do in order to normalize the distance I receive?
I thought about dividing by |V1| + |V2| + |E1| + |E2|, or dividing by max(|V1| + |E1|, |V2| + |E2|)) but I'm not sure.
Thanks in advance.
I know its old post but I am currently reading about GED and was willing to answer it for someone looking for it in future.
Graph edit distance is 4
Reason:
1 and 3 are connected using an undirected edge. While 1 and 2 are connected using directed edge. Now graph edit path will be :
Turn undirected edge to directed edge
Change 3 to 2 (substitution)
Add an edge to 2
Finally add a node to that edge
The Graph Edit Distance is unbounded. The more nodes you add to one graph, that the other graph doesn't have, the more edits you need to make them the same. So, the GED can be arbitrarily large.
I haven't seen this proposed anywhere else, but I have an idea:
Instead of GED(G1,G2), you can compute GED(G1,G2)/[GED(G1,G0) + GED(G2,G0)],
where G0 is the empty graph.
The situation is analogous to the difference between real numbers.
Imagine that I give you |A-B| and |C-D|. They are not on the same footing.
E.g., you could have A=1, B=2 and C=1000, D=1001.
The differences are equal, but the relative differences are very different.
To convey that, you would compute |A-B|/(|A|+|B|) instead of just |A-B|.
This is symmetric to a swapping of A and B, and it's a relative distance.
Since it's relative, it can be compared to the other relative distance: |C-D|/(|C|+|D|). These relative distances are comparable, they're expressing a notion that is universal and applies to all pairs of numbers.
In summary, compute the relative GED, using G0, the null graph, like you would use 0 if you were measuring the relative distance between real numbers.

How can I reduce the nodes in a ggraph arc graph?

I'm trying to create an arc graph showing relationships between nonprofits focusing on a subgraph centered on one of the nonprofits. There are so many nonprofits in this subgraph, I need to reduce the number of nodes in the arc graph to only focus on the strongest connections.
I've successfully filtered out edges below a weight of 50. But when I create the graph, the nodes are still remaining even though the edges have disappeared. How do I filter the unwanted nodes from the arc graph?
Here's my code, starting from the creation of the igraph object.
# Create an igraph object
NGO_igraph <- graph_from_data_frame(d = edges, vertices = nodes, directed = TRUE)
# Create a subgraph centered on a node
# Start by entering the node ID
nodes_of_interest <- c(48)
# Build the graph
selegoV <- ego(NGO_igraph, order=1, nodes = nodes_of_interest, mode = "all", mindist = 0)
selegoG <- induced_subgraph(NGO_igraph,unlist(selegoV))
# Reducing the graph based on edge weight
smaller <- delete.edges(selegoG, which(E(selegoG)$weight < 50))
# Plotting an arc graph
ggraph(smaller, layout = "linear") +
geom_edge_arc(aes(width = weight), alpha = 0.8) +
scale_edge_width(range = c(0.2, 2)) +
geom_node_text(aes(label = label)) +
labs(edge_width = "Interactions") +
theme_graph()
And here's the result I'm getting:
If you are only interested in omitting zero degree vertices or isolates (meaning vertices which have no incoming or outgoing edge) you could simply use the following line:
g <- induced.subgraph(g, degree(g) > 0)
However, this will delete all isolates. So if you are for some reason set on specificly deleting those vertices connected by edges smaller than 50 (and exempt other 'special' isolates), then you will need to clearly identify which those are:
special_vertex <- 1
v <- ends(g, which(E(g) < 50))
g <- delete.vertices(g, v[v != special_vertex])
You could also skip the delete.edges part by considering the strength of a vertex:
g <- induced.subgraph(g, strength(g) > 50)
Without any sample data I created this basic sample:
#define graph
g <- make_ring(10) %>%
set_vertex_attr("name", value = LETTERS[1:10])
g
V(g)
#delete edges going to and from vertice C
g<-delete.edges(g, E(g)[2:3])
#find the head and tails of each edge in graph
heads<-head_of(g, E(g))
tails<-tail_of(g, E(g))
#list of all used vetrices
combine<-unique(c(heads, tails))
#collect an vertices
v<-V(g)
#find vertices not in found set
toremove<-setdiff(v, combine)
#remove unwanted vertices
delete_vertices(g, toremove)
The basic process is to identify the start and end of all of the edges of interest, then compare this unique list with all of the edges and remove the ones not in the unique list.
From your code above the graph "smaller" would be used to find the vertices.
Hope this helps.

How can I reconstruct a network so that it maintains the same attributes and same shape but the nodes are in different locations?

I currently have a network that is similar in node count and edge count to the following:
set.seed(12)
net <- sample_gnp(20, 1/4)
V(net)$a <- sample(c(0, 1), vcount(net), replace = TRUE, prob = c(0.3, 0.7))
V(net)$b <- sample(c(0, 1), vcount(net), replace = TRUE, prob = c(0.5, 0.5))
V(net)$color <- V(net)$a + 4
plot(net)
This creates a distinct network with a unique shape. Is there a way that I can only move the 20 nodes in this network randomly and maintain the shape? I want the network to look the same visually but with different nodes (node A at coordinate (a, b) is replaced by node G and they have the same number of edges). So ideally I want the graph to make a change like the following: (Same proportion of yellow to blue nodes but you can tell that they've moved around while simultaneously maintaining the shape)
The idea is to add the layout as additional node attributes, which are kept constant.
l <- layout_nicely(net)
V(net)$x <- l[,1]
V(net)$y <- l[,2]
The other attributes are reshuffled following the same pattern in order to keep the attribute bundle intact.
pattern <- sample(1:vcount(net))
net2 <- net
V(net2)$a <- V(net2)$a[pattern]
V(net2)$b <- V(net2)$b[pattern]
V(net2)$color <- V(net2)$color[pattern]
Now if you plot the result, you will hopefully end up with the desired output.
par(mfrow = c(1,2))
plot(net)
plot(net2)
I am not sure what you are looking for, but here are a couple of other options.
Rotate the graph
LO = layout_nicely(net)
LO2 = LO
alpha=pi/4
LO2[,1] = cos(alpha)*LO[,1] + sin(alpha)*LO[,2]
LO2[,2] = -sin(alpha)*LO[,1] + cos(alpha)*LO[,2]
par(mfrow=c(1,2))
plot(net, layout=LO)
plot(net, layout=LO2)
Flip side to side
LO3 = LO
alpha=pi/4
LO3[,1] = -LO3[,1]
plot(net, layout=LO)
plot(net, layout=LO3)

Checking validity of topological sort

Given the following directed graph:
I determined the topological sort to be 0, 1, 2, 3, 7, 6, 5, 4 with the values for each node being:
d[0] = 1
f[0] = 16
d[1] = 2
f[1] = 15
d[2] = 3
f[2] = 14
d[3] = 4
f[3] = 13
d[4] = 7
f[4] = 8
d[5] = 6
f[5] = 9
d[6] = 5
f[6] = 10
d[7] = 11
f[7] = 12
Where d is discovery-time and f is finishing-time.
How can I check whether the topological sort is valid or not?
With python and networkx, you can check it as follows:
import networkx as nx
G = nx.DiGraph()
G.add_edges_from([(0, 2), (1, 2), (2, 3)])
all_topological_sorts = list(nx.algorithms.dag.all_topological_sorts(G))
print([0, 1, 2, 3] in all_topological_sorts) # True
print([2, 3, 1, 0] in all_topological_sorts) # False
However, note that in order to have a topological ordering, the graph must be a Directed Acyclic Graph (DAG). If G is not directed, NetworkXNotImplemented will be raised. If G is not acyclic (as in your case) NetworkXUnfeasible will be raised.
See documentation here.
If you want a less coding approach to this question (since it looks like your original topological ordering was generated without code), you can go back to the definition of a topological sort. Paraphrased from Emory University:
Topological ordering of nodes = an ordering (label) of the nodes/vertices such that for every edge (u,v) in G, u appears earlier than v in the ordering.
There's two ways that you could approach this question: from an edge perspective of a vertex perspective. I describe a naive (meaning with some additional space complexity and cleverness, you could improve on them) implementation of both below.
Edge approach
Iterate through the edges in G. For each edge, retrieve the index of each of its vertices in the ordering. Compared the indices. If the origin vertex isn't earlier than the destination vertex, return false. If you iterate through all of the edges without returning false, return true.
Complexity: O(E*V)
Vertex approach
Iterate through the vertices in your ordering. For each vertex, retrieve its list of outgoing edges. If any of those edges end in a vertex that precedes the current vertex in the ordering, return false. If you iterate through all the vertices without returning false, return true.
Complexity: O(V^2*E)
First, do a graph traversal to get the incoming degree of each vertex. Then start from the first vertex in your list. Every time, when we look at a vertex, we want to check two things 1) is the incoming degree of this vertex is 0? 2) is this vertex a neighbor of the previous vertex? We also want to decrement all its neighbors' incoming degree, as if we cut all edges. If we got a no from the previous questions at some point, we know that this is not a valid topological order. Otherwise, it is. This takes O(V + E) time.

How to calculate path/diameter of edge attributes in igraph?

Here's an example of a data frame you can convert to an edgelist and then into a graph. Notice that I have added 'km' as an attribute to the edgelist.
I'm not sure how to add 'km' as an edge attribute (so the distance between two nodes), but pretend that it's been done.
inst2 = c(2, 3, 4, 5, 6)
motherinst2 = c(7, 8, 9, 10, 11)
km = c(20, 30, 40, 25, 60)
df2 = data.frame(inst2, motherinst2)
edgelist = cbind(df2, km)
g = graph_from_data_frame(edgelist)
Now, how can I calculate the path lengths based on those km distances? I'm not interested in the number of edges or vertices in the path, just the sum of those km from a root to a leaf.
The km edge attribute already exists. When using graph_from_data_frame() any information stored in the 3rd column and up are stored in the edge. You can pull information from an edge with the igraph::E() function.
E(g) #identifies all of the edges
E(g)$km #identifies all of the `km` attributes for each edge
E(g)$km[1] #identifies the `km` attribute for the first edge (connecting 2 -> 7)
For completeness, let's say you have a node path that is greater than 1.
#lets add two more edges to the network
#and create a new and longer path between vertex named '2', and vertex named '7'
g <- g +
edge('2', '6', km = 10) +
edge('6', '7', km = 120)
#find all paths between 2 and 7
#don't forget that the names of vertices are strings, not numbers
paths <- igraph::all_simple_paths(g, '2', '7')
paths
#find the edge id for each of the connecting edges
#the below function accepts a vector of pairwise vectors.
#the ids are the edges between each pair of vectors
connecting_267 <- igraph::get.edge.ids(g, c('2','6' , '6','7'))
connecting_267
#get the km attribute for each of the edges
connecting_kms <- igraph::E(g)[connecting_267]$km
connecting_kms
sum(connecting_kms)
igraph is pretty powerful. It is definitely worth spending time and exploring its documentation. Also, Katherine Ognyanova created an AWESOME tutorial that is definitely worth everyone's time.

Resources