How to calculate path/diameter of edge attributes in igraph? - r

Here's an example of a data frame you can convert to an edgelist and then into a graph. Notice that I have added 'km' as an attribute to the edgelist.
I'm not sure how to add 'km' as an edge attribute (so the distance between two nodes), but pretend that it's been done.
inst2 = c(2, 3, 4, 5, 6)
motherinst2 = c(7, 8, 9, 10, 11)
km = c(20, 30, 40, 25, 60)
df2 = data.frame(inst2, motherinst2)
edgelist = cbind(df2, km)
g = graph_from_data_frame(edgelist)
Now, how can I calculate the path lengths based on those km distances? I'm not interested in the number of edges or vertices in the path, just the sum of those km from a root to a leaf.

The km edge attribute already exists. When using graph_from_data_frame() any information stored in the 3rd column and up are stored in the edge. You can pull information from an edge with the igraph::E() function.
E(g) #identifies all of the edges
E(g)$km #identifies all of the `km` attributes for each edge
E(g)$km[1] #identifies the `km` attribute for the first edge (connecting 2 -> 7)
For completeness, let's say you have a node path that is greater than 1.
#lets add two more edges to the network
#and create a new and longer path between vertex named '2', and vertex named '7'
g <- g +
edge('2', '6', km = 10) +
edge('6', '7', km = 120)
#find all paths between 2 and 7
#don't forget that the names of vertices are strings, not numbers
paths <- igraph::all_simple_paths(g, '2', '7')
paths
#find the edge id for each of the connecting edges
#the below function accepts a vector of pairwise vectors.
#the ids are the edges between each pair of vectors
connecting_267 <- igraph::get.edge.ids(g, c('2','6' , '6','7'))
connecting_267
#get the km attribute for each of the edges
connecting_kms <- igraph::E(g)[connecting_267]$km
connecting_kms
sum(connecting_kms)
igraph is pretty powerful. It is definitely worth spending time and exploring its documentation. Also, Katherine Ognyanova created an AWESOME tutorial that is definitely worth everyone's time.

Related

Sequencing of river network calculation using sfnetworks and r

I’ve got a rooted tree setup in sfnetworks. It is derived from a line shapefile of a stream network. For each tributary in the stream network, there are obviously start and end nodes which sfnetworks determines, but there are also nodes connecting between start and end nodes on the line. I am going to take the example network developed in this question for simplicity sake.
library(sfnetworks)
library(sf)
library(tidygraph)
library(dplyr)
library(tidyverse)
rm(list = ls())
n01 = st_sfc(st_point(c(0, 0)))
n02 = st_sfc(st_point(c(1, 2)))
n03 = st_sfc(st_point(c(1, 3)))
n04 = st_sfc(st_point(c(1, 4)))
n05 = st_sfc(st_point(c(2, 1)))
n06 = st_sfc(st_point(c(2, 3)))
n07 = st_sfc(st_point(c(2, 4)))
n08 = st_sfc(st_point(c(3, 2)))
n09 = st_sfc(st_point(c(3, 3)))
n10 = st_sfc(st_point(c(3, 4)))
n11 = st_sfc(st_point(c(4, 2)))
n12 = st_sfc(st_point(c(4, 4)))
from = c(1, 2, 2, 3, 3, 5, 5, 8, 8, 9, 9)
to = c(5, 3, 6, 4, 7, 2, 8, 9, 11, 10, 12)
nodes = st_as_sf(c(n01, n02, n03, n04, n05, n06, n07, n08, n09, n10, n11, n12))
edges = data.frame(from = from, to = to)
G_1 = sfnetwork(nodes, edges)%>%
convert(to_spatial_explicit, .clean = TRUE) %>%
activate(edges) %>%
mutate(edgeID = c("a","c","d","e","f","b","g","h","i","j","k")) %>%
mutate(dL = c(1100, 300, 100, 100, 100, 500, 500, 300, 100, 100, 100))
ggplot()+
geom_sf(data = G_1 %>% activate(edges) %>% as_tibble() %>% st_as_sf(), aes(color = factor(edgeID)), size = 1.5)+
geom_sf(data = G_1 %>% activate(nodes) %>% as_tibble() %>% st_as_sf())+
scale_color_brewer(palette = "Paired")+
labs(x = "Longitude", y = "Latitude", color = "EdgeID")+
theme_bw()
I am working on a solving an equation as you move upstream on a stream, starting at the river outlet. I'll save you as much of the details on that equation as I can, but essentially:
I am solving for hx at each node, x, based on node attributes.
hx = sqrt(h0^2 + (N*dX/K)*(2*L – dX))
N, dX, K, and L are all fixed parameters known before we start the calculation.
N, K are fixed for all nodes, they never change.
dX is a node specific attribute.
h0 and L are edge specific attributes (they are the same for all nodes on an edge)
h0 is defined by the user at the start of the calculation, but then changes as you traverse the network upstream, changing at the end node when new edges converge.
h0 is initially set to 10, N = 0.5, K = 1500.
For sake of simplicity of presenting this, we will say that each edge is 100m long.
dX just reflects the distance on each edge. The downstream node on each edge, dX = 0. The upstream node on each edge, dX = 100.
L reflects the total distance of channel upstream of an edge, plus the total length of that edge.
I want to start the calculation at the downstream node of edge a with h0 = 10.
hx is calculated at that starting point and then at every node along the edge (in this example only the starting and ending node is shown) until you get to the junction where edge g and edge b merge to form edge a.
At that end point of edge a:
hx = sqrt(10^2 + (0.5*100/1500)*(2*1100-100)).
hx = 13.03
At this junction of edge g and edge b, I want to update h0 to reflect the hx calculated at the end point of edge a, so 13.03. This hx value will be used as h0 for both edge g and edge b.
I now want to make this calculation for each subnetwork. I will start with the subnetwork that starts at edge b. h0 has been changed to 13.03, we calculate hx for the nodes on edge b, then get to the junction of edge c and edge d.
At that end point of edge b:
hx = sqrt(13.03^2 + (0.5*100/1500)*(2*500-100))
hx = 14.13
h0 is updated for both edge c and edge d so that it reflects hx of edge b’s endpoint (14.13)
hx is calculated for all nodes on edge d, and again on edge c.
At the endpoint of edge c, h0 is updated to reflect the hx at that junction of edge e and edge f
hx = sqrt(14.13^2+(0.5*100/1500)*(2*300 – 100))
hx = 14.79
hx is finally calculated for all the nodes on edge e and edge f, using that hx calculated at the endpoint of edge c for the h0. That completes the traversal of that subnetwork.
We return to edge g to calculate hx for the subnetwork that begins there.
As we know from previous calculations, hx at the endpoint of edge a is 13.03. We now want h0 to again reflect this value for the beginning of the traversal of the subnetwork starting at edge g.
When we get to the node junction at edges h and edge i, again, we update h0 to reflect the hx calculated at the end point of edge g.
hx = sqrt(13.03^2 +(0.5*100/1500)*(2*500-100))
hx = 14.13
This is the value used as h0 for edges h and edge i. We calculate hx on edge i and then on edge h. h0 is updated a final time where edge j and edge k merge to reflect the hx calculated at the endpoint of edge h.
hx = sqrt(14.13^2 + (0.5*100/1500)*(2*500-100))
hx = 14.79
After that final update to h0, hx is calculated for all nodes on edge j and edge k.
In gist, h0 needs to be updated as you move upstream to new edges to reflect the maximum hx of the immediate downstream edge. There is a certain order of solution that must be achieved in that, before solving any edge (outside of the first edge), you need to know the ho for that segment, meaning you have to solve the edge below first, to determine the hx to use.
I have struggled to come up with a loop/iteration/recursion solution to this calculation that at the surface seems like it should be quite easily solved. The solution along a single drainage thread is very easy to do manually, but when it is scaled up to a whole network there are these sequencing concerns that must be addressed. When you have many thousand of edges in a network, manually making these calculations would not be an option.
This type of calculation sequencing is quite common in many basic hydrologic analyses (calculating stream order, stream magnitude, calculating upstream contributing area, etc.).
sfnetworks and tidygraph seem well suited to make this type of calculation but there are few example applications on hydrologic networks (most tools for graph analysis are geared for road networks which, given the difference in terms of number of users, makes complete sense).
I tried to make this as much of a repex as possible, if more is needed I will happily supply that, it was difficult to do given the more conceptual nature of the question.

How to calculate minimum spanning tree in R

Given a graph of N vertices and the distance between the edges of the vertices stored in tuple T1 = (d11, d12, …, d1n) to Tn = (dn1, dn2, …, dnn). Find out a minimum spanning tree of this graph starting from the vertex V1. Also, print the total distance travel needed to travel this generated tree.
Example:
For N =5
T1 = (0, 4, 5, 7, 5)
T2 = (4, 0, 6, 2, 5)
T3 = (5, 6, 0, 2, 1)
T4 = (7, 2, 2, 0, 5)
T5 = (5, 5, 1, 5, 0)
Selection of edges according to minimum distance are:
V1 -> V2 = 4
V2 -> V4 = 2
V4 -> V3 = 2
V3 -> V5 = 1
Thus, MST is V1 -> V2 -> V4 -> V3 -> V5 and the distance travelled is 9 (4+2+2+1)
Literally,I don't have idea about how to create a graph of n vertices in R.
I searched in google but i didn't understand how to approach above problem.
Please,help me.
Your question doesn't seem to match the title - you're after the graph creation not the MST? Once you've got a graph, as #user20650 says, the MST itself is easy.
It is easy to create a graph of size n, but there is a whole lot of complexity about which nodes are connected and their weights (distances) that you don't tell us about, so this is a really basic illustration.
If we assume that all nodes are connected to all other nodes (full graph), we can use make_full_graph. If that isn't the case, you either need data to say which nodes are connected or use a random graph.
# create graph
n <- 5
g <- make_full_graph(n)
The next issue is the distances. You haven't given us any information on how those distances are distributed, but we can demonstrate assigning them to the graph. Here, I'll just use random uniform [0-1] numbers:
# number of edges in an (undirected) full graph is (n2 - n) /2 but
# it is easier to just ask the graph how many edges it has - this
# is more portable if you change from make_full_graph
n_edge <- gsize(g)
g <- set_edge_attr(g, 'weight', value=runif(n_edge))
plot(g)
The next bit is just the MST itself, using minimum.spanning.tree:
mst <- minimum.spanning.tree(g)
The output mst looks like this:
IGRAPH dc21276 U-W- 5 4 -- Full graph
+ attr: name (g/c), loops (g/l), weight (e/n)
+ edges from dc21276:
[1] 1--4 1--5 2--3 2--5

How can I reduce the nodes in a ggraph arc graph?

I'm trying to create an arc graph showing relationships between nonprofits focusing on a subgraph centered on one of the nonprofits. There are so many nonprofits in this subgraph, I need to reduce the number of nodes in the arc graph to only focus on the strongest connections.
I've successfully filtered out edges below a weight of 50. But when I create the graph, the nodes are still remaining even though the edges have disappeared. How do I filter the unwanted nodes from the arc graph?
Here's my code, starting from the creation of the igraph object.
# Create an igraph object
NGO_igraph <- graph_from_data_frame(d = edges, vertices = nodes, directed = TRUE)
# Create a subgraph centered on a node
# Start by entering the node ID
nodes_of_interest <- c(48)
# Build the graph
selegoV <- ego(NGO_igraph, order=1, nodes = nodes_of_interest, mode = "all", mindist = 0)
selegoG <- induced_subgraph(NGO_igraph,unlist(selegoV))
# Reducing the graph based on edge weight
smaller <- delete.edges(selegoG, which(E(selegoG)$weight < 50))
# Plotting an arc graph
ggraph(smaller, layout = "linear") +
geom_edge_arc(aes(width = weight), alpha = 0.8) +
scale_edge_width(range = c(0.2, 2)) +
geom_node_text(aes(label = label)) +
labs(edge_width = "Interactions") +
theme_graph()
And here's the result I'm getting:
If you are only interested in omitting zero degree vertices or isolates (meaning vertices which have no incoming or outgoing edge) you could simply use the following line:
g <- induced.subgraph(g, degree(g) > 0)
However, this will delete all isolates. So if you are for some reason set on specificly deleting those vertices connected by edges smaller than 50 (and exempt other 'special' isolates), then you will need to clearly identify which those are:
special_vertex <- 1
v <- ends(g, which(E(g) < 50))
g <- delete.vertices(g, v[v != special_vertex])
You could also skip the delete.edges part by considering the strength of a vertex:
g <- induced.subgraph(g, strength(g) > 50)
Without any sample data I created this basic sample:
#define graph
g <- make_ring(10) %>%
set_vertex_attr("name", value = LETTERS[1:10])
g
V(g)
#delete edges going to and from vertice C
g<-delete.edges(g, E(g)[2:3])
#find the head and tails of each edge in graph
heads<-head_of(g, E(g))
tails<-tail_of(g, E(g))
#list of all used vetrices
combine<-unique(c(heads, tails))
#collect an vertices
v<-V(g)
#find vertices not in found set
toremove<-setdiff(v, combine)
#remove unwanted vertices
delete_vertices(g, toremove)
The basic process is to identify the start and end of all of the edges of interest, then compare this unique list with all of the edges and remove the ones not in the unique list.
From your code above the graph "smaller" would be used to find the vertices.
Hope this helps.

How to calculate the number of vertices contracted into one graph?

I have a few large igraph objects that represent social networks. All nodes have various attributes, among them sector which is a factor variable. I have contracted this large network into a small where vertices represent groups and edges have the sum of individual edges in the original network. The label attribute in the second network represents the sector attribute in the first.
groupnet <- contract(g, as.integer(as.factor(V(g)$sector)), "ignore")
E(groupnet)$weight <- 1
groupnet <- simplify(groupnet, edge.attr.comb = list(weight = "sum"))
V(groupnet)$label <- levels(as.factor(V(g)$sector))
I would like to add another attribute to the second object V(groupnet)$groupsize that represents the number of original vertices that were contracted into groupnet. I have tried it with the following code but it did not work:
V(groupnet)$groupsize <- length(V(g)$sector[V(g)$sector == V(groupnet)$label])
How can I do this properly?
table() could be helpful here. Try out:
set.seed(1234)
library(igraph)
g <- make_ring(1000)
V(g)$sector <- factor(sample(LETTERS, 100, replace = T))
V(g)$sector
## contracted network
groupnet <- contract(g, as.integer(as.factor(V(g)$sector)), "ignore")
E(groupnet)$weight <- 1
V(groupnet)$label <- levels(as.factor(V(g)$sector))
## number of original vertices that were contracted into groupnet
# the tip is to see that table(V(g)$sector) provides the number of vertices per sector and
# its output is also arranged like V(groupnet)
table(V(g)$sector)
V(groupnet)
# solution
V(groupnet)$groupsize <- as.numeric(table(V(g)$sector))

Add missing mutual edges, while not changing attributes of existing mutual edges in R (igraph)

I have a directed graph, G. Some of the edges in G are reciprocal, and some are not. For reciprocal edges, edge attributes may be different. That is E(v1,v2)$att may not equal E(v2,v1)$att.
I need to fill in all missing reciprocal edges (that is, if E(v2,v1) does not exist while E(v1,v2) does, I want to create E(v2, v1) and copy all attribute information from E(v1, v2)).
If the reciprocal edge does exist, I need to keep the unique edge attribute information.
There are a lot of edges and a lot of attributes, so I am trying to avoid a loop here. Currently, where g1 is the directed but incomplete graph, I:
#make undirected with loops for reciprocal friendships
g2 <- as.undirected(g1, mode = c("each"))
#force everything to be directed, create new edges
g <- as.directed(g2, mode = c("mutual"))
#get rid of the double loops.
gnew <- simplify(g, remove.multiple = TRUE,
edge.attr.comb = "random")
The only problem with this is edge.attr.comb = "random" That is, I override the pre-existing mutual edge attribute information. I am thinking that I can flag missing mutual edges from g1and add the necessary edges (and copy their attribute information) using which_mutual but am having a difficult time with the indexing of edges. I must be missing an easy solution. An example:
g <- graph_from_literal(A+-+B, A-+C)
E(g)$att1 <- c(1,2,3)
#I want (note the default order of vertices for igraph)
g2 <- graph_from_literal(A+-+B, A+-+C)
E(g2)$att1 <- c(1, 2, 3, 2)
Figured it out. Perhaps not the most eloquent solution, but it works.
As an example,
g <- graph_from_literal(10-+40, 10-+30, 20+-+40)
E(g)$att1 <- c(1,2,3, 4)
E(g)$att2 <- c(10, 11, 12, 13)
######################################################################
test <- which((which_mutual(g) == FALSE))
head <- head_of(g,test)
tail <- tail_of(g,test)
combine <- matrix(c(head,tail), ncol = length(test), byrow = TRUE)
combineV <- as.vector(combine)
attributes <- list(att1 = E(g)$att1[test],att2 = E(g)$att2[test])
gnew <- add_edges(g,combineV, attr = attributes)

Resources