IGraph: network distance until stop node/vertex - r

I have an igraph network that contains two types of nodes, one set that describes my points/nodes of interest (NOI) and another set that act as barriers (B) in my network. Now I'd like to measure the total length of all edges that are connected starting from a specific NOI until a barrier is approached.
Here a short example using a ring-shape in igraph:
set.seed(123)
g <- make_ring(10) %>%
set_edge_attr("weight", value = rnorm(10,100,20))%>%
set_vertex_attr("barrier", value = c(0,0,1,0,0,1,0,0,1,0))%>%
set_vertex_attr("color", value = c("green","green","red",
"green","green","red",
"green","green","red","green"))
For example when starting from my node 1 (NOI, green) all edges until the nodes 9 and 3 are reachable (the nodes 9 and 3 are barriers B and block). Thus the total connected length of edges for NOI 1 is the sum of the lengths/weights of edges 1--2,2--3,1--10 and 10--9. The same value is true for node 10 as starting node. I the end I am interested in a list/dataframe of all NOI and their total length of reachable network. How to best proceed in R using igraph? Is there a built-in function in igraph?

Here's one possible strategy. First, I set a name for each node so I will be preserved during graph transformations
V(g)$name = seq.int(vcount(g))
Now I drop all the barriers and split the graph up into separate connected nodes of interest that will all share the same length.
gd <- g %>% induced_subgraph(V(g)[V(g)$barrier==0]) %>% decompose()
Then We can write a helper function that takes a subgraph and finds all the incident edges for the nodes in the subgraph in the the original graph, extracts the weights, and sums them up
get_connected_length <- function(x) {
incident_edges(g, V(g)$name %in% V(x)$name) %>% do.call("c", .) %>% unique() %>% .$weight %>% sum()
}
Now we apply the function to each of the subgraphs and extract the node names
n <- gd %>% Map(function(x) V(x)$name, .)
w <- gd %>% Map(get_connected_length, .)
And we can combine that data all together in a matrix
do.call("rbind", Map(cbind, n, w))
# [,1] [,2]
# [1,] 1 361.5366
# [2,] 2 361.5366
# [3,] 10 361.5366
# [4,] 4 335.1701
# [5,] 5 335.1701
# [6,] 7 318.2184
# [7,] 8 318.2184

Related

How to number nodes when creating decision trees in R?

In R I am creating a data frame of the structure of decision trees. The issue I'm facing is, I have to number the nodes of the trees in a certain way that will allow me to plot them later. However, Im struggling to find a good way to number the nodes. Hopefully my example below will explain the issue.
For example, if I have a column in my data frame that describes the path or direction of the nodes, like so:
df <- data.frame(
var = c("P", "L", "R", "RL", "RR",
"P", "L", "R" , "RL", "RR", "LL", "LR", "RRL", "RRR")
)
Here, P means the parent node, L means left node, R means right node, RL means the left node from the previous right node etc... The diagram below shows what the decision trees made from df$var would look like:
So, as we can see, every time we reach a P in df$var, we start a new decision tree, as it is the parent.
Now, I want to try and number the nodes, so I can plot them. I initially tried numbering the nodes sequentially, like so:
df <- df %>%
group_by(newVal = cumsum(var == "P")) %>%
mutate(node = 1:length(var)) %>%
ungroup() %>%
select(-newVal)
df
var node
P 1
L 2
R 3
RL 4
RR 5
P 1
L 2
R 3
RL 4
RR 5
LL 6
LR 7
RRL 8
RRR 9
For clarity, that would look like this:
But as you can see (mainly in the 2nd tree), due to the original ordering of df$var, it results in a non-intuitive numbering of the nodes. This presents a problem when I try to plot the tree.
The issue is, when Im plotting the trees, I have to create data frames (for each tree) with 2 columns. That is, from and to, where we go from node x to node y. Using the 2nd image as an example, my data frames for plotting would look like this:
tree.1.Edges <- data.frame(
from = c(1,1,3,3),
to = c(2,3,4,5)
)
tree.2.edges <- data.frame(
from = c(1,1,2,2,3,3,5,5),
to = c(2,3,6,7,4,5,8,9)
)
Im finding it difficult to come up with a way to automate the process of creating the tree edges data frames using my method of sequentially numbering the nodes. Does anyone have any suggestions as to a better way I could do this?
Prefix
This is my solution. It returns a list of the edges with correctly numbered nodes.
The Nodes are numbered like this:
Parent node number < Child node number
Left node number < Right node number
Code
library(tidyverse)
df <- data.frame(
var = c("P", "L", "R", "RL", "RR",
"P", "L", "R" , "RL", "RR", "LL", "LR", "RRL", "RRR"),
stringsAsFactors = FALSE # important for character operations
)
#enumerate tree ids
# a new tree is initialized when a parent node "P" is initialized
df$tree <-cumsum(df$var=="P") # Cumsum increments for every TRUE by one
#sort nodes so that Left nodes are in front of Right nodes
# and every deeper level of the tree is numbered after
# the preceeding level
df <- df %>% group_by(tree) %>% mutate(level = nchar(var)) %>%
group_by(tree) %>% arrange(level, # arrange by level first
# custom alphabet function where P comes first
# As L comes in front of R in the alphabet
# longer strings are
# correctly sorted
ifelse(var=="P",1,match(LETTERS,var)+1),
.by_group = TRUE)
# define the nodes as row numbers resetting at every tree
df <- df %>% group_by(tree) %>% mutate(node = row_number())
## At this point the nodes are numbered according to your specifications
# Find out parent node by deleting the last character from every node name (var)
df <- df %>% group_by(tree) %>% mutate(parent_node_name=substr(var,0,nchar(var)-1))
# define parent node of P as NA
df$parent_node_name[df$var=="P"] <- NA
# define parent nodes vars with still empty parent node name as "P"
df$parent_node_name[df$parent_node_name==""] <- "P"
# Match parent node names to node numbers
df <- df %>%
group_by(tree) %>%
mutate(parent_node_num = match(parent_node_name,var))
# split the dataframe into a list of dfs, one for each tree
list_edges <- split(df,df$tree)
# for every dataframe in the list, replace by a result dataframe (res)
list_edges <- lapply(list_edges, function(df_tree){
res <- data.frame(
from = df_tree$parent_node_num,
to = df_tree$node
)
# delete NAs from result
res <- res[!is.na(res$from),]
return(res)
})
# Show result
list_edges
# $`1`
# from to
# 2 1 2
# 3 1 3
# 4 3 4
# 5 3 5
#
# $`2`
# from to
# 2 1 2
# 3 1 3
# 4 3 4
# 5 3 5
# 6 2 6
# 7 2 7
# 8 5 8
# 9 5 9
The code is quite convoluted, but you can insert df at any point to look at the intermediate results. Or simply post a comment.

Contract verticies by attribute with igraph

I am working on a graph, where each node has an attribute "group" of the following: "Baby Product", "Book" "CE" "DVD" "Music" "Software" "Toy" "Video" "Video Games".
I would like to know how to plot a graph reppresenting those communities: there shall be 9 verticies, one for each group, and a link (possibly weighted) each time two nodes of two categories are connected.
I have tried using the igraph contract function, but this is the result:
> contract(fullnet, mapping=as.factor(products$group), vertex.attr.comb = products$group)
Error in FUN(X[[i]], ...) :
Unknown/unambigous attribute combination specification
Inoltre: Warning message:
In igraph.i.attribute.combination(vertex.attr.comb) :
Some attributes are duplicated
I guess I have misunderstood what this function is used for.
Now I am thinking about creating a new edgelist, made like the one before but instead of the Id of each vertex the name of the group. Sadly, I do not know how to do this in a fast way on an edgelist of over 1200000 elements.
Thank you very much in advance.
I think using contract() should be correct. In the example code below, I added an anonymous function to vertex.attr.comb to combine the vertices by group. Then, simplify() removes loop edges and calculate the sum of edge weight.
# Create example graph
set.seed(1)
g <- random.graph.game(10, 0.2)
V(g)$group <- rep(letters[1:3], times = c(3, 3, 4))
E(g)$weight <- 1:length(E(g))
E(g)
# + 9/9 edges from 7017c6a:
# [1] 2-- 3 3-- 4 4-- 7 5-- 7 5-- 8 7-- 8 3-- 9 2--10 9--10
E(g)$weight
# [1] 1 2 3 4 5 6 7 8 9
# Contract graph by `group` attribute of vertices
g1 <- contract(g, factor(V(g)$group),
vertex.attr.comb = function(x) levels(factor(x)))
# Remove loop edges and compute the sum of edge weight by group
g1 <- simplify(g1, edge.attr.comb = "sum")
E(g1)
# + 3/3 edges from a852397:
# [1] 1--2 1--3 2--3
E(g1)$weight
# [1] 2 15 12

Define and categorise separate networks in R

I have an issue that I've been unable to optimise and I'm sure that either igraph or tidy graphs must hold this function already or there must be a better way to do this. I am using R and igraph to do this but possibly tidygraphs would also do the job.
Problem: How to define networks a list of over two million edges (node 1 - linked to - node 2) into their own separate networks and to then define the network as it's highest weighted node category.
Data:
Edges:
from
to
1
2
3
4
5
6
7
6
8
6
This creates 3 networks N.B. in the real example we have loops and multiple edges to and from nodes (this is why I've used igraph as it easily deals with these).
Data: Node categories:
id
cat
weight
1
traffic accident
10
2
abuse
50
3
abuse
50
4
speeding
5
5
murder
100
6
abuse
50
7
speeding
5
8
abuse
50
Final table:
The final table categorises each node and labels each network with the max category of the nodes
id
idcat
networkid
networkcat
1
traffic accident
1
50
2
abuse
1
50
3
abuse
2
50
4
speeding
2
50
5
murder
3
100
6
abuse
3
100
7
speeding
3
100
8
abuse
3
100
Current iterative solution and code:
If there is no better solution to this then maybe we can speed this iteration up?
library(tidyverse)
library(igraph)
library(purrr) #might be an answer
library(tidyverse)
library(tidygraph) #might be an answer
from <- c(1,3,5,7,8)
to <- c(2,4,6,6,6)
edges <- data.frame(from,to)
id <- c(1,2,3,4,5,6,7,8)
cat <- c("traffic accident","abuse","abuse","speeding","murder","abuse","speeding","abuse")
weight <- c(10,50,50,5,100,50,5,50)
details <- data.frame(id,cat,weight)
g <- graph_from_data_frame(edges)# we can add the vertex details here as well g <-
graph_from_data_frame(edges,vertices=details) but we join these in later
plot(g)
dg <- decompose(g)# decomposing the network defines the separate networks
networks <- data.frame(id=as.integer(),
network_id=as.integer())
for (i in 1:length(dg)) { # this is likely too many to do at once. As the networks are already defined we can split this into chunks. There is a case here for parellisation
n <- dg[[i]][1] %>% # using the decomposed list of lists from i graph. There is an issue here as the list comes back with the node as an index. I can't find an easier way to get this out
as.data.frame() %>% # I can't work a way to bring out the data without changing to df and then using row names
row.names() %>% # and this returns a vector
as.data.frame() %>%
rename(id=1) %>%
mutate(network_id = i,
id=as.integer(id))
networks <-bind_rows(n,networks)
}
networks <- networks %>%
inner_join(details) # one way to bring in details
n_weight <- networks %>%
group_by(network_id) %>%
summarise(network_weight=max(weight))
networks <- networks %>%
inner_join(n_weight)
networks # final answer
filtered_n <- networks %>%
filter(network_weight==100) %>%
select(network_id) %>%
distinct()#this brings out just the network ID's of whatever we happen to want
filtered_n <- networks %>%
filter(network_id %in% filtered_n_id$network_id)
edges %>%
filter(from %in% filtered_n$id | to %in% filtered_n$id ) %>%
graph_from_data_frame() %>%
plot() # returns only the network/s that we want to view
Here is a solution just using igraph and base R.
networkid <- components(g)$membership
Table <- aggregate(id, list(networkid), function(x) { max(weight[x]) })
networkcat <- Table$x[networkid]
Final <- data.frame(id, idcat=cat, networkid, networkcat)
Final
id idcat networkid networkcat
1 1 traffic accident 1 50
2 2 abuse 1 50
3 3 abuse 2 50
4 4 speeding 2 50
5 5 murder 3 100
6 6 abuse 3 100
7 7 speeding 3 100
8 8 abuse 3 100

Extract the hierarchical structure of the nodes in a dendrogram or cluster

I would like to extract the hierarchical structure of the nodes of a dendrogram or cluster.
For example in the next example:
library(dendextend)
dend15 <- c(1:5) %>% dist %>% hclust(method = "average") %>% as.dendrogram
dend15 %>% plot
The nodes are classified according their position in the dendrogram (see figure below)
(Figure extracted from the dendextend package's tutorial)
I would like to get all the nodes for each final leaf as the next output:
(the labels are ordered from left to right and from bottom to top)
hierarchical structure
leaf_1: 3-2-1
leaf_2: 4-2-1
leaf_3: 6-5-1
leaf_4: 8-7-5-1
leaf_5: 9-7-5-1
Thanks in advance,
First I find all subtrees (i.e structure) that uses a node. In your example, there would be 9 nodes.
subtrees <- partition_leaves(dend15)
leaves <- subtrees[[1]] # assume top node is used by all subtrees
I make a helper function to find route for each leaf, and apply it to all leaves.
pathRoutes <- function(leaf) {
which(sapply(subtrees, function(x) leaf %in% x))
}
paths <- lapply(leaves, pathRoutes)
The raw output in list form, where each list element is the structure for an end node / leaf
> paths
[[1]]
[1] 1 2 3
[[2]]
[1] 1 2 4
[[3]]
[1] 1 5 6
[[4]]
[1] 1 5 7 8
[[5]]
[1] 1 5 7 9

R reading an adjacency list from file

I have an output.csv file with adjacency list of a graph. It is in the following format..
Every line starts with the source node (which is an integer) followed by the nodes it is connected to. The nodes are separated from each other and from the source node by a space (' ') separator..
A snapshot looks as follows:
0 2 5 8
1 2 7 4 6
2 0 1
3 4 7 8
4 1 3
I want to read this into an adjacency list format and use it to plot in igraph. What is the simplest way to do this ? Thanks..
Your data is not a proper adjacency list, because it is missing the lists for 5-8. So I just removed these vertices from your list.
Igraph has a function to create a graph from an adjacency list, so you just need to read in the data, and create the graph from the adjacency list with graph.adjlist. Here is one way to do it, not necessarily the simplest:
## magrittr for the %>% pipes
library(magrittr)
library(igraph)
## sample data
text <- "0 2\n1 2 4\n2 0 1\n3 4\n4 1 3"
## read in as lines, replace textConnection(text) with your file name
lines <- readLines(textConnection(text))
g <- lines %>%
strsplit(split = " ") %>% # 1
lapply(as.numeric) %>% # 2
lapply(extract, -1) %>% # 3
lapply(add, 1) %>% # 4
graph.adjlist(mode = "all") # 5
g
#> IGRAPH U--- 5 4 --
#> + edges:
#> [1] 1--3 2--3 2--5 4--5
Some explanation for the long pipe steps:
We split the lines at single spaces.
Convert them to numeric.
Drop the first number from each line, this is not needed for graph.adjlist.
Add one to all numbers, since igraph vertex ids start with one, yours seem to start with zero.
Call graph.adjlist to create an undirected graph.

Resources