Finding the size of biconnected components in R - r

I am analyzing an undirected graph in R. I'm trying to (eventually) write a function to get the ratio of the size (number of vertices) of the largest connected component to the size of the largest biconnected component - of any random graph. I was able to extract the size of the largest connected component, but am having trouble with the size of the largest biconnected component. I started off using the igraph function biconnected_components on graph g:
bicomponent_list <- biconnected_components(g)
bicomponent_list$components # lists all of the components, including size and vertex names
length(bicomponent_list$components[[1]]) # returns number of vertices of first bicomponent
Then my half-baked idea was to somehow order this list in decreasing number of vertices, so that I can always call length(bicomponent_list$components[[1]]) and it will be the largest biconnected component. But I don't know how to sort this correctly. Perhaps I have to convert it to a vector? But I also don't know how to specify that I want the number of vertices in the vector. Does anyone know, or have a better way to do it? Thanks so much!
library(igraph)
# generating sample graph
g1 <- barabasi.game(100, 1, 5)
V(g1)$name <- as.character(1:100)
g2 <- erdos.renyi.game(50, graph.density(g1), directed = TRUE)
V(g2)$name <- as.character(101:200)
g3 <- graph.union(g1, g2, byname = TRUE)
# analyzing the bicomponents
bicomponent_list <- biconnected_components(g3)
bi_list <- as.list(bicomponent_list$components)
bi_list <- lapply(bi_list, length) # lists the sizes of all of the components that I want to reorder
My desired outcome would be ordering bi_list such that length(bicomponent_list$components[[1]]) returns the bicomponent with the most vertices.

The components property is a list containing vertex lists. You can iterate over them and find the length of them like so
sapply(bicomponent_list$components, length)
and if you just wanted the largest, wrap that in a max()
max(sapply(bicomponent_list$components, length))

Related

Creating weighted igraph network using two-column edge list

I'm in the process of creating a weighted igraph network object from a edge list containing two columns from and to. It has proven to be somewhat challenging for me, because when doing a workaround, I notice changes in the network metrics and I believe I'm doing something wrong.
library(igraph)
links <- read.csv2("edgelist.csv")
vertices <- read.csv2("vertices.csv")
network <- graph_from_data_frame(d=links,vertices = vertices,directed = TRUE)
##the following step is included to remove self-loops that I have used to include all isolate nodes to the network##
network <- simplify(network,remove.multiple = FALSE, remove.loops = TRUE)
In this situation I have successfully created a network object. However, it is not weighted. Therefore I create a second network object by taking the adjacency matrix from the objected created earlier and creating the new igraph object from it like this:
gettheweights <- get.adjacency(network)
network2 <- graph_from_adjacency_matrix(gettheweights,mode = "directed",weighted = TRUE)
However, after this when I call both of the objects, I notice a difference in the number of edges, why is this?
network2
IGRAPH ef31b3a DNW- 200 1092 --
network
IGRAPH 934d444 DN-- 200 3626 --
Additionally, I believe I've done something wrong because if they indeed would be the same network, shouldn't their densities be the same? Now it is not the case:
graph.density(network2)
[1] 0.02743719
graph.density(network)
[1] 0.09110553
I browsed and tried several different answers found from here but many were not 1:1 identical and I failed to find a solution.
All seems to be in order. When you re-project a network with edge-duplicates to be represented as a weight by the number of edges between given vertices, the density of your graph should change.
When you you test graph.density(network2) and graph.density(network), they should be different if indeed edge-duplicates were reduced to single-edges with weight as an edge attribute, as your output from network2 and network suggest.
This (over-) commented code goes through the process.
library(igraph)
# Data that should resemble yours
edges <- data.frame(from=c("A","B","C","D","E","A","A","A","B","C"),
to =c("A","C","D","A","B","B","B","C","B","D"))
vertices <- unique(unlist(edges))
# Building graphh in the same way as you do
g0 <- graph_from_data_frame(d=edges, vertices=vertices, directed = TRUE)
# Note that the graph is "DN--": directed, named, but NOT Weighted, since
# Instead of weighted edges, we have a whole lot of dubble edges
(g0)
plot(g0)
# We can se the dubble edges in the adjacency matrix as >1
get.adjacency(g0)
# Use simplify to remove LOOPS ONLY as we can see in the adjacency metrix test
g1 <- simplify(g0, remove.multiple = FALSE, remove.loops = TRUE)
get.adjacency(g1) == get.adjacency(g0)
# Turn the multiple edges into edge-weights by jumping through an adjacency matrix
g2 <- graph_from_adjacency_matrix(get.adjacency(g1), mode = "directed", weighted = TRUE)
# Instead of multiple edges (like many links between "A" and "B"), there are now
# just single edges with weights (hence the density of the network's changed).
graph.density(g1) == graph.density(g2)
# The former doubble edges are now here:
E(g2)$weight
# And we can see that the g2 is now "Named-Directed-Weighted" where g1 was only
# "Named-Directed" and no weights.
(g1);(g2)
# Let's plot the weights
E(g2)$width = E(g2)$weight*5
plot(g2)
A shortcoming of this/your method, however, is that the adjacency matrix is able to carry only the edge-count between any given vertices. If your edge-list contains more variables than i and j, the use of graph_from_data_frame() would normally embed edge-attributes of those variables for you straight from your csv-import (which is nice).
When you convert the edges into weights, however, you would loose that information. And, come to think of it, that information would have to be "converted" too. What would we do with two edges between the same vertices that have different edge-attributes?
At this point, the answer goes slightly beyond your question, but still stays in the realm of explaining the relation between graphs of multiple edges between the same vertices and their representation as weighted graphs with only one structural edge per verticy.
To convert edge-attributes along this transformation into a weighted graph, I suggest you'd use dplyr to "rebuild" any edge-attributes manually in order to keep control of how they are supposed to be merged down when recasting into a weighted one.
This picks up where the code above left off:
# Let's imagine that our original network had these two edge-attributes
E(g0)$coolness <- c(1,2,1,2,3,2,3,3,2,2)
E(g0)$hotness <- c(9,8,2,3,4,5,6,7,8,9)
# Plot the hotness
E(g0)$color <- colorRampPalette(c("green", "red"))(10)[E(g0)$hotness]
plot(g0)
# Note that the hotness between C and D are very different
# When we make your transformations for a weighted netowk, we loose the coolness
# and hotness information
g2 <- g0 %>% simplify(remove.multiple = FALSE, remove.loops = TRUE) %>%
get.adjacency() %>%
graph_from_adjacency_matrix(mode = "directed", weighted = TRUE)
g2$hotness # Naturally, the edge-attributes were lost!
# We can use dplyr to take controll over how we'd like the edge-attributes transfered
# when multiple edges in g0 with different edge attributes are supposed to merge into
# one single edge
library(dplyr)
recalculated_edge_attributes <-
data.frame(name = ends(g0, E(g0)) %>% as.data.frame() %>% unite("name", V1:V2, sep="->"),
hotness = E(g0)$hotness) %>%
group_by(name) %>%
summarise(mean_hotness = mean(hotness))
# We used a string-version of the names of connected verticies (like "A->B") to refere
# to the attributes of each edge. This can now be used to merge back the re-calculated
# edge-attributes onto the weighted graph in g2
g2_attributes <- data.frame(name = ends(g2, E(g2)) %>% as.data.frame() %>% unite("name", V1:V2, sep="->")) %>%
left_join(recalculated_edge_attributes, by="name")
# And manually re-attatch our mean-attributes onto the g2 network
E(g2)$mean_hotness <- g2_attributes$mean_hotness
E(g2)$color <- colorRampPalette(c("green", "red"))(max(E(g2)$mean_hotness))[E(g2)$mean_hotness]
# Note how the link between A and B has turned into the brown mean of the two previous
# green and red hotness-edges
plot(g2)
Sometimes, your analyses may benefit from either structure (weighted no duplicates or unweighted with duplicates). Algorithms for, for example, shortest paths are able to incorporate edge-weight as described in this answer, but other analyses might not allow for or be intuitive when using the weighted version of your network data.
Let purpose guide your structure.

R: Convert correlation matrix to edge list

I want to create a network graph of my data, where the weight of the edges is defined by the correlation coefficient in a correlation matrix. The connection is defined by being statistically significant or not.
Since I want to play around with some parameters I need to have this information in an edge list rather than in matrix form, but I'm struggling as to how to convert this. I have tried to used igraph as shown below, but I cannot figure out how to get the information on which correlations are significant and which are not into the edge list. I guess weight could be set to zero to code that info, but how do I combine a correlation matrix and a p-value matrix?
library(igraph)
g <- graph.adjacency(a,weighted=TRUE)
df <- get.data.frame(g)
df
It'd be great if you could provide a minimal reproducable example, but I think I understand what you're asking for. You'll need to make a graph from a matrix using graph_from_adjacency_matrix, but make sure to input something in the weighted parameter, because otherwise the elements in the matrix represent number of edges (less than 1 means no edges). Then you can create an edge list from the graph using as_data_frame. Then perform whatever calculation you want, or join any external data you have, then you can convert it back to a graph by using graph_from_data_frame
cor_mat <- cor(mtcars)
cor_g <- graph_from_adjacency_matrix(cor_mat, mode='undirected', weighted = 'correlation')
cor_edge_list <- as_data_frame(cor_g, 'edges')
only_sig <- cor_edge_list[abs(cor_edge_list$correlation) > .75, ]
new_g <- graph_from_data_frame(only_sig, F)
For the ones who still need this, here is the answer
library(igraph)
g <- graph.adjacency(a, mode="upper", weighted=TRUE, diag=FALSE)
e <- get.edgelist(g)
df <- as.data.frame(cbind(e,E(g)$weight))

different vertex shapes for each vertex of decomposed graph

I have a very large bipartite network model that I created from 5 million lines of a dataset. I decompose my network model because I can not draw a graph of this size. Now all I need is to plot the decompose graphics one by one. There is no problem with that. But I want to draw the graph with a shape according to the attributes of each node. For example, I want a square for the "A" attributes on my graph G, and a triangle for the "B" attributes. In addition to this I want to add vertex labels by attributes. Here is my codes to plot first component of graph after creating bipartite G and its work:
components <- decompose(G)
plot(components[[1]])
I tried something like this to adding labels and changing vertex shapes according to graph attributes but it didn't work:
plot(components[[1]], vertex.label= V(G)$attributes,
vertex.shape=c("square", "triangle"))
Does anyone can help me, I'm stuck. Thank you so much!
the components function returns a list of vertices which make up a component. So you need to traverse the list, create a subgraph and plot. As for plotting attributes you need to provide a reproducible example for us to help.
library(igraph)
set.seed(8675309)
g <- sample_gnp(200, p = 0.01)
V(g)$name <- paste0("Node", 1:vcount(g))
V(g)$shape <- sample(c("circle","square"), vcount(g), replace = T)
clu <- components(g)
grps <- groups(clu)
lapply(grps, function(x) plot(induced_subgraph(g, x)))

Why igraph::cluster_walktrap gives a different result for non directed isomorphic graphs?

I'm trying to use igraph::cluster_walktrap in R to look for communities inside of a graph, however I noticed a weird behaviour (or at least, a behaviour I am not able to explain).
Suppose you are given an undirected graph by defining a list of its edges. Say
a,b
c,d
e,f
...
Then, if I define another graph by swapping randomly selected vertices in the edge list definition:
a,b
d,c
e,f
...
I expect the two graphs to be isomorphic and the difference between the two graph to be empty. This is exactly what happens in R in my toy example. Following this line of reasoning, calling cluster_walktrap on the two graphs (using set.seed appropriately) should yield the same result since the two graphs are the same. This is not happening and the only explanation I can give is that the starting point of each random walk is not the same for the two graphs. Why is this?
You can follow my reasoning in the toy example below. I don't understand why the last two objects are not identical.
require(igraph)
# Number of vertices
verteces <- 50
# Swap randomly some elements in the edges definition
set.seed(20)
row_swapped <- sample(1:verteces,25,replace=F)
m_values <- sample(letters, verteces*2, replace=T) #1:100
# Build edge lists
m1 <- matrix(m_values, verteces, 2)
m1
a <- m1
colS <- seq(round(ncol(m1)*0.3))
m1[row_swapped, 2:1] <- m1[row_swapped, 1:2]
m1
b <- m1
# Define the two graphs
ag <- igraph::graph_from_edgelist(a, directed = F)
bg <- igraph::graph_from_edgelist(b, directed = F)
# Another way of building an isomorphic graph for testing
#bg <- permute(ag, sample(vcount(ag)))
# Should be empty: ok
difference(ag, bg)
# Should be TRUE: ok
isomorphic(ag,bg)
# I expect it to be TRUE but it isn't...
identical(ag, bg)
# Vertices
V(ag)
ag
V(bg)
bg
# Calculate community
set.seed(100)
ac1 <- cluster_walktrap(ag)
set.seed(100)
bc1 <- cluster_walktrap(bg)
# I expect all to be TRUE, however
# merges is different
# membership is different
# names are different
identical(ac1$merges, bc1$merges)
identical(ac1$modularity, bc1$modularity)
identical(ac1$membership, bc1$membership)
identical(ac1$names, bc1$names)
identical(ac1$vcount, bc1$vcount)
identical(ac1$algorithm, bc1$algorithm)
The results are not different. You have two things going on which is making your graphs not identical but isoporphic. I emphasize identical because it has a very strict definition.
1) identical(ag, bg) is not identical because the vertices and edges are not in the same order between the two graphs. Exactly, the same nodes and edges exist but they are not in the exact same place or orientation. For, example if I shuffle the rows of a and make a new graph...
a1 <- a[sample(1:nrow(a)), ]
a1g <- igraph::graph_from_edgelist(a1, directed = F)
identical(ag, a1g)
#[1] FALSE
2) This goes for edges as well. An edge is stored as node1, node2 and a flag if the edge is directed or not. so when you swap rows the representation at the "byte level" (I use this term loosely) is different even though the relationship is the same. Edge 44 represents the same relationship but is stored based on how it was constructed.
E(ag)[44]
# + 1/50 edge from 6318240 (vertex names):
# [1] q--d
E(bg)[44]
# + 1/50 edge from 38042e0 (vertex names):
# [1] d--q
So onto your cluster_walktrap, first, the function returns the index of the vertices, not the name which can be misleading. Which means the reason the objects aren't identical is because ag and bg have different ordering of nodes in the object.
If I reorder the membership by node name the two become identical.
identical(membership(bc1)[order(names(membership(bc1)))], membership(ac1)[order(names(membership(ac1)))])
#[1] TRUE

Solving Chinese Postman algorithm with eulerization

I'm would like to solve Chinese Postman problem in a graph where an eulerian cycle does not exist. So basically I'm looking for a path in a graph which visits every edge exactly once, and starts and ends at the same node. A graph will have an euler cycle if and only if every node has same number of edges entering into and going out of it. Obviously my graph doesn't .
I found out that Eulerization (making a graph Eulerian) could solve my question LINK. Can anyone suggest a script to add duplicate edges to a graph so that the resulting graph has no vertices of odd degree (and thus does have an Euler Circuit)?
Here is my example:
require(igraph)
require(graph)
require(eulerian)
require(GA)
g1 <- graph(c(1,2, 1,3, 2,4, 2,5, 1,5, 3,5, 4,7, 5,7, 5,8, 3,6, 6,8, 6,9, 9,11, 8,11, 8,10, 8,12, 7,10, 10,12, 11,12), directed = FALSE)
mat <- get.adjacency(g1)
mat <- as.matrix(mat)
rownames(mat) <- LETTERS[1:12]
colnames(mat) <- LETTERS[1:12]
g2 <- as(graphAM(adjMat=mat), "graphNEL")
hasEulerianCycle(g2)
Fun problem.
The graph you sugest in the code above, can be made to have duplicates that enable a eulerian cycle to be created. The function I provide below tries to add the minimum amount of duplicate edges, but also readily breaks the graph structure by adding new links if it has to.
You can run:
eulerian.g1 <- make.eulerian(g1)$graph
Check what the function did to your graph with:
make.eulerian(g1)$info
Bare in mind that:
This is not the only graph structure where duplicates added to the original g1 graph can form an eulerian cycle. Imagine for example my function looping the vertices of the graph backwards instead.
Your graph already has an uneven number of vertices with uneven degree, and all of the vertices that are, have neighbours with uneven degrees to pair them with. This function therefore works well four your particular example data.
The function could fail to produce a graph using only duplicates even in graphs where eulerian cycles are possible with correctly added duplicates. This is since it always goes for connecting a node with the first of its neighbours with uneven degree. If this is something that you'd absolutely like to get around, an MCMC-approach would be the way to go.
See also this excellent answer on probability calculation:
Here's my function in a full script that you can source out-of-the-box:
library(igraph)
# You asked about this graph
g1 <- graph(c(1,2, 1,3, 2,4, 2,5, 1,5, 3,5, 4,7, 5,7, 5,8, 3,6, 6,8, 6,9, 9,11, 8,11, 8,10, 8,12, 7,10, 10,12, 11,12), directed = FALSE)
# Make a CONNECTED random graph with at least n nodes
connected.erdos.renyi.game <- function(n,m){
graph <- erdos.renyi.game(n,m,"gnm",directed=FALSE)
graph <- delete_vertices(graph, (degree(graph) == 0))
}
# This is a random graph
g2 <- connected.erdos.renyi.game(n=12, m=16)
make.eulerian <- function(graph){
# Carl Hierholzer (1873) had explained how eulirian cycles exist for graphs that are
# 1) connected, and 2) contain only vertecies with even degrees. Based on this proof
# the posibility of an eulerian cycle existing in a graph can be tested by testing
# on these two conditions.
#
# This function assumes a connected graph.
# It adds edges to a graph to ensure that all nodes eventuall has an even numbered. It
# tries to maintain the structure of the graph by primarily adding duplicates of already
# existing edges, but can also add "structurally new" edges if the structure of the
# graph does not allow.
# save output
info <- c("broken" = FALSE, "Added" = 0, "Successfull" = TRUE)
# Is a number even
is.even <- function(x){ x %% 2 == 0 }
# Graphs with an even number of verticies with uneven degree will more easily converge
# as eulerian.
# Should we even out the number of unevenly degreed verticies?
search.for.even.neighbor <- !is.even(sum(!is.even(degree(graph))))
# Loop to add edges but never to change nodes that have been set to have even degree
for(i in V(graph)){
set.j <- NULL
#neighbors of i with uneven number of edges are good candidates for new edges
uneven.neighbors <- !is.even(degree(graph, neighbors(graph,i)))
if(!is.even(degree(graph,i))){
# This node needs a new connection. That edge e(i,j) needs an appropriate j:
if(sum(uneven.neighbors) == 0){
# There is no neighbor of i that has uneven degree. We will
# have to break the graph structure and connect nodes that
# were not connected before:
if(sum(!is.even(degree(graph))) > 0){
# Only break the structure if it's absolutely nessecary
# to force the graph into a structure where an euclidian
# cycle exists:
info["Broken"] <- TRUE
# Find candidates for j amongst any unevenly degreed nodes
uneven.candidates <- !is.even(degree(graph, V(graph)))
# Sugest a new edge between i and any node with uneven degree
if(sum(uneven.candidates) != 0){
set.j <- V(graph)[uneven.candidates][[1]]
}else{
# No candidate with uneven degree exists!
# If all edges except the last have even degrees, thith
# function will fail to make the graph eulerian:
info["Successfull"] <- FALSE
}
}
}else{
# A "structurally duplicated" edge may be formed between i one of
# the nodes of uneven degree that is already connected to it.
# Sugest a new edge between i and its first neighbor with uneven degree
set.j <- neighbors(graph, i)[uneven.neighbors][[1]]
}
}else if(search.for.even.neighbor == TRUE & is.null(set.j)){
# This only happens once (probably) in the beginning of the loop of
# treating graphs that have an uneven number of verticies with uneven
# degree. It creates a duplicate between a node and one of its evenly
# degreed neighbors (if possible)
info["Added"] <- info["Added"] + 1
set.j <- neighbors(graph, i)[ !uneven.neighbors ][[1]]
# Never do this again if a j is correctly set
if(!is.null(set.j)){search.for.even.neighbor <- FALSE}
}
# Add that a new edge to alter degrees in the desired direction
# OBS: as.numeric() since set.j might be NULL
if(!is.null(set.j)){
# i may not link to j
if(i != set.j){
graph <- add_edges(graph, edges=c(i, set.j))
info["Added"] <- info["Added"] + 1
}
}
}
# return the graph
(list("graph" = graph, "info" = info))
}
# Look at what we did
eulerian <- make.eulerian(g1)
eulerian$info
g <- eulerian$graph
par(mfrow=c(1,2))
plot(g1)
plot(g)

Resources