Creating weighted igraph network using two-column edge list - r

I'm in the process of creating a weighted igraph network object from a edge list containing two columns from and to. It has proven to be somewhat challenging for me, because when doing a workaround, I notice changes in the network metrics and I believe I'm doing something wrong.
library(igraph)
links <- read.csv2("edgelist.csv")
vertices <- read.csv2("vertices.csv")
network <- graph_from_data_frame(d=links,vertices = vertices,directed = TRUE)
##the following step is included to remove self-loops that I have used to include all isolate nodes to the network##
network <- simplify(network,remove.multiple = FALSE, remove.loops = TRUE)
In this situation I have successfully created a network object. However, it is not weighted. Therefore I create a second network object by taking the adjacency matrix from the objected created earlier and creating the new igraph object from it like this:
gettheweights <- get.adjacency(network)
network2 <- graph_from_adjacency_matrix(gettheweights,mode = "directed",weighted = TRUE)
However, after this when I call both of the objects, I notice a difference in the number of edges, why is this?
network2
IGRAPH ef31b3a DNW- 200 1092 --
network
IGRAPH 934d444 DN-- 200 3626 --
Additionally, I believe I've done something wrong because if they indeed would be the same network, shouldn't their densities be the same? Now it is not the case:
graph.density(network2)
[1] 0.02743719
graph.density(network)
[1] 0.09110553
I browsed and tried several different answers found from here but many were not 1:1 identical and I failed to find a solution.

All seems to be in order. When you re-project a network with edge-duplicates to be represented as a weight by the number of edges between given vertices, the density of your graph should change.
When you you test graph.density(network2) and graph.density(network), they should be different if indeed edge-duplicates were reduced to single-edges with weight as an edge attribute, as your output from network2 and network suggest.
This (over-) commented code goes through the process.
library(igraph)
# Data that should resemble yours
edges <- data.frame(from=c("A","B","C","D","E","A","A","A","B","C"),
to =c("A","C","D","A","B","B","B","C","B","D"))
vertices <- unique(unlist(edges))
# Building graphh in the same way as you do
g0 <- graph_from_data_frame(d=edges, vertices=vertices, directed = TRUE)
# Note that the graph is "DN--": directed, named, but NOT Weighted, since
# Instead of weighted edges, we have a whole lot of dubble edges
(g0)
plot(g0)
# We can se the dubble edges in the adjacency matrix as >1
get.adjacency(g0)
# Use simplify to remove LOOPS ONLY as we can see in the adjacency metrix test
g1 <- simplify(g0, remove.multiple = FALSE, remove.loops = TRUE)
get.adjacency(g1) == get.adjacency(g0)
# Turn the multiple edges into edge-weights by jumping through an adjacency matrix
g2 <- graph_from_adjacency_matrix(get.adjacency(g1), mode = "directed", weighted = TRUE)
# Instead of multiple edges (like many links between "A" and "B"), there are now
# just single edges with weights (hence the density of the network's changed).
graph.density(g1) == graph.density(g2)
# The former doubble edges are now here:
E(g2)$weight
# And we can see that the g2 is now "Named-Directed-Weighted" where g1 was only
# "Named-Directed" and no weights.
(g1);(g2)
# Let's plot the weights
E(g2)$width = E(g2)$weight*5
plot(g2)
A shortcoming of this/your method, however, is that the adjacency matrix is able to carry only the edge-count between any given vertices. If your edge-list contains more variables than i and j, the use of graph_from_data_frame() would normally embed edge-attributes of those variables for you straight from your csv-import (which is nice).
When you convert the edges into weights, however, you would loose that information. And, come to think of it, that information would have to be "converted" too. What would we do with two edges between the same vertices that have different edge-attributes?
At this point, the answer goes slightly beyond your question, but still stays in the realm of explaining the relation between graphs of multiple edges between the same vertices and their representation as weighted graphs with only one structural edge per verticy.
To convert edge-attributes along this transformation into a weighted graph, I suggest you'd use dplyr to "rebuild" any edge-attributes manually in order to keep control of how they are supposed to be merged down when recasting into a weighted one.
This picks up where the code above left off:
# Let's imagine that our original network had these two edge-attributes
E(g0)$coolness <- c(1,2,1,2,3,2,3,3,2,2)
E(g0)$hotness <- c(9,8,2,3,4,5,6,7,8,9)
# Plot the hotness
E(g0)$color <- colorRampPalette(c("green", "red"))(10)[E(g0)$hotness]
plot(g0)
# Note that the hotness between C and D are very different
# When we make your transformations for a weighted netowk, we loose the coolness
# and hotness information
g2 <- g0 %>% simplify(remove.multiple = FALSE, remove.loops = TRUE) %>%
get.adjacency() %>%
graph_from_adjacency_matrix(mode = "directed", weighted = TRUE)
g2$hotness # Naturally, the edge-attributes were lost!
# We can use dplyr to take controll over how we'd like the edge-attributes transfered
# when multiple edges in g0 with different edge attributes are supposed to merge into
# one single edge
library(dplyr)
recalculated_edge_attributes <-
data.frame(name = ends(g0, E(g0)) %>% as.data.frame() %>% unite("name", V1:V2, sep="->"),
hotness = E(g0)$hotness) %>%
group_by(name) %>%
summarise(mean_hotness = mean(hotness))
# We used a string-version of the names of connected verticies (like "A->B") to refere
# to the attributes of each edge. This can now be used to merge back the re-calculated
# edge-attributes onto the weighted graph in g2
g2_attributes <- data.frame(name = ends(g2, E(g2)) %>% as.data.frame() %>% unite("name", V1:V2, sep="->")) %>%
left_join(recalculated_edge_attributes, by="name")
# And manually re-attatch our mean-attributes onto the g2 network
E(g2)$mean_hotness <- g2_attributes$mean_hotness
E(g2)$color <- colorRampPalette(c("green", "red"))(max(E(g2)$mean_hotness))[E(g2)$mean_hotness]
# Note how the link between A and B has turned into the brown mean of the two previous
# green and red hotness-edges
plot(g2)
Sometimes, your analyses may benefit from either structure (weighted no duplicates or unweighted with duplicates). Algorithms for, for example, shortest paths are able to incorporate edge-weight as described in this answer, but other analyses might not allow for or be intuitive when using the weighted version of your network data.
Let purpose guide your structure.

Related

How to calculate the number of vertices contracted into one graph?

I have a few large igraph objects that represent social networks. All nodes have various attributes, among them sector which is a factor variable. I have contracted this large network into a small where vertices represent groups and edges have the sum of individual edges in the original network. The label attribute in the second network represents the sector attribute in the first.
groupnet <- contract(g, as.integer(as.factor(V(g)$sector)), "ignore")
E(groupnet)$weight <- 1
groupnet <- simplify(groupnet, edge.attr.comb = list(weight = "sum"))
V(groupnet)$label <- levels(as.factor(V(g)$sector))
I would like to add another attribute to the second object V(groupnet)$groupsize that represents the number of original vertices that were contracted into groupnet. I have tried it with the following code but it did not work:
V(groupnet)$groupsize <- length(V(g)$sector[V(g)$sector == V(groupnet)$label])
How can I do this properly?
table() could be helpful here. Try out:
set.seed(1234)
library(igraph)
g <- make_ring(1000)
V(g)$sector <- factor(sample(LETTERS, 100, replace = T))
V(g)$sector
## contracted network
groupnet <- contract(g, as.integer(as.factor(V(g)$sector)), "ignore")
E(groupnet)$weight <- 1
V(groupnet)$label <- levels(as.factor(V(g)$sector))
## number of original vertices that were contracted into groupnet
# the tip is to see that table(V(g)$sector) provides the number of vertices per sector and
# its output is also arranged like V(groupnet)
table(V(g)$sector)
V(groupnet)
# solution
V(groupnet)$groupsize <- as.numeric(table(V(g)$sector))

compare communities from graphs with different number of vertices

I am calculating louvain communities on graphs of communications data, where vertices represent performers on a big project. The graphs represent different communication methods (e.g., email, phone).
We want to try to identify teams of performers from their communication data. Since performers have preferences for different communication methods, the graphs are of different sizes and may have some unique vertices which may not be present in both. When I try to compare the community objects from the respective graphs, igraph::compare() throws an exception. See toy reprex below.
I considered a dplyr::full_join() or inner_join() of the vertex lists before constructing the graph & community objects to make them the same size, but worry about the impact of doing so on the resulting cluster_louvain() solutions.
Any ideas on how I can compare the community objects to one another from these different communication methods? Thanks in advance!
library(tidyverse, warn.conflicts = FALSE)
library(igraph, warn.conflicts = FALSE)
nodes <- as_tibble(list(id = c("sample1", "sample2", "sample3")))
edge <- as_tibble(list(from = "sample1",
to = "sample2"))
net <- graph_from_data_frame(d = edge, vertices = nodes, directed = FALSE)
com <- cluster_louvain(net)
nodes2 <- as_tibble(list(id = c("sample1","sample21", "sample22","sample23"
)))
edge2 <- as_tibble(list(from = c("sample1", "sample21"),
to = c("sample21", "sample22")))
net2 <- graph_from_data_frame(d = edge2, vertices = nodes2, directed = FALSE)
com2 <- cluster_louvain(net2)
# # uncomment to see graph plots
# plot.igraph(net, mark.groups = com)
# plot.igraph(net2, mark.groups = com2)
compare(com, com2)
#> Error in i_compare(comm1, comm2, method): At community.c:3106 : community membership vectors have different lengths, Invalid value
Created on 2019-02-22 by the reprex package (v0.2.1)
You will not (I don't believe) be able to compare clustering algorithms from two different graphs that contain two different sets of nodes. Practically you can't do it in igraph and conceptually its hard because the way clustering algorithms are compared is by considering all pairs of nodes in a graph and checking whether they are placed in the same cluster or a different cluster in each of the two clustering approaches. If both clustering approaches typically put the same nodes together and the same nodes apart then they are considered more similar.1
I suppose another valid way to approach the problem would be to evaluate how similar the clustering schemes are for purely the set of nodes that are the intersection of the two graphs. You'll have to decide what makes more sense in your setting. I'll show how to do it using the union of nodes rather than the intersection.
So you need all the same nodes in both graphs in order to make the comparison. In fact, I think the easier way to do it is to put all the same nodes in one graph and have different edge types. Then you can compute your clusters for each edge type separately and then make the comparison. The reprex below is hopefully clear:
# repeat your set-up
library(tidyverse, warn.conflicts = FALSE)
library(igraph, warn.conflicts = FALSE)
nodes <- as_tibble(list(id = c("sample1", "sample2", "sample3")))
edge <- as_tibble(list(from = "sample1",
to = "sample2"))
nodes2 <- as_tibble(list(id = c("sample1","sample21", "sample22","sample23")))
edge2 <- as_tibble(list(from = c("sample1", "sample21"),
to = c("sample21", "sample22")))
# approach from a single graph
# concatenate edges
edges <- rbind(edge, edge2)
# create an edge attribute indicating network type
edges$type <- c("phone", "email", "email")
# the set of nodes (across both graphs)
nodes <- unique(rbind(nodes, nodes2))
g <- graph_from_data_frame(d = edges, vertices = nodes, directed = F)
# We cluster over the graph without the email edges
com_phone <- cluster_louvain(g %>% delete_edges(E(g)[E(g)$type=="email"]))
plot(g, mark.groups = com_phone)
# Now we can cluster over the graph without the phone edges
com_email <- cluster_louvain(g %>% delete_edges(E(g)[E(g)$type=="phone"]))
plot(g, mark.groups = com_email)
# Now we can compare
compare(com_phone, com_email)
#> [1] 0.7803552
As you can see from the plots we pick out the same initial clustering structure you found in the separate graphs with the additions of the extra isolated nodes.
1: Obviously this is a pretty vague explanation. The default algorithm used in compare is from this paper, which has a nice discussion.

R: Convert correlation matrix to edge list

I want to create a network graph of my data, where the weight of the edges is defined by the correlation coefficient in a correlation matrix. The connection is defined by being statistically significant or not.
Since I want to play around with some parameters I need to have this information in an edge list rather than in matrix form, but I'm struggling as to how to convert this. I have tried to used igraph as shown below, but I cannot figure out how to get the information on which correlations are significant and which are not into the edge list. I guess weight could be set to zero to code that info, but how do I combine a correlation matrix and a p-value matrix?
library(igraph)
g <- graph.adjacency(a,weighted=TRUE)
df <- get.data.frame(g)
df
It'd be great if you could provide a minimal reproducable example, but I think I understand what you're asking for. You'll need to make a graph from a matrix using graph_from_adjacency_matrix, but make sure to input something in the weighted parameter, because otherwise the elements in the matrix represent number of edges (less than 1 means no edges). Then you can create an edge list from the graph using as_data_frame. Then perform whatever calculation you want, or join any external data you have, then you can convert it back to a graph by using graph_from_data_frame
cor_mat <- cor(mtcars)
cor_g <- graph_from_adjacency_matrix(cor_mat, mode='undirected', weighted = 'correlation')
cor_edge_list <- as_data_frame(cor_g, 'edges')
only_sig <- cor_edge_list[abs(cor_edge_list$correlation) > .75, ]
new_g <- graph_from_data_frame(only_sig, F)
For the ones who still need this, here is the answer
library(igraph)
g <- graph.adjacency(a, mode="upper", weighted=TRUE, diag=FALSE)
e <- get.edgelist(g)
df <- as.data.frame(cbind(e,E(g)$weight))

Why igraph::cluster_walktrap gives a different result for non directed isomorphic graphs?

I'm trying to use igraph::cluster_walktrap in R to look for communities inside of a graph, however I noticed a weird behaviour (or at least, a behaviour I am not able to explain).
Suppose you are given an undirected graph by defining a list of its edges. Say
a,b
c,d
e,f
...
Then, if I define another graph by swapping randomly selected vertices in the edge list definition:
a,b
d,c
e,f
...
I expect the two graphs to be isomorphic and the difference between the two graph to be empty. This is exactly what happens in R in my toy example. Following this line of reasoning, calling cluster_walktrap on the two graphs (using set.seed appropriately) should yield the same result since the two graphs are the same. This is not happening and the only explanation I can give is that the starting point of each random walk is not the same for the two graphs. Why is this?
You can follow my reasoning in the toy example below. I don't understand why the last two objects are not identical.
require(igraph)
# Number of vertices
verteces <- 50
# Swap randomly some elements in the edges definition
set.seed(20)
row_swapped <- sample(1:verteces,25,replace=F)
m_values <- sample(letters, verteces*2, replace=T) #1:100
# Build edge lists
m1 <- matrix(m_values, verteces, 2)
m1
a <- m1
colS <- seq(round(ncol(m1)*0.3))
m1[row_swapped, 2:1] <- m1[row_swapped, 1:2]
m1
b <- m1
# Define the two graphs
ag <- igraph::graph_from_edgelist(a, directed = F)
bg <- igraph::graph_from_edgelist(b, directed = F)
# Another way of building an isomorphic graph for testing
#bg <- permute(ag, sample(vcount(ag)))
# Should be empty: ok
difference(ag, bg)
# Should be TRUE: ok
isomorphic(ag,bg)
# I expect it to be TRUE but it isn't...
identical(ag, bg)
# Vertices
V(ag)
ag
V(bg)
bg
# Calculate community
set.seed(100)
ac1 <- cluster_walktrap(ag)
set.seed(100)
bc1 <- cluster_walktrap(bg)
# I expect all to be TRUE, however
# merges is different
# membership is different
# names are different
identical(ac1$merges, bc1$merges)
identical(ac1$modularity, bc1$modularity)
identical(ac1$membership, bc1$membership)
identical(ac1$names, bc1$names)
identical(ac1$vcount, bc1$vcount)
identical(ac1$algorithm, bc1$algorithm)
The results are not different. You have two things going on which is making your graphs not identical but isoporphic. I emphasize identical because it has a very strict definition.
1) identical(ag, bg) is not identical because the vertices and edges are not in the same order between the two graphs. Exactly, the same nodes and edges exist but they are not in the exact same place or orientation. For, example if I shuffle the rows of a and make a new graph...
a1 <- a[sample(1:nrow(a)), ]
a1g <- igraph::graph_from_edgelist(a1, directed = F)
identical(ag, a1g)
#[1] FALSE
2) This goes for edges as well. An edge is stored as node1, node2 and a flag if the edge is directed or not. so when you swap rows the representation at the "byte level" (I use this term loosely) is different even though the relationship is the same. Edge 44 represents the same relationship but is stored based on how it was constructed.
E(ag)[44]
# + 1/50 edge from 6318240 (vertex names):
# [1] q--d
E(bg)[44]
# + 1/50 edge from 38042e0 (vertex names):
# [1] d--q
So onto your cluster_walktrap, first, the function returns the index of the vertices, not the name which can be misleading. Which means the reason the objects aren't identical is because ag and bg have different ordering of nodes in the object.
If I reorder the membership by node name the two become identical.
identical(membership(bc1)[order(names(membership(bc1)))], membership(ac1)[order(names(membership(ac1)))])
#[1] TRUE

revealing clusters of interaction in igraph

I have an interaction network and I used the following code to make an adjacency matrix and subsequently calculate the dissimilarity between the nodes of the network and then cluster them to form modules:
ADJ1=abs(adjacent-mat)^6
dissADJ1<-1-ADJ1
hierADJ<-hclust(as.dist(dissADJ1), method = "average")
Now I would like those modules to appear when I plot the igraph.
g<-simplify(graph_from_adjacency_matrix(adjacent-mat, weighted=T))
plot.igraph(g)
However the only thing that I have found thus far to translate hclust output to graph is as per the following tutorial: http://gastonsanchez.com/resources/2014/07/05/Pretty-tree-graph/
phylo_tree = as.phylo(hierADJ)
graph_edges = phylo_tree$edge
graph_net = graph.edgelist(graph_edges)
plot(graph_net)
which is useful for hierarchical lineage but rather I just want the nodes that closely interact to cluster as follows:
Can anyone recommend how to use a command such as components from igraph to get these clusters to show?
igraph provides a bunch of different layout algorithms which are used to place nodes in the plot.
A good one to start with for a weighted network like this is the force-directed layout (implemented by layout.fruchterman.reingold in igraph).
Below is a example of using the force-directed layout using some simple simulated data.
First, we create some mock data and clusters, along with some "noise" to make it more realistic:
library('dplyr')
library('igraph')
library('RColorBrewer')
set.seed(1)
# generate a couple clusters
nodes_per_cluster <- 30
n <- 10
nvals <- nodes_per_cluster * n
# cluster 1 (increasing)
cluster1 <- matrix(rep((1:n)/4, nodes_per_cluster) +
rnorm(nvals, sd=1),
nrow=nodes_per_cluster, byrow=TRUE)
# cluster 2 (decreasing)
cluster2 <- matrix(rep((n:1)/4, nodes_per_cluster) +
rnorm(nvals, sd=1),
nrow=nodes_per_cluster, byrow=TRUE)
# noise cluster
noise <- matrix(sample(1:2, nvals, replace=TRUE) +
rnorm(nvals, sd=1.5),
nrow=nodes_per_cluster, byrow=TRUE)
dat <- rbind(cluster1, cluster2, noise)
colnames(dat) <- paste0('n', 1:n)
rownames(dat) <- c(paste0('cluster1_', 1:nodes_per_cluster),
paste0('cluster2_', 1:nodes_per_cluster),
paste0('noise_', 1:nodes_per_cluster))
Next, we can use Pearson correlation to construct our adjacency matrix:
# create correlation matrix
cor_mat <- cor(t(dat))
# shift to [0,1] to separate positive and negative correlations
adj_mat <- (cor_mat + 1) / 2
# get rid of low correlations and self-loops
adj_mat <- adj_mat^3
adj_mat[adj_mat < 0.5] <- 0
diag(adj_mat) <- 0
Cluster the data using hclust and cutree:
# convert to dissimilarity matrix and cluster using hclust
dissim_mat <- 1 - adj_mat
dend <- dissim_mat %>%
as.dist %>%
hclust
clusters = cutree(dend, h=0.65)
# color the nodes
pal = colorRampPalette(brewer.pal(11,"Spectral"))(length(unique(clusters)))
node_colors <- pal[clusters]
Finally, create an igraph graph from the adjacency matrix and plot it using the fruchterman.reingold layout:
# create graph
g <- graph.adjacency(adj_mat, mode='undirected', weighted=TRUE)
# set node color and plot using a force-directed layout (fruchterman-reingold)
V(g)$color <- node_colors
coords_fr = layout.fruchterman.reingold(g, weights=E(g)$weight)
# igraph plot options
igraph.options(vertex.size=8, edge.width=0.75)
# plot network
plot(g, layout=coords_fr, vertex.color=V(g)$color)
In the above code, I generated two "clusters" of correlated rows, and a third group of "noise".
Hierarchical clustering (hclust + cuttree) is used to assign the data points to clusters, and they are colored based on cluster membership.
The result looks like this:
For some more examples of clustering and plotting graphs with igraph, checkout: http://michael.hahsler.net/SMU/LearnROnYourOwn/code/igraph.html
You haven't shared some toy data for us to play with and suggest improvements to code, but your question states that you are only interested in plotting your clusters distinctly - that is, graphical presentation.
Although igraph comes with some nice force directed layout algorithms, such as layout.fruchterman.reingold, layout_with_kk, etc., they can, in presence of a large number of nodes, quickly become difficult to interpret and make sense of at all.
Like this:
With these traditional methods of visualising networks,
the layout algorithms, rather than the data, determine the visualisation
similar networks may end up being visualised very differently
large number of nodes will make the visualisation difficult to interpret
Instead, I find Hive Plots to be better at displaying important network properties, which, in your instance, are the cluster and the edges.
In your case, you can:
Plot each cluster on a different straight line
order the placement of nodes intelligently, so that nodes with certain properties are placed at the very end or start of each straight line
Colour the edges to identify direction of edge
To achieve this you will need to:
use the ggnetwork package to turn your igraph object into a dataframe
map your clusters to the nodes present in this dataframe
generate coordinates for the straight lines and map these to each cluster
use ggplot to visualise
There is also a hiveR package in R, should you wish to use a packaged solution. You might also find another visualisation technique for graphs very useful: BioFabric

Resources