I am trying to use igraph/ggraph to plot a network, in which some of the edges are directed and others are not.
A small example from my edgelist. Here the protein-site edges are what I want to represent as undirected and the phosphorylation edge as directed.
df <- data.frame(
stringsAsFactors = FALSE,
from = c("RPS6KA3", "RPS6KA3", "RPS6KA3", "RPS6KA3", "RPS6KA3"),
to = c("RPS6KA3_Y529-p",
"RPS6KA3_S227-p","RPS6KA3_S369-p","RPS6KA3_T577-p","ATF4"),
action = c("protein-site","protein-site",
"protein-site","protein-site","phosphorylation")
)
I have tried subsetting the undirected edges and specifying them as such, but it didn't work:
library(igraph)
nw <- graph_from_data_frame(df)
E(nw)[E(nw)$action == "protein-site"] <- as.undirected(subgraph.edges(nw, E(nw)[E(nw)$action == "protein-site"] ))
Does anyone have any other suggestions?
As I say, I am really only wanting to plot this (using ggraph).
Thanks!
If you would be willing to plot with igraph you can import the list of edges as directed and then use edge.arrow.mode.
nw <- graph_from_data_frame(df, directed = T)
plot(nw)
plot(nw,
edge.arrow.mode = ifelse(E(nw)$action=="protein-site", "-", ">"))
I am not sure ggraph supports anything similar. I thought it might be possible to change the size of the arrows, and set it to 0 for undirected edges. This doesnt work, as arrows inherit the style of the rest of the edge.
Related
I'm in the process of creating a weighted igraph network object from a edge list containing two columns from and to. It has proven to be somewhat challenging for me, because when doing a workaround, I notice changes in the network metrics and I believe I'm doing something wrong.
library(igraph)
links <- read.csv2("edgelist.csv")
vertices <- read.csv2("vertices.csv")
network <- graph_from_data_frame(d=links,vertices = vertices,directed = TRUE)
##the following step is included to remove self-loops that I have used to include all isolate nodes to the network##
network <- simplify(network,remove.multiple = FALSE, remove.loops = TRUE)
In this situation I have successfully created a network object. However, it is not weighted. Therefore I create a second network object by taking the adjacency matrix from the objected created earlier and creating the new igraph object from it like this:
gettheweights <- get.adjacency(network)
network2 <- graph_from_adjacency_matrix(gettheweights,mode = "directed",weighted = TRUE)
However, after this when I call both of the objects, I notice a difference in the number of edges, why is this?
network2
IGRAPH ef31b3a DNW- 200 1092 --
network
IGRAPH 934d444 DN-- 200 3626 --
Additionally, I believe I've done something wrong because if they indeed would be the same network, shouldn't their densities be the same? Now it is not the case:
graph.density(network2)
[1] 0.02743719
graph.density(network)
[1] 0.09110553
I browsed and tried several different answers found from here but many were not 1:1 identical and I failed to find a solution.
All seems to be in order. When you re-project a network with edge-duplicates to be represented as a weight by the number of edges between given vertices, the density of your graph should change.
When you you test graph.density(network2) and graph.density(network), they should be different if indeed edge-duplicates were reduced to single-edges with weight as an edge attribute, as your output from network2 and network suggest.
This (over-) commented code goes through the process.
library(igraph)
# Data that should resemble yours
edges <- data.frame(from=c("A","B","C","D","E","A","A","A","B","C"),
to =c("A","C","D","A","B","B","B","C","B","D"))
vertices <- unique(unlist(edges))
# Building graphh in the same way as you do
g0 <- graph_from_data_frame(d=edges, vertices=vertices, directed = TRUE)
# Note that the graph is "DN--": directed, named, but NOT Weighted, since
# Instead of weighted edges, we have a whole lot of dubble edges
(g0)
plot(g0)
# We can se the dubble edges in the adjacency matrix as >1
get.adjacency(g0)
# Use simplify to remove LOOPS ONLY as we can see in the adjacency metrix test
g1 <- simplify(g0, remove.multiple = FALSE, remove.loops = TRUE)
get.adjacency(g1) == get.adjacency(g0)
# Turn the multiple edges into edge-weights by jumping through an adjacency matrix
g2 <- graph_from_adjacency_matrix(get.adjacency(g1), mode = "directed", weighted = TRUE)
# Instead of multiple edges (like many links between "A" and "B"), there are now
# just single edges with weights (hence the density of the network's changed).
graph.density(g1) == graph.density(g2)
# The former doubble edges are now here:
E(g2)$weight
# And we can see that the g2 is now "Named-Directed-Weighted" where g1 was only
# "Named-Directed" and no weights.
(g1);(g2)
# Let's plot the weights
E(g2)$width = E(g2)$weight*5
plot(g2)
A shortcoming of this/your method, however, is that the adjacency matrix is able to carry only the edge-count between any given vertices. If your edge-list contains more variables than i and j, the use of graph_from_data_frame() would normally embed edge-attributes of those variables for you straight from your csv-import (which is nice).
When you convert the edges into weights, however, you would loose that information. And, come to think of it, that information would have to be "converted" too. What would we do with two edges between the same vertices that have different edge-attributes?
At this point, the answer goes slightly beyond your question, but still stays in the realm of explaining the relation between graphs of multiple edges between the same vertices and their representation as weighted graphs with only one structural edge per verticy.
To convert edge-attributes along this transformation into a weighted graph, I suggest you'd use dplyr to "rebuild" any edge-attributes manually in order to keep control of how they are supposed to be merged down when recasting into a weighted one.
This picks up where the code above left off:
# Let's imagine that our original network had these two edge-attributes
E(g0)$coolness <- c(1,2,1,2,3,2,3,3,2,2)
E(g0)$hotness <- c(9,8,2,3,4,5,6,7,8,9)
# Plot the hotness
E(g0)$color <- colorRampPalette(c("green", "red"))(10)[E(g0)$hotness]
plot(g0)
# Note that the hotness between C and D are very different
# When we make your transformations for a weighted netowk, we loose the coolness
# and hotness information
g2 <- g0 %>% simplify(remove.multiple = FALSE, remove.loops = TRUE) %>%
get.adjacency() %>%
graph_from_adjacency_matrix(mode = "directed", weighted = TRUE)
g2$hotness # Naturally, the edge-attributes were lost!
# We can use dplyr to take controll over how we'd like the edge-attributes transfered
# when multiple edges in g0 with different edge attributes are supposed to merge into
# one single edge
library(dplyr)
recalculated_edge_attributes <-
data.frame(name = ends(g0, E(g0)) %>% as.data.frame() %>% unite("name", V1:V2, sep="->"),
hotness = E(g0)$hotness) %>%
group_by(name) %>%
summarise(mean_hotness = mean(hotness))
# We used a string-version of the names of connected verticies (like "A->B") to refere
# to the attributes of each edge. This can now be used to merge back the re-calculated
# edge-attributes onto the weighted graph in g2
g2_attributes <- data.frame(name = ends(g2, E(g2)) %>% as.data.frame() %>% unite("name", V1:V2, sep="->")) %>%
left_join(recalculated_edge_attributes, by="name")
# And manually re-attatch our mean-attributes onto the g2 network
E(g2)$mean_hotness <- g2_attributes$mean_hotness
E(g2)$color <- colorRampPalette(c("green", "red"))(max(E(g2)$mean_hotness))[E(g2)$mean_hotness]
# Note how the link between A and B has turned into the brown mean of the two previous
# green and red hotness-edges
plot(g2)
Sometimes, your analyses may benefit from either structure (weighted no duplicates or unweighted with duplicates). Algorithms for, for example, shortest paths are able to incorporate edge-weight as described in this answer, but other analyses might not allow for or be intuitive when using the weighted version of your network data.
Let purpose guide your structure.
I'm trying to visualize the connections between the institutions in a medical faculty and just can't get the edges to be weighted and displayed thicker or thinner depending on the number of connections.
I've tried to combine the answers I found here playing around with edge.width = E(g)$weight and trying graph.strength(g). But honestly I have no idea what I'm doing. This is the first time I have to use R and I have no experience in programming whatsoever.
library(igraph)
D3 <- read.csv(file.choose(),header=TRUE,row.names = 1)
g <- graph.data.frame(D3, directed=FALSE)
plot(g,
vertex.size=20,
vertex.label.dist=1,
vertex.label.degree=-pi/2,
layout=layout_with_kk)
Igraph plots a network where every single connection is shown. Some institutions have multiple connections between each other which make the graph quite unattractive to look at. Only a Part of the table was used for this picture
My data looks like this and has about 1500 rows:
"1","NEUROLOGIE","MEDINF"
my data
Any help is much appreciated!
Using edge.width = E(g)$weight is the right idea, but you need to get the right weight. graph.strength(g) is a property of the vertices, but you need a weight for the edges. I don't know of a function that directly calculates how many edges there are between two vertices, but it is not hard to write one.
First, get a version of the graph with just one edge between each pair of connected vertices.
g2 = simplify(g)
Now we need to get the right weight for the edges of g2. If an edge connects two vertices, all shortest paths connecting those two vertices will be single edges, so for each edge of the simplified g2, we need to find the number of shortest paths (edges) between those vertices in the original g. Then we can plot.
E(g2)$weight = sapply(E(g2), function(e) {
length(all_shortest_paths(g, from=ends(g2, e)[1], to=ends(g2, e)[2])$res) } )
plot(g2,
vertex.size=15,
vertex.label.dist=0.5,
vertex.label.cex=0.8,
vertex.label.degree=-pi/2,
edge.width=E(g2)$weight,
layout=layout_with_kk,
margin=-0.2)
(I have slightly modified your plot statement to improve readability.)
Thank you so much for your help!! I was nowhere close to that.. To make it more readble I reduced the thickness of the edges and replaced the names with number, this is the code:
library(igraph)
D3 <- read.csv(file.choose(),header=TRUE,row.names = 1)
g <- graph.data.frame(D3, directed=FALSE)
g2 = simplify(g)
E(g2)$weight = sapply(E(g2), function(e) {
length(all_shortest_paths(g, from=ends(g2, e)[1], to=ends(g2, e)[2])$res) } )
tkplot(g2,
vertex.color= "gold",
vertex.label.color="red",
vertex.size=10,
vertex.label.cex=1,
edge.width=E(g2)$weight*0.15,
edge.color="grey",
layout=layout.reingold.tilford,
asp = .5,
margin=-0.95)
Creating:
Reingold.tilford
I find this visualization quite fine because the graph is interactive. Are there other ways to make it even more readable?
Thanks again for the help!
All the best,
Jay
I am calculating louvain communities on graphs of communications data, where vertices represent performers on a big project. The graphs represent different communication methods (e.g., email, phone).
We want to try to identify teams of performers from their communication data. Since performers have preferences for different communication methods, the graphs are of different sizes and may have some unique vertices which may not be present in both. When I try to compare the community objects from the respective graphs, igraph::compare() throws an exception. See toy reprex below.
I considered a dplyr::full_join() or inner_join() of the vertex lists before constructing the graph & community objects to make them the same size, but worry about the impact of doing so on the resulting cluster_louvain() solutions.
Any ideas on how I can compare the community objects to one another from these different communication methods? Thanks in advance!
library(tidyverse, warn.conflicts = FALSE)
library(igraph, warn.conflicts = FALSE)
nodes <- as_tibble(list(id = c("sample1", "sample2", "sample3")))
edge <- as_tibble(list(from = "sample1",
to = "sample2"))
net <- graph_from_data_frame(d = edge, vertices = nodes, directed = FALSE)
com <- cluster_louvain(net)
nodes2 <- as_tibble(list(id = c("sample1","sample21", "sample22","sample23"
)))
edge2 <- as_tibble(list(from = c("sample1", "sample21"),
to = c("sample21", "sample22")))
net2 <- graph_from_data_frame(d = edge2, vertices = nodes2, directed = FALSE)
com2 <- cluster_louvain(net2)
# # uncomment to see graph plots
# plot.igraph(net, mark.groups = com)
# plot.igraph(net2, mark.groups = com2)
compare(com, com2)
#> Error in i_compare(comm1, comm2, method): At community.c:3106 : community membership vectors have different lengths, Invalid value
Created on 2019-02-22 by the reprex package (v0.2.1)
You will not (I don't believe) be able to compare clustering algorithms from two different graphs that contain two different sets of nodes. Practically you can't do it in igraph and conceptually its hard because the way clustering algorithms are compared is by considering all pairs of nodes in a graph and checking whether they are placed in the same cluster or a different cluster in each of the two clustering approaches. If both clustering approaches typically put the same nodes together and the same nodes apart then they are considered more similar.1
I suppose another valid way to approach the problem would be to evaluate how similar the clustering schemes are for purely the set of nodes that are the intersection of the two graphs. You'll have to decide what makes more sense in your setting. I'll show how to do it using the union of nodes rather than the intersection.
So you need all the same nodes in both graphs in order to make the comparison. In fact, I think the easier way to do it is to put all the same nodes in one graph and have different edge types. Then you can compute your clusters for each edge type separately and then make the comparison. The reprex below is hopefully clear:
# repeat your set-up
library(tidyverse, warn.conflicts = FALSE)
library(igraph, warn.conflicts = FALSE)
nodes <- as_tibble(list(id = c("sample1", "sample2", "sample3")))
edge <- as_tibble(list(from = "sample1",
to = "sample2"))
nodes2 <- as_tibble(list(id = c("sample1","sample21", "sample22","sample23")))
edge2 <- as_tibble(list(from = c("sample1", "sample21"),
to = c("sample21", "sample22")))
# approach from a single graph
# concatenate edges
edges <- rbind(edge, edge2)
# create an edge attribute indicating network type
edges$type <- c("phone", "email", "email")
# the set of nodes (across both graphs)
nodes <- unique(rbind(nodes, nodes2))
g <- graph_from_data_frame(d = edges, vertices = nodes, directed = F)
# We cluster over the graph without the email edges
com_phone <- cluster_louvain(g %>% delete_edges(E(g)[E(g)$type=="email"]))
plot(g, mark.groups = com_phone)
# Now we can cluster over the graph without the phone edges
com_email <- cluster_louvain(g %>% delete_edges(E(g)[E(g)$type=="phone"]))
plot(g, mark.groups = com_email)
# Now we can compare
compare(com_phone, com_email)
#> [1] 0.7803552
As you can see from the plots we pick out the same initial clustering structure you found in the separate graphs with the additions of the extra isolated nodes.
1: Obviously this is a pretty vague explanation. The default algorithm used in compare is from this paper, which has a nice discussion.
I've been following documentation tutorials and even lecture tutorials step by step. But for some reason the output of my plot is like this:
The output doesn't make any sense to me. There clearly is no structure, or communities in this current plot, as you can see that the bigger circles are all overlapping. Shouldn't this, in this case, return only a single community? Additionally the modularity of my network is ~0.02 which would again, suggest there is no community structure. But why does it return 3 communities?
this is my code: (exactly same as in documentation, with different dataset)
m <- data.matrix(df)
g <- graph_from_adjacency_matrix(m, mode = "undirected")
#el <- get.edgelist(g)
wc <- cluster_walktrap(g)
modularity(wc)
membership(wc)
plot(wc,g)
my data set looks is a 500x500 adjacency matrix in the form of a csv, with a 1-500 column and index names corresponding to a person.
I tried understanding the community class and using different types of variables for the plot, e.g. membership(wc)[2] etc. My thought is that the coloring is simply wrong, but nothing Ive tried so far seems to fix the issue.
You can have inter-community connections. You're working with a graph of 500 nodes and they can have multiple connections. There will be a large number of connections between nodes of different communities, but if you conduct a random walk you're most likely to traverse connections between nodes of the same community.
If you separate the communities in the plot (using #G5W's code (igraph) Grouped layout based on attribute) you can see the different groups.
set.seed(4321)
g <- sample_gnp(500, .25)
plot(g, vertex.label = '', vertex.size = 5)
wc <- cluster_walktrap(g)
V(g)$community <- membership(wc)
E(g)$weight = 1
g_grouped = g
for(i in unique(V(g)$community)){
groupV = which(V(g)$community == i)
g_grouped = add_edges(g_grouped, combn(groupV, 2), attr=list(weight = 2))
}
l <- layout_nicely(g_grouped)
plot( wc,g, layout = l, vertex.label = '', vertex.size = 5, edge.width = .1)
Red edges are intercommunity connections and black edges are intracommunity edges
I have a very large bipartite network model that I created from 5 million lines of a dataset. I decompose my network model because I can not draw a graph of this size. Now all I need is to plot the decompose graphics one by one. There is no problem with that. But I want to draw the graph with a shape according to the attributes of each node. For example, I want a square for the "A" attributes on my graph G, and a triangle for the "B" attributes. In addition to this I want to add vertex labels by attributes. Here is my codes to plot first component of graph after creating bipartite G and its work:
components <- decompose(G)
plot(components[[1]])
I tried something like this to adding labels and changing vertex shapes according to graph attributes but it didn't work:
plot(components[[1]], vertex.label= V(G)$attributes,
vertex.shape=c("square", "triangle"))
Does anyone can help me, I'm stuck. Thank you so much!
the components function returns a list of vertices which make up a component. So you need to traverse the list, create a subgraph and plot. As for plotting attributes you need to provide a reproducible example for us to help.
library(igraph)
set.seed(8675309)
g <- sample_gnp(200, p = 0.01)
V(g)$name <- paste0("Node", 1:vcount(g))
V(g)$shape <- sample(c("circle","square"), vcount(g), replace = T)
clu <- components(g)
grps <- groups(clu)
lapply(grps, function(x) plot(induced_subgraph(g, x)))