Thanks for your help in advance!
My question is, given a list of sets, how can I visualize the overlap of any of the two sets using the network plot as shown below?
Please feel free to generate any sets for demonstration. Or you can use the following simple sets.
set.seed(123456)
A <- sample(1:100, 60)
B <- sample(1:100, 50)
C <- sample(1:100, 75)
In ggraph we must use scale_size() for nodes and scale_edge_width() for edges to harmonize proportions. Point sizes in ggplot are scaled by their radius already:
Does size for ggplot2::geom_point() refer to radius, diameter, area, or something else?
so no transformations are necessary, unless you want the point size to be proportional to the edge width by area.
Build a tbl_graph with your samples
#edges are determined by length of intersection
edges <- data.frame('from'=c('A','B','C'),'to'=c('B','C','A'),
'weight'=c(length(intersect(A,B)),length(intersect(B,C)),length(intersect(C,A))))
#nodes are weighted by the length of the sample
nodes <- data.frame('name'=c('A','B','C'),size=c(length(A),length(B),length(C)))
tbl_graph <- tbl_graph(nodes=nodes,edges=edges)
Now, if you build the network directly with these sizes, the distances between nodes will be decided automatically, and most ggraph layouts set distances between nodes between 0 and 1, resulting in a crowded graph with oversized edges and nodes. If the distance between nodes is not important, we can simply use a scaling factor to scale the node sizes and edge widths down to fit the graph.
In order to harmonize width and sizes, we scale the range of the edge widths to the min and max of the edge widths, and scale node sizes to the min and max of the nodes' sizes, multiplied by 2, as the nodes are scaled by diameter. This way, node sizes and edge widths are scaled to their actual values, rather than decided by the layout. I also include here additional annotation methods to show the sizes of the nodes and edges. node_point shape=21 is the empty circle. Good luck!
scale_factor = 0.1
ggraph(tbl_graph) + geom_edge_link(aes(width=weight*scale_factor,label=weight),label_dodge=unit(-4,'mm'),angle_calc='along') +
scale_edge_width(range=c(min(edges$weight)*scale_factor,max(edges$weight)*scale_factor)) +
geom_node_point(aes(size=size * scale_factor),shape=21) + scale_size(range=c(min(nodes$size)*scale_factor*2,max(nodes$size)*scale_factor*2)) +
theme_linedraw() + geom_node_text(aes(label=paste(name,':',size)),nudge_x=-0.1)
resulting ggraph
I have the following dataframe:
df<-data.frame(consumed= c("level1_plt1", "level1_plt2", "level1_plt3", "level1_plt3","level1_plt2","level1_plt4","level1_plt5","level1_plt5","level1_plt6","level1_plt7","level1_plt8","level1_plt9","level1_plt10","level1_plt10","level1_plt1","level1_plt1","level1_plt6","level1_plt6","level1_plt9","level1_plt9","level1_plt11","level1_plt11","level1_plt11","level2_lep1","level2_lep4","level2_lep3"),consumer=c("level2_lep1","level2_lep2","level2_lep3","level2_lep2","level2_lep4", "level2_lep4","level2_lep5","level2_lep5","level2_lep6","level2_lep7","level2_lep8","level2_lep9","level2_lep10","level2_lep10","level2_lep8","level2_lep8","level2_lep1","level2_lep1","level2_lep3","level2_lep11","level2_lep12","level2_lep13","level2_lep13", "level3_pst1","level3_pst3","level3_pst4"))
And have preformed the following steps to get an igraph tripartite output:
links<-
df%>%
group_by(consumed, consumer) %>%
summarize(freq=n())
g<- graph_from_data_frame(d=links,directed=FALSE)
layer <- rep(2, length(V(g)$name))
layer[grepl("level1_",V(g)$name)]=1
layer[grepl("level3_",V(g)$name)]=3
names<- V(g)$name
names<-sub("level2_","", names)
names<-sub("level3_","", names)
names<-sub("level1_","", names)
V(g)$name = names
layout = layout_with_sugiyama(g, layers=layer)
E(g)$width <- E(g)$freq
V(g)$vertex_degree <- degree(g)*7
plot(g,
layout=cbind(layer,layout$layout[,1]),edge.curved=0,
vertex.shape=c("square","circle","square")[layer],
vertex.frame.color = c("darkolivegreen","darkgoldenrod","orange3")
[layer],
vertex.color=c("olivedrab","goldenrod1","orange1")[layer],
vertex.label.color="white",
vertex.label.font=2,
vertex.size=V(g)$vertex_degree,
vertex.label.dist=c(0,0,0)[layer],
vertex.label.degree=0, vertex.label.cex=0.5)
And I would like to do two things to adjust the picture, if possible:
Order the layers from the largest shape (highest degree) to smallest shape (smallest degree). For example, in the green layer the order could be as follows: plt9, plt3,plt2,plt11,plt6,plt1,plt7,plt5,plt4,plt10,plt8.
Create space between the shapes so that there is no overlap (e.g. lep3 and lep4). I like the current sizes/proportions so I am opposed to making shapes smaller to create space between shapes.
Flip the graph and vertex font 90 degrees counter-clockwise so that from bottom to top it would be in the order green layer-->yellow layer-->orange layer. (I guess it is always an option to rotate vertex text and I can rotate the image in word or ppt.)
I know this question is old, but I hope that the answer will help someone.
Rather than using layout_with_sugiyama, It may be easiest to do this with
a custom layout. It is not very hard to do so. You already constructed the
horizontal position with your layer variable. To get the vertical positions,
we need to order the vertices by size (vertex_degree) and then allow shape proportional to the size, so we will set the height using cumsum on the vertex_degrees within each layer. After I make the layout the complex call to plot is the same as yours except
that I swap my custom layout for your call to sugiyama.
MyLO = matrix(0, nrow=vcount(g), ncol=2)
## Horizontal position is determined by layer
MyLO[,1] = layer
## Vertical position is determined by sum of sorted vertex_degree
for(i in 1:3) {
L = which(layer ==i)
OL = order(V(g)$vertex_degree[L], decreasing=TRUE)
MyLO[L[OL],2] = cumsum(V(g)$vertex_degree[L][OL])
}
plot(g,
layout=MyLO, edge.curved=0,
vertex.shape=c("square","circle","square")[layer],
vertex.frame.color = c("darkolivegreen","darkgoldenrod","orange3")[layer],
vertex.color=c("olivedrab","goldenrod1","orange1")[layer],
vertex.label.color="white",
vertex.label.font=2,
vertex.size=V(g)$vertex_degree,
vertex.label.dist=0,
vertex.label.degree=0, vertex.label.cex=0.5)
I created an igraph with a community membership identified:
fc <- fastgreedy.community(graph)
colors <- rainbow(max(membership(fc)))
This provided me the clusters that each of the nodes belong to.
Now when I plot this:
plot(graph,vertex.color=colors[membership(fc)],
layout=layout.kamada.kawai)
it doesn't provide a layout where it exclusively separates each group of nodes based on the membership. Does anyone know a different layout that can provide this? All this is doing is taking the layout: kamada.kawai and coloring in the memberships rather than restructuring the layout so that it is organized by membership.
Hope this question makes sense. Thanks!
You have to calculate the Kamada-Kawai layout with an artificial weight vector that assigns a high weight to edges within clusters and a low weight to edges that cross cluster boundaries:
> graph <- grg.game(100, 0.2) # example graph
> cl <- fastgreedy.community(graph)
> weights <- ifelse(crossing(cl, graph), 1, 100)
> layout <- layout_with_kk(graph, weights=weights)
> plot(graph, layout=layout)
The trick here is the ifelse(crossing(cl, graph), 1, 100) part -- crossing(cl, graph) takes a clustering and the graph that the clustering belongs to, and returns a Boolean vector that defines for each edge whether the edge is crossing cluster boundaries or not. The ifelse() call then simply replaces TRUE (i.e. edge crossing boundaries) in this vector with 1 and FALSE (i.e. edge stays within the cluster) with 0.
Using the R Kohonen package, I have obtained a "codes" plot which shows the codebook vectors.
I would like to ask, shouldn't the codebook vectors of neighbouring nodes be similar? Why are the top 2 nodes on the left so different?
Is there a way to organise it in a meaningful organisation such as this image below? Source from here. Where the countries of high poverty are clustered at the bottom.
library("kohonen")
data("wines")
wines.sc <- scale(wines)
set.seed(7)
wine.som <- som(data = wines.sc, grid = somgrid(5, 4, "hexagonal"))
# types of plots
plot(wine.som, type="codes", main = "Wine data")
Map 1 is the average vector result for each node. The top 2 nodes that you highlighted are very similar.
Map 2 is a kind of similarity index between the nodes.
If you want to obtain such kind of map using the map 1 result you may have to develop your own plotting function with the following parameters:
Pick up the most relevant nodes or the most different ones (manually or automatically). Then, you have to attribute a color to each of these nodes.
Give a color the the neigbours nodes using the average distance between the center of each node from the selected nodes. Shorter distance = close color, higher distance = fading color.
To sum up, that's a lot of work for nearly nothing. Map 1 is better and contains a lot of informations. Map 2 is nice looking...
I would like to reproduce the kind of "community summary" graph like on page 6 of this paper:
http://arxiv.org/pdf/0803.0476v2.pdf
First a community algorithm is employed on the graph, e.g.:
wc <- walktrap.community(subgraph)
mc <- multilevel.community(subgraph)
Then the vertices are grouped according to community. The size of the community node is a function of the membership size and the edge width is a function of the total edges going from any member of community A to community B.
Please note I don't just want to encode community as color or convex hulls like this:
V(inSubGraph)$color <- commObj$membership+1
plot.igraph( inSubGraph, vertex.color = V(inSubGraph)$color)
or with the convex hulls:
plot(commObj, inSubGraph)
Use the contract.vertices function with the membership vector that the community detection method provides, followed by simplify. In particular:
Assign a numeric vertex attribute with a value of 1 to each vertex as follows: V(g)$size = 1
Assign a numeric edge attribute with a value of 1 to each edge as follows: E(g)$count = 1
Contract the communities into vertices as follows: comm.graph <- contract.vertices(g, wc$membership, vertex.attr.comb=list(size="sum", "ignore")); basically this specifies that the size attribute of the vertices being contracted should be summed and every other vertex attribute should be ignored. (See ?attribute.combination in R for more details). This call contracts the vertices but leaves the original edges so you now have as many edges between the vertices as there were in the original graph between the communities.
Collapse the multiple edges as follows: comm.graph <- simplify(comm.graph, remove.loops=FALSE, edge.attr.comb=list(count="sum", "ignore")).
You now have a graph named comm.graph where the vertices represent the communities of the original graph, the size vertex attribute corresponds to the number of vertices in each community in the original graph, and the count edge attribute corresponds to the number of edges between communities in the original graph.