R Indexing a matrix to use in plot coordinates - r

I'm trying to plot a temporal social network in R. My approach is to create a master graph and layout for all nodes. Then, I will subset the graph based on a series of vertex id's. However, when I do this and layout the graph, I get completely different node locations. I think I'm either subsetting the layout matrix incorrectly. I can't locate where my issue is because I've done some smaller matrix subsets and everything seems to work fine.
I have some example code and an image of the issue in the network plots.
library(igraph)
# make graph
g <- barabasi.game(25)
# make graph and set some aestetics
set.seed(123)
l <- layout_nicely(g)
V(g)$size <- rescale(degree(g), c(5, 20))
V(g)$shape <- 'none'
V(g)$label.cex <- .75
V(g)$label.color <- 'black'
E(g)$arrow.size = .1
# plot graph
dev.off()
par(mfrow = c(1,2),
mar = c(1,1,5,1))
plot(g, layout = l,
main = 'Entire\ngraph')
# use index & induced subgraph
v_ids <- sample(1:25, 15, F)
sub_l <- l[v_ids, c(1,2)]
sub_g <- induced_subgraph(g, v_ids)
# plot second graph
plot(sub_g, layout = sub_l,
main = 'Sub\ngraph')
The vertices in the second plot should match layout of those in the first.

Unfortunately, you set the random seed after you generated the graph,
so we cannot exactly reproduce your result. I will use the same code but
with set.seed before the graph generation. This makes the result look
different than yours, but will be reproducible.
When I run your code, I do not see exactly the same problem as you are
showing.
Your code (with set.seed moved and scales added)
library(igraph)
library(scales) # for rescale function
# make graph
set.seed(123)
g <- barabasi.game(25)
# make graph and set some aestetics
l <- layout_nicely(g)
V(g)$size <- rescale(degree(g), c(5, 20))
V(g)$shape <- 'none'
V(g)$label.cex <- .75
V(g)$label.color <- 'black'
E(g)$arrow.size = .1
## V(g)$names = 1:25
# plot graph
dev.off()
par(mfrow = c(1,2),
mar = c(1,1,5,1))
plot(g, layout = l,
main = 'Entire\ngraph')
# use index & induced subgraph
v_ids <- sort(sample(1:25, 15, F))
sub_l <- l[v_ids, c(1,2)]
sub_g <- induced_subgraph(g, v_ids)
# plot second graph
plot(sub_g, layout = sub_l,
main = 'Sub\ngraph', vertex.label=V(sub_g)$names)
When I run your code, both graphs have nodes in the same
positions. That is not what I see in the graph in your question.
I suggest that you run just this code and see if you don't get
the same result (nodes in the same positions in both graphs).
The only difference between the two graphs in my version is the
node labels. When you take the subgraph, it renumbers the nodes
from 1 to 15 so the labels on the nodes disagree. You can fix
this by storing the node labels in the graph before taking the
subgraph. Specifically, add V(g)$names = 1:25 immediately after
your statement E(g)$arrow.size = .1. Then run the whole thing
again, starting at set.seed(123). This will preserve the
original numbering as the node labels.
The graph looks slightly different because the new, sub-graph
does not take up all of the space and so is stretched to use
up the empty space.

Possible fast way around: draw the same graph, but color nodes and vertices that you dont need in color of your background. Depending on your purposes it can suit you.

Related

Hide vertices from plot.igraph conditional on vertex attribute without deleting them

I have an igraph plot that is geographically laid out based on its latitude and longitude coordinates. I now want to hide certain points from one time period, while preserving the layout of the graph. I would therefore not like to delete the vertices from the network, but merely make them invisible in this particular plot rendering, conditional on a vertex attribute. Furthermore, the color attribute is already set to capture another variable, so I cannot use that to hide the points.
My plot is generated according to the following code:
lo <- layout.norm(as.matrix(g[, c("longitude","latitude")]))
plot.igraph(g, layout=lo, vertex.label=NA,rescale=T, vertex.size = 4)
The time attribute is a numerical variable stored in V(g)$period
Is there code I can put within the plot.igraph function to hide vertices for which V(g)$period == 1?
Update.
Building upon Szabolcs's answer.
library(igraph)
## reproducible example
g <- make_graph("Zachary")
V(g)$name <- V(g)
set.seed(10)
lyt <- layout_with_drl(g)
V(g)$x <- lyt[,1]
V(g)$y <- lyt[,2]
plot(g)
del_vs <- c(4, 8, 9, 19, 24, 33)
dev.new(); plot(g - del_vs, main = paste("Zachary minus", toString(del_vs)))
Try invisible inkt, e.g. print hidden objects in background color.
Or try this.
library(igraph)
## reproducible example.
g <- make_graph("Zachary")
V(g)$name <- V(g)
set.seed(10)
lyt <- layout_with_drl(g)
plot(g, layout=lyt)
## delete vertices and preserve layout.
del_vs <- c(9, 19, 24, 33)
g2 <- g - del_vs
g2$main <- paste("Zachary minus", toString(del_vs))
g2$layout <- matrix(lyt[-del_vs,], ncol=2)
dev.new(); plot(g2)
See also:
Looking to save coordinates/layout to make temporal networks in Igraph with DRL
.
You can store the coordinates in the x and y vertex attributes. Then they will be used by plot automatically, and they will be preserved when you delete vertices.
For example:
g<-make_ring(4)
V(g)$x <- c(0,0,1,1)
V(g)$y <- c(0,1,0,1)
plot(g)
plot(delete_vertices(g,1))

How to rescale the plot to push the clusters (nodes) a bit further apart and name the clusters in igraph?

I have nodes and edges information, and trying to make a network plot with that. The nodes information has 1552 rows with information:
And the edges information is with four columns with 1203576 entries.
Using the nodes and edges data I used below code to make a network plot.
library(igraph)
net <- graph_from_data_frame(d=edges, vertices=nodes, directed=F)
plot(net, edge.arrow.size=.4,vertex.label=NA,
vertex.color=as.numeric(factor(nodes$type)))
Grouped.net = net
E(Grouped.net)$weight = 1
colnames(nodes)[4] <- "Clusters"
## Add edges with high weight between all nodes in the same group
for(Clus in unique(nodes$Clusters)) {
GroupV = which(nodes$Clusters == Clus)
Grouped.net = add_edges(Grouped.net, combn(GroupV, 2), attr=list(weight=500))
}
## Now create a layout based on G_Grouped
set.seed(567)
LO = layout_with_fr(Grouped.net)
# Generate colors based on media type:
colrs <- c("gray50", "yellow", "tomato")
V(net)$color <- colrs[V(net)$type_num]
plot(net, layout=LO, edge.arrow.size=0,vertex.label=NA, asp=0, vertex.size=4)
legend(x=-1.5, y=-1.1, c("typeA","typeB", "typeC"), pch=21,
col="#777777", pt.bg=colrs, pt.cex=2, cex=.8, bty="n", ncol=1)
The plot I got looks like below:
In the above figure there are 5 clusters.
How do I increase the space between the clusters? How to move them far? And how to adjust the edges? They look weird.
How to name the clusters in the Figure?
How to bring the nodes typeC to the top? They are very few in number. As typeA are huge in number typeC were below.
You have several questions. I will try to answer them all, but in a different order.
Setup
library(igraph)
edges = read.csv("temp/edges_info_5Clusters.csv", stringsAsFactors=T)
nodes = read.csv("temp/nodes_info_5Clusters.csv", stringsAsFactors=T)
Question 3. How to bring the nodes typeC to the top?
The nodes are plotted in order of node number. In order to get the
infrequent types to be shown, we need those nodes to get the highest
node numbers. So just sort on the types to force the nodes to be in
the order TypeA, TypeB, TypeC.
nodes = nodes[order(nodes$type),]
net <- graph_from_data_frame(d=edges, vertices=nodes, directed=F)
I will just go directly to the grouped plotting that you had in
your code to show the result.
Grouped.net = net
E(Grouped.net)$weight = 1
colnames(nodes)[4] <- "Clusters"
## Add edges with high weight between all nodes in the same group
for(Clus in unique(nodes$Clusters)) {
GroupV = which(nodes$Clusters == Clus)
Grouped.net = add_edges(Grouped.net, combn(GroupV, 2), attr=list(weight=500))
}
## Now create a layout based on G_Grouped
set.seed(567)
LO = layout_with_fr(Grouped.net)
colrs <- c("gray50", "yellow", "tomato")
V(net)$color <- colrs[V(net)$type_num]
plot(net, layout=LO, edge.arrow.size=0,vertex.label=NA, vertex.size=4,
edge.color="lightgray")
legend(x=-1.5, y=-1.1, c("typeA","typeB", "typeC"), pch=21,
col="#777777", pt.bg=colrs, pt.cex=2, cex=.8, bty="n", ncol=1)
OK, now the TypeC and TypeB are much more visible, but the five clusters are laid out poorly. To get something more like your second (example) graph, we need to construct the layout hierarchically: layout the clusters first and separately lay out the points within the clusters. The layout for the five clusters is simple.
F5 = make_full_graph(5)
Stretch = 6
LO_F5 = Stretch*layout.circle(F5)
plot(F5, layout=LO_F5)
Now we need to layout the points in each cluster, and space them out
using the cluster layout just created. But there is a tradeoff here.
If you make the clusters far apart, all of the nodes will be small
and hard to see. If you want the nodes bigger, you need to make the
cluster closer together (so that they all fit on the plot). You have
so many links that no matter what you do, the links will all blur together
as just a gray background. I picked a middle ground that appealed to me,
but I invite you to explore different values of the factor Stretch.
Bigger values of Stretch will make the clusters farther apart with
smaller nodes. Smaller values will make the clusters closer together
with larger nodes. Pick something that works for you.
set.seed(1234)
HierLO = matrix(0, ncol=2, nrow=vcount(net))
for(i in 1:length(levels(nodes$Clusters))) {
CLUST = which(nodes$Clusters == levels(nodes$Clusters)[i])
SubNet = induced_subgraph(net, V(net)[CLUST])
LO_SN = scale(layout_nicely(SubNet))
HierLO[CLUST, ] = LO_SN +
matrix(LO_F5[i,], nrow=vcount(SubNet), ncol=2,byrow=TRUE)
}
plot(net, layout=HierLO, edge.arrow.size=0,vertex.label=NA, vertex.size=4,
edge.color="lightgray")
You can now see all of the TypeC nodes and most of the TypeB (except in cluster 1 where there are a lot of TypeB).
Finally, let's add cluster labels. These just need to be placed relative to the cluster centers. Those centers are sort of given by the layout LO_F5, but igraph plotting rescales the layout so that the plot actually has the range (-1,1).
We can rescale LO_F5 ourselves and then stretch the positions a little so that the labels will be just outside the circle.
LO_Text = LO_F5
LO_Text[,1] = 2*(LO_F5[,1] - min(LO_F5[,1]))/(max(LO_F5[,1]) - min(LO_F5[,1])) -1
LO_Text[,2] = 2*(LO_F5[,2] - min(LO_F5[,2]))/(max(LO_F5[,2]) - min(LO_F5[,2])) -1
text(1.2*LO_Text, labels=levels(nodes$Clusters))
legend(x=-1.5, y=-1.1, c("typeA","typeB", "typeC"), pch=21,
col="#777777", pt.bg=colrs, pt.cex=2, cex=.8, bty="n", ncol=1)
The links are still a problem, but I think this addresses your other questions.

Phylogenetic tree ape too small

I am building a phylogenetic tree using data from NCBI taxonomy. The tree is quite simple, its aims is to show the relationship among a few Arthropods.
The problem is the tree looks very small and I can't seem to make its branches longer. I would also like to color some nodes (Ex: Pancrustaceans) but I don't know how to do this using ape.
Thanks for any help!
library(treeio)
library(ape)
treeText <- readLines('phyliptree.phy')
treeText <- paste0(treeText, collapse="")
tree <- read.tree(text = treeText) ## load tree
distMat <- cophenetic(tree) ## generate dist matrix
plot(tree, use.edge.length = TRUE,show.node.label = T, edge.width = 2, label.offset = 0.75, type = "cladogram", cex = 1, lwd=2)
Here are some pointers using the ape package. I am using a random tree as we don't have access to yours, but these examples should be easily adaptable to your problem. If your provide a reproducible example of a specific question, I could take another look.
First me make a random tree, add some species names, and plot it to show the numbers of nodes (both terminal and internal)
library(ape)
set.seed(123)
Tree <- rtree(10)
Tree$tip.label <- paste("Species", 1:10, sep="_")
plot.phylo(Tree)
nodelabels() # blue
tiplabels() # yellow
edgelabels() # green
Then, to color any node or edge of the tree, we can create a vector of colors and provide it to the appropriate *labels() function.
# using numbers for colors
node_colors <- rep(5:6, each=5)[1:9] # 9 internal nodes
edge_colors <- rep(3:4, each=9) # 18 branches
tip_colors <- rep(c(11,12,13), 4)
# plot:
plot.phylo(Tree, edge.color = edge_colors, tip.color = tip_colors)
nodelabels(pch = 21, bg = node_colors, cex=2)
To label just one node and the clade descending from it, we could do:
Nnode(Tree)
node_colors <- rep(NA, 9)
node_colors[7] <- "green"
node_shape <- ifelse(is.na(node_colors), NA, 21)
edge_colors <- rep("black", 18)
edge_colors[13:18] <- "green"
plot(Tree, edge.color = edge_colors, edge.width = 2, label.offset = .1)
nodelabels(pch=node_shape, bg=node_colors, cex=2)
Without your tree, it is harder to tell how to adjust the branches. One way is to reduce the size of the tip labels, so they take up less space. Another way might be to play around when saving the png or pdf.
There are other ways of doing these embellishments of trees, including the ggtree package.

How to plot overlap clustering with a list of vertices of each group and the edge list by R?

I have a file csv including edge list of graph. After implementing CONGA
(Clustering Overlap Girven-NewMan alorithm), result is a list of vertices of each group.
I don't know how to plot it so that each group has different color in graph by R.
I can plot graph with edge list in R, but I don't know how to mark vertices in each group.
Input: edge list file and list of vertices in each group.
Output: graph with different color for each group.
output nearly like this
My English isn't good. Thanks for your support.
You plot colors using $color of vertices. Try to assign a color like V(g)$color <- 'green'.
It is better if you give us some code.
You say you get a list of your group-members. Convert the list to a vector, and sign a new color to each unique group-member value. I wrote this example code. I think it shows what you're after.
library(igraph)
get_a_random_network <- function() {
# EN: Function to get some random data to use as an example
g <- erdos.renyi.game(100, 60, type="gnm", directed=F, loops=FALSE)
g <- g %>% delete_vertices( V(g)[degree(g)==0] )
(g)
}
# Get sample data
g <- get_a_random_network()
# Use a cluster algorythm to determine groups. You said you had a list. I use this to generate example data.
groups <- cluster_fast_greedy(g)
# Look at the vertecies
(V(g))
# Look at what groups they belong to:
(groups$membership)
# Here you write that you have "list of vertices of each group". You don't
# give us code, but I assume that you have data that looks like this:
CONGA_list <- lapply(1:max(groups$membership),function(x) V(g)[groups$membership ==x])
(CONGA_list)
# This is where you should really have provided a code example.
# You could convert a list like CONGA_list to a vector like this:
membership_groups <- rep(0, length(V(g)))
for(x in 1:length(CONGA_list)){
membership_groups[as.vector(CONGA_list[[x]])] <- x
}
(membership_groups == groups$membership)
# You give color to your network by first telling each vector which group it belongs to
V(g)$membership <- groups$membership
# Then we asign a color. I use a vector of R-colors which I get like this...
colors = grDevices::colors()[grep('gr(a|e)y', grDevices::colors(), invert = T)]
# ... and then I sample from them to give each vertecy a color.
colors <- sample(colors, max(V(g)$membership))
V(g)$color <- colors[V(g)$membership]
(V(g)$color)
# The plot will work with the colors in V(g)$color
plot(g, vertex.size=7, vertex.label=NA)
Good luck

How to cut a dendrogram in r

Okay so I'm sure this has been asked before but I can't find a nice answer anywhere after many hours of searching.
I have some data, I run a classification then I make a dendrogram.
The problem has to do with aesthetics, specifically; (1) how to cut according to the number of groups (in this example I want 3), (2) make the group labels aligned with the branches of the trees, (2) Re-scale so that there aren't any huge gaps between the groups
More on (3). I have dataset which is very species rich and there would be ~1000 groups without cutting. If I cut at say 3, the tree has some branches on the right and one 'miles' off to the right which I would want to re-scale so that its closer. All of this is possible via external programs but I want to do it all in r!
Bonus points if you can put an average silhouette width plot nested into the top right of this plot
Here is example using iris data
library(ggplot2)
data(iris)
df = data.frame(iris)
df$Species = NULL
ED10 = vegdist(df,method="euclidean")
EucWard_10 = hclust(ED10,method="ward.D2")
hcd_ward10 = as.dendrogram(EucWard_10)
plot(hcd_ward10)
plot(cut(hcd_ward10, h = 10)$upper, main = "Upper tree of cut at h=75")
I suspect what you would want to look at is the dendextend R package (it also has a paper in bioinformatics).
I am not fully sure about your question on (3), since I am not sure I understand what rescaling means. What I can tell you is that you can do quite a lot of dendextend. Here is a quick example for coloring the branches and labels for 3 groups.
library(ggplot2)
library(vegan)
data(iris)
df = data.frame(iris)
df$Species = NULL
library(vegan)
ED10 = vegdist(df,method="euclidean")
EucWard_10 = hclust(ED10,method="ward.D2")
hcd_ward10 = as.dendrogram(EucWard_10)
plot(hcd_ward10)
install.packages("dendextend")
library(dendextend)
dend <- hcd_ward10
dend <- color_branches(dend, k = 3)
dend <- color_labels(dend, k = 3)
plot(dend)
You can also get an interactive dendrogram by using plotly (ggplot method is available through dendextend):
library(plotly)
library(ggplot2)
p <- ggplot(dend)
ggplotly(p)

Resources