Plotting pedigree / famliy tree using igraph in R - r

I have some very simple pedigree data that I would like to make visualise graphically. Example data here
I have tried with kinship2, but had no success - see here for previous issues with kinship2
I have also been trying with igraph but have not been able to get the graph quite right. I have managed, with the below code, to get a good representation of female linages.
library(igraph)
library(dplyr)
GGM_igraph <- read.csv("example_data.csv")
mothers=GGM_igraph[,c('Ring','Mother','famid')]
fathers=GGM_igraph[,c('Ring','Father','famid')]
links<-left_join(mothers, fathers)
g=graph.data.frame(links)
this script is from this question originally
G_Grouped = g
E(G_Grouped)$weight = 1
## Add edges with high weight between all nodes in the same group
for(i in unique(V(g)$famid)) {
GroupV = which(V(g)$famid == i)
G_Grouped = add_edges(G_Grouped, combn(GroupV, 5), attr=list(weight=10))
}
## Now create a layout based on G_Grouped
set.seed(567)
LO = layout_with_fr(G_Grouped)
## Use the layout to plot the original graph
par(mar=c(0,0,0,0))
# then plot the graph
plot(g, vertex.color=links$famid, layout=LO,
vertex.size = 8,
vertex.label.cex=.7,
vertex.label.color = "black",
edge.arrow.size = 0.25,
edge.arrow.mode = 1)
What I would like to do is:
1) include the males in the same graph as a number of them had offspring with more than one female over their life time
2) manually assign colours to each family. At the momemnt it is automatic, which is then assigning the mothers (most who don't have a family as they are founders) a random colour and also it is reusing some colours as there is not enough in the default pallette.

Related

How to rescale the plot to push the clusters (nodes) a bit further apart and name the clusters in igraph?

I have nodes and edges information, and trying to make a network plot with that. The nodes information has 1552 rows with information:
And the edges information is with four columns with 1203576 entries.
Using the nodes and edges data I used below code to make a network plot.
library(igraph)
net <- graph_from_data_frame(d=edges, vertices=nodes, directed=F)
plot(net, edge.arrow.size=.4,vertex.label=NA,
vertex.color=as.numeric(factor(nodes$type)))
Grouped.net = net
E(Grouped.net)$weight = 1
colnames(nodes)[4] <- "Clusters"
## Add edges with high weight between all nodes in the same group
for(Clus in unique(nodes$Clusters)) {
GroupV = which(nodes$Clusters == Clus)
Grouped.net = add_edges(Grouped.net, combn(GroupV, 2), attr=list(weight=500))
}
## Now create a layout based on G_Grouped
set.seed(567)
LO = layout_with_fr(Grouped.net)
# Generate colors based on media type:
colrs <- c("gray50", "yellow", "tomato")
V(net)$color <- colrs[V(net)$type_num]
plot(net, layout=LO, edge.arrow.size=0,vertex.label=NA, asp=0, vertex.size=4)
legend(x=-1.5, y=-1.1, c("typeA","typeB", "typeC"), pch=21,
col="#777777", pt.bg=colrs, pt.cex=2, cex=.8, bty="n", ncol=1)
The plot I got looks like below:
In the above figure there are 5 clusters.
How do I increase the space between the clusters? How to move them far? And how to adjust the edges? They look weird.
How to name the clusters in the Figure?
How to bring the nodes typeC to the top? They are very few in number. As typeA are huge in number typeC were below.
You have several questions. I will try to answer them all, but in a different order.
Setup
library(igraph)
edges = read.csv("temp/edges_info_5Clusters.csv", stringsAsFactors=T)
nodes = read.csv("temp/nodes_info_5Clusters.csv", stringsAsFactors=T)
Question 3. How to bring the nodes typeC to the top?
The nodes are plotted in order of node number. In order to get the
infrequent types to be shown, we need those nodes to get the highest
node numbers. So just sort on the types to force the nodes to be in
the order TypeA, TypeB, TypeC.
nodes = nodes[order(nodes$type),]
net <- graph_from_data_frame(d=edges, vertices=nodes, directed=F)
I will just go directly to the grouped plotting that you had in
your code to show the result.
Grouped.net = net
E(Grouped.net)$weight = 1
colnames(nodes)[4] <- "Clusters"
## Add edges with high weight between all nodes in the same group
for(Clus in unique(nodes$Clusters)) {
GroupV = which(nodes$Clusters == Clus)
Grouped.net = add_edges(Grouped.net, combn(GroupV, 2), attr=list(weight=500))
}
## Now create a layout based on G_Grouped
set.seed(567)
LO = layout_with_fr(Grouped.net)
colrs <- c("gray50", "yellow", "tomato")
V(net)$color <- colrs[V(net)$type_num]
plot(net, layout=LO, edge.arrow.size=0,vertex.label=NA, vertex.size=4,
edge.color="lightgray")
legend(x=-1.5, y=-1.1, c("typeA","typeB", "typeC"), pch=21,
col="#777777", pt.bg=colrs, pt.cex=2, cex=.8, bty="n", ncol=1)
OK, now the TypeC and TypeB are much more visible, but the five clusters are laid out poorly. To get something more like your second (example) graph, we need to construct the layout hierarchically: layout the clusters first and separately lay out the points within the clusters. The layout for the five clusters is simple.
F5 = make_full_graph(5)
Stretch = 6
LO_F5 = Stretch*layout.circle(F5)
plot(F5, layout=LO_F5)
Now we need to layout the points in each cluster, and space them out
using the cluster layout just created. But there is a tradeoff here.
If you make the clusters far apart, all of the nodes will be small
and hard to see. If you want the nodes bigger, you need to make the
cluster closer together (so that they all fit on the plot). You have
so many links that no matter what you do, the links will all blur together
as just a gray background. I picked a middle ground that appealed to me,
but I invite you to explore different values of the factor Stretch.
Bigger values of Stretch will make the clusters farther apart with
smaller nodes. Smaller values will make the clusters closer together
with larger nodes. Pick something that works for you.
set.seed(1234)
HierLO = matrix(0, ncol=2, nrow=vcount(net))
for(i in 1:length(levels(nodes$Clusters))) {
CLUST = which(nodes$Clusters == levels(nodes$Clusters)[i])
SubNet = induced_subgraph(net, V(net)[CLUST])
LO_SN = scale(layout_nicely(SubNet))
HierLO[CLUST, ] = LO_SN +
matrix(LO_F5[i,], nrow=vcount(SubNet), ncol=2,byrow=TRUE)
}
plot(net, layout=HierLO, edge.arrow.size=0,vertex.label=NA, vertex.size=4,
edge.color="lightgray")
You can now see all of the TypeC nodes and most of the TypeB (except in cluster 1 where there are a lot of TypeB).
Finally, let's add cluster labels. These just need to be placed relative to the cluster centers. Those centers are sort of given by the layout LO_F5, but igraph plotting rescales the layout so that the plot actually has the range (-1,1).
We can rescale LO_F5 ourselves and then stretch the positions a little so that the labels will be just outside the circle.
LO_Text = LO_F5
LO_Text[,1] = 2*(LO_F5[,1] - min(LO_F5[,1]))/(max(LO_F5[,1]) - min(LO_F5[,1])) -1
LO_Text[,2] = 2*(LO_F5[,2] - min(LO_F5[,2]))/(max(LO_F5[,2]) - min(LO_F5[,2])) -1
text(1.2*LO_Text, labels=levels(nodes$Clusters))
legend(x=-1.5, y=-1.1, c("typeA","typeB", "typeC"), pch=21,
col="#777777", pt.bg=colrs, pt.cex=2, cex=.8, bty="n", ncol=1)
The links are still a problem, but I think this addresses your other questions.

Phylogenetic tree ape too small

I am building a phylogenetic tree using data from NCBI taxonomy. The tree is quite simple, its aims is to show the relationship among a few Arthropods.
The problem is the tree looks very small and I can't seem to make its branches longer. I would also like to color some nodes (Ex: Pancrustaceans) but I don't know how to do this using ape.
Thanks for any help!
library(treeio)
library(ape)
treeText <- readLines('phyliptree.phy')
treeText <- paste0(treeText, collapse="")
tree <- read.tree(text = treeText) ## load tree
distMat <- cophenetic(tree) ## generate dist matrix
plot(tree, use.edge.length = TRUE,show.node.label = T, edge.width = 2, label.offset = 0.75, type = "cladogram", cex = 1, lwd=2)
Here are some pointers using the ape package. I am using a random tree as we don't have access to yours, but these examples should be easily adaptable to your problem. If your provide a reproducible example of a specific question, I could take another look.
First me make a random tree, add some species names, and plot it to show the numbers of nodes (both terminal and internal)
library(ape)
set.seed(123)
Tree <- rtree(10)
Tree$tip.label <- paste("Species", 1:10, sep="_")
plot.phylo(Tree)
nodelabels() # blue
tiplabels() # yellow
edgelabels() # green
Then, to color any node or edge of the tree, we can create a vector of colors and provide it to the appropriate *labels() function.
# using numbers for colors
node_colors <- rep(5:6, each=5)[1:9] # 9 internal nodes
edge_colors <- rep(3:4, each=9) # 18 branches
tip_colors <- rep(c(11,12,13), 4)
# plot:
plot.phylo(Tree, edge.color = edge_colors, tip.color = tip_colors)
nodelabels(pch = 21, bg = node_colors, cex=2)
To label just one node and the clade descending from it, we could do:
Nnode(Tree)
node_colors <- rep(NA, 9)
node_colors[7] <- "green"
node_shape <- ifelse(is.na(node_colors), NA, 21)
edge_colors <- rep("black", 18)
edge_colors[13:18] <- "green"
plot(Tree, edge.color = edge_colors, edge.width = 2, label.offset = .1)
nodelabels(pch=node_shape, bg=node_colors, cex=2)
Without your tree, it is harder to tell how to adjust the branches. One way is to reduce the size of the tip labels, so they take up less space. Another way might be to play around when saving the png or pdf.
There are other ways of doing these embellishments of trees, including the ggtree package.

R Indexing a matrix to use in plot coordinates

I'm trying to plot a temporal social network in R. My approach is to create a master graph and layout for all nodes. Then, I will subset the graph based on a series of vertex id's. However, when I do this and layout the graph, I get completely different node locations. I think I'm either subsetting the layout matrix incorrectly. I can't locate where my issue is because I've done some smaller matrix subsets and everything seems to work fine.
I have some example code and an image of the issue in the network plots.
library(igraph)
# make graph
g <- barabasi.game(25)
# make graph and set some aestetics
set.seed(123)
l <- layout_nicely(g)
V(g)$size <- rescale(degree(g), c(5, 20))
V(g)$shape <- 'none'
V(g)$label.cex <- .75
V(g)$label.color <- 'black'
E(g)$arrow.size = .1
# plot graph
dev.off()
par(mfrow = c(1,2),
mar = c(1,1,5,1))
plot(g, layout = l,
main = 'Entire\ngraph')
# use index & induced subgraph
v_ids <- sample(1:25, 15, F)
sub_l <- l[v_ids, c(1,2)]
sub_g <- induced_subgraph(g, v_ids)
# plot second graph
plot(sub_g, layout = sub_l,
main = 'Sub\ngraph')
The vertices in the second plot should match layout of those in the first.
Unfortunately, you set the random seed after you generated the graph,
so we cannot exactly reproduce your result. I will use the same code but
with set.seed before the graph generation. This makes the result look
different than yours, but will be reproducible.
When I run your code, I do not see exactly the same problem as you are
showing.
Your code (with set.seed moved and scales added)
library(igraph)
library(scales) # for rescale function
# make graph
set.seed(123)
g <- barabasi.game(25)
# make graph and set some aestetics
l <- layout_nicely(g)
V(g)$size <- rescale(degree(g), c(5, 20))
V(g)$shape <- 'none'
V(g)$label.cex <- .75
V(g)$label.color <- 'black'
E(g)$arrow.size = .1
## V(g)$names = 1:25
# plot graph
dev.off()
par(mfrow = c(1,2),
mar = c(1,1,5,1))
plot(g, layout = l,
main = 'Entire\ngraph')
# use index & induced subgraph
v_ids <- sort(sample(1:25, 15, F))
sub_l <- l[v_ids, c(1,2)]
sub_g <- induced_subgraph(g, v_ids)
# plot second graph
plot(sub_g, layout = sub_l,
main = 'Sub\ngraph', vertex.label=V(sub_g)$names)
When I run your code, both graphs have nodes in the same
positions. That is not what I see in the graph in your question.
I suggest that you run just this code and see if you don't get
the same result (nodes in the same positions in both graphs).
The only difference between the two graphs in my version is the
node labels. When you take the subgraph, it renumbers the nodes
from 1 to 15 so the labels on the nodes disagree. You can fix
this by storing the node labels in the graph before taking the
subgraph. Specifically, add V(g)$names = 1:25 immediately after
your statement E(g)$arrow.size = .1. Then run the whole thing
again, starting at set.seed(123). This will preserve the
original numbering as the node labels.
The graph looks slightly different because the new, sub-graph
does not take up all of the space and so is stretched to use
up the empty space.
Possible fast way around: draw the same graph, but color nodes and vertices that you dont need in color of your background. Depending on your purposes it can suit you.

How to combine state distribution plot and separate legend in traminer?

Plotting several clusters using seqdplot in TraMineR can make the legend messy, especially in combination with numerous states. This calls for additional options for modifying the legend which is available with the function seqlegend. However, I have a hard time combining a state distribution plot (seqdplot) with a separate modified legend (seqlegend). Ideally one wants to plot the clusters (e.g. 9) without a legend and then add the separate legend in the available bottom right row, but instead the separate legend is generating a new plot window. Can anyone help?
Here's an example using the biofam data. With the data I use in my own research the legend becomes much more messy since I have 11 states.
#Data
library(TraMineR)
library(WeightedCluster)
data(biofam)
biofam.seq <- seqdef(biofam[501:600, 10:25])
#OM distances
biofam.om <- seqdist(biofam.seq, method = "OM", indel = 3, sm = "TRATE")
#9 clusters
wardCluster <- hclust(as.dist(biofam.om), method = "ward.D2")
cluster9 <- cutree(wardCluster, k = 9)
#State distribution plot
seqdplot(biofam.seq, group = cluster9, with.legend = F)
#Separate legend
seqlegend(biofam.seq, title = "States", ncol = 2)
#Combine state distribution plot and separate legend
#??
Thank you.
The seqplot function does not allow to control the number of columns of the legend, nor does it allow to add a legend title. So you have to compose the plot yourself by generating a separated plot for each group with the legend disabled and adding the legend afterwards. Here is how you can do that:
cluster9 <- factor(cluster9)
levc <- levels(cluster9)
lev <- length(levc)
par(mfrow=c(5,2))
for (i in 1:lev)
seqdplot(biofam.seq[cluster9 == levc[i],], border=NA, main=levc[i], with.legend=FALSE)
seqlegend(biofam.seq, ncol=4, cex = 1.2, title='States')
========================
Update, Oct 1, 2018 =================
Since TraMineR V 2.0-9, the seqplot family of functions now support (when applicable) the argument ncol to control the number of columns in the legend. To add a title to the legend, you still have to proceed as shown above.
AFAIK seqlegend() doesn't work when the other plots you are plotting utilizes the groups arguments. In your case the only thing seqlegend() is adding is a title "States". If you are looking to add a legend so you can customize what is in the legend and so forth, you can accomplish that by providing the corresponding alphabet and states that are used in your analysis.
The package's website has several walkthroughs and guides enumerating the various options and so forth: Link to their webiste
#Data
library(TraMineR)
library(WeightedCluster)
data(biofam)
## Generate alphabet and states
alphabet <- 0:7
states <- letters[seq_along(alphabet)]
biofam.seq <- seqdef(biofam[501:600, 10:25], states = states, alphabet = alphabet)
#OM distances
biofam.om <- seqdist(biofam.seq, method = "OM", indel = 3, sm = "TRATE")
#9 clusters
wardCluster <- hclust(as.dist(biofam.om), method = "ward.D2")
cluster9 <- cutree(wardCluster, k = 9)
#State distribution plot
seqdplot(biofam.seq, group = cluster9, with.legend = TRUE)

Phylogenetic tree tip color

I have a phylogenetic tree that I drew on R. I want to color my tip edges based on the order of my species. How can I choose the color of every tip label alone?
I tried first:
EdgeCols <- rep("black", Nedge(tree))
EdgeCols[which.edge(tree, tree$edge[1]) ] <- "red"
plot( tree, space = 30, assoc = AMat,
show.tip.label = T, gap = 1, length.line = 0, edge.color =EdgeCols1)
But I would not get any change in the color of this edge.
Can anyone tell me where the problem is?
I am not exactly sure what you are trying to do, but here is how to color specific edges of a phylogeny with the ape package. Here is code for coloring all edges:
library(ape)
# Simulate tree
ntax <- 20
tree <- rcoal(ntax)
# Color branches
colors <- rainbow(Nedge(tree))
plot(tree, edge.color=colors)
And for coloring all terminal branches:
# Color terminal branches
colors2 <- rep("black", Nedge(tree))
colors2[which(tree$edge[,2] %in% 1:20)] <- rainbow(ntax)
plot(tree, edge.color=colors2)
I also would point out that there are obvious issues in your code:
You have tree$edge[1], but tree$edge is a matrix, so you can't index it with one value.
The which.edge function requires a vector of tips and returns the index of all the edges within the monophyletic clade defined by those tips. It seems like you are trying to give it a single value, which doesn't make any sense.
You define EdgeCols, but then in your plot function you have EdgeCols1.

Resources