I am building a phylogenetic tree using data from NCBI taxonomy. The tree is quite simple, its aims is to show the relationship among a few Arthropods.
The problem is the tree looks very small and I can't seem to make its branches longer. I would also like to color some nodes (Ex: Pancrustaceans) but I don't know how to do this using ape.
Thanks for any help!
library(treeio)
library(ape)
treeText <- readLines('phyliptree.phy')
treeText <- paste0(treeText, collapse="")
tree <- read.tree(text = treeText) ## load tree
distMat <- cophenetic(tree) ## generate dist matrix
plot(tree, use.edge.length = TRUE,show.node.label = T, edge.width = 2, label.offset = 0.75, type = "cladogram", cex = 1, lwd=2)
Here are some pointers using the ape package. I am using a random tree as we don't have access to yours, but these examples should be easily adaptable to your problem. If your provide a reproducible example of a specific question, I could take another look.
First me make a random tree, add some species names, and plot it to show the numbers of nodes (both terminal and internal)
library(ape)
set.seed(123)
Tree <- rtree(10)
Tree$tip.label <- paste("Species", 1:10, sep="_")
plot.phylo(Tree)
nodelabels() # blue
tiplabels() # yellow
edgelabels() # green
Then, to color any node or edge of the tree, we can create a vector of colors and provide it to the appropriate *labels() function.
# using numbers for colors
node_colors <- rep(5:6, each=5)[1:9] # 9 internal nodes
edge_colors <- rep(3:4, each=9) # 18 branches
tip_colors <- rep(c(11,12,13), 4)
# plot:
plot.phylo(Tree, edge.color = edge_colors, tip.color = tip_colors)
nodelabels(pch = 21, bg = node_colors, cex=2)
To label just one node and the clade descending from it, we could do:
Nnode(Tree)
node_colors <- rep(NA, 9)
node_colors[7] <- "green"
node_shape <- ifelse(is.na(node_colors), NA, 21)
edge_colors <- rep("black", 18)
edge_colors[13:18] <- "green"
plot(Tree, edge.color = edge_colors, edge.width = 2, label.offset = .1)
nodelabels(pch=node_shape, bg=node_colors, cex=2)
Without your tree, it is harder to tell how to adjust the branches. One way is to reduce the size of the tip labels, so they take up less space. Another way might be to play around when saving the png or pdf.
There are other ways of doing these embellishments of trees, including the ggtree package.
Related
Similar questions have been asked here and here, however, none of the other answers solve my problem.
Im trying to join together two (or more) separate heat maps and turn them into a circle. Im trying to achieve something like the image below (which I made by following the circlize package tutorial found here:
In my data, I have multiple matrices, where each matrix represents a different year. I want to try and create a circular heat map (like the one in the image) where each section of the circular heatmap is a single year.
In my example below, I am just using 2 years (so 2 heat maps) but I cant seem to get it to work:
library(circlize)
# create matrix
mat1 <- matrix(runif(80), 10, 8)
mat2 <- matrix(runif(80), 10, 8)
rownames(mat1) <- rownames(mat2) <- paste0('a', 1:10)
colnames(mat1) <- colnames(mat2) <- paste0('b', 1:8)
# join together
matX <- cbind(mat1, mat2)
# set splits
split <- c(rep('a', 8), rep('b', 8))
split = factor(split, levels = unique(split))
# create circular heatmap
col_fun1 = colorRamp2(c(0, 0.5, 1), c("blue", "white", "red"))
circos.heatmap(matX, split = split, col = col_fun1, rownames.side = "inside")
circos.clear()
The above code makes:
Im not sure where I am going wrong!? As when I use the ComplexHeatmap package, I am splitting the matrices correctly, as shown below:
# using ComplexHeatmap package
library(ComplexHeatmap)
Heatmap(matX, column_split = split, show_row_dend = F, show_column_dend = F)
Any suggestions as to how I could achieve this?
I simulated some graph network data (~10,000 observations) in R and tried to visualize it using the visNetwork library in R. However, the data is very cluttered and is very difficult to analyze visually (I understand that in real life, network data is meant to be analyzed using graph query language).
For the time being, is there anything I can do to improve the visualization of the graph network I created (so I can explore some of the linkages and nodes that are all piled on top of each other)?
Can libraries such as 'networkD3' and 'diagrammeR' be used to better visualize this network?
I have attached my reproducible code below:
library(igraph)
library(dplyr)
library(visNetwork)
#create file from which to sample from
x5 <- sample(1:10000, 10000, replace=T)
#convert to data frame
x5 = as.data.frame(x5)
#create first file (take a random sample from the created file)
a = sample_n(x5, 9000)
#create second file (take a random sample from the created file)
b = sample_n(x5, 9000)
#combine
c = cbind(a,b)
#create dataframe
c = data.frame(c)
#rename column names
colnames(c) <- c("a","b")
graph <- graph.data.frame(c, directed=F)
graph <- simplify(graph)
graph
plot(graph)
library(visNetwork)
nodes <- data.frame(id = V(graph)$name, title = V(graph)$name)
nodes <- nodes[order(nodes$id, decreasing = F),]
edges <- get.data.frame(graph, what="edges")[1:2]
visNetwork(nodes, edges) %>% visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>%
visInteraction(navigationButtons = TRUE)
Thanks
At the request of the OP, I am applying the method used in a previous answer
Visualizing the result of dividing the network into communities to this problem.
The network in the question was not created with a specified random seed.
Here, I specify the seed for reproducibility.
## reproducible version of OP's network
library(igraph)
library(dplyr)
set.seed(1234)
#create file from which to sample from
x5 <- sample(1:10000, 10000, replace=T)
#convert to data frame
x5 = as.data.frame(x5)
#create first file (take a random sample from the created file)
a = sample_n(x5, 9000)
#create second file (take a random sample from the created file)
b = sample_n(x5, 9000)
#combine
c = cbind(a,b)
#create dataframe
c = data.frame(c)
#rename column names
colnames(c) <- c("a","b")
graph <- graph.data.frame(c, directed=F)
graph <- simplify(graph)
As noted by the OP, a simple plot is a mess. The referenced previous answer
broke this into two parts:
Plot all of the small components
Plot the giant component
1. Small components
Different components get different colors to help separate them.
## Visualize the small components separately
SmallV = which(components(graph)$membership != 1)
SmallComp = induced_subgraph(graph, SmallV)
LO_SC = layout_components(SmallComp, layout=layout_with_graphopt)
plot(SmallComp, layout=LO_SC, vertex.size=9, vertex.label.cex=0.8,
vertex.color=rainbow(18, alpha=0.6)[components(graph)$membership[SmallV]])
More could be done with this, but that is fairly easy and not the substance of the question, so I will leave this as the representation of the small components.
2. Giant component
Simply plotting the giant component is still hard to read. Here are two
approaches to improving the display. Both rely on grouping the vertices.
For this answer, I will use cluster_louvain to group the nodes, but you
could try other community detection methods. cluster_louvain produces 47
communities.
## Now try for the giant component
GiantV = which(components(graph)$membership == 1)
GiantComp = induced_subgraph(graph, GiantV)
GC_CL = cluster_louvain(GiantComp)
max(GC_CL$membership)
[1] 47
Giant method 1 - grouped vertices
Create a layout that emphasizes the communities
GC_Grouped = GiantComp
E(GC_Grouped)$weight = 1
for(i in unique(membership(GC_CL))) {
GroupV = which(membership(GC_CL) == i)
GC_Grouped = add_edges(GC_Grouped, combn(GroupV, 2), attr=list(weight=6))
}
set.seed(1234)
LO = layout_with_fr(GC_Grouped)
colors <- rainbow(max(membership(GC_CL)))
par(mar=c(0,0,0,0))
plot(GC_CL, GiantComp, layout=LO,
vertex.size = 5,
vertex.color=colors[membership(GC_CL)],
vertex.label = NA, edge.width = 1)
This provides some insight, but the many edges make it a bit hard to read.
Giant method 2 - contracted communities
Plot each community as a single vertex. The size of the vertex
reflects the number of nodes in that community. The color represents
the degree of the community node.
## Contract the communities in the giant component
CL.Comm = simplify(contract(GiantComp, membership(GC_CL)))
D = unname(degree(CL.Comm))
set.seed(1234)
par(mar=c(0,0,0,0))
plot(CL.Comm, vertex.size=sqrt(sizes(GC_CL)),
vertex.label=1:max(membership(GC_CL)), vertex.cex = 0.8,
vertex.color=round((D-29)/4)+1)
This is much cleaner, but loses any internal structure of the communities.
Just a tip for 'real-life'. The best way to deal with large graphs is to either 1) filter the edges you are using by some measure, or 2) use some related variable as weight.
I have some very simple pedigree data that I would like to make visualise graphically. Example data here
I have tried with kinship2, but had no success - see here for previous issues with kinship2
I have also been trying with igraph but have not been able to get the graph quite right. I have managed, with the below code, to get a good representation of female linages.
library(igraph)
library(dplyr)
GGM_igraph <- read.csv("example_data.csv")
mothers=GGM_igraph[,c('Ring','Mother','famid')]
fathers=GGM_igraph[,c('Ring','Father','famid')]
links<-left_join(mothers, fathers)
g=graph.data.frame(links)
this script is from this question originally
G_Grouped = g
E(G_Grouped)$weight = 1
## Add edges with high weight between all nodes in the same group
for(i in unique(V(g)$famid)) {
GroupV = which(V(g)$famid == i)
G_Grouped = add_edges(G_Grouped, combn(GroupV, 5), attr=list(weight=10))
}
## Now create a layout based on G_Grouped
set.seed(567)
LO = layout_with_fr(G_Grouped)
## Use the layout to plot the original graph
par(mar=c(0,0,0,0))
# then plot the graph
plot(g, vertex.color=links$famid, layout=LO,
vertex.size = 8,
vertex.label.cex=.7,
vertex.label.color = "black",
edge.arrow.size = 0.25,
edge.arrow.mode = 1)
What I would like to do is:
1) include the males in the same graph as a number of them had offspring with more than one female over their life time
2) manually assign colours to each family. At the momemnt it is automatic, which is then assigning the mothers (most who don't have a family as they are founders) a random colour and also it is reusing some colours as there is not enough in the default pallette.
I'm trying to plot a temporal social network in R. My approach is to create a master graph and layout for all nodes. Then, I will subset the graph based on a series of vertex id's. However, when I do this and layout the graph, I get completely different node locations. I think I'm either subsetting the layout matrix incorrectly. I can't locate where my issue is because I've done some smaller matrix subsets and everything seems to work fine.
I have some example code and an image of the issue in the network plots.
library(igraph)
# make graph
g <- barabasi.game(25)
# make graph and set some aestetics
set.seed(123)
l <- layout_nicely(g)
V(g)$size <- rescale(degree(g), c(5, 20))
V(g)$shape <- 'none'
V(g)$label.cex <- .75
V(g)$label.color <- 'black'
E(g)$arrow.size = .1
# plot graph
dev.off()
par(mfrow = c(1,2),
mar = c(1,1,5,1))
plot(g, layout = l,
main = 'Entire\ngraph')
# use index & induced subgraph
v_ids <- sample(1:25, 15, F)
sub_l <- l[v_ids, c(1,2)]
sub_g <- induced_subgraph(g, v_ids)
# plot second graph
plot(sub_g, layout = sub_l,
main = 'Sub\ngraph')
The vertices in the second plot should match layout of those in the first.
Unfortunately, you set the random seed after you generated the graph,
so we cannot exactly reproduce your result. I will use the same code but
with set.seed before the graph generation. This makes the result look
different than yours, but will be reproducible.
When I run your code, I do not see exactly the same problem as you are
showing.
Your code (with set.seed moved and scales added)
library(igraph)
library(scales) # for rescale function
# make graph
set.seed(123)
g <- barabasi.game(25)
# make graph and set some aestetics
l <- layout_nicely(g)
V(g)$size <- rescale(degree(g), c(5, 20))
V(g)$shape <- 'none'
V(g)$label.cex <- .75
V(g)$label.color <- 'black'
E(g)$arrow.size = .1
## V(g)$names = 1:25
# plot graph
dev.off()
par(mfrow = c(1,2),
mar = c(1,1,5,1))
plot(g, layout = l,
main = 'Entire\ngraph')
# use index & induced subgraph
v_ids <- sort(sample(1:25, 15, F))
sub_l <- l[v_ids, c(1,2)]
sub_g <- induced_subgraph(g, v_ids)
# plot second graph
plot(sub_g, layout = sub_l,
main = 'Sub\ngraph', vertex.label=V(sub_g)$names)
When I run your code, both graphs have nodes in the same
positions. That is not what I see in the graph in your question.
I suggest that you run just this code and see if you don't get
the same result (nodes in the same positions in both graphs).
The only difference between the two graphs in my version is the
node labels. When you take the subgraph, it renumbers the nodes
from 1 to 15 so the labels on the nodes disagree. You can fix
this by storing the node labels in the graph before taking the
subgraph. Specifically, add V(g)$names = 1:25 immediately after
your statement E(g)$arrow.size = .1. Then run the whole thing
again, starting at set.seed(123). This will preserve the
original numbering as the node labels.
The graph looks slightly different because the new, sub-graph
does not take up all of the space and so is stretched to use
up the empty space.
Possible fast way around: draw the same graph, but color nodes and vertices that you dont need in color of your background. Depending on your purposes it can suit you.
I have a phylogenetic tree that I drew on R. I want to color my tip edges based on the order of my species. How can I choose the color of every tip label alone?
I tried first:
EdgeCols <- rep("black", Nedge(tree))
EdgeCols[which.edge(tree, tree$edge[1]) ] <- "red"
plot( tree, space = 30, assoc = AMat,
show.tip.label = T, gap = 1, length.line = 0, edge.color =EdgeCols1)
But I would not get any change in the color of this edge.
Can anyone tell me where the problem is?
I am not exactly sure what you are trying to do, but here is how to color specific edges of a phylogeny with the ape package. Here is code for coloring all edges:
library(ape)
# Simulate tree
ntax <- 20
tree <- rcoal(ntax)
# Color branches
colors <- rainbow(Nedge(tree))
plot(tree, edge.color=colors)
And for coloring all terminal branches:
# Color terminal branches
colors2 <- rep("black", Nedge(tree))
colors2[which(tree$edge[,2] %in% 1:20)] <- rainbow(ntax)
plot(tree, edge.color=colors2)
I also would point out that there are obvious issues in your code:
You have tree$edge[1], but tree$edge is a matrix, so you can't index it with one value.
The which.edge function requires a vector of tips and returns the index of all the edges within the monophyletic clade defined by those tips. It seems like you are trying to give it a single value, which doesn't make any sense.
You define EdgeCols, but then in your plot function you have EdgeCols1.