I want to make a simple phylogenetic tree for a marine biology course as an educative example. I have a list of species with taxonomic rank:
Group <- c("Benthos","Benthos","Benthos","Benthos","Benthos","Benthos","Zooplankton","Zooplankton","Zooplankton","Zooplankton",
"Zooplankton","Zooplankton","Fish","Fish","Fish","Fish","Fish","Fish","Phytoplankton","Phytoplankton","Phytoplankton","Phytoplankton")
Domain <- rep("Eukaryota", length(Group))
Kingdom <- c(rep("Animalia", 18), rep("Chromalveolata", 4))
Phylum <- c("Annelida","Annelida","Arthropoda","Arthropoda","Porifera","Sipunculida","Arthropoda","Arthropoda","Arthropoda",
"Arthropoda","Echinoidermata","Chorfata","Chordata","Chordata","Chordata","Chordata","Chordata","Chordata","Heterokontophyta",
"Heterokontophyta","Heterokontophyta","Dinoflagellata")
Class <- c("Polychaeta","Polychaeta","Malacostraca","Malacostraca","Demospongiae","NA","Malacostraca","Malacostraca",
"Malacostraca","Maxillopoda","Ophiuroidea","Actinopterygii","Chondrichthyes","Chondrichthyes","Chondrichthyes","Actinopterygii",
"Actinopterygii","Actinopterygii","Bacillariophyceae","Bacillariophyceae","Prymnesiophyceae","NA")
Order <- c("NA","NA","Amphipoda","Cumacea","NA","NA","Amphipoda","Decapoda","Euphausiacea","Calanioda","NA","Gadiformes",
"NA","NA","NA","NA","Gadiformes","Gadiformes","NA","NA","NA","NA")
Species <- c("Nephtys sp.","Nereis sp.","Gammarus sp.","Diastylis sp.","Axinella sp.","Ph. Sipunculida","Themisto abyssorum","Decapod larvae (Zoea)",
"Thysanoessa sp.","Centropages typicus","Ophiuroidea larvae","Gadus morhua eggs / larvae","Etmopterus spinax","Amblyraja radiata",
"Chimaera monstrosa","Clupea harengus","Melanogrammus aeglefinus","Gadus morhua","Thalassiosira sp.","Cylindrotheca closterium",
"Phaeocystis pouchetii","Ph. Dinoflagellata")
dat <- data.frame(Group, Domain, Kingdom, Phylum, Class, Order, Species)
dat
I would like to get a dendrogram (cluster analysis) and use Domain as the first cutting point, Kindom as the second, Phylum as the third, etc. Missing values should be ignored (no cutting point, a straight line instead). Group should be used as a coloring category for the labels.
I am a bit uncertain how to make a distance matrix from this data frame. There are a lot of phylogenetic tree packages for R, they seem to want newick data / DNA / other advanced information. Thus help with this would be appreciated.
It's probably a bit lame to answer my own question, but I found an easier solution. Maybe it helps someone one day.
library(ape)
taxa <- as.phylo(~Kingdom/Phylum/Class/Order/Species, data = dat)
col.grp <- merge(data.frame(Species = taxa$tip.label), dat[c("Species", "Group")], by = "Species", sort = F)
cols <- ifelse(col.grp$Group == "Benthos", "burlywood4", ifelse(col.grp$Group == "Zooplankton", "blueviolet", ifelse(col.grp$Group == "Fish", "dodgerblue", ifelse(col.grp$Group == "Phytoplankton", "darkolivegreen2", ""))))
plot(taxa, type = "cladogram", tip.col = cols)
Note that all columns have to be factors. This demonstrates the work flow with R. It takes a week to find out something, although the code itself is just a couple of rows =)
If you wanted to draw the tree by hand
(this is probably not the best way to do it),
you could start as follows
(it is not a complete answer:
the colours are missing,
and the edges are too long).
This assumes that the data has already been sorted.
# Data: remove Group
dat <- data.frame(Domain, Kingdom, Phylum, Class, Order, Species)
# Start a new plot
par(mar=c(0,0,0,0))
plot(NA, xlim=c(0,ncol(dat)+1), ylim=c(0,nrow(dat)+1),
type="n", axes=FALSE, xlab="", ylab="", main="")
# Compute the position of each node and find all the edges to draw
positions <- NULL
links <- NULL
for(k in 1:ncol(dat)) {
y <- tapply(1:nrow(dat), dat[,k], mean)
y <- y[ names(y) != "NA" ]
positions <- rbind( positions, data.frame(
name = names(y),
x = k,
y = y
))
}
links <- apply( dat, 1, function(u) {
u <- u[ !is.na(u) & u != "NA" ]
cbind(u[-length(u)],u[-1])
} )
links <- do.call(rbind, links)
rownames(links) <- NULL
links <- unique(links[ order(links[,1], links[,2]), ])
# Draw the edges
for(i in 1:nrow(links)) {
from <- positions[links[i,1],]
to <- positions[links[i,2],]
lines( c(from$x, from$x, to$x), c(from$y, to$y, to$y) )
}
# Add the text
text(positions$x, positions$y, label=positions$name)
Related
(Fig. 3a, b, Extended Data Fig. 3a, b and Supplementary Table 1).
After 48 h, more than one-third of the transcriptome was
differentially expressed (>5,000 genes; 405 genes encoding for
proteins in the extracellular region, Gene Ontology (GO) accession
0005576), significantly overlapping with the gene expression changes
of A375 tumours in vivo after 5 days of vemurafenib treatment (Fig.
3a, b and Extended Data Fig. 3c). Similar extensive gene expression
changes were observed in Colo800 and UACC62 melanoma cells treated
with vemurafenib and H3122 lung adenocarcinoma cells treated with
crizotinib (Extended Data Fig. 3d). Despite different cell lineages,
different oncogenic drivers, and different targeted therapies we
observed a significant overlap between the secretome of melanoma and
lung adenocarcinoma cells (P < 9.11 × 10−5)
The original paper
I would like to see similar to the figure f where it shows the intersection and significance overlap. To achieve that i got this code working till the intersection part but I dont know how to run the significance part.
library(reshape2)
library(venneuler)
RNA_seq_cds <- read.csv("~/Downloads/RNA_seq_gene_set.txt", header=TRUE, sep="\t")
head(RNA_seq_cds)
ATAC_seq <- read.csv("~/Downloads/ATAC_seq_gene_set.txt", header=TRUE, sep="\t")
head(ATAC_seq)
RNA_seq <- RNA_seq_cds
ATAC_seq <- ATAC_seq
#https://stackoverflow.com/questions/6988184/combining-two-data-frames-of-different-lengths
cbindPad <- function(...) {
args <- list(...)
n <- sapply(args, nrow)
mx <- max(n)
pad <- function(x, mx) {
if (nrow(x) < mx) {
nms <- colnames(x)
padTemp <- matrix(NA, mx - nrow(x), ncol(x))
colnames(padTemp) <- nms
if (ncol(x) == 0) {
return(padTemp)
} else {
return(rbind(x, padTemp))
}
} else {
return(x)
}
}
rs <- lapply(args, pad, mx)
return(do.call(cbind, rs))
}
dat <- cbindPad(ATAC_seq, RNA_seq)
vennfun <- function(x) {
x$id <- seq(1, nrow(x)) #add a column of numbers (required for melt)
xm <- melt(x, id.vars="id", na.rm=TRUE) #melt table into two columns (value & variable)
xc <- dcast(xm, value~variable, fun.aggregate=length) #remove NA's, list presence/absence of each value for each variable (1 or 0)
rownames(xc) <- xc$value #value column=rownames (required for Venneuler)
xc$value <- NULL #remove redundent value column
xc #output the new dataframe
}
#https://stackoverflow.com/questions/9121956/legend-venn-diagram-in-venneuler
VennDat <- vennfun(dat)
genes.venn <- venneuler(VennDat)
genes.venn$labels <- c("RNA", "\nATAC" )
# plot(genes.venn, cex =15, )
#https://stackoverflow.com/questions/30225151/how-to-create-venn-diagram-in-r-studio-from-group-of-three-frequency-column
#https://rstudio-pubs-static.s3.amazonaws.com/13301_6641d73cfac741a59c0a851feb99e98b.html
vd <- venneuler(VennDat)
vd$labels <- paste(genes.venn$labels, colSums(VennDat))
plot(vd, cex=10)
text(.3, .45,
bquote(bold("Common ="~.(as.character(sum(rowSums(VennDat) == 2))))),
col="red", cex=1)
LABS <- vd$labels
The above code gives me the intersection plot
Now the significance part how do i do that between two gene sets and show it as shown in the original figure.
My data which i have used to generate the above plot
Any suggestion or help would be really appreciated.
If you talk about how to place any text under your figure, just use 'text' as you did before. It's just some guessing on which x= and y= coordinates. Thexpd=TRUE allows you to plot over the margin.
VennDat <- vennfun(dat)
vd <- venneuler(VennDat)
vd$labels <- paste(c("RNA", "ATAC"), colSums(VennDat))
plot(vd, cex=10, border=c(NA, 'red'), col=c('#6b65af', '#ad7261'))
text(x=.5, y=.5, sum(rowSums(VennDat) == 2), xpd=TRUE)
text(.5, .15, 'overlap\n', xpd=TRUE)
text(.5, .13, bquote(italic(p)*'< 9.11E-55'), xpd=TRUE)
I also adjusted some parameters of plot. You may inspect the code of the plotting method using:
venneuler:::plot.VennDiagram
If you want to know how significance is calculated, you should post your question at Cross Validated.
I'm having multiple data frames where the first column (in the end filled with NA's) is the wavenumber and the other columns are my variables of the specific wavenumber for multiple observations.
Is there a possibility to plot the columns in a way that my first column holds the variables for the x-axis and the other are plotted into one big plot with their respective y-values?
I already tried "matplot" (resulting in "numbers" instead of points),
matplot(df[,1],df[,3:5],xlab = "Wavelength [nm]", ylab = "Absorbance")
different sets of "xyplot" (no possibility to give more than one y-value), but none seem to work (on my level of knowledge on R).
The final result should look like this:
Thanks for any help!
You could always make your own function to do this ;I make such functions on a regular basis when nothing really fits my needs.
I put this together rather quickly but you can adapt it to your needs.
# generate data
set.seed(6)
n <- 50
dat <- data.frame(x1=seq(1,100, length.out = n),
x2=seq(1,20, length.out = n)+rnorm(n),
x3=seq(1,20, length.out = n)+rnorm(n, mean = 3),
x4=seq(1,20, length.out = n)+rnorm(n, mean = 5))
# make some NAs at the end
dat[45:n,2] <- NA
dat[30:n,3] <- NA
plot_multi <- function(df, x=1, y=2, cols=y,
xlim=range(df[,x], na.rm = T),
ylim=range(df[,y], na.rm = T),
main="", xlab="", ylab="", ...){
# setup plot frame
plot(NULL,
xlim=xlim,
ylim=ylim,
main=main, xlab=xlab, ylab=ylab)
# plot all your y's against your x
pb <- sapply(seq_along(y), function(i){
points(df[,c(x, y[i])], col=cols[i], ...)
})
}
plot_multi(dat, y=2:4, type='l', lwd=3, main = ":)",
xlab = "Wavelength", ylab = "Absorbance")
Results in :
EDIT
I actually found your dataset online by chance, so I'll include how to plot it as well using my code above.
file <- 'http://openmv.net/file/tablet-spectra.csv'
spectra <- read.csv(file, header = FALSE)
# remove box label
spectra <- spectra[,-1]
# add the 'wavelength' and rotate the df
# (i didn't find the actual wavelength values, but hey).
spectra <- cbind(1:ncol(spectra), t(spectra))
plot_multi(spectra, y=2:ncol(spectra), cols = rainbow(ncol(spectra)),
type='l', main=":))", ylab="Absorbance", xlab = "'Wavelength'")
You could use the pavo R package, which is made to deal with spectral data (full disclosure, I'm one of the maintainers):
library(pavo)
df <- t(read.csv("http://openmv.net/file/tablet-spectra.csv", header = FALSE))
df <- df[-1, ]
df <- apply(df, 2, as.numeric)
df <- cbind(wl = seq_len(nrow(df)),
df)
df <- as.rspec(df)
#> wavelengths found in column 1
plot(df, ylab = "Absorbance", col = rainbow(3))
Created on 2019-07-26 by the reprex package (v0.3.0)
I have this bit of code
library(igraph)
library(igraphdata)
data("karate")
g <- karate
# for reproducibility
set.seed(23548723)
network_layout <- layout_with_fr(g)
trimmed_network <- delete.edges(g, which(E(g)$weight < 4))
communities <- cluster_louvain(trimmed_network)
plot(communities, trimmed_network, layout=network_layout)
and it generates
I want to disable the coloring (color="white" and mark.groups=NULL) of vertices in single vertice communities (length 1) and I know that you can manipulate the color of "normal" graphs by using $color but I did not find any hint in the igraph documentation how to handle it per community.
There is also the option not to use the community plotting with
plot(trimmed_network, ...)
thus using the color of the graph, but then I would loose the group markings.
How can I change color and group marks per community based on length(communities[1]) == 1
Identify the vertices in each group > 1 and pass those as a list to mark.groups. It is a bit fiddly, but it works.
r <- rle(sort(communities$membership))
x <- r$values[which(r$lengths>1)]
y <- r$values[which(r$lengths==1)]
cols <- communities$membership
cols[which(cols %in% y)] <- "white"
grps <- lapply(x, function(x) communities[[x]])
grps <- lapply(1:length(grps), function(x) which(V(g)$name %in% grps[[x]]))
plot(communities, trimmed_network, layout=network_layout,
col = cols, mark.groups = grps)
We need to find the numeric identifier of communities with only one member and set the color of the members of those singleton communities to "white".
# Get community membership
memb = membership(communities)
# Find number of members in each community
tab = table(memb)
# Set colors for each member. (Adjust these as desired)
col = colors()[2:(length(memb)+1)]
# But for members of communities of one, set the color to white
singles = which(memb %in% as.numeric(names(tab)[tab==1]))
col[singles] = "white"
plot(communities, trimmed_network, layout=network_layout, col=col, mark.groups=NULL)
I have built a phylogenetic tree for a protein family that can be split into different groups, classifying each one by its type of receptor or type of response. The nodes in the tree are labeled as the type of receptor.
In the phylogenetic tree I can see that proteins that belong to the same groups or type of receptor have clustered together in the same branches. So I would like to collapse these branches that have labels in common, grouping them by a given list of keywords.
The command would be something like this:
./collapse_tree_by_label -f phylogenetic_tree.newick -l list_of_labels_to_collapse.txt -o collapsed_tree.eps(or pdf)
My list_of_labels_to_collapse.txt would be like this:
A
B
C
D
My newick tree would be like this:
(A_1:0.05,A_2:0.03,A_3:0.2,A_4:0.1):0.9,(((B_1:0.05,B_2:0.02,B_3:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2)
The output image without collapsing is like this:
http://i.stack.imgur.com/pHkoQ.png
The output image collapsing should be like this (collapsed_tree.eps):
http://i.stack.imgur.com/TLXd0.png
The width of the triangles should represent the branch length, and the high of the triangles must represent the number of nodes in the branch.
I have been playing with the "ape" package in R. I was able to plot a phylogenetic tree, but I still can't figure out how to collapse the branches by keywords in the labels:
require("ape")
This will load the tree:
cat("((A_1:0.05,A_2:0.03,A_3:0.2,A_4:0.1):0.9,(((B_1:0.05,B_2:0.02,B_3:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2):0.5);", file = "ex.tre", sep = "\n")
tree.test <- read.tree("ex.tre")
Here should be the code to collapse
This will plot the tree:
plot(tree.test)
Your tree as it is stored in R already has the tips stored as polytomies. It's just a matter of plotting the tree with triangles representing the polytomies.
There is no function in ape to do this, that I am aware of, but if you mess with the plotting function a little bit you can pull it off
# Step 1: make edges for descendent nodes invisible in plot:
groups <- c("A", "B", "C", "D")
group_edges <- numeric(0)
for(group in groups){
group_edges <- c(group_edges,getMRCA(tree.test,tree.test$tip.label[grepl(group, tree.test$tip.label)]))
}
edge.width <- rep(1, nrow(tree.test$edge))
edge.width[tree.test$edge[,1] %in% group_edges ] <- 0
# Step 2: plot the tree with the hidden edges
plot(tree.test, show.tip.label = F, edge.width = edge.width)
# Step 3: add triangles
add_polytomy_triangle <- function(phy, group){
root <- length(phy$tip.label)+1
group_node_labels <- phy$tip.label[grepl(group, phy$tip.label)]
group_nodes <- which(phy$tip.label %in% group_node_labels)
group_mrca <- getMRCA(phy,group_nodes)
tip_coord1 <- c(dist.nodes(phy)[root, group_nodes[1]], group_nodes[1])
tip_coord2 <- c(dist.nodes(phy)[root, group_nodes[1]], group_nodes[length(group_nodes)])
node_coord <- c(dist.nodes(phy)[root, group_mrca], mean(c(tip_coord1[2], tip_coord2[2])))
xcoords <- c(tip_coord1[1], tip_coord2[1], node_coord[1])
ycoords <- c(tip_coord1[2], tip_coord2[2], node_coord[2])
polygon(xcoords, ycoords)
}
Then you just have to loop through the groups to add the triangles
for(group in groups){
add_polytomy_triangle(tree.test, group)
}
I've also been searching for this kind of tool for ages, not so much for collapsing categorical groups, but for collapsing internal nodes based on a numerical support value.
The di2multi function in the ape package can collapse nodes to polytomies, but it currently can only does this by branch length threshold.
Here is a rough adaptation that allows collapsing by a node support value threshold instead (default threshold = 0.5).
Use at your own risk, but it works for me on my rooted Bayesian tree.
di2multi4node <- function (phy, tol = 0.5)
# Adapted di2multi function from the ape package to plot polytomies
# based on numeric node support values
# (di2multi does this based on edge lengths)
# Needs adjustment for unrooted trees as currently skips the first edge
{
if (is.null(phy$edge.length))
stop("the tree has no branch length")
if (is.na(as.numeric(phy$node.label[2])))
stop("node labels can't be converted to numeric values")
if (is.null(phy$node.label))
stop("the tree has no node labels")
ind <- which(phy$edge[, 2] > length(phy$tip.label))[as.numeric(phy$node.label[2:length(phy$node.label)]) < tol]
n <- length(ind)
if (!n)
return(phy)
foo <- function(ancestor, des2del) {
wh <- which(phy$edge[, 1] == des2del)
for (k in wh) {
if (phy$edge[k, 2] %in% node2del)
foo(ancestor, phy$edge[k, 2])
else phy$edge[k, 1] <<- ancestor
}
}
node2del <- phy$edge[ind, 2]
anc <- phy$edge[ind, 1]
for (i in 1:n) {
if (anc[i] %in% node2del)
next
foo(anc[i], node2del[i])
}
phy$edge <- phy$edge[-ind, ]
phy$edge.length <- phy$edge.length[-ind]
phy$Nnode <- phy$Nnode - n
sel <- phy$edge > min(node2del)
for (i in which(sel)) phy$edge[i] <- phy$edge[i] - sum(node2del <
phy$edge[i])
if (!is.null(phy$node.label))
phy$node.label <- phy$node.label[-(node2del - length(phy$tip.label))]
phy
}
This is my answer based on phytools::phylo.toBackbone function,
see http://blog.phytools.org/2013/09/even-more-on-plotting-subtrees-as.html, and http://blog.phytools.org/2013/10/finding-edge-lengths-of-all-terminal.html. First, load the function at the end of code.
library(ape)
library(phytools) #phylo.toBackbone
library(phangorn)
cat("((A_1:0.05,E_2:0.03,A_3:0.2,A_4:0.1,A_5:0.1,A_6:0.1,A_7:0.35,A_8:0.4,A_9:01,A_10:0.2):0.9,((((B_1:0.05,B_2:0.05):0.5,B_3:0.02,B_4:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2):0.5);"
, file = "ex.tre", sep = "\n")
phy <- read.tree("ex.tre")
groups <- c("A", "B|C", "D")
backboneoftree<-makebackbone(groups,phy)
# tip.label clade.label N depth
# 1 A_1 A 10 0.2481818
# 2 B_1 B|C 6 0.9400000
# 3 D_1 D 5 0.4600000
{
tryCatch(dev.off(),error=function(e){""})
par(fig=c(0,0.5,0,1), mar = c(0, 0, 2, 0))
plot(phy, main="Original" )
par(fig=c(0.5,1,0,1), oma = c(0, 0, 1.2, 0), xpd=NA, new=T)
plot(backboneoftree)
title(main="Clades")
}
makebackbone <- function(groupings,phy){
listofspecies <- phy$tip.label
listtopreserve <- character()
newedgelengths <- meandistnode<- lengthofclades<- numeric()
for (i in 1:length(groupings)){
bestmrca<-getMRCA(phy,grep(groupings[i], phy$tip.label) )
mrcatips<-phy$tip.label[unlist(phangorn::Descendants(phy,bestmrca, type="tips") )]
listtopreserve[i] <- mrcatips[1]
meandistnode[i] <- mean(dist.nodes(phy)[unlist(lapply(mrcatips,
function(x) grep(x, phy$tip.label) ) ),bestmrca] )
lengthofclades[i] <- length(mrcatips)
provtree <- drop.tip(phy,mrcatips, trim.internal=F, subtree = T)
n3 <- length(provtree$tip.label)
newedgelengths[i] <- setNames(provtree$edge.length[sapply(1:n3,function(x,y)
which(y==x),
y=provtree$edge[,2])],
provtree$tip.label)[provtree$tip.label[grep("tips",provtree$tip.label)] ]
}
newtree <- drop.tip(phy,setdiff(listofspecies,listtopreserve),
trim.internal = T)
n <- length(newtree$tip.label)
newtree$edge.length[sapply(1:n,function(x,y)
which(y==x),
y=newtree$edge[,2])] <- newedgelengths + meandistnode
trans <- data.frame(tip.label=newtree$tip.label,clade.label=groupings,
N=lengthofclades, depth=meandistnode )
rownames(trans) <- NULL
print(trans)
backboneoftree <- phytools::phylo.toBackbone(newtree,trans)
return(backboneoftree)
}
EDIT: I haven't tried this, but it might be another answer: "Script and function to transform the tip branches of a tree , i.e the thickness or to triangles, with the width of both correlating with certain parameters (e.g., species number of the clade) (tip.branches.R)"
https://www.en.sysbot.bio.lmu.de/people/employees/cusimano/use_r/index.html
I think the script is finally doing what I wanted.
From the answer that #CactusWoman provided, I changed the code a little bit so the script will try to find the MRCA that represents the largest branch that matches to my search pattern. This solved the problem of not merging non-polytomic branches, or collapsing the whole tree because one matching node was mistakenly outside the correct branch.
In addition, I included a parameter that represents the limit for the pattern abundance ratio in a given branch, so we can select and collapse/group branches that have at least 90% of its tips matching to the search pattern, for example.
library(geiger)
library(phylobase)
library(ape)
#functions
find_best_mrca <- function(phy, group, threshold){
group_matches <- phy$tip.label[grepl(group, phy$tip.label, ignore.case=TRUE)]
group_mrca <- getMRCA(phy,phy$tip.label[grepl(group, phy$tip.label, ignore.case=TRUE)])
group_leaves <- tips(phy, group_mrca)
match_ratio <- length(group_matches)/length(group_leaves)
if( match_ratio < threshold){
#start searching for children nodes that have more than 95% of descendants matching to the search pattern
mrca_children <- descendants(as(phy,"phylo4"), group_mrca, type="all")
i <- 1
new_ratios <- NULL
nleaves <- NULL
names(mrca_children) <- NULL
for(new_mrca in mrca_children){
child_leaves <- tips(tree.test, new_mrca)
child_matches <- grep(group, child_leaves, ignore.case=TRUE)
new_ratios[i] <- length(child_matches)/length(child_leaves)
nleaves[i] <- length(tips(phy, new_mrca))
i <- i+1
}
match_result <- data.frame(mrca_children, new_ratios, nleaves)
match_result_sorted <- match_result[order(-match_result$nleaves,match_result$new_ratios),]
found <- numeric(0);
print(match_result_sorted)
for(line in 1:nrow(match_result_sorted)){
if(match_result_sorted$ new_ratios[line]>=threshold){
return(match_result_sorted$mrca_children[line])
found <- 1
}
}
if(found==0){return(found)}
}else{return(group_mrca)}
}
add_triangle <- function(phy, group,phylo_plot){
group_node_labels <- phy$tip.label[grepl(group, phy$tip.label)]
group_mrca <- getMRCA(phy,group_node_labels)
group_nodes <- descendants(as(tree.test,"phylo4"), group_mrca, type="tips")
names(group_nodes) <- NULL
x<-phylo_plot$xx
y<-phylo_plot$yy
x1 <- max(x[group_nodes])
x2 <-max(x[group_nodes])
x3 <- x[group_mrca]
y1 <- min(y[group_nodes])
y2 <- max(y[group_nodes])
y3 <- y[group_mrca]
xcoords <- c(x1,x2,x3)
ycoords <- c(y1,y2,y3)
polygon(xcoords, ycoords)
return(c(x2,y3))
}
#main
cat("((A_1:0.05,E_2:0.03,A_3:0.2,A_4:0.1,A_5:0.1,A_6:0.1,A_7:0.35,A_8:0.4,A_9:01,A_10:0.2):0.9,((((B_1:0.05,B_2:0.05):0.5,B_3:0.02,B_4:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2):0.5);", file = "ex.tre", sep = "\n")
tree.test <- read.tree("ex.tre")
# Step 1: Find the best MRCA that matches to the keywords or search patten
groups <- c("A", "B|C", "D")
group_labels <- groups
group_edges <- numeric(0)
edge.width <- rep(1, nrow(tree.test$edge))
count <- 1
for(group in groups){
best_mrca <- find_best_mrca(tree.test, group, 0.90)
group_leaves <- tips(tree.test, best_mrca)
groups[count] <- paste(group_leaves, collapse="|")
group_edges <- c(group_edges,best_mrca)
#Step2: Remove the edges of the branches that will be collapsed, so they become invisible
edge.width[tree.test$edge[,1] %in% c(group_edges[count],descendants(as(tree.test,"phylo4"), group_edges[count], type="all")) ] <- 0
count = count +1
}
#Step 3: plot the tree hiding the branches that will be collapsed/grouped
last_plot.phylo <- plot(tree.test, show.tip.label = F, edge.width = edge.width)
#And save a copy of the plot so we can extract the xy coordinates of the nodes
#To get the x & y coordinates of a plotted tree created using plot.phylo
#or plotTree, we can steal from inside tiplabels:
last_phylo_plot<-get("last_plot.phylo",envir=.PlotPhyloEnv)
#Step 4: Add triangles and labels to the collapsed nodes
for(i in 1:length(groups)){
text_coords <- add_triangle(tree.test, groups[i],last_phylo_plot)
text(text_coords[1],text_coords[2],labels=group_labels[i], pos=4)
}
This doesn't address depicting the clades as triangles, but it does help with collapsing low-support nodes. The library ggtree has a function as.polytomy which can be used to collapse nodes based on support values.
For example, to collapse bootstraps less than 50%, you'd use:
polytree = as.polytomy(raxtree, feature='node.label', fun=function(x) as.numeric(x) < 50)
I am using the example here for discussion:
ggplot map with l
library(rgdal)
library(ggplot2)
library(maptools)
# Data from http://thematicmapping.org/downloads/world_borders.php.
# Direct link: http://thematicmapping.org/downloads/TM_WORLD_BORDERS_SIMPL-0.3.zip
# Unpack and put the files in a dir 'data'
gpclibPermit()
world.map <- readOGR(dsn="data", layer="TM_WORLD_BORDERS_SIMPL-0.3")
world.ggmap <- fortify(world.map, region = "NAME")
n <- length(unique(world.ggmap$id))
df <- data.frame(id = unique(world.ggmap$id),
growth = 4*runif(n),
category = factor(sample(1:5, n, replace=T)))
## noise
df[c(sample(1:100,40)),c("growth", "category")] <- NA
ggplot(df, aes(map_id = id)) +
geom_map(aes(fill = growth, color = category), map =world.ggmap) +
expand_limits(x = world.ggmap$long, y = world.ggmap$lat) +
scale_fill_gradient(low = "red", high = "blue", guide = "colorbar")
Gives the following results:
I would like to map one variable to the left "half" of a country and a different variable to the right "half" of the country. I put "half" in quotes because it's not clearly defined (or at least I'm not clearly defining it). The answer by Ian Fellows might help (which gives an easy way to get the centroid). I'm hoping for something so that I can do aes(left_half_color = growth, right_half_color = category) in the example. I'm also interested in top half and bottom half if that is different.
If possible, I would also like to map the individual centroids of the halves to something.
This is a solution without ggplot that relies on the plot function instead. It also requires the rgeos package in addition to the code in the OP:
EDIT Now with 10% less visual pain
EDIT 2 Now with centroids for east and west halves
library(rgeos)
library(RColorBrewer)
# Get centroids of countries
theCents <- coordinates(world.map)
# extract the polygons objects
pl <- slot(world.map, "polygons")
# Create square polygons that cover the east (left) half of each country's bbox
lpolys <- lapply(seq_along(pl), function(x) {
lbox <- bbox(pl[[x]])
lbox[1, 2] <- theCents[x, 1]
Polygon(expand.grid(lbox[1,], lbox[2,])[c(1,3,4,2,1),])
})
# Slightly different data handling
wmRN <- row.names(world.map)
n <- nrow(world.map#data)
world.map#data[, c("growth", "category")] <- list(growth = 4*runif(n),
category = factor(sample(1:5, n, replace=TRUE)))
# Determine the intersection of each country with the respective "left polygon"
lPolys <- lapply(seq_along(lpolys), function(x) {
curLPol <- SpatialPolygons(list(Polygons(lpolys[x], wmRN[x])),
proj4string=CRS(proj4string(world.map)))
curPl <- SpatialPolygons(pl[x], proj4string=CRS(proj4string(world.map)))
theInt <- gIntersection(curLPol, curPl, id = wmRN[x])
theInt
})
# Create a SpatialPolygonDataFrame of the intersections
lSPDF <- SpatialPolygonsDataFrame(SpatialPolygons(unlist(lapply(lPolys,
slot, "polygons")), proj4string = CRS(proj4string(world.map))),
world.map#data)
##########
## EDIT ##
##########
# Create a slightly less harsh color set
s_growth <- scale(world.map#data$growth,
center = min(world.map#data$growth), scale = max(world.map#data$growth))
growthRGB <- colorRamp(c("red", "blue"))(s_growth)
growthCols <- apply(growthRGB, 1, function(x) rgb(x[1], x[2], x[3],
maxColorValue = 255))
catCols <- brewer.pal(nlevels(lSPDF#data$category), "Pastel2")
# and plot
plot(world.map, col = growthCols, bg = "grey90")
plot(lSPDF, col = catCols[lSPDF#data$category], add = TRUE)
Perhaps someone can come up with a good solution using ggplot2. However, based on this answer to a question about multiple fill scales for a single graph ("You can't"), a ggplot2 solution seems unlikely without faceting (which might be a good approach, as suggested in the comments above).
EDIT re: mapping centroids of the halves to something: The centroids for the east ("left") halves can be obtained by
coordinates(lSPDF)
Those for the west ("right") halves can be obtained by creating an rSPDF object in a similar way:
# Create square polygons that cover west (right) half of each country's bbox
rpolys <- lapply(seq_along(pl), function(x) {
rbox <- bbox(pl[[x]])
rbox[1, 1] <- theCents[x, 1]
Polygon(expand.grid(rbox[1,], rbox[2,])[c(1,3,4,2,1),])
})
# Determine the intersection of each country with the respective "right polygon"
rPolys <- lapply(seq_along(rpolys), function(x) {
curRPol <- SpatialPolygons(list(Polygons(rpolys[x], wmRN[x])),
proj4string=CRS(proj4string(world.map)))
curPl <- SpatialPolygons(pl[x], proj4string=CRS(proj4string(world.map)))
theInt <- gIntersection(curRPol, curPl, id = wmRN[x])
theInt
})
# Create a SpatialPolygonDataFrame of the western (right) intersections
rSPDF <- SpatialPolygonsDataFrame(SpatialPolygons(unlist(lapply(rPolys,
slot, "polygons")), proj4string = CRS(proj4string(world.map))),
world.map#data)
Then information could be plotted on the map according to the centroids of lSPDF or rSPDF:
points(coordinates(rSPDF), col = factor(rSPDF#data$REGION))
# or
text(coordinates(lSPDF), labels = lSPDF#data$FIPS, cex = .7)