In R, how can I make the branches of a classification tree not overlap in a plot? - r

I have a tree with a lot of branches. Here is my code to plot the tree. The problem is that the labels overlap each other, specially towards the bottom of the tree. Is there any way to plot the tree so that the labels don't overlap?
par(mfrow=c(1,1))
plot(prunedTree, type=c("uniform"))
text(prunedTree)
Note--I used "type=c("uniform"))" because it helped readability the lower branches. Also, prunedTree is the class "tree" from the tree package.
Here's a sample of what is being produced currently.
EDIT: Code to fully reproduce the issue.
load(url("https://spark-public.s3.amazonaws.com/dataanalysis/samsungData.rda"))
samsungData$subject <- factor(samsungData$subject)
samsungData$activity <- factor(samsungData$activity)
samsungData <- samsungData[, !c(duplicated(names(samsungData)))]
names(samsungData) <- gsub("[.]", "", names(samsungData))
samsungData <- data.frame(samsungData)
trainDF <- samsungData[samsungData$subject %in% c(1,3,5,6),]
tree1 <- tree(activity ~ ., data=trainDF)
plot(tree1)
text(tree1)

You have several general options:
Use a wider graphics device. (i.e. png(...,width = 1200,height = ...))
Shrink the text using cex = 0.5 (or smaller)
Use more concise column (i.e. variable) names
Some combination of the previous three.
I thought I could get text.tree to use fewer significant digits in labeling the splits, but I can't seem to do that. rpart appears to use only 4 digits by default, so that would save you some space as well.

In addition to joran indications listed above, you can play with parameters:
srt to rtotate your text.
give different colors for text
For example :
plot(tree1)
text(tree1,col=rainbow(5)[1:25],srt=85,cex=0.8)

Related

iGraph - Spacing between verticies

I have a dataset called data. The data is not that important, but every interaction has a name. I want to create a graph in iGraph with the following code:
tab <- count(data, B, S, K)
factors <- table(interaction(tab$B, tab$K),interaction(tab$S,tab$K))
graph1 <- graph_from_incidence_matrix(factors)
plot(graph1, vertex.size = 40, layout = layout.bipartite)
However, I get the following:
All the names of interactions are completely mixed together. I can make it a little more readable by lowering the vertex.size, but I want to find a solution to my problem.
I want to create more space between the verticies, but I cannot seem to find the right way.
I have tried creating a manual graph by using tkplot, but it is annoying that I manually have to sort them each time.
Best regards

Controlling margins in a genoPlotR plot_gene_map

I'm producing a plot_gene_map figure by the genoPlotR R package, which gives a horizontal phylogenetic tree where aligned with each leaf is a genomic segment.
Here's a simple example that illustrates my usage and problem:
The plot_gene_map function requires an ade4s' package phylog object which represents the phylogenetic tree:
tree <- ade4::newick2phylog("(((A:0.08,B:0.075):0.028,(C:0.06,D:0.06):0.05):0.0055,E:0.1);")
A list of genoPlotR's dna_seg objects (which are essentially data.frames with specific columns), where the names of the list elements have to match the names of the leaves of tree:
dna.segs.list <- list(A=genoPlotR::as.dna_seg(data.frame(name=paste0("VERY.LONG.NAME.A.",1:10),start=seq(1,91,10),end=seq(5,95,10),strand=1,col="black",ly=1,lwd=1,pch=1,cex=1,gene_type="blocks",fill="red")),
B=genoPlotR::as.dna_seg(data.frame(name=paste0("VERY.LONG.NAME.B.",1:10),start=seq(1,91,10),end=seq(5,95,10),strand=1,col="black",ly=1,lwd=1,pch=1,cex=1,gene_type="blocks",fill="blue")),
C=genoPlotR::as.dna_seg(data.frame(name=paste0("VERY.LONG.NAME.C.",1:10),start=seq(1,91,10),end=seq(5,95,10),strand=1,col="black",ly=1,lwd=1,pch=1,cex=1,gene_type="blocks",fill="green")),
D=genoPlotR::as.dna_seg(data.frame(name=paste0("VERY.LONG.NAME.D.",1:10),start=seq(1,91,10),end=seq(5,95,10),strand=1,col="black",ly=1,lwd=1,pch=1,cex=1,gene_type="blocks",fill="yellow")),
E=genoPlotR::as.dna_seg(data.frame(name=paste0("VERY.LONG.NAME.E.",1:10),start=seq(1,91,10),end=seq(5,95,10),strand=1,col="black",ly=1,lwd=1,pch=1,cex=1,gene_type="blocks",fill="orange")))
And a list of genoPlotR's annotation objects, which give coordinate information, also named according to the tree leaves:
annotation.list <- lapply(1:5,function(s){
mids <- genoPlotR::middle(dna.segs.list[[s]])
return(genoPlotR::annotation(x1=mids,x2=NA,text=dna.segs.list[[s]]$name,rot=30,col="black"))
})
names(annotation.list) <- names(dna.segs.list)
And the call to the function is:
genoPlotR::plot_gene_map(dna_segs=dna.segs.list,tree=tree,tree_width=2,annotations=annotation.list,annotation_height=1.3,annotation_cex=0.9,scale=F,dna_seg_scale=F)
Which gives:
As you can see the top and right box (gene) names get cut off.
I tried playing with pdf's width and height, when saving the figure to a file, and with the margins through par's mar, but they have no effect.
Any idea how to display this plot without getting the names cut off?
Currently genoPlotR's plot_gene_map does not have a legend option implemented. Any idea how can I add a legend, let's say which shows these colors in squares aside these labels:
data.frame(label = c("A","B","C","D","E"), color = c("red","blue","green","yellow","orange"))
Glad that you like genoPlotR.
There are no real elegant solution to your problem, but here are a few things you can attempt:
- increase annotation_height and reduce annotation_cex
- increase rotation (“rot”) in the annotation function
- use xlims to artificially increase the length of the dna_seg (but that’s a bad hack)
For the rest (including the legend), you’ll have to use grid and its viewports.
A blend of the first 3 solutions:
annotation.list <- lapply(1:5,function(s){
mids <- genoPlotR::middle(dna.segs.list[[s]])
return(genoPlotR::annotation(x1=mids, x2=NA, text=dna.segs.list[[s]]$name,rot=75,col="black"))
})
names(annotation.list) <- names(dna.segs.list)
genoPlotR::plot_gene_map(dna_segs=dna.segs.list,tree=tree,tree_width=2,annotations=annotation.list,annotation_height=5,annotation_cex=0.4,scale=F,dna_seg_scale=F, xlims=rep(list(c(0,110)),5))
For the better solution with grid: (note the "plot_new=FALSE" in the call to plot_gene_map)
# changing rot to 30
annotation.list <- lapply(1:5,function(s){
mids <- genoPlotR::middle(dna.segs.list[[s]])
return(genoPlotR::annotation(x1=mids,x2=NA,text=dna.segs.list[[s]]$name,rot=30,col="black"))
})
names(annotation.list) <- names(dna.segs.list)
# main viewport: two columns, relative widths 1 and 0.3
pushViewport(viewport(layout=grid.layout(1,2, widths=unit(c(1, 0.3), rep("null", 2))), name="overall_vp"))
# viewport with gene_map
pushViewport(viewport(layout.pos.col=1, name="geneMap"))
genoPlotR::plot_gene_map(dna_segs=dna.segs.list,tree=tree,tree_width=2,annotations=annotation.list,annotation_height=3,annotation_cex=0.5,scale=F,dna_seg_scale=F, plot_new=FALSE)
upViewport()
# another viewport for the margin/legend
pushViewport(viewport(layout.pos.col=2, name="legend"))
plotLegend(…)
upViewport()
Hope that helps!
Lionel
Which function or package could I use to add the legend? The R base functions did not seem to work for me. The following message is displayed:
Error in strheight(legend, units = "user", cex = cex) :
plot.new has not been called yet"

Selecting clusters below a certain height in a dendrogram R but only if the cluster is bigger than one

I'm looking to write some simple code that will select for certain clusters below a threshold height and highlight them (either with a box or by colour).
So far I have used cutree, which selects the clusters I am after, but it also selects all the clusters of size 1.
I've managed to use which to select the clusters I actually want, but as this is only a very small section of the data I have I don't want to have to go through manually to choose these. Is there a way that I can cut the tree but only select clusters bigger than one?
This is the code I'm using at the moment:
plot(hClust,hang = -1,cex=0.5)
abline(h= 0.0018,col = 'blue')
ct <- cutree(hClust, h = 0.0018)
clust <- rect.hclust(hClust, h=0.0018, which = c(1,2,4,8,23))
You do not provide your data so I will illustrate with the built-in mtcars data. Of course, the heights are different than yours. Same set-up as your problem:
hClust =hclust(dist(mtcars))
plot(hClust,hang = -1, cex=0.8)
abline(h= 28,col = 'blue')
Now we can call rect.hclust without printing (border=0), to get the clusters numbered as rect.hclust see them. Then we can select the clusters with more than one point and put the boxes around those.
clust <- rect.hclust(hClust, h=28, border=0)
NumMemb = sapply(clust, length)
clust <- rect.hclust(hClust, h=28, which=which(NumMemb>1))

Node labels on circular phylogenetic tree

I am trying to create circular phylogenetic tree. I have this part of code:
fit<- hclust(dist(Data[,-4]), method = "complete", members = NULL)
nclus= 3
color=c('red','blue','green')
color_list=rep(color,nclus/length(color))
clus=cutree(fit,nclus)
plot(as.phylo(fit),type='fan',tip.color=color_list[clus],label.offset=0.2,no.margin=TRUE, cex=0.70, show.node.label = TRUE)
And this is result:
Also I am trying to show label for each node and to color branches. Any suggestion how to do that?
Thanks!
When you say "color branches" I assume you mean color the edges. This seems to work, but I have to think there's a better way.
Using the built-in mtcars dataset here, since you did not provide your data.
plot.fan <- function(hc, nclus=3) {
palette <- c('red','blue','green','orange','black')[1:nclus]
clus <-cutree(hc,nclus)
X <- as.phylo(hc)
edge.clus <- sapply(1:nclus,function(i)max(which(X$edge[,2] %in% which(clus==i))))
order <- order(edge.clus)
edge.clus <- c(min(edge.clus),diff(sort(edge.clus)))
edge.clus <- rep(order,edge.clus)
plot(X,type='fan',
tip.color=palette[clus],edge.color=palette[edge.clus],
label.offset=0.2,no.margin=TRUE, cex=0.70)
}
fit <- hclust(dist(mtcars[,c("mpg","hp","wt","disp")]))
plot.fan(fit,3); plot.fan(fit,5)
Regarding "label the nodes", if you mean label the tips, it looks like you've already done that. If you want different labels, unfortunately, unlike plot.hclust(...) the labels=... argument is rejected. You could experiment with the tiplabels(....) function, but it does not seem to work very well with type="fan". The labels come from the row names of Data, so your best bet IMO is to change the row names prior to clustering.
If you actually mean label the nodes (the connection points between the edges, have a look at nodelabels(...). I don't provide a working example because I can't imagine what labels you would put there.

How to plot a large ctree() to avoid overlapping nodes

When I plotted the decision tree result from ctree() from party package, the font was too big and the box was also too big. They are overlapping other nodes.
Is there a way to customize the output from plot() so that the box and the font would be smaller ?
The short answer seems to be, no, you cannot change the font size, but there are some good other options.
I know of three possible solutions. First, you can change other parameters in the plot to make it more compact. Second, you can write it to a graphic file and view that file. Third, you can use an alternative implementation of ctree() in the partykit package, which is a newer package by some of the same authors.
Default Plot Example
library(party)
airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq,
controls = ctree_control(maxsurrogate = 3))
plot(airct) #default plot, some crowding with N hidden on leafs
Simplified plot
# simpler version of plot
plot(airct, type="simple", # no terminal plots
inner_panel=node_inner(airct,
abbreviate = TRUE, # short variable names
pval = FALSE, # no p-values
id = FALSE), # no id of node
terminal_panel=node_terminal(airct,
abbreviate = TRUE,
digits = 1, # few digits on numbers
fill = c("white"), # make box white not grey
id = FALSE)
)
This is somewhat better and one might be able to improve it further. To figure out these details, I originally did class(airct) which returned "BinaryTree". Armed with this info, I started reading ?plot.BinaryTree
Write to a file
A second simple solution is to write the plot to a file and then view the file. You may need to play with the settings to find the best fit.
png("airct.png", res=80, height=800, width=1600)
plot(airct)
dev.off()
Plot with partykit package instead
Finally, you can use a newer and not-yet-finished re-implementation of the party package by some of the same authors. At this point (Dec 2012), the only function they have re-done is ctree(). This version allows you to change font size.
library(partykit)
airct <- ctree(Ozone ~ ., data = airq)
class(airct) # different class from before
# "constparty" "party"
plot(airct, gp = gpar(fontsize = 6), # font size changed to 6
inner_panel=node_inner,
ip_args=list(
abbreviate = TRUE,
id = FALSE)
)
Here I have left the leafs in their default setting because I have frankly never figured out how to get it to work the way I want. I suspect this has to do with the fact that the package is incomplete (as of Dec 2012). You can read about the plot method starting with ?plot.party
Another option (that doesn't change what you want but does potentially solve the underlying problem) is to change the size of the figure itself, as I learned in my class for my assignment.
Replace the r in the below:
{r}
with:
{r, fig.width=X, fig.height=Y}
where the X and Y need to be replaced by numbers chosen by you depending on what size you think works better.
This website, talks about doing this in more detail and universally throughout the document.

Resources