How to plot dendrograms with large datasets? - r

I am using ape (Analysis of Phylogenetics and Evolution) package in R that has dendrogram drawing functionality. I use following commands to read the data in Newick format, and draw a dendrogram using the plot function:
library("ape")
gcPhylo <-read.tree(file = "gc.tree")
plot(gcPhylo, show.node.label = TRUE)
As the data set is quite large, it is impossible to see any details in the lower levels of the tree. I can see just black areas but no details. I can only see few levels from the top, and then no detail.
I was wondering if there is any zoom capability of the plot function. I tried to limit the area using xLim and yLim, however, they just limit the area, and do not zoom to make the details visible. Either zooming, or making the details visible without zooming will solve my problem.
I am also appreciated to know any other package, function, or tool that will help me overcoming the problem.
Thanks.

It is possible to cut a dendrogram at a specified height and plot the elements:
First create a clustering using the built-in dataset USArrests. Then convert to a dendrogram:
hc <- hclust(dist(USArrests))
hcd <- as.dendrogram(hc)
Next, use cut.dendrogram to cut at a specified height, in this case h=75. This produces a list of a dendrogram for the upper bit of the cut, and a list of dendograms, one for each branch below the cut:
par(mfrow=c(3,1))
plot(hcd, main="Main")
plot(cut(hcd, h=75)$upper,
main="Upper tree of cut at h=75")
plot(cut(hcd, h=75)$lower[[2]],
main="Second branch of lower tree with cut at h=75")

The cut function described in the other answer is a very good solution; if you would like to maintain the whole tree on one page for some interactive investigation you could also plot to a large page on a PDF.
The resulting PDF is vectorized so you can zoom in closely with your favourite PDF viewer without loss of resolution.
Here's an example of how to direct plot output to PDF:
# Open a PDF for plotting; units are inches by default
pdf("/path/to/a/pdf/file.pdf", width=40, height=15)
# Do some plotting
plot(gcPhylo)
# Close the PDF file's associated graphics device (necessary to finalize the output)
dev.off()

Related

R igraph output vertice is not shown

I am using R igraph package to display gene networks. The plot on Rstudio is like this (I can't post image because I am new user and don't have enough reputation, sorry about that):
R igraph on preview
Now I want to draw this on file to clearly see the changes and there is always an issue on vertices near margin side like this:
part of output pdf file
My code is as follows`
pdf("graph.pdf",width = 20, height = 10)
par(mar = c(9,9,9,9))
plot(finalnet, edge.arrow.size=0.1, edge.curved=FALSE,vertex.size= 3, margin = -0.5)
dev.off()
Update: I have tried square layout and the problem persists, here is my plotting object and square plot.
square plot
rda file for my igraph object
Can anyone give me an suggestion how to solve this issue? To whole net is about 170 vertices but I don't know why it cannot be displayed on output file well. I have tried different plot options in mai, mar but this seems to fail.
The reason you are getting this behavior is because you are specifying margin in your plot call. margin=-0.5 is telling R to extend the plot 0.5 units past the graphics device dimensions, below are three examples:
Your original plotting call, notice the clipping
pdf("withMargin.pdf")
par(mar=c(9,9,9,9))
plot(g, margin=-0.5)
dev.off()
Without the call to par, problem still presists but now youuse the entire dimension of the graphics device.
png("withoutPar_Margin.png")
#par(mar=c(9,9,9,9))
plot(g, margin=-0.5)
dev.off()
Lastly, removing the margin in plot
png("withoutplotMargin.png")
par(mar=c(9,9,9,9))
plot(g)
dev.off()
You're specifying a rectangular size for what looks like a square object. Try a square size, as in
pdf("graph.pdf")
This will use the defaults, which are square.
But, it's hard to know for sure since you haven't given us the object to troubleshoot for you.

Link tip labels to phylogenetic tree using dots and fix overcrowded tip labels

I'm attempting to produce an ancestral reconstruction using the ape and phytools package in Rstudio. My problem is that in my phylogenetic tree the tip labels / species names are overcrowded and illegible. Currently, my tree has a dataset of 262 species.
An example nexus file of the data can be found here.
The Ancestral reconstruction tree I have produced so far is here: http://i.imgur.com/WFoEu7S.png.
Each species has a character state of 0 or 1 and has node and tip labels addressing each state. Eventually i'd like to color the branches with their respective character state(which I have as either red or black.)
Ideally, I wish to produce a non-ultrametric tree similar to a previous question on stack overflow in this link here.
I've tried implementing the R code from this link for my own tree with little success.
Below is my code in R. I am still learning R and am unfamiliar with certain plotting methods and suspect that may be the issue here:
tree = read.nexus("test_nexus")
dichot_tree = multi2di(tree)
dichot_tree$edge.length<-runif(n=nrow(dichot_tree$edge),min=0,max=1)
dichot_tree$edge.length[dichot_tree$edge.length<1]<-1
domest = read.nexus.data("test_nexus")
aceDISCRETE<-ace(as.numeric(domest), dichot_tree, type="discrete")
plot(dichot_tree, cex=0.5, label.offset=1, no.margin=TRUE)
tiplabels(pch=22, bg=as.numeric(domest),cex=1, adj=1)
nodelabels(pie=aceDISCRETE$lik.anc, piecol=c("black", "red"), cex=0.25)
There are a couple possible ways to make the tip labels more readable.
First, you could decrease the font size (that would be the cex parameter of the plot function).
Second, you are using RStudio, and it looks like you currently have your plot area as a square.
You can adjust the different panels to make the plot area a very tall rectangle, which would extend your tree when you plot it.
Alternatively, you could create an external plot area (I use windows() and you can specify height and width.)
Alternatively, when saving a plot in RStudio, you should be able to change the output height/width/aspect ratio. You should be able to make it much taller here as well.

plot igraph in a big area

Just wondering if it is possible to increase the size of the plot so that the nodes and edges can be more scattered over the plot.
Original plot:
What are expected:
I tried many parameters in the layout function such as area, niter, and so on, but all of them do not work. By the way, I am using 'igraph' package in R.
If you are referring to the actual size of the produced output (pdf, png, etc), you can configure it with the width and height parameters. Check this link for png,bpm, etc, and this link for PDF format.
A MWE is something like this:
png("mygraph.png", heigh=400, width=600)
#functions to plot your graph
dev.off()
If you are referring to the size of the graphic produced by the layout function, as #MrFlick referred, you should check the parameters of the particular layout you are using.
Hope it helps you.
In your second graph, it's obviously the graph can be divided into several clusters (or sections). If I understood you correctly, you want to have a layout that separates your clusters more visibly.
Then you can draw this by calculating a two-level layout:
First, calculate the layout of the graph in order to find a place for each cluster.
Second, calculate the layout in each cluster according to first step and plot nodes in the corresponding place.

Change plot size of pairs plot in R

I have this pairs plot
I want to make this plot bigger, but I don't know how.
I've tried
window.options(width = 800, height = 800)
But nothing changes.
Why?
That thing's huge. I would send it to a pdf.
> pdf(file = "yourPlots.pdf")
> plot(...) # your plot
> dev.off() # important!
Also, there is an answer to the window sizing issue in this post.
If your goal is to explore the pairwise relationships between your variables, you could consider using the shiny interface from the pairsD3 R package, which provides a way to interact with (potentially large) scatter plot matrices by selecting a few variables at a time.
An example with the iris data set:
install.packages("pairsD3")
require("pairsD3")
shinypairs(iris)
More reference here
I had the same problem with the pairs() function. Unfortunately, I couldn't find a direct answer to your question.
However, something that could help you is to plot a selected number of variables only. For this, you can either subset the default plot. Refer to this answer I received on a different question.
Alternatively, you can use the pairs2 function which I came across through this post.
To make the plot bigger, write it to a file. I found that a PDF file works well for this. If you use "?pdf", you will see that it comes with height and width options. For something this big, I suggest 6000 (pixels) for both the height and width. For example:
pdf("pairs.pdf", height=6000, width=6000)
pairs(my_data, cex=0.05)
dev.off()
The "cex=0.05" is to handle a second issue here: The points in the array of scatter plots are way too big. This will make them small enough to show the arrangements in the embedded scatter plots.
The labels not fitting into the diagonal boxes is resolved by the increased plot size. It could also be handled by changing the font size.

Formatting Issue with barchart() of Cluster Analysis

I've created a segment profile plot of my cluster analysis but I'm having an issue with the formatting of a barchart() command. Here is the chart I created. The obvious issue is that my lines are too close together to read.
Here you can see the code I used to create this chart. Can someone tell me what to add in order to make this chart readable? Below is an example of my code used.
R code for reproducing the clustering and PCA we used:
## if not installed, install: install.packages("flexclust")
library("flexclust")
load("vacpref.RData")
cl6 <- kcca(vacpref, k=vacpref6, control=list(iter=0),
simple=FALSE, save.data=TRUE)
summary(cl6)
hierarchical clustering of the variables
varhier <- hclust(dist(t(vacpref)), "ward")
par(mar=c(0,0,0,15))
plot(as.dendrogram(varhier), xlab="", horiz=TRUE,yaxt="n")
principal component projection
vacpca <- prcomp(vacpref)
R code for generating the Segment Separation Plot
pairs(cl6, project=vacpca, which=1:3, asp=TRUE,points=FALSE,
hull.args=list(density=10))
R code for generating the Segment Positioning Plot:
col <- flxColors(1:6)
col[c(1,3)] <- flxColors(1:4, "light")[c(1,3)]
par(mar=rep(0,4))
plot(cl6, project=vacpca, which=2:3,
col=col,asp=TRUE,points=F,hull.args=list(density=10),axes=FALSE)
projAxes(vacpca, minradius=.5, which=2:3, lwd=2, col=”darkblue”)
R code for generating the Segment Profile Plot:
barchart(cl6, shade=TRUE, which=rev(varhier$order),legend=TRUE)
The last command was the one I used to create my segment profile plot but I wasn't sure if the commands before may have affected it in any way. I'm new to R.
One trick I often use is to change the width/height and resolution through exporting the image. Try this:
png("c:\\temp\\myCrazyPlot.png", res=250, height=8, width=12, unit="in")
barchart(cl6, shade=TRUE, which=rev(varhier$order),legend=TRUE)
# And whatever other plot commands for the same plot
dev.off()
Then go check your .png file. By tinkering the height and width, you can somehow adjust the spacing of the labels at the y-axis. You may even make its height longer than its width to let the labels spread out. (I think currently you can't do that because that's the maximal height of your screen?)

Resources