Pretty dendrograms in R? - r

My dendrograms are horribly ugly, on the verge of unreadable, and usually look like this:
library(TraMineR)
library(cluster)
data(biofam)
lab <- c("P","L","M","LM","C","LC","LMC","D")
biofam.seq <- seqdef(biofam[1:500,10:25], states=lab)
ccost <- seqsubm(biofam.seq, method = "CONSTANT", cval = 2, with.missing=TRUE)
sequences.OM <- seqdist(biofam.seq, method = "OM", norm= TRUE, sm = ccost,
with.missing=TRUE)
clusterward <- agnes(sequences.OM, diss = TRUE, method = "ward")
plot(clusterward, which.plots = 2)
What I would like to create is something like the following, meaning a round dendrogram, where the size of the labels can be carefully controlled so that they are actually visible:
How can I accomplish this in R?

The following solution may not be optimal but worth a try:
library(ape)
CL1 <- as.hclust(clusterward)
CL2 <- as.phylo(CL1)
plot(CL2, type="fan", cex=0.5)
The main issue obviously being the fact that there is still too many objects, hence too many labels. To turn the labels off, use argument show.tip.label=FALSE. You can also get rid of the margins to occupy the complete device with no.margin=TRUE:
plot(CL2, type="fan", show.tip.label=FALSE, no.margin=TRUE)

Related

Suppress graph output of a function [duplicate]

I am trying to turn off the display of plot in R.
I read Disable GUI, graphics devices in R but the only solution given is to write the plot to a file.
What if I don't want to pollute the workspace and what if I don't have write permission ?
I tried options(device=NULL) but it didn't work.
The context is the package NbClust : I want what NbClust() returns but I do not want to display the plot it does.
Thanks in advance !
edit : Here is a reproducible example using data from the rattle package :)
data(wine, package="rattle")
df <- scale (wine[-1])
library(NbClust)
# This produces a graph output which I don't want
nc <- NbClust(df, min.nc=2, max.nc=15, method="kmeans")
# This is the plot I want ;)
barplot(table(nc$Best.n[1,]),
xlab="Numer of Clusters", ylab="Number of Criteria",
main="Number of Clusters Chosen by 26 Criteria")
You can wrap the call in
pdf(file = NULL)
and
dev.off()
This sends all the output to a null file which effectively hides it.
Luckily it seems that NbClust is one giant messy function with some other functions in it and lots of icky looking code. The plotting is done in one of two places.
Create a copy of NbClust:
> MyNbClust = NbClust
and then edit this function. Change the header to:
MyNbClust <-
function (data, diss = "NULL", distance = "euclidean", min.nc = 2,
max.nc = 15, method = "ward", index = "all", alphaBeale = 0.1, plotetc=FALSE)
{
and then wrap the plotting code in if blocks. Around line 1588:
if(plotetc){
par(mfrow = c(1, 2))
[etc]
cat(paste(...
}
and similarly around line 1610. Save. Now use:
nc = MyNbClust(...etc....)
and you see no plots unless you add plotetc=TRUE.
Then ask the devs to include your patch.

How to combine state distribution plot and separate legend in traminer?

Plotting several clusters using seqdplot in TraMineR can make the legend messy, especially in combination with numerous states. This calls for additional options for modifying the legend which is available with the function seqlegend. However, I have a hard time combining a state distribution plot (seqdplot) with a separate modified legend (seqlegend). Ideally one wants to plot the clusters (e.g. 9) without a legend and then add the separate legend in the available bottom right row, but instead the separate legend is generating a new plot window. Can anyone help?
Here's an example using the biofam data. With the data I use in my own research the legend becomes much more messy since I have 11 states.
#Data
library(TraMineR)
library(WeightedCluster)
data(biofam)
biofam.seq <- seqdef(biofam[501:600, 10:25])
#OM distances
biofam.om <- seqdist(biofam.seq, method = "OM", indel = 3, sm = "TRATE")
#9 clusters
wardCluster <- hclust(as.dist(biofam.om), method = "ward.D2")
cluster9 <- cutree(wardCluster, k = 9)
#State distribution plot
seqdplot(biofam.seq, group = cluster9, with.legend = F)
#Separate legend
seqlegend(biofam.seq, title = "States", ncol = 2)
#Combine state distribution plot and separate legend
#??
Thank you.
The seqplot function does not allow to control the number of columns of the legend, nor does it allow to add a legend title. So you have to compose the plot yourself by generating a separated plot for each group with the legend disabled and adding the legend afterwards. Here is how you can do that:
cluster9 <- factor(cluster9)
levc <- levels(cluster9)
lev <- length(levc)
par(mfrow=c(5,2))
for (i in 1:lev)
seqdplot(biofam.seq[cluster9 == levc[i],], border=NA, main=levc[i], with.legend=FALSE)
seqlegend(biofam.seq, ncol=4, cex = 1.2, title='States')
========================
Update, Oct 1, 2018 =================
Since TraMineR V 2.0-9, the seqplot family of functions now support (when applicable) the argument ncol to control the number of columns in the legend. To add a title to the legend, you still have to proceed as shown above.
AFAIK seqlegend() doesn't work when the other plots you are plotting utilizes the groups arguments. In your case the only thing seqlegend() is adding is a title "States". If you are looking to add a legend so you can customize what is in the legend and so forth, you can accomplish that by providing the corresponding alphabet and states that are used in your analysis.
The package's website has several walkthroughs and guides enumerating the various options and so forth: Link to their webiste
#Data
library(TraMineR)
library(WeightedCluster)
data(biofam)
## Generate alphabet and states
alphabet <- 0:7
states <- letters[seq_along(alphabet)]
biofam.seq <- seqdef(biofam[501:600, 10:25], states = states, alphabet = alphabet)
#OM distances
biofam.om <- seqdist(biofam.seq, method = "OM", indel = 3, sm = "TRATE")
#9 clusters
wardCluster <- hclust(as.dist(biofam.om), method = "ward.D2")
cluster9 <- cutree(wardCluster, k = 9)
#State distribution plot
seqdplot(biofam.seq, group = cluster9, with.legend = TRUE)

Inconsistent clustering with ComplexHeatmap?

So I'm trying to generate a heatmap for my data using Bioconductor's ComplexHeatmap package, but I get slightly different results depending on whether I make the dendrogram myself, or tell Heatmap to make it.
Packages:
require(ComplexHeatmap)
require(dendextend)
Data:
a=rnorm(400,1)
b=as.matrix(a)
dim(b)=c(80,5)
If I make the dendrogram myself:
d=dist(b,method="euclidean")
d=as.dist(d)
h=hclust(d,method="ward.D")
dend=as.dendrogram(h)
Heatmap(b,
cluster_columns=FALSE,
cluster_rows = dend)
Versus having Heatmap do the clustering:
Heatmap(b,
cluster_columns=FALSE,
clustering_distance_rows = "euclidean",
clustering_method_rows = "ward.D")
They tend to look very similar, but they'll be very slightly different.
And this matters a lot for my data. Heatmap's clustering ends up organizing my data way, way better, however, I also want to extract the list of clustered items via like cutree(), but I don't think I can extract it from Heatmap's clustering.
Does anyone know what's going on?
the dendrograms are the same. The only thing that changes is the ordering. You can verify this using:
hmap1 <- Heatmap(b,
cluster_columns=FALSE,
cluster_rows = dend)
hmap2 <- Heatmap(b,
cluster_columns=FALSE,
clustering_distance_rows = "euclidean",
clustering_method_rows = "ward.D")
#Reorder both row dendrograms using the same weights:
rowdend1 <- reorder(row_dend(hmap1)[[1]], 1:80)
rowdend2 <- reorder(row_dend(hmap2)[[1]], 1:80)
#check that they are identical:
identical( rowdend1, rowdend2)
## [1] TRUE
The ComplexHeatmap::Heatmap function has an argument row_dend_reorder with default value TRUE that you should check.

Error plotting Kohonen maps in R?

I was reading through this blog post on R-bloggers and I'm confused by the last section of the code and can't figure it out.
http://www.r-bloggers.com/self-organising-maps-for-customer-segmentation-using-r/
I've attempted to recreate this with my own data. I have 5 variables that follow an exponential distribution with 2755 points.
I am fine with and can plot the map that it generates:
plot(som_model, type="codes")
The section of the code I don't understand is the:
var <- 1
var_unscaled <- aggregate(as.numeric(training[,var]),by=list(som_model$unit.classif),FUN = mean, simplify=TRUE)[,2]
plot(som_model, type = "property", property=var_unscaled, main = names(training)[var], palette.name=coolBlueHotRed)
As I understand it, this section of the code is suppose to be plotting one of the variables over the map to see what it looks like but this is where I run into problems. When I run this section of the code I get the warning:
Warning message:
In bgcolors[!is.na(showcolors)] <- bgcol[showcolors[!is.na(showcolors)]] :
number of items to replace is not a multiple of replacement length
and it produces the plot:
Which just some how doesn't look right...
Now what I think it has come down to is the way the aggregate function has re-ordered the data. The length of var_unscaled is 789 and the length of som_model$data, training[,var] and unit.classif are all of length 2755. I tried plotting the aggregated data, the result was no warning but an unintelligible graph (as expected).
Now I think it has done this because unit.classif has a lot of repeated numbers inside it and that's why it has reduced in size.
The question is, do I worry about the warning? Is it producing an accurate graph? What exactly is the "Property"'s section looking for in the plot command? Is there a different way I could "Aggregate" the data?
I think that you have to create the palette color. If you put the argument
coolBlueHotRed <- function(n, alpha = 1) {rainbow(n, end=4/6, alpha=alpha)[n:1]}
and then try to get a plot, for example
plot(som_model, type = "count", palette.name = coolBlueHotRed)
the end is succesful.
This link can help you: http://rgm3.lab.nig.ac.jp/RGM/R_rdfile?f=kohonen/man/plot.kohonen.Rd&d=R_CC
I think that not all of the cells on your map have points inside.
You have 30 by 30 map and about 2700 points. In average it's about 3 points per cell. With high probability some cells have more than 3 points and some cells are empty.
The code in the post on R-bloggers works well when all of the cells have points inside.
To make it work on your data try change this part:
var <- 1
var_unscaled <- aggregate(as.numeric(training[, var]), by = list(som_model$unit.classif), FUN = mean, simplify = TRUE)[, 2]
plot(som_model, type = "property", property = var_unscaled, main = names(training)[var], palette.name = coolBlueHotRed)
with this one:
var <- 1
var_unscaled <- aggregate(as.numeric(data.temp[, data.classes][, var]),
by = list(som_model$unit.classif),
FUN = mean,
simplify = T)
v_u <- rep(0, max(var_unscaled$Group.1))
v_u[var_unscaled$Group.1] <- var_unscaled$x
plot(som_model,
type = "property",
property = v_u,
main = colnames(data.temp[, data.classes])[var],
palette.name = coolBlueHotRed)
Hope it helps.
Just add these functions to your script:
coolBlueHotRed <- function(n, alpha = 1) {rainbow(n, end=4/6, alpha=alpha)[n:1]}
pretty_palette <- c("#1f77b4","#ff7f0e","#2ca02c", "#d62728","#9467bd","#8c564b","#e377c2")

Different data in upper and lower panel of scatterplot matrix

I want to plot two different data sets in a scatterplot matrix.
I know that I can use upper.panel and lower.panel to differentiate the plot function. However, I don’t succeed in putting my data in a suitable format to harness this.
Assume I have two tissues (“brain” and “heart”) and four conditions (1–4). Now I can use e.g. pairs(data$heart) to get a scatterplot matrix for one of the data sets. Assume I have the following data:
conditions <- 1 : 4
noise <- rnorm(100)
data <- list(brain = sapply(conditions, function (x) noise + 0.1 * rnorm(100)),
heart = sapply(conditions, function (x) noise + 0.3 * rnorm(100)))
How do I get this into a format so that pairs(data, …) plots one data set above and one below the diagonal, as shown here (green = brain, violet = heart):
Just using
pairs(data, upper.panel = something, lower.panel = somethingElse)
Doesn’t work because that will plot all conditions versus all conditions without regard for different tissue – it essentially ignores the list, and the same when reordering the hierarchy (i.e. having data = (A=list(brain=…, heart=…), B=list(brain=…, heart=…), …)).
This is the best I seem to be able to do via passing arguments:
foo.upper <- function(x,y,ind.upper,col.upper,ind.lower,col.lower,...){
points(x[ind.upper],y[ind.upper],col = col.upper,...)
}
foo.lower <- function(x,y,ind.lower,col.lower,ind.upper,col.upper,...){
points(x[ind.lower],y[ind.lower],col = col.lower,...)
}
pairs(dat[,-5],
lower.panel = foo.lower,
upper.panel = foo.upper,
ind.upper = dat$type == 'brain',
ind.lower = dat$type == 'heart',
col.upper = 'blue',
col.lower = 'red')
Note that each panel needs all arguments. ... is a cruel mistress. If you include only the panel specific arguments in each function, it appears to work, but you get lots and lots of warnings from R trying to pass these arguments on to regular plotting functions and obviously they won't exist.
This was my quick first attempt, but it seems ugly:
dat <- as.data.frame(do.call(rbind,data))
dat$type <- rep(c('brain','heart'),each = 100)
foo.upper <- function(x,y,...){
points(x[dat$type == 'brain'],y[dat$type == 'brain'],col = 'red',...)
}
foo.lower <- function(x,y,...){
points(x[dat$type == 'heart'],y[dat$type == 'heart'],col = 'blue',...)
}
pairs(dat[,-5],lower.panel = foo.lower,upper.panel = foo.upper)
I'm abusing R's scoping here in this second version a somewhat ugly way. (Of course, you could probably do this more cleanly in lattice, but you probably knew that.)
The only other option I can think of is to design your own scatter plot matrix using layout, but that's probably quite a bit of work.
Lattice Edit
Here's at least a start on a lattice solution. It should handle varying x,y axis ranges better, but I haven't tested that.
dat <- do.call(rbind,data)
dat <- as.data.frame(dat)
dat$grp <- rep(letters[1:2],each = 100)
plower <- function(x,y,grp,...){
panel.xyplot(x[grp == 'a'],y[grp == 'a'],col = 'red',...)
}
pupper <- function(x,y,grp,...){
panel.xyplot(x[grp == 'b'],y[grp == 'b'],...)
}
splom(~dat[,1:4],
data = dat,
lower.panel = plower,
upper.panel = pupper,
grp = dat$grp)

Resources