best way to convert dendrogram to ggplot? - r

I have build up a dendrogram and colored its branches according to their "purity" (whether they only include subjects with a particular value in a factor variable) using the set("by_labels_branches_col") function of the dendextend package. Now, I would like to convert this dendrogram to a ggplot2 object for further customization. I have been able to do that with the function as.ggdend (also from the dendextend package). Here is when I encounter 2 issues
for which I would need some help:
1-After using as.ggdend, the resulting object "loses" the vertical axis indicating the height of the dendrogram... How could I do this transformation without losing the axis?
2.-I have also tried to enrich my dendrogram by adding a colored bar using the colored_bars function of the dendextend package. However, I do not know how to save the resulting object to convert it to a ggplot object.
Here I provide an example of my code with the mtcars dataset
df=mtcars
ds=dist(df, "euclidean")
hc<-hclust(ds,method= "average")
de=as.dendrogram(hc)
library(dendextend)
code=rownames(df[df$cyl==4,])#factor for coloring
de2<-de%>%set("by_labels_branches_col", value = c(code))%>% set("labels", "")%>%as.dendrogram(de)#coloring branches
#to add the colored bar
colores<-c("red","black", "blue") [as.factor(df$cyl)]
plot(de2)
colored_bars(colors=colores,dend=de2, y_shift=-2, rowLabels="" )
#transform to ggplot
de3=as.ggdend(de2)
Thanks in advance for any possible answer

Finally, I have found a solution for the first of the posted questions). It is far from elegant and probably there are better ways to do this. However, I post it here just in case someone finds it useful.
The solution skips the use of as.ggdend and directly uses ggplot+theme to ensure that the dendrogram axis is displayed. Because this automatically thickens all plot lines, line sizes are corrected at steps 2/3.
step1=ggplot(de2)+
theme(axis.line.y = element_line(color="black"),
axis.text.y = element_text(color="black"),
axis.ticks.y = element_line(color="black"))+
scale_y_continuous(expand = expansion(add = c(0,0)))
step2=ggplot_build(step1)
step2$data[[1]]$size=0.3
step3= ggplot_gtable(step2)
step4=ggplotify::as.ggplot(step3)

Related

How to create histogram plot in ggplot2 without data frame?

I am plotting two histograms in R by using the following code.
x1<-rnorm(100)
x2<-rnorm(50)
h1<-hist(x1)
h2<-hist(x2)
plot(h1, col=rgb(0,0,1,.25), xlim=c(-4,4), ylim=c(0,0.6), main="", xlab="Index", ylab="Percent",freq = FALSE)
plot(h2, col=rgb(1,0,0,.25), xlim=c(-4,4), ylim=c(0,0.6), main="", xlab="Index", ylab="Percent",freq = FALSE,add=TRUE)
legend("topright", c("H1", "H2"), fill=c(rgb(0,0,1,.25),rgb(1,0,0,.25)))
The code produces the following output.
I need a visually good looking (or stylistic) version of the above plot. I want to use ggplot2. I am looking for something like this (see Change fill colors section). However, I think, ggplot2 only works with data frames. I do not have data frames in this case. Hence, how can I create good looking histogram plot in ggplot2? Please let me know. Thanks in advance.
You can (and should) put your data into a data.frame if you want to use ggplot. Ideally for ggplot, the data.frame should be in long format. Here's a simple example:
df1 = rbind(data.frame(grp='x1', x=x1), data.frame(grp='x2', x=x2))
ggplot(df1, aes(x, fill=grp)) +
geom_histogram(color='black', alpha=0.5)
There are lots of options to change the appearnce how you like. If you want to have the histograms stacked or grouped, or shown as percent versus count, or as densities etc., you will find many resources in previous questions showing how to implement each of those options.

Heatmap table (ggfluctuation function)

When I run this programming code, I will get this error "ggfluctuation is deprecated. (Defunct; last used in version 0.9.1)".
1-How can i fix this issue?
2-In my original data set, I have two string variables with too many levels (first variable with 65 levels and second variable with 8 levels),can I have Heatmap table for these two variables although they have different number of levels?
3-What is the best way (plot) to show the relationship between these two categorical variables in my data set?
library(Hmisc)
library(ggplot2)
library(reshape)
data(HairEyeColor)
P=t(HairEyeColor[,,2])
Pm=melt(P)
ggfluctuation(Pm,type="heatmap")+geom_text(aes(label=Pm$value),colour="white")+ opts(axis.text.x=theme_text(size = 15),axis.text.y=theme_text(size = 15))
If you want to plot a heatmap just use geom_tile. Also, opts and theme_text are deprecated instead and have been replaced by theme and element_text respectively.
So, you could use this:
ggplot(Pm, aes(Eye, Hair, fill=value)) + geom_tile() +
geom_text(aes(label=Pm$value),colour="white")+
theme(axis.text.x=element_text(size = 15),axis.text.y=element_text(size = 15))
Which outputs:
Also, just to answer all the questions yes, ggplot can handle two categorical columns with a different number of levels and also a heatmap is a nice way to show the relationship between two categorical variables such as the ones you have.
The GGally package has a ggfluctuation2 function that replaces the deprecated ggfluctuation. But it's still pretty rough (you can't even specify axis labels) and I prefer the original ggplot function. You can also try ggally_ratio.

GGally::ggpairs plot with varying size of correlation coefficient for grouped data

Please refer to the following link on the solution to a previous question. After overriding the "ggally_cor" function, it is very handy to be able to plot the correlation coefficient adjusting the size to its equivalent estimated value, however, this does not seems to work if I wanted to produce plots grouped via a factor variable. I wonder how it can be adjusted to account for this?
GGally::ggpairs plot without gridlines when plotting correlation coefficient
Code that I used to plot grouped data:
library("GGally")
data(iris)
ggpairs(iris, columns = c(1,2,3,4), lower=list(continuous="points"),
diag=list(continuous="bar", params=c(position = "dodge")),
upper=list(params=list(corSize=6)), axisLabels='show', colour="Species",legend=T)
I would also like to know how to increase the font size for axis titles (theme(axis.title=element_text(size=15, face="bold")) doesn't seems to work) and display a legend (legend=T does not seems to do the job).
After consulting ?theme and making a couple of experiments I arrived at:
+theme( strip.text = element_text(size = 5))

Scaling heat map colours for multiple heat maps

So I have a bunch of matrices that I am trying to plot as a heatmaps. I am using the heatmap.2() function in the ggplot2 packaage.
I have been trying for quite some time with it, and I am sure there is a very simple fix, but my issue is this:
How do I keep the colours consistent between heatmaps? For example, to make the values that provide the colours absolute as opposed to relative.
I have tried doing something similar to this question:
R/ggplot: Reuse color key for multiple heat maps
But I was unable to figure out the ggplot function; I kept receiving an error message stating that there were "no layers in plot".
After reading the comments on the above question, I tried using scales::rescale() and discrete_scale() but the former does not remove the problem, while the latter did not work.
I am fully aware that I might be doing something very simple wrong, and just being a bit of an idiot, but for the life of me I can't figure out where I am going wrong.
As for the data itself, I am trying to plot 10 matrices/heatmaps, each 10x10 cells (showing change over time) and the values in the cells range from 1.0 to 1.2.
As an example, this is the code I am using (once I have my 10x10 matrix).
Matrix1<-matrix(data=(runif(100,1.0,1.2)),nrow=10,ncol=10)
heatmap.2(Matrix1, Colv=NA, Rowv=NA, dendrogram="none",
trace="none", key=F, cellnote=round(Matrix1,digits=2),
notecex=1.6, notecol="black",
labRow=seq(10,100,10), labCol=seq(10,100,10),
main="Title1", xlab="Xlab1", ylab="Ylab1"
)
So any help with either figuring out how to create the scaled values for the heatmap.2() function, or how I can use the ggplot() function would be greatly appreciated!
It's important to note that heatmap.2 is not a ggplot2 function. The ggplot2 package is not necessarily compatible with all plotting types. If you look at the ?heatmap.2 help page, in the upper left corner it shows you where the function is from. heatmap.2 {gplots} means that function comes from the gplots package. These are different pacakges so they have different rules how they work.
To get the same colors across different heatmaps, you want to explicitly get the breaks= parameter. By default it splits the observed range of the data into equal chunks. But since each data set may have a different min and max, these chunks may have different start and end points. By specifying breaks, you can make them all consistent. Since your data ranges from 1 to 1.2, you can set
mybreaks <- seq(1.0, 1.2, length.out=7)
and then in your call add
heatmap.2(Matrix1, Colv=NA, Rowv=NA, dendrogram="none",
...
breaks=mybreaks,
...
)
That should make them all match up.
Maybe this will help you. With the following code multiple heatmaps are stored in a list and displayed in a grid later on. This will allow you to control the colours of each heatmap since each heatmap is created separately. So in this case I chose to use green and red for the number range in each chart.
data(mtcars)
require(ggplot2)
require(gridExtra)
myplotslist2 <- list()
var = c("mpg", "wt", "drat")
new = cbind(mtcars, "variable")
new = cbind(car = rownames(mtcars), new)
for (i in 1:length(var)){
t= paste("new[[\"variable\"]] = \"", var[[i]],"\"; a = ggplot(new, aes(variable, car)) + geom_tile(aes(fill = ", var[[i]], "),colour = \"white\") + scale_fill_gradient(low = \"red\", high = \"green\") + theme(axis.title.y=element_blank(), axis.text.y=element_blank(),legend.position=\"none\"); myplotslist2[[i]] = a")
eval(parse(text=t))
}
grid.arrange(grobs=myplotslist2, ncol=length(var))
The result looks like this:
I hope this helps.
I explain more in my blogpost. https://dwh-businessintelligence.blogspot.nl/2016/05/pca-3d-and-k-means.html

Mosaic plot with labels in each box showing a name and percentage of all observations

I would like to create a mosaic plot (R package vcd, see e.g. http://cran.r-project.org/web/packages/vcd/vignettes/residual-shadings.pdf ) with labels inside the plot. The labels should show either a combination of the various factors or some custom label and the percentage of total observations in this combination of categories (see e.g. http://i.usatoday.net/communitymanager/_photos/technology-live/2011/07/28/nielsen0728x-large.jpg , despite this not quite being a mosaic plot).
I suspect something like the labeling_values function might play a role here, but I cannot quite get it to work.
library(vcd)
library(MASS)
data("Titanic")
mosaic(Titanic, labeling = labeling_values)
Alternative ways to represent two variables with categorical data in a friendly way for non-statisticians are also welcome and are acceptable solutions.
Here is an example of adding proportions as labels. As usual, the degree of customization of a plot is a matter of taste, but this shows at least the principles. See ?labeling_cells for further possibilities.
labs <- round(prop.table(Titanic), 2)
mosaic(Titanic, pop = FALSE)
labeling_cells(text = labs, margin = 0)(Titanic)

Resources