Dendrogram with Corrplot (R) - r

Does anyone have a method to adorn an R corrplot correlation plot with a dendrogram?

heatmaply actually has this functionality baked in since about December 2017! See the example below taken from the upcoming v1.0 vignette:
library("heatmaply")
r <- cor(mtcars)
## We use rcorr to calculate a matrix of p-values from correlation tests
library("Hmisc")
mtcars.rcorr <- rcorr(as.matrix(mtcars))
p <- mtcars.rcorr$P
heatmaply_cor(
r,
node_type = "scatter",
point_size_mat = -log10(p),
point_size_name = "-log10(p-value)",
label_names = c("x", "y", "Correlation")
)

The closest solution I know of is to use a heatmap on a correlation matrix, for example you could also use gplots::heatmap.2.
Here is how to do it using the heatmaply R package, which also offers an interactive interface where you can zoom-in and get a tooltip when hovering over the cells:
# for the first time:
# install.packages("heatmaply")
library(heatmaply)
my_cor <- cor(mtcars)
heatmaply_cor(my_cor)
Here is how it looks:
You can learn more about heatmaply in this vignette.

Related

How to rotate the plot in r base package graphics?

I know this is a little bit too much, but I am plotting a dendrogram plot in r, and here is my code:
dd <- dist(scale(full[,c(1,2,3,4)]),method="euclidean")
hc = hclust(dd,method="ward.D2")
dend <- color_branches(as.dendrogram(hc),6)
labels_colors(dend) <-
rainbow_hcl(6)[sort_levels_values(
as.numeric(classified[, 9])[order.dendrogram(dend)]
)]
plot(dend,horiz=T)
and I got this plot:
Is there any way can do mirror symmetry to make it like this:(please ignore the difference in colour)
plot_horiz.dendrogram(dend, side = TRUE)
should do the trick. See https://rdrr.io/cran/dendextend/f/vignettes/FAQ.Rmd

How to cut a dendrogram in r

Okay so I'm sure this has been asked before but I can't find a nice answer anywhere after many hours of searching.
I have some data, I run a classification then I make a dendrogram.
The problem has to do with aesthetics, specifically; (1) how to cut according to the number of groups (in this example I want 3), (2) make the group labels aligned with the branches of the trees, (2) Re-scale so that there aren't any huge gaps between the groups
More on (3). I have dataset which is very species rich and there would be ~1000 groups without cutting. If I cut at say 3, the tree has some branches on the right and one 'miles' off to the right which I would want to re-scale so that its closer. All of this is possible via external programs but I want to do it all in r!
Bonus points if you can put an average silhouette width plot nested into the top right of this plot
Here is example using iris data
library(ggplot2)
data(iris)
df = data.frame(iris)
df$Species = NULL
ED10 = vegdist(df,method="euclidean")
EucWard_10 = hclust(ED10,method="ward.D2")
hcd_ward10 = as.dendrogram(EucWard_10)
plot(hcd_ward10)
plot(cut(hcd_ward10, h = 10)$upper, main = "Upper tree of cut at h=75")
I suspect what you would want to look at is the dendextend R package (it also has a paper in bioinformatics).
I am not fully sure about your question on (3), since I am not sure I understand what rescaling means. What I can tell you is that you can do quite a lot of dendextend. Here is a quick example for coloring the branches and labels for 3 groups.
library(ggplot2)
library(vegan)
data(iris)
df = data.frame(iris)
df$Species = NULL
library(vegan)
ED10 = vegdist(df,method="euclidean")
EucWard_10 = hclust(ED10,method="ward.D2")
hcd_ward10 = as.dendrogram(EucWard_10)
plot(hcd_ward10)
install.packages("dendextend")
library(dendextend)
dend <- hcd_ward10
dend <- color_branches(dend, k = 3)
dend <- color_labels(dend, k = 3)
plot(dend)
You can also get an interactive dendrogram by using plotly (ggplot method is available through dendextend):
library(plotly)
library(ggplot2)
p <- ggplot(dend)
ggplotly(p)

R superimposing bivariate normal density (ellipses) on scatter plot

There are similar questions on the website, but I could not find an answer to this seemingly very simple problem. I fit a mixture of two gaussians on the Old Faithful Dataset:
if(!require("mixtools")) { install.packages("mixtools"); require("mixtools") }
data_f <- faithful
plot(data_f$waiting, data_f$eruptions)
data_f.k2 = mvnormalmixEM(as.matrix(data_f), k=2, maxit=100, epsilon=0.01)
data_f.k2$mu # estimated mean coordinates for the 2 multivariate Gaussians
data_f.k2$sigma # estimated covariance matrix
I simply want to super-impose two ellipses for the two Gaussian components of the model described by the mean vectors data_f.k2$mu and the covariance matrices data_f.k2$sigma. To get something like:
For those interested, here is the MatLab solution that created the plot above.
If you are interested in the colors as well, you can use the posterior to get the appropriate groups. I did it with ggplot2, but first I show the colored solution using #Julian's code.
# group data for coloring
data_f$group <- factor(apply(data_f.k2$posterior, 1, which.max))
# plotting
plot(data_f$eruptions, data_f$waiting, col = data_f$group)
for (i in 1: length(data_f.k2$mu)) ellipse(data_f.k2$mu[[i]],data_f.k2$sigma[[i]], col=i)
And for my version using ggplot2.
# needs ggplot2 package
require("ggplot2")
# ellipsis data
ell <- cbind(data.frame(group=factor(rep(1:length(data_f.k2$mu), each=250))),
do.call(rbind, mapply(ellipse, data_f.k2$mu, data_f.k2$sigma,
npoints=250, SIMPLIFY=FALSE)))
# plotting command
p <- ggplot(data_f, aes(color=group)) +
geom_point(aes(waiting, eruptions)) +
geom_path(data=ell, aes(x=`2`, y=`1`)) +
theme_bw(base_size=16)
print(p)
You can use the ellipse-function from package mixtools. The initial problem was that this function swaps x and y from your plot. I'll try to figure this out and update the answe. (I'll leave the colors to somebody else...)
plot( data_f$eruptions,data_f$waiting)
for (i in 1: length(data_f.k2$mu)) ellipse(data_f.k2$mu[[i]],data_f.k2$sigma[[i]])
Using mixtools internal plotting function:
plot.mixEM(data_f.k2, whichplots=2)

How can I plot a biplot for LDA in r?

I did a linear discriminant analysis using the function lda() from the package MASS. Now I would try to plot a biplot like in ade4 package (forLDA). Do you know how can I do this?
If I try to use the biplot() function it doesn't work. For example, if I use the Iris data and make LDA:
dis2 <- lda(as.matrix(iris[, 1:4]), iris$Species)
then I can plot it using the function plot(), but if I use the function biplot() it doesn't work:
biplot(dis2)
Error in nrow(y) : argument "y" is missing, with no default
How can I plot the arrows of variables?
I wrote the following function to do this:
lda.arrows <- function(x, myscale = 1, tex = 0.75, choices = c(1,2), ...){
## adds `biplot` arrows to an lda using the discriminant function values
heads <- coef(x)
arrows(x0 = 0, y0 = 0,
x1 = myscale * heads[,choices[1]],
y1 = myscale * heads[,choices[2]], ...)
text(myscale * heads[,choices], labels = row.names(heads),
cex = tex)
}
For your example:
dis2 <- lda(as.matrix(iris[, 1:4]), iris$Species)
plot(dis2, asp = 1)
lda.arrows(dis2, col = 2, myscale = 2)
The length of the arrows is arbitrary relative to the lda plot (but not to each other, of course!). If you want longer or shorter arrows, change the value of myscale accordingly. By default, this plots arrows for the first and second axes. If you want to plot other axes, change choices to reflect this.
My understanding is that biplots of linear discriminant analyses can be done, it is implemented in fact also in R package ggbiplot, see https://github.com/vqv/ggbiplot/tree/experimental and package ggord, see https://github.com/fawda123/ggord, for your example:
install.packages("devtools")
library(devtools)
install_github("fawda123/ggord")
library(ggord)
ord <- lda(Species ~ ., iris, prior = rep(1, 3)/3)
ggord(ord, iris$Species)
Also the book "Biplots in practice" by M. Greenacre has one chapter (chapter 11) on it and in Figure 11.5 it shows a biplot of a linear discriminant analysis of the iris dataset:
You can achieve this using the ggord package from github. The dataset used is IRIS dataset
# --- data partition -- #
set.seed(555)
IRSam <- sample.int(n = nrow(IR), size = floor(.60*nrow(IR)), replace = FALSE, prob = NULL)
IRTrain <- IR[IRSam,]
IRTest <- IR[-IRSam,]
# --- Prediction --- #
p<- predict(IR.lda, IRTrain)
# --- plotting a biplot --- #
library(devtools)
# install_github('fawda123/ggord') --- Used to install ggord from github we need to run devtools to achieve this.
library(ggord)
ggord(IR.lda, IRTrain$Species, ylim=c(-5,5), xlim=c(-10,10))

Multidimensional Scaling

I've 5x14 data matrix. I'm using the MDS to get a perceptual map. I can do the MDS properly & get the result.
But my problem is in MDS we can map either row or column variables. Is it possible to map both row & column variable using MDS.
The code I used is the following:
perp<-read.csv("E:\\Projects\\Combined_3.csv")
ads.dis<-dist(perp)
perp_mds <- cmdscale(ads.dis, k = 2,eig=TRUE)
x <- perp_mds$points[,1]
y <- perp_mds$points[,2]
plot(x,y, xlab = "Coordinate 1", ylab = "Coordinate 2", type = "n")
text(x,y, labels = rownames(perp))
I'll be grateful if somebody can help me with the coding.
Regards,
Ari
In general, the answer is no, not with cmdscale(). All that cmdscale() has knowledge of is the dissimilarity between objects. In the vegan package, there is function capscale() which is a constrained version of principal coordinates analysis (PCoA aka MDS), but can be used for normal PCoA. It can place both the objects and the variables in a biplot-like figure:
require(vegan)
data(varespec)
mod <- capscale(varespec ~ 1)
plot(mod)
But do note that PCoA with the euclidean distance is the same as PCA, which also could be used and will naturally plot both the objects and the variables:
plot(rda(varespec))
or using base R functions
mod2 <- prcomp(varespec)
biplot(mod2)
Or did you mean the non-metric version of MDS?

Resources