Drawing rectangles around specified labels in a dendrogram with 'dendextend'

Drawing rectangles around specified labels in a dendrogram with 'dendextend' - r

I'm currently constructing a dendrogram and I'm using 'dendextend' to tweak the look of it.
I've been able to do everything I want to (labelling leaves and highlighting branches of my chosen clusters), except drawing rectangles around pre-defined clusters.
My data (which can be sourced from this file: Barra_IBS_example.matrix) was clustered with 'pvclust', so 'pvrect' draws the rects in the correct position, but it cuts the labels (see image below), so I want to reproduce it with 'rect.dendrogram', however, I can't figure out how to tell the function to use the clustering data from 'pvclust'.
This is the code I'm using:
idnames <- dimnames(ibs_mat)[[1]]
ibs.pv <- pvclust(ibs_mat, nboot=1000)
ibs.clust <- pvpick(ibs.pv, alpha=0.95)
names(ibs.clust$clusters) <- paste0("Cluster", 1:length(ibs.clust$clusters))
# Choose a colour palette
pal <- brewer.pal(length(ibs.clust$clusters), "Paired")
# Transform the list to a dataframe
ibs_meta <- bind_rows(lapply(names(ibs.clust$clusters),
function(l) data.frame(Cluster=l, Sample = ibs.clust$clusters[[l]])))
# Add the rest of the non-clustered samples (and assign them as Cluster0), add colour to each cluster
ibs_table <- ibs_meta %>%
rbind(., data.frame(Cluster = "Cluster0",
Sample = idnames[!idnames %in% .$Sample])) %>%
mutate(Cluster_int=as.numeric(sub("Cluster", "", Cluster))) %>%
mutate(Cluster_col=ifelse(Cluster_int==0, "#000000",
pal[Cluster_int])) %>%
.[match(ibs.pv$hclust$labels[ibs.pv$hclust$order], .$Sample),]
hcd <- as.dendrogram(ibs.pv) %>%
#pvclust_show_signif(ibs.pv, show_type = "lwd", signif_value = c(2, 1),alpha=0.25) %>%
set("leaves_pch", ifelse(ibs_table$Cluster_int>0,19,18)) %>% # node point type
set("leaves_cex", 1) %>% # node point size
set("leaves_col", ibs_table$Cluster_col) %>% #node point color
branches_attr_by_labels(ibs_meta$Sample, TF_values = c(2, Inf), attr = c("lwd")) %>% # change branch width
# rect.dendrogram(k=12, cluster = ibs_table$Cluster_int, border = 8, lty = 5, lwd = 1.5,
# lower_rect = 0) %>% # add rectangles around clusters
plot(main="Barramundi samples IBS based clustering")
pvrect(ibs.pv, alpha=0.95, lwd=1.5)
Many thanks, Ido

ok, this took more work than I had hoped, but I got a solution for you.
I created a new function called pvrect2 and just pushed it to the latest version of dendextend on github. Here is a self contained example demonstrating the solution:
devtools::install_github('talgalili/dendextend')
library(pvclust)
library(dendextend)
data(lung) # 916 genes for 73 subjects
set.seed(13134)
result <- pvclust(lung[, 1:20], method.dist="cor", method.hclust="average", nboot=10)
par(mar = c(9,2.5,2,0))
dend <- as.dendrogram(result)
dend %>%
pvclust_show_signif(result, signif_value = c(3,.5)) %>%
pvclust_show_signif(result, signif_value = c("black", "grey"), show_type = "col") %>%
plot(main = "Cluster dendrogram with AU/BP values (%)")
# pvrect(result, alpha=0.95)
pvrect2(result, alpha=0.95)
text(result, alpha=0.95)
UvdV.png

Related

Visualizing the CLARA cluster center/medoid

I visualized my CLARA results using fviz_cluster (ggplot2) and I would like to have the medoids of each cluster more prominent (like changing their shape or color, etc) than other data points. The issue is, I have more than 800,000 data points and it is impossible to see it just through the "show.clust.cent".
How can I color the medoids with different colors and make them so much bigger than other data points, or make other data points invisible except the medoids? I also tried to use the star.plot but somehow it didn't work.
I know the line number of the medoids and thought to add it manually, but I also don't know how to integrate it to the fviz_cluster.
Can anyone help me with this? Thank you!
fviz_cluster(clara.res,
palette = c("#004c6d",
"#00ffff",
"#00a1c1",
"#6efa75",
"#78ab63",
"#cc0089",
"#ffc334",
"#ff9509",
"#ffb6de",
"#00cfe3"
), # color palette
ellipse.type = "t",geom = "point",show.clust.cent = TRUE,repel = TRUE,pointsize = 0.5,
ggtheme = theme_classic()

Will this be ok for you?
library(tidyverse)
fpoint = function(n) tibble(
Dlm1 = rnorm(n, sample(-20:20,1), sample(1:5,1)),
Dlm2 = rnorm(n, sample(-20:20,1), sample(1:5,1))
)
df = tibble(cluster = paste(1:10)) %>%
mutate(data = map(cluster, ~fpoint(1000))) %>%
unnest(data)
df %>% ggplot(aes(Dlm1, Dlm2, color=cluster))+
geom_point(alpha = 0.2, pch=21)+
stat_ellipse(size=0.7)
Write data to tibble and use standard ggplot.
Update 1
library(factoextra)
library(cluster)
df = USArrests %>% na.omit() %>% scale()
kmed = pam(df, k = 4)
fviz_cluster(kmed, data = df, alpha=0.3, geom = "point", show.clust.cent = TRUE,repel = TRUE, pointsize = 2)
Is that the point?

How to remove specific group from a plot but plot stays the same in R?

data('iris')
pca.irix <- PCA(iris[ ,1:4])
gg <- factoextra::fviz_pca_biplot(X = pca.irix,
# samples
fill.ind = iris$Species, col.ind = 'black',
pointshape = 21, pointsize = 1.5,
geom.ind = 'point', repel = T,
geom.var = FALSE )
I would like to obtain a plot that is exactly like the plot above but without the specie setosa.
I started doing this, but do not know how to continue
setosa_wo <- iris %>%
filter(Species != 'setosa')
gg + scale_x_continuous(limits = c((-2), 2)) + scale_y_continuous(limits = c((-2), 2))
How to remove a colored group from a plot? But the plot should stay the same.

One approach to remove one or any number of groups from the plot would be to filter the data used for the layers, e.g. having a look at gg$layers show that your PCA plot is composed of six layers, however only in the first two of the layers are the groups used as fill color. Therefore I simply filtered the data for these two layers which gives me a plot where setosa is removed.
EDIT Following the suggestion by #DaveArmstrong I added his code to fix the ranges of the axes on the original ranges and addtionally added the original colors
library(FactoMineR)
library(ggplot2)
pca.irix <- PCA(iris[ ,1:4])
gg <- factoextra::fviz_pca_biplot(X = pca.irix,
# samples
fill.ind = iris$Species, col.ind = 'black',
pointshape = 21, pointsize = 1.5,
geom.ind = 'point', repel = T,
geom.var = FALSE )
# First: Get the ranges
yrg <- ggplot2::layer_scales(gg)$y$range$range
xrg <- ggplot2::layer_scales(gg)$x$range$range
# Filter the data
gg$layers[[1]]$data <- dplyr::filter(gg$layers[[1]]$data, Fill. != "setosa")
gg$layers[[2]]$data <- dplyr::filter(gg$layers[[2]]$data, Fill. != "setosa")
gg +
# Set the limits to the original ones
ggplot2::coord_cartesian(xlim=xrg, ylim=yrg, expand=FALSE) +
# Add orignial colors
ggplot2::scale_fill_manual(values = scales::hue_pal()(3)[2:3])
Created on 2020-10-16 by the reprex package (v0.3.0)

Dot plot of multiple X and Y variables?

I am using a gene expression dataset from ~100 cells.
I want to generate a dot plot indicating which cells are expressing which genes, like below, excluding the color delineations.
I have tried ggplot solutions, but (from what I can tell) Ggplot2 cannot graph numerous variables in each axis. I've looked into more complex packages like Seurot and cRegulome (the image above is from cRegulome), but these produce more information the graphical output than I want.
Below is an example of the type of data frame I am working with.
Cell_A<-c(0,0,1,0,1,0,1,0)
Cell_B<-c(1,1,1,0,0,0,1,0)
Cell_C<-c(1,0,1,0,0,1,0,1)
Cell_D<-c(0,0,0,1,1,1,1,0)
Cell_E<-c(1,1,1,1,1,0,1,1)
Cell_F<-c(0,0,0,0,0,1,1,0)
Cell_G<-c(1,1,1,1,1,1,1,1)
Cell_H<-c(1,1,1,1,1,1,1,1)
Genes <- c("Gene1","Gene2","Gene3","Gene4","Gene5","Gene6","Gene7","Gene8")
fake_data <- data.frame(Cell_A, Cell_B, Cell_C, Cell_D, Cell_E,
Cell_F, Cell_G,Cell_H, row.names = Genes)
How can I manipulate this dataset to get the graphical output I want?

You can do this by reshaping the data and using geom_point. Map the size aesthetic to your count variable and it will work well. The legend is currently a bit nonsensical but can be manually tweaked if you do not have any other sizes than 0 and 1.
library(tidyverse)
Cell_A<-c(0,0,1,0,1,0,1,0)
Cell_B<-c(1,1,1,0,0,0,1,0)
Cell_C<-c(1,0,1,0,0,1,0,1)
Cell_D<-c(0,0,0,1,1,1,1,0)
Cell_E<-c(1,1,1,1,1,0,1,1)
Cell_F<-c(0,0,0,0,0,1,1,0)
Cell_G<-c(1,1,1,1,1,1,1,1)
Cell_H<-c(1,1,1,1,1,1,1,1)
Genes <- c("Gene1","Gene2","Gene3","Gene4","Gene5","Gene6","Gene7","Gene8")
fake_data <- data.frame(Cell_A, Cell_B, Cell_C, Cell_D, Cell_E,
Cell_F, Cell_G,Cell_H, row.names = Genes)
fake_data %>%
rownames_to_column(var = "gene") %>%
gather(cell, count, -gene) %>%
ggplot() +
geom_point(aes(x = gene, y = cell, size = count))
Created on 2019-08-02 by the reprex package (v0.3.0)

This solution is a base R solution that relies on matplot().
fake_data2 <- sweep(fake_data, 2, seq_len(length(fake_data)), FUN = '*')
fake_data2[fake_data2 == 0] <- NA_integer_
matplot(x = seq_along(Genes), y = as.matrix(fake_data2),
, cex = colSums(fake_data) / 3, pch = 16, col = 1
, yaxt='n', xaxt='n', ann=FALSE)
axis(1, at = seq_along(Genes), Genes)
axis(2, at = seq_len(length(fake_data)), names(fake_data), las = 1)
You didn't provide enough details on how what size you wanted. The size here is based on the number of 1 values for each column.

How can I create subplots in plotly using R where each subplot is two traces

Here is a toy example I have got stuck on
library(plotly)
library(dplyr)
# construct data.frame
df <- tibble(x=c(3,2,3,5,5,5,2),y=c("a","a","a","b","b","b","b"))
# construct data.frame of last y values
latest <- df %>%
group_by(y) %>%
slice(n())
# plot for one value of y (NB not sure why value for 3 appears?)
p <- plot_ly() %>%
add_histogram(data=subset(df,y=="b"),x= ~x) %>%
add_histogram(data=subset(latest,y=="b"),x= ~x,marker=list(color="red")) %>%
layout(barmode="overlay",showlegend=FALSE,title= ~y)
p
How can i set these up as subplots, one for each unique value of y? In the real world example, I would have 20 different y's so would ideally loop or apply the code. In addition, it would be good to set standard x scales of say c(1:10) and have, for example, 2 rows
TIA

build a list containing each of the plots
set the bin sizes manually for the histograms, otherwise the automatic selection will choose different bins for each of the traces within a plot (making it look strange as in you example where the bars of each trace are different widths)
use subplot to put it all together
add titles to individual subplots using a list of annotations, as explained here
Like this:
N = nlevels(factor(df$y))
plot_list = vector("list", N)
lab_list = vector("list", N)
for (i in 1:N) {
this_y = levels(factor(df$y))[i]
p <- plot_ly() %>%
add_trace(type="histogram", data=subset(df,y==this_y), x=x, marker=list(color="blue"),
autobinx=F, xbins=list(start=0.5, end=6.5, size=1)) %>%
add_trace(type="histogram", data=subset(latest,y==this_y), x = x, marker=list(color="red"),
autobinx=F, xbins=list(start=0.5, end=6.5, size=1)) %>%
layout(barmode="overlay", showlegend=FALSE)
plot_list[[i]] = p
titlex = 0.5
titley = c(1.05, 0.45)[i]
lab_list[[i]] = list(x=titlex, y=titley, text=this_y,
showarrow=F, xref='paper', yref='paper', font=list(size=18))
}
subplot(plot_list, nrows = 2) %>%
layout(annotations = lab_list)

How to adjust lines length in dendrogram?

Using this code in R,
library("dendextend")
library("dendextendRcpp")
dist2 <- read.csv("distanceMatrix.csv",sep=";",header=TRUE)
mat <- as.matrix(dist2)
# using piping to get the dend
dend <- dist2 %>% dist %>% hclust %>% as.dendrogram %>% set("labels", colnames(mat))
foo <- function(k){
svg(filename = "dendrogram_newest.svg",width = 25,height = 14)
# plot + color the dend's branches before, based on k clusters:
dend %>% color_branches(k) %>% plot()
# add horiz line:
abline(h = heights_per_k.dendrogram(dend)[k], lwd = 2, lty = 2, col = "purple")
dev.off()}
foo(6)
I get this:
So, how to shorten these lines. This way is almost unreadable.
And yes, my labels are ordered just like in first row of my distanceMatrix.csv. This order has nothing to do with relations inside of distanceMatrix. I mean, dendrogram is ok but values of labels aren't the right one.
Thanks

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Drawing rectangles around specified labels in a dendrogram with 'dendextend' - r

Related

Visualizing the CLARA cluster center/medoid

How to remove specific group from a plot but plot stays the same in R?

Dot plot of multiple X and Y variables?

How can I create subplots in plotly using R where each subplot is two traces

How to adjust lines length in dendrogram?

Categories

Resources