How to reorder cluster leaves (columns) when plotting pheatmap in R? - r

I am plotting a set of 15 samples clustered in three groups A, B, C, and the heatmap orders them such as C, A, B. (I have read this is due to that it plots on the right the cluster with the strongest similarity). I would like to order the clusters so the leaves of the cluster are seen as A, B, C (therefore reorganising the order of the cluster branches. Is there a function that can help me do this?
The code I have used:
library(pheatmap)
pheatmap(mat, annotation_col = anno,
color = colorRampPalette(c("blue", "white", "red"))(50), show_rownames = F)
(cluster_cols=FALSE would not cluster the samples at all, but that is not what I want)
I have also found on another forum this, but I am unsure how to change the function code and if it would work for me:
clustering_callback callback function to modify the clustering. Is
called with two parameters: original hclust object and the matrix used
for clustering. Must return a hclust object.

Hi I am not sure if that is of any help for you but when you check?pheatmap and scroll down to examples the last snippet of code actually does give that example.
# Modify ordering of the clusters using clustering callback option
callback = function(hc, mat){
sv = svd(t(mat))$v[,1]
dend = reorder(as.dendrogram(hc), wts = sv)
as.hclust(dend)
}
pheatmap(test, clustering_callback = callback)
I tried it on my heatmap and the previously defined function actually sorted the clusters exactly the way I needed them. Although I have to admit (as I am new to R) I don't fully understand what the defined callback function does.
Maybe you can also write a function with the dendsortpackage as I know you can reorder the branches of a dendrogram with it.

In this case, luckily clustering of the columns coincides with sample number order, (which is similar to dendrogram) so I added cluster_cols = FALSE and solved the issue of re-clustering the columns (and avoided writing the callback function.
pheatmap(mat,
annotation_col = anno,
fontsize_row = 2,
show_rownames = T,
cutree_rows = 3,
cluster_cols = FALSE)

# install.packages("dendsort")
library(dendsort)
sort_hclust <- function(...) as.hclust(dendsort(as.dendrogram(...)))
cluster_cols=sort_hclust(hclust(dist(mat)))

Related

List as an argument in R

This might be a fairly banal question. I am a r-user who knows little about coding.
I am using the package 'EGAnet' and create a plot using the function EGA. This function has a 'plot.args' argument. From the ?EGA page
plot.args
List. A list of additional arguments for the network plot.
For plot.type = "GGally" (see ggnet2 for full list of arguments):
vsize Size of the nodes. Defaults to 6.
label.size Size of the labels. Defaults to 5.
alpha The level of transparency of the nodes, which might be a single value or a vector of values. Defaults to 0.7.
edge.alpha The level of transparency of the edges, which might be a single value or a vector of values. Defaults to 0.4.
legend.names A vector with names for each dimension
color.palette The color palette for the nodes. For custom colors, enter HEX codes for each dimension in a vector. See color_palette_EGA for more details and examples
I can't figure out how to modify the parameters of this list to add them to the function EGA. For instance, I'd like to change the color.palette (color.palette = "grayscale") and vsize (vsize = 8) arguments.
How do I create this list to add to the function?
The final function looks something like
ega.sds <- EGA(data = ex_graph, model = "glasso", plot.EGA = T, plot.args = ???)
What do I add in place of ???
Thank you a million.
I believe you just need to write:
plot.args = list(
color.palette = "grayscale",
vsize = 8
)

Suppress graph output of a function [duplicate]

I am trying to turn off the display of plot in R.
I read Disable GUI, graphics devices in R but the only solution given is to write the plot to a file.
What if I don't want to pollute the workspace and what if I don't have write permission ?
I tried options(device=NULL) but it didn't work.
The context is the package NbClust : I want what NbClust() returns but I do not want to display the plot it does.
Thanks in advance !
edit : Here is a reproducible example using data from the rattle package :)
data(wine, package="rattle")
df <- scale (wine[-1])
library(NbClust)
# This produces a graph output which I don't want
nc <- NbClust(df, min.nc=2, max.nc=15, method="kmeans")
# This is the plot I want ;)
barplot(table(nc$Best.n[1,]),
xlab="Numer of Clusters", ylab="Number of Criteria",
main="Number of Clusters Chosen by 26 Criteria")
You can wrap the call in
pdf(file = NULL)
and
dev.off()
This sends all the output to a null file which effectively hides it.
Luckily it seems that NbClust is one giant messy function with some other functions in it and lots of icky looking code. The plotting is done in one of two places.
Create a copy of NbClust:
> MyNbClust = NbClust
and then edit this function. Change the header to:
MyNbClust <-
function (data, diss = "NULL", distance = "euclidean", min.nc = 2,
max.nc = 15, method = "ward", index = "all", alphaBeale = 0.1, plotetc=FALSE)
{
and then wrap the plotting code in if blocks. Around line 1588:
if(plotetc){
par(mfrow = c(1, 2))
[etc]
cat(paste(...
}
and similarly around line 1610. Save. Now use:
nc = MyNbClust(...etc....)
and you see no plots unless you add plotetc=TRUE.
Then ask the devs to include your patch.

Visualizing PCA with large number of variables in R using ggbiplot

I am trying to visualize a PCA that includes 87 variables.
prc <-prcomp(df[,1:87], center = TRUE, scale. = TRUE)
ggbiplot(prc, labels = rownames(df[,1:87]), var.axes = TRUE)
When I create the biplot, many of the vectors overlap with each other, making it impossible to read the labels. I was wondering if there is any way to only show some of the labels at a time. For example, I think it'd be useful if I could create a few separate biplots with each one showing only a subset of the labels on the vectors.
This question seems closely related, but I don't know if it translates to the latest version of ggbiplot. I'm also not sure how to modify the original functions.
A potential solution is to use the factoextra package to visualize your PCA results. The fviz_pca_biplot() function includes a repel argument. When repel = TRUE the plot labels are spread out to minimize overlap. There are also select.var options mentioned in the documentation, such as select.var = list(contrib=5) to display only the 5 most influential vectors. Also a select.var = list(name) option that seems to allow for the specification of a specific subset of variables that you want shown.
# read data
df <- mtcars[, c(1:7,10:11)]
# perform PCA
library("FactoMineR")
res.pca <- PCA(df, graph = FALSE)
# visualize
library(factoextra)
fviz_pca_biplot(res.pca, repel = TRUE, select.var = list(contrib = 5))

Inconsistent clustering with ComplexHeatmap?

So I'm trying to generate a heatmap for my data using Bioconductor's ComplexHeatmap package, but I get slightly different results depending on whether I make the dendrogram myself, or tell Heatmap to make it.
Packages:
require(ComplexHeatmap)
require(dendextend)
Data:
a=rnorm(400,1)
b=as.matrix(a)
dim(b)=c(80,5)
If I make the dendrogram myself:
d=dist(b,method="euclidean")
d=as.dist(d)
h=hclust(d,method="ward.D")
dend=as.dendrogram(h)
Heatmap(b,
cluster_columns=FALSE,
cluster_rows = dend)
Versus having Heatmap do the clustering:
Heatmap(b,
cluster_columns=FALSE,
clustering_distance_rows = "euclidean",
clustering_method_rows = "ward.D")
They tend to look very similar, but they'll be very slightly different.
And this matters a lot for my data. Heatmap's clustering ends up organizing my data way, way better, however, I also want to extract the list of clustered items via like cutree(), but I don't think I can extract it from Heatmap's clustering.
Does anyone know what's going on?
the dendrograms are the same. The only thing that changes is the ordering. You can verify this using:
hmap1 <- Heatmap(b,
cluster_columns=FALSE,
cluster_rows = dend)
hmap2 <- Heatmap(b,
cluster_columns=FALSE,
clustering_distance_rows = "euclidean",
clustering_method_rows = "ward.D")
#Reorder both row dendrograms using the same weights:
rowdend1 <- reorder(row_dend(hmap1)[[1]], 1:80)
rowdend2 <- reorder(row_dend(hmap2)[[1]], 1:80)
#check that they are identical:
identical( rowdend1, rowdend2)
## [1] TRUE
The ComplexHeatmap::Heatmap function has an argument row_dend_reorder with default value TRUE that you should check.

Error plotting Kohonen maps in R?

I was reading through this blog post on R-bloggers and I'm confused by the last section of the code and can't figure it out.
http://www.r-bloggers.com/self-organising-maps-for-customer-segmentation-using-r/
I've attempted to recreate this with my own data. I have 5 variables that follow an exponential distribution with 2755 points.
I am fine with and can plot the map that it generates:
plot(som_model, type="codes")
The section of the code I don't understand is the:
var <- 1
var_unscaled <- aggregate(as.numeric(training[,var]),by=list(som_model$unit.classif),FUN = mean, simplify=TRUE)[,2]
plot(som_model, type = "property", property=var_unscaled, main = names(training)[var], palette.name=coolBlueHotRed)
As I understand it, this section of the code is suppose to be plotting one of the variables over the map to see what it looks like but this is where I run into problems. When I run this section of the code I get the warning:
Warning message:
In bgcolors[!is.na(showcolors)] <- bgcol[showcolors[!is.na(showcolors)]] :
number of items to replace is not a multiple of replacement length
and it produces the plot:
Which just some how doesn't look right...
Now what I think it has come down to is the way the aggregate function has re-ordered the data. The length of var_unscaled is 789 and the length of som_model$data, training[,var] and unit.classif are all of length 2755. I tried plotting the aggregated data, the result was no warning but an unintelligible graph (as expected).
Now I think it has done this because unit.classif has a lot of repeated numbers inside it and that's why it has reduced in size.
The question is, do I worry about the warning? Is it producing an accurate graph? What exactly is the "Property"'s section looking for in the plot command? Is there a different way I could "Aggregate" the data?
I think that you have to create the palette color. If you put the argument
coolBlueHotRed <- function(n, alpha = 1) {rainbow(n, end=4/6, alpha=alpha)[n:1]}
and then try to get a plot, for example
plot(som_model, type = "count", palette.name = coolBlueHotRed)
the end is succesful.
This link can help you: http://rgm3.lab.nig.ac.jp/RGM/R_rdfile?f=kohonen/man/plot.kohonen.Rd&d=R_CC
I think that not all of the cells on your map have points inside.
You have 30 by 30 map and about 2700 points. In average it's about 3 points per cell. With high probability some cells have more than 3 points and some cells are empty.
The code in the post on R-bloggers works well when all of the cells have points inside.
To make it work on your data try change this part:
var <- 1
var_unscaled <- aggregate(as.numeric(training[, var]), by = list(som_model$unit.classif), FUN = mean, simplify = TRUE)[, 2]
plot(som_model, type = "property", property = var_unscaled, main = names(training)[var], palette.name = coolBlueHotRed)
with this one:
var <- 1
var_unscaled <- aggregate(as.numeric(data.temp[, data.classes][, var]),
by = list(som_model$unit.classif),
FUN = mean,
simplify = T)
v_u <- rep(0, max(var_unscaled$Group.1))
v_u[var_unscaled$Group.1] <- var_unscaled$x
plot(som_model,
type = "property",
property = v_u,
main = colnames(data.temp[, data.classes])[var],
palette.name = coolBlueHotRed)
Hope it helps.
Just add these functions to your script:
coolBlueHotRed <- function(n, alpha = 1) {rainbow(n, end=4/6, alpha=alpha)[n:1]}
pretty_palette <- c("#1f77b4","#ff7f0e","#2ca02c", "#d62728","#9467bd","#8c564b","#e377c2")

Resources