Pheatmap: Re-order leaves in dendogram - r

I have created a heatmap with a corresponding dendogram based on the hierarchical clustering, using the pheatmap package. Now, I want to change the order of the leaves in the dendogram. Preferably using the optimal leaves method. I have searched around but not found any solution on how to change the achieve this.
I would appreciate suggestions on how to change the order of the leaves, using the optimal leaves method.
Here's my example code with random data:
mat <- matrix(rgamma(1000, shape = 1) * 5, ncol = 50)
p <- pheatmap(mat,
clustering_distance_cols = "manhattan",
cluster_cols=TRUE,
cluster_rows=FALSE
)

For "optimal leaf ordering" you can use order method from seriation library. pheatmap accepts clustering_callback argument. According to docs:
clustering_callback callback function to modify the clustering. Is called with two parameters: original hclust object and the matrix used
for clustering. Must return a hclust object.
So you need to construct callback function which accepts hclust object and initial matrix and returns optimized hclust object.
Here is a code:
library(pheatmap)
library(seriation)
cl_cb <- function(hcl, mat){
# Recalculate manhattan distances for reorder method
dists <- dist(mat, method = "manhattan")
# Perform reordering according to OLO method
hclust_olo <- reorder(hcl, dists)
return(hclust_olo)
}
mat <- matrix(rgamma(1000, shape = 1) * 5, ncol = 50)
p <- pheatmap(mat,
clustering_distance_cols = "manhattan",
cluster_cols=TRUE,
cluster_rows=FALSE,
clustering_callback = cl_cb
)

Related

Change title: mcmc_trace function with ggplot

I used mcmc_trace function from the bayesplot package to plot traceplot with mcmc list, which is a ggplot item so it can be further edited by ggplot function.
Follows is the plot that produced by the function. I needed to change the title k1...k[20] to subject 1... subject 20. Are there any approaches I can achieve this with ggplot function?
Follows is a simple reproducible model.
library (r2jags)
library (bayesplot)
library (ggplot2)
# data
dlist <- list(
NSubjects = 20,
k = rep (5,20),
n = rep (10,20)
)
# monitor
parameter <- 'theta'
# model
minimodel <- function(){
for (i in 1:NSubjects){
theta [i] ~ dbeta (1,1)
k[i] ~ dbin(theta[i],n[i])
}
}
samples <- jags(dlist, inits=NULL, parameter,
model.file = minimodel,
n.chains=1, n.iter=10, n.burnin=1, n.thin=1, DIC=T)
# mcmc list
codaSamples = as.mcmc.list(samples$BUGSoutput)
# select subjects
colstheta <- sprintf("theta[%d]",1:20)
# plot (here is where I need to change title, in this example: theta[1]...theta[20] to subject [1].. subject [20]
mcmc_trace(codaSamples[,colstheta]) +
labs (x='Iteration',y='theta value',
title='Traceplot - theta')
Use colnames<- to modify the column names. Since the object is a 1-element list containing a matrix-like object, you need to use [[1]]; if you have multiple chains you'll need to lapply() (or use a for loop) to apply the solution to every chain (i.e., every element in the list).
cc <- codaSamples[,colstheta]
colnames(cc[[1]]) <- gsub("theta\\[([0-9]+)\\]","subject \\1",colnames(cc[[1]]))
mcmc_trace(cc, ...)
The code above finds the numerical element in each name and inserts it into the new name; since you happen to know in this case that these are elements 1:20, you could simplify considerably, e.g.
colnames(cc[[1]]) <- paste("subject",seq(ncol(cc[[1]])))

How to reorder cluster leaves (columns) when plotting pheatmap in R?

I am plotting a set of 15 samples clustered in three groups A, B, C, and the heatmap orders them such as C, A, B. (I have read this is due to that it plots on the right the cluster with the strongest similarity). I would like to order the clusters so the leaves of the cluster are seen as A, B, C (therefore reorganising the order of the cluster branches. Is there a function that can help me do this?
The code I have used:
library(pheatmap)
pheatmap(mat, annotation_col = anno,
color = colorRampPalette(c("blue", "white", "red"))(50), show_rownames = F)
(cluster_cols=FALSE would not cluster the samples at all, but that is not what I want)
I have also found on another forum this, but I am unsure how to change the function code and if it would work for me:
clustering_callback callback function to modify the clustering. Is
called with two parameters: original hclust object and the matrix used
for clustering. Must return a hclust object.
Hi I am not sure if that is of any help for you but when you check?pheatmap and scroll down to examples the last snippet of code actually does give that example.
# Modify ordering of the clusters using clustering callback option
callback = function(hc, mat){
sv = svd(t(mat))$v[,1]
dend = reorder(as.dendrogram(hc), wts = sv)
as.hclust(dend)
}
pheatmap(test, clustering_callback = callback)
I tried it on my heatmap and the previously defined function actually sorted the clusters exactly the way I needed them. Although I have to admit (as I am new to R) I don't fully understand what the defined callback function does.
Maybe you can also write a function with the dendsortpackage as I know you can reorder the branches of a dendrogram with it.
In this case, luckily clustering of the columns coincides with sample number order, (which is similar to dendrogram) so I added cluster_cols = FALSE and solved the issue of re-clustering the columns (and avoided writing the callback function.
pheatmap(mat,
annotation_col = anno,
fontsize_row = 2,
show_rownames = T,
cutree_rows = 3,
cluster_cols = FALSE)
# install.packages("dendsort")
library(dendsort)
sort_hclust <- function(...) as.hclust(dendsort(as.dendrogram(...)))
cluster_cols=sort_hclust(hclust(dist(mat)))

Random Graph Function in R

I have an assignment in which I have to generate my own random graph function in R, with an igraph output. I've figured out that the easiest way to do this is to simply generate a square matrix and then build a function which creates edges between the nodes in the matrix. However I'd like to do something special, where the probability of the edges are based on forming a higher likelihood of sybil networks. Would look like this:
My matrix is generated and visualised quite simply like this:
library(ggraph)
library(igraph)
NCols <- 20
NRows <- 20
myMat <-matrix(runif(NCols*NRows), ncol = NCols)
myMat
randomgraph <- graph_from_adjacency_matrix(myMatG, mode = "undirected", weighted = NULL, diag = TRUE, add.colnames = NULL, add.rownames = NA)
randomgraph %>%
ggraph() +
geom_node_point(colour = "firebrick4", size = 0.5, show.legend = F)
I know there are functions like Erdos-Renyi Random- (for a true random graph), Barabási-Albert Scale-Free- and Watts-Strogatz Small-World graphs. I'm trying to write my own with a unique twist.
Any advice or code snippets on how to write my own preferential attachment function for the random matrix would be greatly appreciated! Thank you!

FactoMineR/factoextra visualize all the clusters in the dendrogram

I performed a hierarchical clustering on a dataframe using the HCPC function of the package FactoMineR. Problem is, I cannot visualize the number of clusters I asked when I draw the dendrogram using factoextra.
Here is below a reproducible example of my problem
model <- HCPC(iris[,1:4], nb.clust = 5)
there are indeed 5 clusters above
fviz_dend(model, k = 5,
cex = 0.7,
palette = "default",
rect = TRUE, rect_fill = TRUE,
)
But just 3 mapped within the dendrogram
I bumped into the same problem: the fviz_dend function would always return what it considers to be the optimal amount of clusters, even when I tried to override this – either in the HCPC or in the fviz_dend functions.
One way to fix this while sticking to FactoMineR and factoextra would be to change the default amount of clusters calculated by the HCPC function:
model$call$t$nb.clust = 5
And then run the fviz_dend function.
This should return the result that you were expecting.
You can just use the dendextend R package with the color_branches function:
library(dendextend)
dend <- USArrests %>% dist %>% hclust(method = "ave") %>% as.dendrogram
dd <- color_branches(dend,5)
plot(dd)

How to plot an nmds with coloured/symbol points based on SIMPROF

Hi so i am trying to plot my nmds of a assemblage data which is in a bray-curtis dissimilarity matrix in R. I have been able to apply ordielipse(),ordihull() and even change the colours based on group factors created by cutree() of a hclst()
e.g using the dune data from the vegan package
data(dune)
Dune.dis <- vegdist(Dune, method = "bray)
Dune.mds <- metaMDS(Dune, distance = "bray", k=2)
#hierarchical cluster
clua <- hclust(Dune.dis, "average")
plot(clua, hang = -1)
# set groupings
rect.hclust(clua, 4)
grp <- cutree(clua, 4)
#plot mds
plot(Dune.mds, display = "sites", type = "text", cex = 1.5)
#show groupings
ordielipse(Dune.mds, group = grp, border =1, col ="red", lwd = 3)
or even colour the points just by the cutree
colvec <- c("red2", "cyan", "deeppink3", "green3")
colvec[grp]
plot(Dune.mds, display = "sites", type = "text", cex = 1.5) #or use type = "points"
points(P4.mds, col = colvec[c2], bg =colvec[c2], pch=21)
However what i really want to do is use the SIMPROF function using the package "clustsig" to then colour the points based on significant groupings - this is more of a technical coding language thing - i am sure there is a way to create a string of factors but i am sure there is a more efficient way to do it
heres my code so far for that:
simp <- simprof(Dune.dis, num.expected = 1000, num.simulated = 999, method.cluster = "average", method.distance = "braycurtis", alpha = 0.05, sample.orientation = "row")
#plot dendrogram
simprof.plot(simp, plot = TRUE)
Now i am just not sure how do the next step to plot the nmds using the groupings defined by the SIMPROF - how do i make the SIMPROF results a factor string without literally typing it my self it myself?
Thanks in advance.
You wrote you know how to get colours from an hclust object with cutree. Then read the documentation of clustsig::simprof. This says that simprof returns an hclust object within its result object. It also returns numgroups which is the suggested number of clusters. Now you have all information you need to use the cutree of hclust you already know. If your simprof result is called simp, use cutree(simp$hclust, simp$numgroups) to extract the integer vector corresponding to the clustsig::simprof result, and use this to colours.
I have never used simprof or clustsig, but I gathered all this information from its documentation.

Resources