Highlight Subset Cells From Heatmap By Row/Col Index - r

I'm trying to visually inspect, and extract, subsets of large heatmaps. For example, I'd like to roughly extract the row/col indices for clusters like the one I circled below:
Following the advice from here, I hope to achieve this by creating rectangles around subsets of cells by index and repeat until I've highlighted areas close enough to what I want.
Using some simpler data, I tried this:
library(gplots)
set.seed(100)
# Input data 4x5 matrix
nx <- 5
ny <- 4
dat <- matrix(runif(20, 1, 10), nrow=ny, ncol=nx)
# Get hierarchically clustered heatmap matrix
hm <- heatmap.2(dat, main="Test HM", key=T, trace="none")
hmat <- dat[rev(hm$rowInd), hm$colInd]
# Logical matrix with the same dimensions as our data
# indicating which cells I want to subset
selection <- matrix(rep(F,20), nrow=4)
# For example: the third row
selection[3,] <- T
#selection <- dat>7 # Typical subsets like this don't work either
# Function for making selection rectangles around selection cells
makeRects <- function(cells){
coords = expand.grid(1:nx,1:ny)[cells,]
xl=coords[,1]-0.49
yb=coords[,2]-0.49
xr=coords[,1]+0.49
yt=coords[,2]+0.49
rect(xl,yb,xr,yt,border="black",lwd=3)
}
# Re-make heatmap with rectangles based on the selection
# Use the already computed heatmap matrix and don't recluster
heatmap.2(hmat, main="Heatmap - Select 3rd Row", key=T, trace="none",
dendrogram="none", Rowv=F, Colv=F,
add.expr={makeRects(selection)})
This does not work. Here is the result. Instead of the third row being highlighted, we see a strange pattern:
It must have to do with this line:
coords = expand.grid(1:nx,1:ny)[cells,]
# with parameters filled...
coords = expand.grid(1:5,1:4)[selection,]
Can anyone explain what's going on here? I'm not sure why my subset isn't working even though it is similar to the one in the other question.

Very close. I think you made a typo in the makeRects() function. In my hands, it works with a few changes.
# Function for making selection rectangles around selection cells
makeRects <- function(cells){
coords = expand.grid(ny:1, 1:nx)[cells,]
xl=coords[,2]-0.49
yb=coords[,1]-0.49
xr=coords[,2]+0.49
yt=coords[,1]+0.49
rect(xl,yb,xr,yt,border="black",lwd=3)
}
# Re-make heatmap with rectangles based on the selection
# Use the already computed heatmap matrix and don't recluster
heatmap.2(hmat, main="Heatmap - Select 3rd Row", key=T, trace="none",
dendrogram="none", Rowv=F, Colv=F,
add.expr={makeRects(selection)})

Related

How to plot overlap clustering with a list of vertices of each group and the edge list by R?

I have a file csv including edge list of graph. After implementing CONGA
(Clustering Overlap Girven-NewMan alorithm), result is a list of vertices of each group.
I don't know how to plot it so that each group has different color in graph by R.
I can plot graph with edge list in R, but I don't know how to mark vertices in each group.
Input: edge list file and list of vertices in each group.
Output: graph with different color for each group.
output nearly like this
My English isn't good. Thanks for your support.
You plot colors using $color of vertices. Try to assign a color like V(g)$color <- 'green'.
It is better if you give us some code.
You say you get a list of your group-members. Convert the list to a vector, and sign a new color to each unique group-member value. I wrote this example code. I think it shows what you're after.
library(igraph)
get_a_random_network <- function() {
# EN: Function to get some random data to use as an example
g <- erdos.renyi.game(100, 60, type="gnm", directed=F, loops=FALSE)
g <- g %>% delete_vertices( V(g)[degree(g)==0] )
(g)
}
# Get sample data
g <- get_a_random_network()
# Use a cluster algorythm to determine groups. You said you had a list. I use this to generate example data.
groups <- cluster_fast_greedy(g)
# Look at the vertecies
(V(g))
# Look at what groups they belong to:
(groups$membership)
# Here you write that you have "list of vertices of each group". You don't
# give us code, but I assume that you have data that looks like this:
CONGA_list <- lapply(1:max(groups$membership),function(x) V(g)[groups$membership ==x])
(CONGA_list)
# This is where you should really have provided a code example.
# You could convert a list like CONGA_list to a vector like this:
membership_groups <- rep(0, length(V(g)))
for(x in 1:length(CONGA_list)){
membership_groups[as.vector(CONGA_list[[x]])] <- x
}
(membership_groups == groups$membership)
# You give color to your network by first telling each vector which group it belongs to
V(g)$membership <- groups$membership
# Then we asign a color. I use a vector of R-colors which I get like this...
colors = grDevices::colors()[grep('gr(a|e)y', grDevices::colors(), invert = T)]
# ... and then I sample from them to give each vertecy a color.
colors <- sample(colors, max(V(g)$membership))
V(g)$color <- colors[V(g)$membership]
(V(g)$color)
# The plot will work with the colors in V(g)$color
plot(g, vertex.size=7, vertex.label=NA)
Good luck

How to add labels to original data given clustering result using hclust

Just say I have some unlabeled data which I know should be clustered into six catergories, like for example this dataset:
library(tidyverse)
ts <- read_table(url("http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_control.data"), col_names = FALSE)
If I create an hclust object with a sample of 60 from the original dataset like so:
n <- 10
s <- sample(1:100, n)
idx <- c(s, 100+s, 200+s, 300+s, 400+s, 500+s)
ts.samp <- ts[idx,]
observedLabels <- c(rep(1,n), rep(2,n), rep(3,n), rep(4,n), rep(5,n), rep(6,n))
# compute DTW distances
library(dtw)#Dynamic Time Warping (DTW)
distMatrix <- dist(ts.samp, method= 'DTW')
# hierarchical clustering
hc <- hclust(distMatrix, method='average')
I know that I can then add the labels to the dendrogram for viewing like this:
observedLabels <- c(rep(1,), rep(2,n), rep(3,n), rep(4,n), rep(5,n), rep(6,n))
plot(hc, labels=observedLabels, main="")
However, I would like to the correct labels to the initial data frame that was clustered. So for ts.samp I would like to add a extra column with the correct label that each observation has been clustered into.
It would seems that ts.samp$cluster <- hc$label should add the cluster to the data frame, however hc$label returns NULL.
Can anyone help with extracting this information?
You need to define a level where you cut your dendrogram, this will form the groups.
Use:
labels <- cutree(hc, k = 3) # you set the number of k that's more appropriate, see how to read a dendrogram
ts.samp$grouping <- labels
Let's look at the dendrogram in order to find the best number for k:
plot(hc, main="")
abline(h=500, col = "red") # cut at height 500 forms 2 groups
abline(h=300, col = "blue") # cut at height 300 forms 3/4 groups
It looks like either 2 or 3 might be good. You need to find the highest jump in the vertical lines (Height).
Use the horizontal lines at that height and count the cluster "formed".

How can I extract the matrix derived from a heatmap created with gplots after hierarchical clustering?

I am making a heatmap, but I can't assign the result in a variable to check the result before plotting. Rstudio plot it automatically. I would like to get the list of rownames in the order of the heatmap. I'am not sure if this is possible. I'am using this code:
hm <- heatmap.2( assay(vsd)[ topVarGenes, ], scale="row",
trace="none", dendrogram="both",
col = colorRampPalette( rev(brewer.pal(9, "RdBu")) )(255),
ColSideColors = c(Controle="gray", Col1.7G2="darkgreen", JG="blue", Mix="orange")[
colData(vsd)$condition ] )
You can assign the plot to an object. The plot will still be drawn in the plot window, however, you'll also get a list with all the data for each plot element. Then you just need to extract the desired plot elements from the list. For example:
library(gplots)
p = heatmap.2(as.matrix(mtcars), dendrogram="both", scale="row")
p is a list with all the elements of the plot.
p # Outputs all the data in the list; lots of output to the console
str(p) # Struture of p; also lots of output to the console
names(p) # Names of all the list elements
p$rowInd # Ordering of the data rows
p$carpet # The heatmap values
You'll see all the other values associated with the dendrogram and the heatmap if you explore the list elements.
To others out there, a more complete description way to capture a matrix representation of the heatmap created by gplots:
matrix_map <- p$carpet
matrix_map <- t(matrix_map)

Trying to determine why my heatmap made using heatmap.2 and using breaks in R is not symmetrical

I am trying to cluster a protein dna interaction dataset, and draw a heatmap using heatmap.2 from the R package gplots. My matrix is symmetrical.
Here is a copy of the data-set I am using after it is run through pearson:DataSet
Here is the complete process that I am following to generate these graphs: Generate a distance matrix using some correlation in my case pearson, then take that matrix and pass it to R and run the following code on it:
library(RColorBrewer);
library(gplots);
library(MASS);
args <- commandArgs(TRUE);
matrix_a <- read.table(args[1], sep='\t', header=T, row.names=1);
mtscaled <- as.matrix(scale(matrix_a))
# location <- args[2];
# setwd(args[2]);
pdf("result.pdf", pointsize = 15, width = 18, height = 18)
mycol <- c("blue","white","red")
my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7))
#colors <- colorpanel(75,"midnightblue","mediumseagreen","yellow")
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks)
dev.off()
The issue I am having is once I use breaks to help me control the color separation the heatmap no longer looks symmetrical.
Here is the heatmap before I use breaks, as you can see the heatmap looks symmetrical:
Here is the heatmap when breaks are used:
I have played with the cutoff's for the sequences to make sure for instance one sequence does not end exactly where the other begins, but I am not able to solve this problem. I would like to use the breaks to help bring out the clusters more.
Here is an example of what it should look like, this image was made using cluster maker:
I don't expect it to look identical to that, but I would like it if my heatmap is more symmetrical and I had better definition in terms of the clusters. The image was created using the same data.
After some investigating I noticed was that after running my matrix through heatmap, or heatmap.2 the values were changing, for example the interaction taken from the provided data set of
Pacdh-2
and
pegg-2
gave a value of 0.0250313 before the matrix was sent to heatmap.
After that I looked at the matrix values using result$carpet and the values were then
-0.224333135
-1.09805379
for the two interactions
So then I decided to reorder the original matrix based on the dendrogram from the clustered matrix so that I was sure that the values would be the same. I used the following stack overflow question for help:
Order of rows in heatmap?
Here is the code used for that:
rowInd <- rev(order.dendrogram(result$rowDendrogram))
colInd <- rowInd
data_ordered <- matrix_a[rowInd, colInd]
I then used another program "matrix2png" to draw the heatmap:
I still have to play around with the colors but at least now the heatmap is symmetrical and clustered.
Looking into it even more the issue seems to be that I was running scale(matrix_a) when I change my code to just be mtscaled <- as.matrix(matrix_a) the result now looks symmetrical.
I'm certainly not the person to attempt reproducing and testing this from that strange data object without code that would read it properly, but here's an idea:
..., col=bluered(20)[4:20], ...
Here's another though which should return the full rand of red which tha above strategy would not:
shift.BR<- colorRamp(c("blue","white", "red"), bias=0.5 )((1:16)/16)
heatmap.2( ...., col=rgb(shift.BR, maxColorValue=255), .... )
Or you can use this vector:
> rgb(shift.BR, maxColorValue=255)
[1] "#1616FF" "#2D2DFF" "#4343FF" "#5A5AFF" "#7070FF" "#8787FF" "#9D9DFF" "#B4B4FF" "#CACAFF" "#E1E1FF" "#F7F7FF"
[12] "#FFD9D9" "#FFA3A3" "#FF6C6C" "#FF3636" "#FF0000"
There was a somewhat similar question (also today) that was asking for a blue to red solution for a set of values from -1 to 3 with white at the center. This it the code and output for that question:
test <- seq(-1,3, len=20)
shift.BR <- colorRamp(c("blue","white", "red"), bias=2)((1:20)/20)
tpal <- rgb(shift.BR, maxColorValue=255)
barplot(test,col = tpal)
(But that would seem to be the wrong direction for the bias in your situation.)

Displaying TraMineR (R) dendrograms in text/table format

I use the following R code to generate a dendrogram (see attached picture) with labels based on TraMineR sequences:
library(TraMineR)
library(cluster)
clusterward <- agnes(twitter.om, diss = TRUE, method = "ward")
plot(clusterward, which.plots = 2, labels=colnames(twitter_sequences))
The full code (including dataset) can be found here.
As informative as the dendrogram is graphically, it would be handy to get the same information in text and/or table format. If I call any of the aspects of the object clusterward (created by agnes), such as "order" or "merge" I get everything labeled using numbers rather than the names I get from colnames(twitter_sequences). Also, I don't see how I can output the groupings represented graphically in the dendrogram.
To summarize: How can I get the cluster output in text/table format with the labels properly displayed using R and ideally the traminer/cluster libraries?
The question concerns the cluster package. The help page for the agnes.object returned by agnes
(See http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/agnes.object.html ) states that this object contains an order.lab component "similar to order, but containing observation labels instead of observation numbers. This component is only available if the original observations were labelled."
The dissimilarity matrix (twitter.om in your case) produced by TraMineR does currently not retain the sequence labels as row and column names. To get the order.lab component you have to manually assign sequence labels as both the rownames and colnames of your twitter.om matrix. I illustrate here with the mvad data provided by the TraMineR package.
library(TraMineR)
data(mvad)
## attaching row labels
rownames(mvad) <- paste("seq",rownames(mvad),sep="")
mvad.seq <- seqdef(mvad[17:86])
## computing the dissimilarity matrix
dist.om <- seqdist(mvad.seq, method = "OM", indel = 1, sm = "TRATE")
## assigning row and column labels
rownames(dist.om) <- rownames(mvad)
colnames(dist.om) <- rownames(mvad)
dist.om[1:6,1:6]
## Hierarchical cluster with agnes library(cluster)
cward <- agnes(dist.om, diss = TRUE, method = "ward")
## here we can see that cward has an order.lab component
attributes(cward)
That is for getting order with sequence labels rather than numbers. But now it is not clear to me which cluster outcome you want in text/table form. From the dendrogram you decide of where you want to cut it, i.e., the number of groups you want and cut the dendrogram with cutree, e.g. cl.4 <- cutree(clusterward1, k = 4). The result cl.4 is a vector with the cluster membership for each sequence and you get the list of the members of group 1, for example, with rownames(mvad.seq)[cl.4==1].
Alternatively, you can use the identify method (see ?identify.hclust) to select the groups interactively from the plot, but need to pass the argument as as.hclust(cward). Here is the code for the example
## plot the dendrogram
plot(cward, which.plot = 2, labels=FALSE)
## and select the groups manually from the plot
x <- identify(as.hclust(cward)) ## Terminate with second mouse button
## number of groups selected
length(x)
## list of members of the first group
x[[1]]
Hope this helps.

Resources