combine dendrogram plots and heatmap plots in R [duplicate] - r

I am trying to take my dataset which is made up of protein dna interaction, cluster the data and generate a heatmap that displays the resulting data such that the data looks clustered with the clusters lining up on the diagonal. I am able to cluster the data and generate a dendrogram of that data however when I generate the heatmap of the data using the heatmap function in R, the clusters are not visible. If you look at the first 2 images one is of the dendrogram I am able to generate, the second is of the heatmap that I am able to generate, and the third is just an example of a clustered heatmap that shows how I expect the result to look roughly. As you can see from comparing the second and third images, it is clear that there are clusters in the third but not in the second image.
Here is a link to my dataset:
http://pastebin.com/wQ9tYmjy
I am able to cluster the data and generate a just fine in R:
args <- commandArgs(TRUE);
matrix_a <- read.table(args[1], sep='\t', header=T, row.names=1);
location <- args[2];
matrix_d <- dist(matrix_a);
hc <- hclust(matrix_d,"average");
mypng <- function(filename = "mydefault.png") {
png(filename)
}
options(device = "mypng")
plot(hc);
I am also able to generate a heatmap okay as well:
matrix_a <- read.table("Arda_list.txt.binary.matrix.txt", sep='\t', header=T, row.names=1);
mtscaled <- as.matrix(scale(matrix_a))
heatmap(mtscaled, Colv=F, scale='none')
I tried to follow the post:
http://digitheadslabnotebook.blogspot.com/2011/06/drawing-heatmaps-in-r.html
by by Christopher Bare but I am missing something. Any ideas would be appreciated. I have attached an image of the heatmap that I am getting, as well as the dendrogram. Image 3 was taken from Christopher Bare's post. Thanks

It turns out I should have generated a distance matrix using some kind of correlation on my data first. I calculated similarity values on the matrix using pearson, then called the heapmap function which made it easier to cluster the data. Once I was able to generate clusters I made it so that they would line up on the diagonal. Above is what the result looks like now. I had to alter how I called heatmap on my data set so that the clusters line up on the axis:
heatmap(mtscaled, Colv=T,Rowv=T, scale='none',symm = T)

Related

Complexheatmap zoom annotation : Cluster wise boxplot

I want show cluster wise boxplot distribution from complexheatmap. I was able to do row-wise distribution but how do I implement the cluster-wise distribution attached as example.
In the dummy example it creates a subgroup which it shows in the distribution. Similar manner I have already in my datafile made cluster which is represented in the first column.
How do I implement this in my dataframe using this example code
I'm not sure how do I make subgroup in case of my dataframe.
Any suggestion or help would be really appreciated.
This is the output i would like to see:
This is the output I have:
The dataset is this one: small_data
And my code:
df <- read.csv("small_data.txt",header = TRUE)
heat <- t(scale(t(df[,3:ncol(df)])))
myBreaks <- seq(-1.5, 1.5, length.out=100)
hmap <- Heatmap(heat)
hmap
How do i implement the cluster specific distribution ? as it is shown in the first pic. The second figure is what I'm getting now

Stacked single row heatmaps in R

I have 10 matrices where each one represent the eigenvalues of correlation matrices of wavelet coefficients of a number of time series.
I would like to generate a heatmap of my data to see if there are any overlapping significant events across the different scales.
So far I have hacked together the following image using the graphics package as described here by #Josliber.
I am still getting to grips with R so excuse the nasty code but it works for me for now. I have yet to mess around with the labels and formatting but it's a quick and dirty representation.
#plotting the eigenvalues for each of the wavelet coefficient scales
w1mat <- matrix(w1eigen[1,])
w2mat <- matrix(w2eigen[1,])
w3mat <- matrix(w3eigen[1,])
w4mat <- matrix(w4eigen[1,])
w5mat <- matrix(w5eigen[1,])
w6mat <- matrix(w6eigen[1,])
w7mat <- matrix(w7eigen[1,])
w8mat <- matrix(w8eigen[1,])
w9mat <- matrix(w9eigen[1,])
w10mat <- matrix(w10eigen[1,])
#plots the eigenvalues for each of the scales
par(mfrow=c(10,1))
imageW1 <- image(w1mat)
imageW2 <- image(w2mat)
imageW3 <- image(w3mat)
imageW4 <- image(w4mat)
imageW5 <- image(w5mat)
imageW6 <- image(w6mat)
imageW7 <- image(w7mat)
imageW8 <- image(w8mat)
imageW9 <- image(w9mat)
imageW10 <- image(w10mat)
As you can see I used the image function in the graphics package to create this and I am sure I can achieve the same in ggplot with greater control.
Ultimately I wish to create a plot similar to the one below where there is a common Y axis and they're sitting right on top of one another with no white space.
What I would like to know is whether using the stacked image function in graphics is the best approach or is there a better way to visualise the data in an alternative package?

Different visualization for hierarchical clustering of dendrogram

I would like to have visualization of hierarchical clustering with shapes one inside the other. Brightness level represents level of hierarchy.
Let me show you my idea with an example:
# Clustering small proportion of iris data
clusters <- hclust(dist(iris[20:28, 3:4]), method = 'average')
# Visualizing the result as a dendogram
plot(clusters)
Now we can convert the dendrogram as below.
Is there any R package that can produce something similar?
This is only a partial answer. You can use clusplot from the cluster package to get some way in that direction. You could probably improve on this by changing the source of clusplot (type getAnywhere(clusplot.default) to get the source). But it is probably some work to get your bubbles to not overlap. Anyway, here's the plot you get from clusplot. It may also be of interest to look at the individual plots one at a time instead of showing them all together.
# use sample data
df <- iris[20:28, 3:4]
# calculate hierarchical clustering
hfit <- hclust(dist(df), method = 'average')
# plot dendogram
plot(hfit)
# use clusplot at all possible cutoffs and show on top of each other.
library(cluster)
clusplot(df, cutree(hfit, 1), lines = 0)
for (i in 2:nrow(df)){
clusplot(df, cutree(hfit, i), lines = 0, add = TRUE)
}

PCA Biplot : A way to hide vectors to see all data points clearly

I am trying to do PCA with R.
My Data has 10,000 columns and 90 rows
I used the prcomp function to do PCA.
Trying to prepare a biplot with the prcomp results, I ran into the problem that the 10,000 plotted vectors cover my datapoints. Is there any option for the biplot to hide the vectors' representation?
OR
I can use plot to get the PCA results. But I am not sure how to label these points according to my datapoints, which are numbered 1 to 90.
Sample<-read.table(file.choose(),header=F,sep="\t")
Sample.scaled<-data.frame(apply(Sample_2XY,2,scale))
Sample_scaled.2<-data.frame(t(na.omit(t(Sample_2XY.scaled))))
pca.Sample<-prcomp(Sample_2XY.scaled.2,retx=TRUE)
pdf("Sample_plot.pdf")
plot(pca.Sample$x)
dev.off()
If you do a help(prcomp) or ?prcomp, the help file tells us all the things contained in the prcomp() object returned by the function. We just need to pick which things we want to plot and do it with some function that gives us more control than biplot().
A more general trick for cases when the help file doesn't clarify things is to do a str() on the prcomp object (in your case pca.Sample) to see all its parts and find what we want ( str() compactly displays the internal structure of an R object. )
Here is an example with some of R's sample data:
# do a pca of arrests in different states
p<-prcomp(USArrests, scale = TRUE)
str(p) gives me something ugly and too long to include, but I can see that p$x has the states as rownames and their locations on the principal components as columns. Armed with this, we can plot it any way we want, such as with plot() and text() (for labels):
# plot and add labels
plot(p$x[,1],p$x[,2])
text(p$x[,1],p$x[,2],labels=rownames(p$x))
If we are making a scatterplot with many observations, the labels may not be readable. We therefore might want to only label more extreme values, which we can identify with quantile():
#make a new dataframe with the info from p we want to plot
df <- data.frame(PC1=p$x[,1],PC2=p$x[,2],labels=rownames(p$x))
#make sure labels are not factors, so we can easily reassign them
df$labels <- as.character(df$labels)
# use quantile() to identify which ones are within 25-75 percentile on both
# PC and blank their labels out
df[ df$PC1 > quantile(df$PC1)["25%"] &
df$PC1 < quantile(df$PC1)["75%"] &
df$PC2 > quantile(df$PC2)["25%"] &
df$PC2 < quantile(df$PC2)["75%"],]$labels <- ""
# plot
plot(df$PC1,df$PC2)
text(df$PC1,df$PC2,labels=df$labels)

How to get clusters to line up on the diagonal using heatmap.2 in r?

I am trying to cluster a protein dna interaction dataset, and draw a heatmap using heatmap.2 from the R package gplots. Here is the complete process that I am following to generate these graphs:
Generate a distance matrix using some correlation in my case pearson.
library(RColorBrewer);
library(gplots);
args <- commandArgs(TRUE);
matrix_a <- read.table(args[1], sep='\t', header=T, row.names=1);
mtscaled <- as.matrix(scale(matrix_a))
pdf("result.pdf", pointsize = 15, width = 18, height = 18)
result <- heatmap.2(mtscaled, Colv=T,Rowv=T, scale='none',symm = T, col = brewer.pal(9,"Reds"))
dev.off()
I am able to acomplish this with the normal heatmap function by doing the following:
result <- heatmap(mtscaled, Colv=T,Rowv=T, scale='none',symm = T)
However when I use the same settings for Heatmap.2 the clusters don't line up as well on the diagonal. I have attached 2 images the first image uses heatmap and the second image uses heatmap.2. I have used the Reds color from the package RColorBrewer to help better show what I am taking about. I would normally just use the default heatmap function, but I need the color variation that heatmap.2 provides.
Here is a list to the dataset used to generate the heatmaps, after it has been turned into a distance matrix:
DataSet
It's as if two of the arguments are conflicting. Colv=T says to order the columns by cluster, and symm=T says to order the columns the same as the rows. Of course, both constraints could be satisfied since the data is symmetrical, but instead Colv=T wins and you get two independent cluster orderings that happen to be different.
If you give up on having redundant copy of the dendrogram, the following gives the heatmap you want, at least:
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col = brewer.pal(9,"Reds"))

Resources