How can I extract the matrix derived from a heatmap created with gplots after hierarchical clustering? - r

I am making a heatmap, but I can't assign the result in a variable to check the result before plotting. Rstudio plot it automatically. I would like to get the list of rownames in the order of the heatmap. I'am not sure if this is possible. I'am using this code:
hm <- heatmap.2( assay(vsd)[ topVarGenes, ], scale="row",
trace="none", dendrogram="both",
col = colorRampPalette( rev(brewer.pal(9, "RdBu")) )(255),
ColSideColors = c(Controle="gray", Col1.7G2="darkgreen", JG="blue", Mix="orange")[
colData(vsd)$condition ] )

You can assign the plot to an object. The plot will still be drawn in the plot window, however, you'll also get a list with all the data for each plot element. Then you just need to extract the desired plot elements from the list. For example:
library(gplots)
p = heatmap.2(as.matrix(mtcars), dendrogram="both", scale="row")
p is a list with all the elements of the plot.
p # Outputs all the data in the list; lots of output to the console
str(p) # Struture of p; also lots of output to the console
names(p) # Names of all the list elements
p$rowInd # Ordering of the data rows
p$carpet # The heatmap values
You'll see all the other values associated with the dendrogram and the heatmap if you explore the list elements.

To others out there, a more complete description way to capture a matrix representation of the heatmap created by gplots:
matrix_map <- p$carpet
matrix_map <- t(matrix_map)

Related

Venn Diagram in R to show character labels

I'm very new to R and trying to create a 4-dimensional Venn diagram with a data which contains all categorical variables.
For instance, if i have the data below and I want to create a Venn Diagram in R to show the word "hello" at the intersection of A & B instead of the counts and percentages, how do i do that? I used the code ggVennDigram(x) after creating the list below and it gave me the counts instead of the actual data labels.
x=list()
x$A=as.character(c("hello"))
x$B=as.character(c("hello", "how", "are"))
x$C=as.character(c("how", "You"))
x$D=as.character(c("me", "her", "they"))
The graph cannot be label by default. This is solution with some hack.
(modified from Venn Diagram with Item labels)
library(VennDiagram)
library(stringr)
library(purrr)
# Generate plot
v <- venn.diagram(x,
fill = c("orange", "blue", "red", "green"),
filename=NULL)
# Calculate overlap site
overlaps <- calculate.overlap(x)
overlaps <- overlaps[str_sort(names(overlaps), numeric = TRUE)] # sort base on numeric value
# Apply name to global variable
# Index of venn diagram start at calculate.overlaps + 8. You have to find the index value by yourself for (3,5,6,7,.. venn)
walk2(seq(overlaps) +8, seq(overlaps),
function(x,y) {v[[x]]$label <<- paste0(overlaps[[y]], collapse = "\n")})
# Draw plot
grid.draw(v)

Set common y axis limits from a list of ggplots

I am running a function that returns a custom ggplot from an input data (it is in fact a plot with several layers on it). I run the function over several different input data and obtain a list of ggplots.
I want to create a grid with these plots to compare them but they all have different y axes.
I guess what I have to do is extract the maximum and minimum y axes limits from the ggplot list and apply those to each plot in the list.
How can I do that? I guess its through the use of ggbuild. Something like this:
test = ggplot_build(plot_list[[1]])
> test$layout$panel_scales_x
[[1]]
<ScaleContinuousPosition>
Range:
Limits: 0 -- 1
I am not familiar with the structure of a ggplot_build and maybe this one in particular is not a standard one as it comes from a "custom" ggplot.
For reference, these plots are created whit the gseaplot2 function from the enrichplot package.
I dont know how to "upload" an R object but if that would help, let me know how to do it.
Thanks!
edit after comments (thanks for your suggestions!)
Here is an example of the a gseaplot2 plot. GSEA stands for Gene Set Enrichment Analysis, it is a technique used in genomic studies. The gseaplot2 function calculates a running average and then plots it and another bar plot on the bottom.
and here is the grid I create to compare the plots generated from different data:
I would like to have a common scale for the "Running Enrichment Score" part.
I guess I could try to recreate the gseaplot2 function and input all of the datasets and then create the grid by facet_wrap, but I was wondering if there was an easy way of extracting parameters from a plot list.
As a reproducible example (from the enrichplot package):
library(clusterProfiler)
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]
wpgmtfile <- system.file("extdata/wikipathways-20180810-gmt-Homo_sapiens.gmt", package="clusterProfiler")
wp2gene <- read.gmt(wpgmtfile)
wp2gene <- wp2gene %>% tidyr::separate(term, c("name","version","wpid","org"), "%")
wpid2gene <- wp2gene %>% dplyr::select(wpid, gene) #TERM2GENE
wpid2name <- wp2gene %>% dplyr::select(wpid, name) #TERM2NAME
ewp2 <- GSEA(geneList, TERM2GENE = wpid2gene, TERM2NAME = wpid2name, verbose=FALSE)
gseaplot2(ewp2, geneSetID=1, subplots=1:2)
And this is how I generate the plot list (probably there is a much more elegant way):
plot_list = list()
for(i in 1:3) {
fig_i = gseaplot2(ewp2,
geneSetID=i,
subplots=1:2)
plot_list[[i]] = fig_i
}
ggarrange(plotlist=plot_list)

Make multiple histograms at once and save them

I would like to make histogram of columns 5-34 of my data set and save them for reference. This is what I have and the error 'x must be numeric' is what keeps coming up. All of these columns have numeric data.
[data screenshot][1]
dput(longbroca)
histograms = c()
GBhistograms = c()
RThistograms = c()
for (i in 5:34){
hist(longbroca)
hist(GBlongbroca)
hist(RTlongbroca)
histograms = c(histograms, hist(longbroca[,5:34]))
GBhistograms = c(GBhistograms, hist(GBlongbroca[,5:34]))
RThistograms = c(RThistograms, hist(RTlongbroca[,5:34]))
}
#reproducible
fakerow1 <- c(100,80,60,40,20)
fakerow2 <- c(100,80,60,40,20)
fakedata = rbind(fakerow1,fakerow2)
colnames(fakedata) = c('ant1','ant2','ant3','ant4','ant5')
You cannot plot all of the columns with a single hist() function. That is why you are getting the error message. You are plotting histograms and saving the output list from the histogram. Your code does not save any histograms, only the data for producing them. If you actually want to save the plotted histograms, you need to plot them to a device (e.g. pdf).
We can use the iris dataset which comes with R (data(iris)) as some example data. The first 4 columns are numeric. If you just want the data for the histograms from the iris data set (columns 1 through 4):
# R will plot all four but you will only see the last one.
histograms <- lapply(iris[, 1:4], hist)
The variable histograms is a list that contains 6 elements. These are documented on the manual page for the function (?hist).
# To plot one of the histograms with a title and x-axis label:
lbl <- names(histograms)
plot(histograms[[1]], main=lbl[1], xlab=lbl[1])
# To plot them all
pdf("histograms.pdf")
lapply(1:4, function(x) plot(histograms[[x]], main=lbl[x], xlab=lbl[x]))
dev.off()
The file "histograms.pdf" will have all four histograms, one per page.

Highlight Subset Cells From Heatmap By Row/Col Index

I'm trying to visually inspect, and extract, subsets of large heatmaps. For example, I'd like to roughly extract the row/col indices for clusters like the one I circled below:
Following the advice from here, I hope to achieve this by creating rectangles around subsets of cells by index and repeat until I've highlighted areas close enough to what I want.
Using some simpler data, I tried this:
library(gplots)
set.seed(100)
# Input data 4x5 matrix
nx <- 5
ny <- 4
dat <- matrix(runif(20, 1, 10), nrow=ny, ncol=nx)
# Get hierarchically clustered heatmap matrix
hm <- heatmap.2(dat, main="Test HM", key=T, trace="none")
hmat <- dat[rev(hm$rowInd), hm$colInd]
# Logical matrix with the same dimensions as our data
# indicating which cells I want to subset
selection <- matrix(rep(F,20), nrow=4)
# For example: the third row
selection[3,] <- T
#selection <- dat>7 # Typical subsets like this don't work either
# Function for making selection rectangles around selection cells
makeRects <- function(cells){
coords = expand.grid(1:nx,1:ny)[cells,]
xl=coords[,1]-0.49
yb=coords[,2]-0.49
xr=coords[,1]+0.49
yt=coords[,2]+0.49
rect(xl,yb,xr,yt,border="black",lwd=3)
}
# Re-make heatmap with rectangles based on the selection
# Use the already computed heatmap matrix and don't recluster
heatmap.2(hmat, main="Heatmap - Select 3rd Row", key=T, trace="none",
dendrogram="none", Rowv=F, Colv=F,
add.expr={makeRects(selection)})
This does not work. Here is the result. Instead of the third row being highlighted, we see a strange pattern:
It must have to do with this line:
coords = expand.grid(1:nx,1:ny)[cells,]
# with parameters filled...
coords = expand.grid(1:5,1:4)[selection,]
Can anyone explain what's going on here? I'm not sure why my subset isn't working even though it is similar to the one in the other question.
Very close. I think you made a typo in the makeRects() function. In my hands, it works with a few changes.
# Function for making selection rectangles around selection cells
makeRects <- function(cells){
coords = expand.grid(ny:1, 1:nx)[cells,]
xl=coords[,2]-0.49
yb=coords[,1]-0.49
xr=coords[,2]+0.49
yt=coords[,1]+0.49
rect(xl,yb,xr,yt,border="black",lwd=3)
}
# Re-make heatmap with rectangles based on the selection
# Use the already computed heatmap matrix and don't recluster
heatmap.2(hmat, main="Heatmap - Select 3rd Row", key=T, trace="none",
dendrogram="none", Rowv=F, Colv=F,
add.expr={makeRects(selection)})

Trying to determine why my heatmap made using heatmap.2 and using breaks in R is not symmetrical

I am trying to cluster a protein dna interaction dataset, and draw a heatmap using heatmap.2 from the R package gplots. My matrix is symmetrical.
Here is a copy of the data-set I am using after it is run through pearson:DataSet
Here is the complete process that I am following to generate these graphs: Generate a distance matrix using some correlation in my case pearson, then take that matrix and pass it to R and run the following code on it:
library(RColorBrewer);
library(gplots);
library(MASS);
args <- commandArgs(TRUE);
matrix_a <- read.table(args[1], sep='\t', header=T, row.names=1);
mtscaled <- as.matrix(scale(matrix_a))
# location <- args[2];
# setwd(args[2]);
pdf("result.pdf", pointsize = 15, width = 18, height = 18)
mycol <- c("blue","white","red")
my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7))
#colors <- colorpanel(75,"midnightblue","mediumseagreen","yellow")
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks)
dev.off()
The issue I am having is once I use breaks to help me control the color separation the heatmap no longer looks symmetrical.
Here is the heatmap before I use breaks, as you can see the heatmap looks symmetrical:
Here is the heatmap when breaks are used:
I have played with the cutoff's for the sequences to make sure for instance one sequence does not end exactly where the other begins, but I am not able to solve this problem. I would like to use the breaks to help bring out the clusters more.
Here is an example of what it should look like, this image was made using cluster maker:
I don't expect it to look identical to that, but I would like it if my heatmap is more symmetrical and I had better definition in terms of the clusters. The image was created using the same data.
After some investigating I noticed was that after running my matrix through heatmap, or heatmap.2 the values were changing, for example the interaction taken from the provided data set of
Pacdh-2
and
pegg-2
gave a value of 0.0250313 before the matrix was sent to heatmap.
After that I looked at the matrix values using result$carpet and the values were then
-0.224333135
-1.09805379
for the two interactions
So then I decided to reorder the original matrix based on the dendrogram from the clustered matrix so that I was sure that the values would be the same. I used the following stack overflow question for help:
Order of rows in heatmap?
Here is the code used for that:
rowInd <- rev(order.dendrogram(result$rowDendrogram))
colInd <- rowInd
data_ordered <- matrix_a[rowInd, colInd]
I then used another program "matrix2png" to draw the heatmap:
I still have to play around with the colors but at least now the heatmap is symmetrical and clustered.
Looking into it even more the issue seems to be that I was running scale(matrix_a) when I change my code to just be mtscaled <- as.matrix(matrix_a) the result now looks symmetrical.
I'm certainly not the person to attempt reproducing and testing this from that strange data object without code that would read it properly, but here's an idea:
..., col=bluered(20)[4:20], ...
Here's another though which should return the full rand of red which tha above strategy would not:
shift.BR<- colorRamp(c("blue","white", "red"), bias=0.5 )((1:16)/16)
heatmap.2( ...., col=rgb(shift.BR, maxColorValue=255), .... )
Or you can use this vector:
> rgb(shift.BR, maxColorValue=255)
[1] "#1616FF" "#2D2DFF" "#4343FF" "#5A5AFF" "#7070FF" "#8787FF" "#9D9DFF" "#B4B4FF" "#CACAFF" "#E1E1FF" "#F7F7FF"
[12] "#FFD9D9" "#FFA3A3" "#FF6C6C" "#FF3636" "#FF0000"
There was a somewhat similar question (also today) that was asking for a blue to red solution for a set of values from -1 to 3 with white at the center. This it the code and output for that question:
test <- seq(-1,3, len=20)
shift.BR <- colorRamp(c("blue","white", "red"), bias=2)((1:20)/20)
tpal <- rgb(shift.BR, maxColorValue=255)
barplot(test,col = tpal)
(But that would seem to be the wrong direction for the bias in your situation.)

Resources