I am a little confused about row scaling in pheatmap. This is my data frame
gene s1 s2 s3
1 -3.83 -8.17 -8.59
2 0.33 -4.51 -7.27
3 0.15 -5.26 -6.2
4 -0.08 -6.13 -5.95
5 -1.15 -4.82 -5.75
6 -0.99 -4.11 -4.85
7 0.42 -4.18 -4.54
8 -0.32 -3.43 -4.4
9 -0.72 -3.37 -4.39
I need to extract those values of the data frame after pheatmap generates the graph with row z score
library(pheatmap)
my_colors <- c(min(d),seq(-4,4,by=0.01),max(d))
my_palette <- c("green",colorRampPalette(colors = c("green", "red"))
(n = length(my_colors)-2), "red")
pheatmap(as.matrix(d),
scale = "row",
cluster_cols=FALSE,
cluster_rows = FALSE,
treeheight_row=0,
show_rownames=FALSE,
main = "test.txt",
color = my_palette,
breaks = my_colors)
How can I get a mew matrix which pheatmap uses to make the heatmap?
Related
I want to achieve the same end goal as this question: Create a single heatmap based on two symmetric matrices in R but to take it further than the answer currently provided.
The answer given does not explain how one would go about having different colours for the upper and lower sections of the matrix and different scales?
Here is the example dataset:
library(Matrix)
set.seed(123)
s1<-forceSymmetric(matrix(round(rnorm(25),2),5))
colnames(s1)<-LETTERS[1:5]
rownames(s1)<-LETTERS[6:10]
diag(s1)<-1
s2<-forceSymmetric(matrix(round(rbinom(25,25,0.3),2),5))
colnames(s2)<-LETTERS[1:5]
rownames(s2)<-LETTERS[6:10]
diag(s2)<-1
s1
# 5 x 5 Matrix of class "dsyMatrix"
# A B C D E
# F 1.00 1.72 1.22 1.79 -1.07
# G 1.72 1.00 0.36 0.50 -0.22
# H 1.22 0.36 1.00 -1.97 -1.03
# I 1.79 0.50 -1.97 1.00 -0.73
# J -1.07 -0.22 -1.03 -0.73 1.00
s2
# 5 x 5 Matrix of class "dsyMatrix"
# A B C D E
# F 1 6 8 7 9
# G 6 1 5 9 8
# H 8 5 1 10 9
# I 7 9 10 1 1
# J 9 8 9 1 1
The suggested answer suggests to add the two matrices together as so:
#Get upper diagonal
reverse = s1[,ncol(s1):1]
diag(reverse) = 0
reverse[lower.tri(reverse, diag = FALSE)] <- 0
upper = reverse[,ncol(reverse):1]
# Get lower diagonal
reverse1 = s2[,ncol(s2):1]
diag(reverse1) = 0
reverse1[upper.tri(reverse1, diag = FALSE)] <- 0
upper1 = reverse1[,ncol(reverse1):1]
# Add them together
merged = as.matrix(upper+upper1)
merged
A B C D E
F 1.00 1.72 1.22 1.79 0
G 1.72 1.00 0.36 0.00 8
H 1.22 0.36 0.00 10.00 9
I 1.79 0.00 10.00 1.00 1
J 0.00 8.00 9.00 1.00 1
It then suggests using heatmap(merged) - however, how would you go about having different colours and scales for the upper and lower part of the matrix?
I am happy with using any package including ggplot2 to make this work.
Thanks in advance!
You can subset the relevant parts of the matrix in the data argument of a layer, and use {ggnewscale} to assign different fill scales to different layers. The trick is to declare a fill scale before adding new_scale_fill(), otherwise the order of operations goes wrong (which usually doesn't matter a lot, but here they do).
You can then tweak every individual scale. In the example below I just tweaked the palettes, but you can also adjust limits, breaks, labels etc.
# Assuming code from question has been executed and we have a 'merged' in memory
library(ggplot2)
library(ggnewscale)
# Wide matrix to long dataframe
# Later, we'll be relying on the notion that the dimnames have been
# converted to factor variables to separate out the upper from the lower
# matrix.
df <- reshape2::melt(merged)
ggplot(df, aes(Var1, Var2)) +
# The first layer, with its own fill scale
geom_raster(
data = ~ subset(.x, as.numeric(Var1) > as.numeric(Var2)),
aes(fill = value)
) +
scale_fill_distiller(palette = "Blues") +
# Declare new fill scale for the second layer
new_scale_fill() +
geom_raster(
data = ~ subset(.x, as.numeric(Var1) < as.numeric(Var2)),
aes(fill = value)
) +
scale_fill_distiller(palette = "Reds") +
# I'm not sure what to do with the diagonal. Make it grey?
new_scale_fill() +
geom_raster(
data = ~ subset(.x, as.numeric(Var1) == as.numeric(Var2)),
aes(fill = value)
) +
scale_fill_distiller(palette = "Greys", guide = "none")
In my opinion #teunbrand's answer is what you're looking for, but another potential option is to use the ComplexHeatmap package, e.g. based on one of the examples in the docs:
library(Matrix)
set.seed(123)
s1<-forceSymmetric(matrix(round(rnorm(25),2),5))
colnames(s1)<-LETTERS[1:5]
rownames(s1)<-LETTERS[6:10]
diag(s1)<-1
s2<-forceSymmetric(matrix(round(rbinom(25,25,0.3),2),5))
colnames(s2)<-LETTERS[1:5]
rownames(s2)<-LETTERS[6:10]
diag(s2)<-1
#Get upper diagonal
reverse = s1[,ncol(s1):1]
diag(reverse) = 0
reverse[lower.tri(reverse, diag = FALSE)] <- 0
upper = reverse[,ncol(reverse):1]
# Get lower diagonal
reverse1 = s2[,ncol(s2):1]
diag(reverse1) = 0
reverse1[upper.tri(reverse1, diag = FALSE)] <- 0
upper1 = reverse1[,ncol(reverse1):1]
# Add them together
m = as.matrix(upper+upper1)
m
#> A B C D E
#> F 1.00 1.72 1.22 1.79 0
#> G 1.72 1.00 0.36 0.00 8
#> H 1.22 0.36 0.00 10.00 9
#> I 1.79 0.00 10.00 1.00 1
#> J 0.00 8.00 9.00 1.00 1
library(ComplexHeatmap)
#> Loading required package: grid
#> ========================================
#> ComplexHeatmap version 2.8.0
#> Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
#> Github page: https://github.com/jokergoo/ComplexHeatmap
#> Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
#>
#> If you use it in published research, please cite:
#> Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional
#> genomic data. Bioinformatics 2016.
#>
#> The new InteractiveComplexHeatmap package can directly export static
#> complex heatmaps into an interactive Shiny app with zero effort. Have a try!
#>
#> This message can be suppressed by:
#> suppressPackageStartupMessages(library(ComplexHeatmap))
#> ========================================
library(circlize)
#> ========================================
#> circlize version 0.4.13
#> CRAN page: https://cran.r-project.org/package=circlize
#> Github page: https://github.com/jokergoo/circlize
#> Documentation: https://jokergoo.github.io/circlize_book/book/
#>
#> If you use it in published research, please cite:
#> Gu, Z. circlize implements and enhances circular visualization
#> in R. Bioinformatics 2014.
#>
#> This message can be suppressed by:
#> suppressPackageStartupMessages(library(circlize))
#> ========================================
col1 = colorRamp2(c(-1, 10), c("white", "red"))
col2 = colorRamp2(c(-1, 10), c("white", "blue3"))
# here reordering the symmetric matrix is necessary
od = hclust(dist(m))$order
m = m[od, od]
ht = Heatmap(m, rect_gp = gpar(type = "none"), show_heatmap_legend = FALSE,
cluster_rows = FALSE, cluster_columns = FALSE,
layer_fun = function(j, i, x, y, w, h, fill) {
l = i > j
grid.rect(x[l], y[l], w[l], h[l],
gp = gpar(fill = col1(pindex(m, i[l], j[l])), col = NA))
l = i < j
grid.rect(x[l], y[l], w[l], h[l],
gp = gpar(fill = col2(pindex(m, i[l], j[l])), col = NA))
})
draw(ht, heatmap_legend_list = list(
Legend(title = "Group_A", col_fun = col1),
Legend(title = "Group_B", col_fun = col2)
))
Created on 2022-03-07 by the reprex package (v2.0.1)
I loaded in a dataset to R that looks like this in the header:
date a b c
1 2017-01-01 -0.98 -1.35 -2.81
2 2017-02-01 -1.63 -2.18 -1.79
3 2017-03-01 -0.92 0.80 -3.33
4 2017-04-01 0.44 0.48 -2.11
5 2017-05-01 1.46 -3.11 -3.67
6 2017-06-01 -0.32 2.46 1.45
The full dataset includes 4 years of data with a total of 48 obs (from Jan-2017 to Dec-2020).
After loading in the dataset I change the format of the date variable to YYYYMM by using the code:
df$date <- format(as.Date(df$date), "%Y%m")
This results in the dates looking like this:
date a b c
1 201701 -0.98 -1.35 -2.81
2 201702 -1.63 -2.18 -1.79
3 201703 -0.92 0.80 -3.33
4 201704 0.44 0.48 -2.11
5 201705 1.46 -3.11 -3.67
6 201706 -0.32 2.46 1.45
After doing this I plot the data with this code:
plot(df$a, type="l", col="darkgreen", lwd=1, xlab="date", ylab="$", xaxs="i")
lines(df$b, col="red", lwd=1, xaxs="i")
lines(df$c, col="blue", lwd=1, xaxs="i")
legend("bottomleft", inset= 0.04, legend=c("a", "b", "c"),
col=c("darkgreen", "red", "blue"), lwd=3, cex=0.8)
Which results in the plot below:
However, the values of the x-axis do not show me the years so that I can measure the performance of a, b and c over time. How do I replace the values of the x-axis with the years in my dataset. And also, how do I make sure that only the years will be included on my x-axis and not my months as well?
The answers to this question I've seen so far has been to format the date etc. This is done and seems to work fine. Can anyone please tell me what to do about this issue?
Here are two solutions base R and ggplot2.
1. Base R
To plot multiple lines use either matplot or matlines.
colrs <- c("darkgreen", "red", "blue")
matplot(df[[1]], df[-1],
type = "l", lty = "solid", lwd = 1,
col = colrs,
xlab = "date", ylab = "$", xaxs = "i")
legend("bottomleft", inset = 0.04, legend = c("a", "b", "c"),
col = colrs, lwd = 3, cex = 0.8)
2. ggplot2
The data is in the wide format and one line per column vector is to be plotted against an x axis vector, in this case the date vector. This sort of problem is usually a data reformating problem. See reshaping data.frame from wide to long format.
library(ggplot2)
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-date) %>%
ggplot(aes(date, value, colour = name)) +
geom_line() +
scale_colour_manual(breaks = c("a", "b", "c"), values = c("darkgreen", "red", "blue")) +
scale_x_date(date_labels = "%Y-%m") +
theme_classic() +
theme(axis.text.x = element_text(angle = 60, vjust = 0.5, hjust = 0.5))
Data
df <- read.table(text = "
date a b c
1 2017-01-01 -0.98 -1.35 -2.81
2 2017-02-01 -1.63 -2.18 -1.79
3 2017-03-01 -0.92 0.80 -3.33
4 2017-04-01 0.44 0.48 -2.11
5 2017-05-01 1.46 -3.11 -3.67
6 2017-06-01 -0.32 2.46 1.45
", header = TRUE)
df$date <- as.Date(df$date)
You replace the values of the x axis with the years in your dataset by adding xaxt="n" to your plot() command (which removes the current x axis) and calling
axis(1,at=1:nrow(df),labels=format(as.Date(df$date),"%Y"))
afterwards (which creates the desired x axis).
I have a data frame like this:
gene s1 s2 s3
1 -3.83 -8.17 -8.59
2 0.33 -4.51 -7.27
3 0.15 -5.26 -6.2
4 -0.08 -6.13 -5.95
5 -1.15 -4.82 -5.75
6 -0.99 -4.11 -4.85
7 0.42 -4.18 -4.54
8 -0.32 -3.43 -4.4
9 -0.72 -3.37 -4.39
I want to make a heatmap using pheatmap where if anything is below -4 it should be green and anything over +4 should be red and everything in between should red/green shades. I also don't want to scale my data and no clustering. I have this code so far in R:
d <- read.table("test.txt", header = TRUE, sep = "\t", row.names = 1, quote = "")
pheatmap(as.matrix(d), # matrix
scale = "none", # z score scaling applied to rows
cluster_cols=FALSE, # do not cluster columns
cluster_rows = FALSE,
treeheight_row=0, # do not show row dendrogram
show_rownames=FALSE, # do not show row names i.e gene names
main = "test.txt",
color = colorRampPalette(c("#0016DB","#FFFFFF","#FFFF00"))(50),
)
How can I plot this with the color scheme I mentioned above.
Thanks
d <-read.table(text="gene s1 s2 s3
1 -3.83 -8.17 -8.59
2 0.33 -4.51 -7.27
3 0.15 -5.26 -6.20
4 -0.08 -6.13 -5.95
5 -1.15 -4.82 -5.75
6 -0.99 -4.11 -4.85
7 0.42 -4.18 -4.54
8 -0.32 -3.43 -4.40
9 -0.72 -3.37 -4.39", header=T)
library(pheatmap)
my_colors <- c(min(d),seq(-4,4,by=0.01),max(d))
my_palette <- c("green",colorRampPalette(colors = c("green", "red"))
(n = length(my_colors)-2), "red")
pheatmap(as.matrix(d),
scale = "none",
cluster_cols=FALSE,
cluster_rows = FALSE,
treeheight_row=0,
show_rownames=FALSE,
main = "test.txt",
color = my_palette,
breaks = my_colors)
Created on 2019-05-29 by the reprex package (v0.3.0)
I have the folloing R code which visualizes a multiline graph where each line corresponds to a category of data. In the code the categories are given my the variable nk:
My dataset looks like this :
k precision recall
0.25 0.02 1.011
0.25 0.04 1.011
0.5 0.15 0.941
0.5 0.17 0.931
0.5 0.18 0.921
0.5 0.19 0.911
1.0 0.36 0.831
1.0 0.39 0.811
1.0 0.41 0.801
The problem is that it only visualizes the lines for k = 1.0 and not the lines for k = 0.5 and 0.25
My question is ? How can i use a nk variable which is not
an integer in order to visualize lines for k = 0.5 or 0.25?
dtf$k <- as.numeric(dtf$k)
nk <- max(dtf$k)
xrange <- range(dtf$precision)
yrange <- range(dtf$recall)
plot(xrange, yrange,
type="n",
xlab="Precision",
ylab="Recall"
)
colors <- rainbow(nk)
linetype <- c(1:nk)
plotchar <- seq(18, 18+nk, 1)
for (i in 1:nk) {
Ki <- subset(dtf, k==i)
lines(Ki$precision, Ki$recall,
type="b",
lwd=2,
lty=linetype[i],
col=colors[i],
pch=plotchar[i]
)
}
title("Methods varying K", "Precision Recall")
legend(xrange[1], yrange[2],
1:nk,
cex=1.0,
col=colors,
inset=c(-0.2,0),
pch=plotchar,
lty=linetype,
title="k"
)
Data
dtf <- read.table(header = TRUE, text = 'k precision recall
0.25 0.02 1.011
0.25 0.04 1.011
0.5 0.15 0.941
0.5 0.17 0.931
0.5 0.18 0.921
0.5 0.19 0.911
1.0 0.36 0.831
1.0 0.39 0.811
1.0 0.41 0.801')
dtf$k <- factor(dtf$k)
ggplot2 solution
require(ggplot2)
ggplot(dtf, aes(x = precision, y = recall, col = k)) +
geom_line()
base solution
plot(recall ~ precision, data = dtf, type = 'n')
cols = c('red', 'blue', 'green')
levs <- levels(df$k)
for(i in seq_along(levs)){
take <- df[df$k == levs[i], ]
lines(take$precision, take$recall, col = cols[i])
}
copying the example below I could create a map that shows the density of points in the map, but I would like to see a density distribution of the quantitative variable "dist", on table W, what should I do to have that?
As this example
Density2d Plot using another variable for the fill (similar to geom_tile)?
but with stat_density2d instead of stat_summary2d.
W
lat lon dist
1 -3.844117 -32.44028 0.23
2 -3.841167 -32.39318 0.86
3 -3.808283 -32.38135 0.13
4 -3.815583 -32.39295 0.15
5 -3.844267 -32.44015 0.16
6 -3.845600 -32.44220 0.20
7 -3.866700 -32.45778 0.67
8 -3.833467 -32.39752 0.22
9 -3.871400 -32.46202 0.18
10 -3.833467 -32.39752 0.22
11 -3.833467 -32.39752 0.60
12 -3.833467 -32.39752 0.14
13 -3.833467 -32.39752 0.22
14 -3.833467 -32.39752 0.14
15 -3.833467 -32.39752 0.16
16 -3.872283 -32.42713 0.06
17 -3.849217 -32.39095 0.10
18 -3.833467 -32.39752 0.57
library(ggmap)
center <- c(-3.858331, -32.423985)
fernando.map <- get_map(location = c(center[2], center[1]), zoom = 13, color = "bw")
ggmap(fernando.map, extent = "normal", maprange=FALSE) %+% W + aes(x = lon, y = lat) +
#geom_density2d() +
stat_density2d(aes(fill = ..level.., alpha = ..level.., colour=dist),
size = 0.01, bins = 16, geom = 'polygon')