inner labelling for heatmap, in R ggplot - r

I am trying to add a number label on each cell of a heatmap. Because it also needs marginal barcharts I have tried two packages. iheatmapr and ComplexHeatmap.
(1st try) iheatmapr makes it easy to add to add bars as below, but I couldnt see how to add labels inside the heatmap on individual cells.
library(tidyverse)
library(iheatmapr)
library(RColorBrewer)
in_out <- data.frame(
'Economic' = c(2,1,1,3,4),
'Education' = c(0,3,0,1,1),
'Health' = c(1,0,1,2,0),
'Social' = c(2,5,0,3,1) )
rownames(in_out) <- c('Habitat', 'Resource', 'Combined', 'Protected', 'Livelihood')
GreenLong <- colorRampPalette(brewer.pal(9, 'Greens'))(12)
lowGreens <- GreenLong[0:5]
in_out_matrix <- as.matrix(in_out)
main_heatmap(in_out_matrix, colors = lowGreens)
in_out_plot <- iheatmap(in_out_matrix,
colors=lowGreens) %>%
add_col_labels() %>%
add_row_labels() %>%
add_col_barplot(y = colSums(bcio)/total) %>%
add_row_barplot(x = rowSums(bcio)/total)
in_out_plot
Then used: save_iheatmap(in_out_plot, "iheatmapr_test.png")
Because I couldnt use ggsave(device = ragg::agg_png etc) with iheatmapr object.
Also, the iheatmapr object's apparent incompatibility (maybe I am wrong) with ggsave() is a problem for me because I normally use ragg package to export image AGG to preserve font sizes. I am suspecting some other heatmap packages make custom objects that maybe incompatible with patchwork and ggsave.
ggsave("png/iheatmapr_test.png", plot = in_out_plot,
device = ragg::agg_png, dpi = 72,
units="in", width=3.453, height=2.5,
scaling = 0.45)
(2nd try) ComplexHeatmap makes it easy to label individual number "cells" inside a heatmap, and also offers marginal bars among its "Annotations", and I have tried it, but its colour palette system (which uses integers to refer to a set of colours) doesnt suit my RGB vector colour gradient, and overall it is a sophisticated package clearly designed to make graphics more advanced than what I am doing.
I am aiming for style as shown in screenshot example below, which was made in Excel.
Please can anyone suggest a more suitable R package for a simple heatmap like this with marginal bars, and number labels inside?

Instead of relying on packages which offer out-of-the-box solutions one option to achieve your desired result would be to create your plot from scratch using ggplot2 and patchwork which gives you much more control to style your plot, to add labels and so on.
Note: The issue with iheatmapr is that it returns a plotly object, not a ggplot. That's why you can't use ggsave.
library(tidyverse)
library(patchwork)
in_out <- data.frame(
'Economic' = c(1,1,1,5,4),
'Education' = c(0,0,0,1,1),
'Health' = c(1,0,1,0,0),
'Social' = c(1,1,0,3,1) )
rownames(in_out) <- c('Habitat', 'Resource', 'Combined', 'Protected', 'Livelihood')
in_out_long <- in_out %>%
mutate(y = rownames(.)) %>%
pivot_longer(-y, names_to = "x")
# Summarise data for marginal plots
yin <- in_out_long %>%
group_by(y) %>%
summarise(value = sum(value)) %>%
mutate(value = value / sum(value))
xin <- in_out_long %>%
group_by(x) %>%
summarise(value = sum(value)) %>%
mutate(value = value / sum(value))
# Heatmap
ph <- ggplot(in_out_long, aes(x, y, fill = value)) +
geom_tile() +
geom_text(aes(label = value), size = 8 / .pt) +
scale_fill_gradient(low = "#F7FCF5", high = "#00441B") +
theme(legend.position = "bottom") +
labs(x = NULL, y = NULL, fill = NULL)
# Marginal plots
py <- ggplot(yin, aes(value, y)) +
geom_col(width = .75) +
geom_text(aes(label = scales::percent(value)), hjust = -.1, size = 8 / .pt) +
scale_x_continuous(expand = expansion(mult = c(.0, .25))) +
theme_void()
px <- ggplot(xin, aes(x, value)) +
geom_col(width = .75) +
geom_text(aes(label = scales::percent(value)), vjust = -.5, size = 8 / .pt) +
scale_y_continuous(expand = expansion(mult = c(.0, .25))) +
theme_void()
# Glue plots together
px + plot_spacer() + ph + py + plot_layout(ncol = 2, widths = c(2, 1), heights = c(1, 2))

Related

What is a good output format to combind multiple similar plots in r?

Let's say I have 100 variables column and 1 label column. The label is categorical, for example, 1,2,and 3. Now for each variable I would like to generate a plot for each category(e.g. boxplot). Is there a good format to show all plot? By using facet_grid, it seems that we can only put 16 plots together, otherwise the plot will be too small.
Example code:
label = sample.int(3, 50, replace = TRUE)
var = as.matrix(matrix(rnorm(5000),50,100))
data = as.data.frame(cbind(var,label))
Ultimately, if you want a box for each of 3 groups for each column of your data, then you would need 300 boxes in total. This seems like a bad idea from a data visualisation perspective. A plot should allow your data to tell a story, but the only story a plot like that could show is "I can make a very crowded plot". In terms of getting it to look nice, you would need a lot of room to plot this, so if it were on a large poster it might work.
To fit it all in to a single page with minimal room taken up by axis annotations, you could do something like:
library(tidyverse)
pivot_longer(data, -label) %>%
mutate(name = as.numeric(sub('V', '', name))) %>%
mutate(row = (name - 1) %/% 20,
label = factor(label)) %>%
ggplot(aes(factor(name), value, fill = label)) +
geom_boxplot() +
facet_wrap(row~., nrow = 5, scales = 'free_x') +
labs(x = "data frame column") +
theme(strip.background = element_blank(),
strip.text = element_blank())
But this is still far from ideal.
An alternative, depending on the nature of your data columns, would be to plot the column number as a continuous variable. That way, you can represent the distribution in each column via its density, allowing for a heatmap-type plot which might actually convey your data's story better:
pivot_longer(data, -label) %>%
mutate(x = as.numeric(sub('V', '', name))) %>%
mutate(label = factor(label)) %>%
group_by(x, label) %>%
summarize(y = density(value, from = -6, to = 6)$x,
z = density(value, from = -6, to = 6)$y) %>%
ggplot(aes(x, y, fill = label, alpha = z)) +
geom_raster() +
coord_cartesian(expand = FALSE) +
labs(x = 'data frame column', y = 'value', alpha = 'density') +
facet_grid(label~.) +
guides(fill = 'none') +
theme_bw()

How do I create a non-stacked barplot with data labels using ggplot2 in R?

I am processing this dataset (bottom of the page) in R for a project.
First I load in the data:
count_data <- read.table(file = "../data/GSE156388_read_counts.tsv", header = T, sep = "",
row.names = 1)
I then melt the data using reshape2:
melted_count_data <- melt(count_data)
Then I create a factor for colouring graphs by group:
color_groups <- factor(melted_count_data$variable, labels = rep(c("siTFIP11", "siGl3"), each = 3))
Now we get to the barplot I'm trying to make:
ggplot(melted_count_data, aes(x = variable, y = value / 1e6, fill = color_groups)) +
geom_bar(stat = "identity") + labs(title = "Read counts", y = "Sequencing depth (millions of reads)")
The problem is that this creates a barplot with a bunch of stripes, leading me to believe it is trying to stack a ton of bars on top of each other instead of just creating one solid block.
I also wanted to add data labels to the plot:
+ geom_text(label = value / 1e6)
but this seemed to just put a bunch of values on top of each other.
For the stacked bars problem I tried to use y = sum(values) but this just made all the bars the same height. I also tried using y = colSums(values) but this obviously didn't work because it needs "an array of at least two dimensions".
I tried figuring it out using the unmelted data but to no avail.
I just kind of gave up on the labels since I wasn't even able to fix the bars problem.
EDIT:
I found a thread suggesting this:
ggplot(melted_count_data, aes(x = variable, y = value / 1e6, color = color_groups)) +
geom_bar(stat = "identity") + labs(title = "Read counts", y = "Sequencing depth (millions of reads)")
Changing fill to color. This fixes the white lines but results in some (fewer) black lines. Looking at this new chart leads me to believe it might actually be pasting a bunch of charts on top of each other?
You could do:
library(tidyverse)
url <- paste0( "https://www.ncbi.nlm.nih.gov/geo/download/",
"?acc=GSE156388&format=file&file=GSE156388%5",
"Fread%5Fcounts%2Etsv%2Egz")
tmpfile <- tempfile()
download.file(url, tmpfile)
count_data <- readr::read_tsv(gzfile(tmpfile),
show_col_types = FALSE)
count_data %>%
pivot_longer(-1) %>%
mutate(color_groups = factor(name,
labels = rep(c("siTFIP11", "siGl3"), each = 3))) %>%
group_by(name) %>%
summarise(value = sum(value)/1e6, color_groups = first(color_groups)) %>%
ggplot(aes(name, value, fill = color_groups)) +
geom_col() +
geom_text(aes(label = round(value, 2)), nudge_y = 0.5) +
labs(title = "Read counts", x = "", fill = "Type",
y = "Sequencing depth (millions of reads)") +
scale_fill_manual(values = c("gold", "deepskyblue3")) +
theme_minimal()
Created on 2022-03-21 by the reprex package (v2.0.1)

Normal curves on multiple histograms on a same plot

My example dataframe:
sample1 <- seq(100,157, length.out = 50)
sample2 <- seq(113, 167, length.out = 50)
sample3 <- seq(95,160, length.out = 50)
sample4 <-seq(88, 110, length.out = 50)
df <- as.data.frame(cbind(sample1, sample2, sample3, sample4))
I have managed to create histograms for these four variables, which share the same y-axis. Now I need an overlay normal curve. Based on previous posts, I've managed a density curve, but this is not what I want. This comes close, but I'd like a smooth line...
This is my current code for plotting:
df <- as.data.table(df)
new.df<-melt(df,id.vars="sample")
names(new.df)=c("sample","type","value")
cdat <- ddply(new.df, "type", summarise, value.mean=mean(value))
ggplot(data = new.df,aes(x=value)) +
geom_histogram(aes(x = value), bins = 15, colour = "black", fill = "gray") +
facet_wrap(~ type) + geom_density(aes(x = value),alpha=.2, fill="#FF6666") +
geom_vline(data=cdat, aes(xintercept=value.mean),
linetype="dashed", size=1, colour="black") +
theme_classic() +
theme(text = element_text(size = 15), element_line(size = 0.5),aspect.ratio = 0.75 )
And I found the following code, which I hoped would do the trick, but this gives me nothing:
stat_function(fun = dnorm, args = list(mean = mean(df$value), sd = sd(df$value)))
Unfortunately, stat_function doesn't play nicely with facets: it overlays the same function on each facet without taking account of the faceting variable.
One of the most common reasons I see for people posting ggplot questions on Stack Overflow is that they get lost while trying to coerce ggplot to do too much of their data manipulation. Functions like geom_smooth and geom_function are useful helpers for common tasks, but if you want to do something that is complex or uncommon, it is best to produce the data you want to plot, then plot it.
In fact, the main author of ggplot2 recommends this approach for a very similar problem to yours in this thread, saying:
I think you are better off generating the data outside of ggplot2 and then plotting it. See https://speakerdeck.com/jennybc/row-oriented-workflows-in-r-with-the-tidyverse to get started.
Hadley Wickham, 26 April 2018
So here's one way of doing that using tidyverse. You create a data frame of the dnorm for each sample and plot these using plain old geom_line.
Note that your histograms are counts, so you either need to change them to density, or multiply the dnorm output by the number of observations * the binwidth, otherwise you will just get an apparently "flat" line on the x axis, since the dnorm values will all be so small in relation to the counts:
library(plyr)
library(dplyr)
library(tidyr)
library(ggplot2)
dfn <- df %>%
pivot_longer(everything()) %>%
ddply("name", function(x) {
xvar <- seq(min(x$value), max(x$value), length.out = 100)
data.frame(value = xvar,
y = 5 * nrow(x) * dnorm(xvar, mean(x$value), sd(x$value)))
})
df %>%
pivot_longer(everything()) %>%
group_by(name) %>%
mutate(mean = mean(value), sd = sd(value)) %>%
ggplot(aes(value)) +
geom_histogram(aes(x = value), binwidth = 5,
colour = "black", fill = "gray") +
facet_wrap(~ name) +
geom_vline(aes(xintercept = mean),
linetype = "dashed", size=1, colour="black") +
geom_line(data = dfn, aes(y = y)) +
theme_classic() +
theme(text = element_text(size = 15), element_line(size = 0.5),
aspect.ratio = 0.75 )
Created on 2020-12-07 by the reprex package (v0.3.0)

Adding a single label per group in ggplot with stat_summary and text geoms

I would like to add counts to a ggplot that uses stat_summary().
I am having an issue with the requirement that the text vector be the same length as the data.
With the examples below, you can see that what is being plotted is the same label multiple times.
The workaround to set the location on the y axis has the effect that multiple labels are stacked up. The visual effect is a bit strange (particularly when you have thousands of observations) and not sufficiently professional for my purposes. You will have to trust me on this one - the attached picture doesn't fully convey the weirdness of it.
I was wondering if someone else has worked out another way. It is for a plot in shiny that has dynamic input, so text cannot be overlaid in a hardcoded fashion.
I'm pretty sure ggplot wasn't designed for the kind of behaviour with stat_summary that I am looking for, and I may have to abandon stat_summary and create a new summary dataframe, but thought I would first check if someone else has some wizardry to offer up.
This is the plot without setting the y location:
library(dplyr)
library(ggplot2)
df_x <- data.frame("Group" = c(rep("A",1000), rep("B",2) ),
"Value" = rnorm(1002))
df_x <- df_x %>%
group_by(Group) %>%
mutate(w_count = n())
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
geom_text(aes(label = w_count)) +
coord_flip() +
theme_classic()
and this is with my hack
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
geom_text(aes(y = 1, label = w_count)) +
coord_flip() +
theme_classic()
Create a df_text that has the grouped info for your labels. Then use annotate:
library(dplyr)
library(ggplot2)
set.seed(123)
df_x <- data.frame("Group" = c(rep("A",1000), rep("B",2) ),
"Value" = rnorm(1002))
df_text <- df_x %>%
group_by(Group) %>%
summarise(avg = mean(Value),
n = n()) %>%
ungroup()
yoff <- 0.0
xoff <- -0.1
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
annotate("text",
x = 1:2 + xoff,
y = df_text$avg + yoff,
label = df_text$n) +
coord_flip() +
theme_classic()
I found another way which is a little more robust for when the plot is dynamic in its ordering and filtering, and works well for faceting. More robust, because it uses stat_summary for the text.
library(dplyr)
library(ggplot2)
df_x <- data.frame("Group" = c(rep("A",1000), rep("B",2) ),
"Value" = rnorm(1002))
counts_df <- function(y) {
return( data.frame( y = 1, label = paste0('n=', length(y)) ) )
}
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
coord_flip() +
theme_classic()
p + stat_summary(geom="text", fun.data=counts_df)

Plot ecdf and density in the same plot and zoom in to specific part

I want to plot the density and ecdf in a same plot using ggplot2. I wrote a code here
library(ggplot2)
library(reshape)
set.seed(101)
var1 = rnorm(1000, 0.5)
var2 = rnorm(100000,0.5)
combine = melt(data.frame("var1" = var1,"var2"= var2))
ggplot(data = combine) +
geom_density(aes(x = value, color = variable), alpha = 0.2)+
scale_y_continuous(name = "Density",sec.axis = sec_axis(~.*(1*(max(density(var1)$y,density(var2)$y))), name = "Ecdf")) +
ggtitle("Density and Ecdf plot ") +
theme_bw() +
theme(plot.title = element_text(size = 14, family = "Tahoma", face = "bold"),
text = element_text(size = 12, family = "Tahoma")) +
scale_fill_brewer(palette="Accent")+
stat_ecdf(aes(x = value, color = variable))
This results in (except the black rectangle)
However, the axis are not correct the left yaxis should be the density limit (0,0.4) and right y axis should be the ecdf limit (0,1). I also want both the figures to be scaled such as maximum of density i.e. 0.4 should correspond to maximum of the ecdf 1.
After this I want to zoom in to the figure especially upper right part (black rectangle, the upper 25%) as the whole plot is not needed. I need the two plots one with full extent and the other one zoomed.
Let me know how its done using ggplot2.
You can try to calculate the density and empirical cumulative distribution before plotting. Here I'm using the tidyverse. Especially purrr::map functions are helpful here.
library(tidyverse)
# density
dens <- combine %>%
as.tibble() %>%
split(.$variable) %>%
map(~density(.x$value) %>%
with(.,tibble(x=x, y=y))) %>%
bind_rows(.id = "variable")
# ecdf
df <- combine %>%
as.tibble() %>%
split(.$variable) %>%
map2(.,split(dens, dens$variable), ~ecdf(.x$value)(.y$x) %>%
tibble(x=.y$x, Ecdf=.)) %>%
bind_rows(.id = "variable") %>%
bind_cols(dens,.)
# scaling factor
SCALE <- max(df$y)
# the plot
ggplot(df,aes(x,color=variable)) +
geom_line(aes(y=y)) +
geom_line(aes(y=Ecdf*SCALE)) +
scale_y_continuous(name = "Density",sec.axis = sec_axis(trans = ~./SCALE, name = "Ecdf"))
# zooming
p + coord_cartesian(xlim = c(1.5, 5))

Resources