Excluding cells from transparency in heatmap with ggplot - r

I am trying to generate a heatmap where I can show more than one level of information on each cell. For each cell I would like to show a different color depending on its value in one variable and then overlay this with a transparency (alpha) that shades the cell according to its value for another variable.
Similar questions have been addressed here (Place 1 heatmap on another with transparency in R) a
and here (Making a heatmap in R varying both color and transparency). In both cases the suggestion is to use ggplot and overlay two geom_tiles, one with the colors one with the transparency.
I have managed to overlay two geom_tiles (see code below). However, in my case, the problem is that the shading defined by the transparency (or "alpha") geom_tile also shades some cells that should remain as white or blank according to the colors (or "fill") geom_tile. I would like these cells to remain white even after overlaying the transparency.
#Create sample dataframe
df <- data.frame("x_pos" = c("A","A","A","B","B","B","C","C","C"),
"y_pos" = c("X","Y","Z","X","Y","Z","X","Y","Z"),
"col_var"= c(1,2,NA,4,5,6,NA,8,9),
"alpha_var" = c(7,12,0,3,2,15,0,6,15))
#Convert factor columns to numeric
df$col_var<- as.numeric(df$col_var)
df$alpha_var<- as.numeric(df$alpha_var)
#Cut display variable into breaks
df$col_var_cut <- cut(df$col_var,
breaks = c(0,3,6,10),
labels = c("cat1","cat2", "cat3"))
#Plot
library(ggplot2)
ggplot(df, aes (x = x_pos, y = y_pos, fill = col_var_cut, label = col_var)) +
geom_tile () +
geom_text() +
scale_fill_manual(values=(brewer.pal(3, "RdYlBu")),na.value="white") +
geom_tile(aes(alpha = alpha_var), fill ="gray29")+
scale_alpha_continuous("alpha_var", range=c(0,0.7), trans = 'reverse')+
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
I would like cells "AZ" and "CX" in the heatmap resulting from the code above to be colored white instead of grey such that the alpha transparency doesn't apply to them. In my data, these cells have NA in the color variable (col_var) and can have a value of NA or 0 (as in the example code) in the transparency/alpha variable (alpha_var).
If this is not possible, then I would like to know whether there are other options to display both variables in a heatmap and keep the NA cells in the col_var white? I am happy to use other packages or alternative heatmap layouts such as those where the size of each cell or the thickness of its border vary according to the values the alpha_var. However, I am not sure how I could achieve this either.
Thanks in advance and my apologies for the cumbersome bits in the example code (I am still learning R and this is my first time asking questions here).

You were not far. See below for a possible solution. The first plot shows an implementation of adding transparency within the geom_tile call itself - note I removed the trans = reverse specification from your plot.
Plot 2 just adds back the white tiles on top of the other plot - simple hack which you will often find necessary when wanting to plot certain data points differently.
Note I have added a few minor comments to your code below.
# creating your data frame with better name - df is a base R function and not recommended as example name.
# Also note that I removed the quotation marks in the data frame call - they were not necessary. I also called as.numeric directly.
mydf <- data.frame(x_pos = c("A","A","A","B","B","B","C","C","C"), y_pos = c("X","Y","Z","X","Y","Z","X","Y","Z"), col_var= as.numeric(c(1,2,NA,4,5,6,NA,8,9)), alpha_var = as.numeric(c(7,12,0,3,2,15,0,6,15)))
mydf$col_var_cut <- cut(mydf$col_var, breaks = c(0,3,6,10), labels = c("cat1","cat2", "cat3"))
#Plot
library(tidyverse)
library(RColorBrewer) # you forgot to add this to your reprex
ggplot(mydf, aes (x = x_pos, y = y_pos, fill = col_var_cut, label = col_var)) +
geom_tile(aes(alpha = alpha_var)) +
geom_text() +
scale_fill_manual(values=(brewer.pal(3, "RdYlBu")), na.value="white")
#> Warning: Removed 2 rows containing missing values (geom_text).
# a bit hacky for quick and dirty solution. Note I am using dplyr::filter from the tidyverse
ggplot(mapping = aes(x = x_pos, y = y_pos, fill = col_var_cut, label = col_var)) +
geom_tile(data = filter(mydf, !is.na(col_var))) +
geom_tile(data = filter(mydf, !is.na(col_var)), aes(alpha = alpha_var), fill ="gray29")+
geom_tile(data = filter(mydf, is.na(col_var)), fill = 'white') +
geom_text(data = mydf) +
scale_fill_manual(values = (brewer.pal(3, "RdYlBu"))) +
scale_alpha_continuous("alpha_var", range=c(0,0.7), trans = 'reverse')
#> Warning: Removed 2 rows containing missing values (geom_text).
Created on 2019-07-04 by the reprex package (v0.2.1)

Related

making a WCS Munsell color chart in R, problems with order in scale_fill_manual, ggplot2

I want to make a Munsell for color chart for the chips used by the World Color Survey. It should look like this:
The information needed can be found on the WCS page, here, I take the following steps:
library(munsell) # https://cran.r-project.org/web/packages/munsell/munsell.pdf
library(ggplot2)
# take the "cnum-vhcm-lab-new.txt" file from: https://www1.icsi.berkeley.edu/wcs/data.html#wmt
# change by replacing .50 with .5 removing .00 after hue values
WCS <- read.csv("cnum-vhcm-lab-new.txt", sep = "\t", header = T)
WCS$hex <- mnsl2hex(hvc2mnsl(hue = WCS$MunH, value = ceiling(WCS$MunV), chroma = WCS$C), fix = T)
# this works, but the order of tiles is messed up
ggplot(aes(x=H, y=V, fill=hex), data = WCS) +
geom_tile(aes(x=H, y=V), show.legend = F) +
scale_fill_manual(values = WCS$hex) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 40))
The result:
Clearly, the chips are not ordered along hue and value but with reference to some other dimension, perhaps even order in the original data frame. I also have to revert the order on the y-axis. I guess the solution will have to do with factor() and reorder(), but how to do it?
OP. TL;DR - you should be using scale_fill_identity() rather than scale_fill_manual().
Now for the long description: At its core, ggplot2 functions on mapping the columns of your data to specific features on the plot, which ggplot2 refers to as "aesthetics" using the aes() function. Positioning is defined by mapping certain columns of your data to x and y aesthetics, and the different colors in your tiles are mapped to fill using aes() as well.
The mapping for fill does not specify color, but only specifies which things should be different colors. When mapped this way, it means that rows in your data (observations) that have the same value in column mapped to the fill aesthetic will be the same color, and observations that have different values in the column mapped to the fill aesthetic will be different colors. Importantly, this does not specify the color, but only specifies if colors should be different!
The default behavior is that ggplot2 will determine the colors to use by applying a default scale. For continuous (numeric) values, a continuous scale is applied, and for discrete values (like a vector of characters), a discrete scale is applied.
To see the default behavior, just remove scale_fill_manual(...) from your plot code. I've recopied your code below and added the needed revisions to programmatically remove and adjust the ".50" and ".00" changes to WCS$MunH. The code below should work entirely if you have downloaded the original .txt file from the link you provided.
library(munsell)
library(ggplot2)
WCS <- read.csv("cnum-vhcm-lab-new.txt", sep = "\t", header = T)
WCS$MunH <- gsub('.50','.5', WCS$MunH) # remove trailing "0" after ".50"
WCS$MunH <- gsub('.00', '', WCS$MunH) # remove ".00" altogether
WCS$V <- factor(WCS$V) # needed to flip the axis
WCS$hex <- mnsl2hex(hvc2mnsl(hue = WCS$MunH, value = ceiling(WCS$MunV), chroma = WCS$C), fix = T)
ggplot(aes(x=H, y=V, fill=hex), data = WCS) +
geom_tile(aes(x=H, y=V), show.legend = F, width=0.8, height=0.8) +
scale_y_discrete(limits = rev(levels(WCS$V))) + # flipping the axis
scale_x_continuous(breaks = scales::pretty_breaks(n = 40)) +
coord_fixed() + # force all tiles to be "square"
theme(
panel.grid = element_blank()
)
You have show.legend = F in there, but there should be 324 different values mapped to the WCS$hex column (i.e. length(unique(WCS$hex))).
When using scale_fill_manual(values=...), you are supplying the names of the colors to be used, but they are not mapped to the same positions in your column WCS$hex. They are applied according to the way in which ggplot2 decides to organize the levels of WCS$hex as if it were a factor.
In order to tell ggplot2 to basically ignore the mapping and just color according to the actual color name you see in the column mapped to fill, you use scale_fill_identity(). This will necessarily remove the ability to show any legend, since it kind of removes the mapping and recoloring that is the default behavior of aes(fill=...). Regardless, this should solve your issue:
ggplot(aes(x=H, y=V, fill=hex), data = WCS) +
geom_tile(aes(x=H, y=V), width=0.8, height=0.8) +
scale_fill_identity() + # assign color based on text
scale_y_discrete(limits = rev(levels(WCS$V))) + # flipping the axis
scale_x_continuous(breaks = scales::pretty_breaks(n = 40)) +
coord_fixed() + # force all tiles to be "square"
theme(
panel.grid = element_blank()
)
The main thing is to use the right color scale (scale_fill_identity). This ensures the hex values are uses as the color for the tiles.
library(munsell) # https://cran.r-project.org/web/packages/munsell/munsell.pdf
library(ggplot2)
WCS <- read.csv(url('https://www1.icsi.berkeley.edu/wcs/data/cnum-maps/cnum-vhcm-lab-new.txt'), sep = "\t", header = T)
WCS$hex <- mnsl2hex(hvc2mnsl(hue = gsub('.00','',gsub('.50', '.5',WCS$MunH)), value = ceiling(WCS$MunV), chroma = WCS$C), fix = T)
# this works, but the order of tiles is messed up
ggplot(aes(x=H, y=V, fill=hex), data = WCS) +
geom_tile(aes(x=H, y=V), show.legend = F) +
scale_fill_identity() +
scale_x_continuous(breaks = scales::pretty_breaks(n = 40))
Created on 2021-10-05 by the reprex package (v2.0.1)

Multiple Splines using ggplot2 + Different colours + Line width + Custom X-axis markings

I have a two small sets of points, viz. (1,a1),...,(9,a9) and (1,b1),...,(9,b9). I'm trying to interpolate these two set of points separately by using splines with the help of ggplot2. So, what I want is 2 different splines curves interpolating the two sets of points on the same plot (Refer to the end of this post).
Since I have a very little plotting experience using ggplot2, I copied a code snippet from this answer by Richard Telford. At first, I stored my Y-values for set of points in two numeric variables A and B, and wrote the following code :
library(ggplot2)
library(plyr)
A <- c(a1,...,a9)
B <- c(b1,...,b9)
d <- data.frame(x=1:9,y=A)
d2 <- data.frame(x=1:9,y=B)
dd <- rbind(cbind(d, case = "d"), cbind(d2, case = "d2"))
ddsmooth <- plyr::ddply(dd, .(case), function(k) as.data.frame(spline(k)))
ggplot(dd,aes(x, y, group = case)) + geom_point() + geom_line(aes(x, y, group = case), data = ddsmooth)
This produces the following output :
Now, I'm seeking for an almost identical plot with the following customizations :
The two spline curves should have different colours
The line width should be user's choice (Like we do in plot function)
A legend (Specifying the colour and the corresponding attribute)
Markings on the X-axis should be 1,2,3,...,9
Hoping for a detailed solution to my problem, though any kind of help is appreciated. Thanks in advance for your time and help.
You have already shaped your data correctly for the plot. It's just a case of associating the case variable with colour and size scales.
Note the following:
I have inferred the values of A and B from your plot
Since the lines are opaque, we plot them first so that the points are still visible
I have included size and colour parameters to the aes call in geom_line
I have selected the colours by passing them as a character vector to scale_colour_manual
I have also selected the sizes of the lines by calling scale_size_manual
I have set the x axis breaks by adding a call to scale_x_continuous
The legend has been added automatically according to the scales used.
ggplot(dd, aes(x, y)) +
geom_line(aes(colour = case, size = case, linetype = case), data = ddsmooth) +
geom_point(colour = "black") +
scale_colour_manual(values = c("red4", "forestgreen"), name = "Legend") +
scale_size_manual(values = c(0.8, 1.5), name = "Legend") +
scale_linetype_manual(values = 1:2, name = "Legend") +
scale_x_continuous(breaks = 1:9)
Created on 2020-07-15 by the reprex package (v0.3.0)

Stacked barplot with colour gradients for each bar

I want to color a stacked barplot so that each bar has its own parent colour, with colours within each bar to be a gradient of this parent colour.
Example:
Here is a minimal example. I would like for the color of each bar to be different for color, with a gradient within each bar set by `clarity.
library(ggplot2)
ggplot(diamonds, aes(color)) +
geom_bar(aes(fill = clarity), colour = "grey")
In my real problem, I have many more groups of each: requiring 18 different bars with 39 different gradient colours.
I have made a function ColourPalleteMulti, which lets you create a multiple colour pallete based on subgroups within your data:
ColourPalleteMulti <- function(df, group, subgroup){
# Find how many colour categories to create and the number of colours in each
categories <- aggregate(as.formula(paste(subgroup, group, sep="~" )), df, function(x) length(unique(x)))
category.start <- (scales::hue_pal(l = 100)(nrow(categories))) # Set the top of the colour pallete
category.end <- (scales::hue_pal(l = 40)(nrow(categories))) # set the bottom
# Build Colour pallette
colours <- unlist(lapply(1:nrow(categories),
function(i){
colorRampPalette(colors = c(category.start[i], category.end[i]))(categories[i,2])}))
return(colours)
}
Essentially, the function identifies how many different groups you have, then counts the number of colours within each of these groups. It then joins together all the different colour palettes.
To use the palette, it is easiest to add a new column group, which pastes together the two values used to make the colour palette:
library(ggplot2)
# Create data
df <- diamonds
df$group <- paste0(df$color, "-", df$clarity, sep = "")
# Build the colour pallete
colours <-ColourPalleteMulti(df, "color", "clarity")
# Plot resultss
ggplot(df, aes(color)) +
geom_bar(aes(fill = group), colour = "grey") +
scale_fill_manual("Subject", values=colours, guide = "none")
Edit:
If you want the bars to be a different colour within each, you can just change the way the variable used to plot the barplot:
# Plot resultss
ggplot(df, aes(cut)) +
geom_bar(aes(fill = group), colour = "grey") +
scale_fill_manual("Subject", values=colours, guide = "none")
A Note of Caution: In all honesty, the dataset you have want to plot probably has too many sub-categories within it for this to work.
Also, although this is visually very pleasing, I would suggest avoiding the use of a colour scale like this. It is more about making the plot look pretty, and the different colours are redundant as we already know which group the data is in from the X-axis.
An easier approach to achieve a colour gradient is to use alpha to change the transparency of the colour. However, this can have unintended consequences as transparency means you can see the guidelines through the plot.
library(ggplot2)
ggplot(diamonds, aes(color, alpha = clarity)) +
geom_bar(aes(fill = color), colour = "grey") +
scale_alpha_discrete(range = c(0,1))
I have recently created the package ggnested which creates such plots. It is essentially a wrapper around ggplot2 that takes main_group and sub_group in the aesthetic mapping, where colours are generated for the main_group, and a gradient is generated for the levels of sub_group that are nested within each level of the main_group.
devtools::install_github("gmteunisse/ggnested")
require(ggnested)
data(diamonds)
ggnested(diamonds, aes(main_group = color, sub_group = clarity)) +
geom_bar(aes(x = color))
Another option is to use any custom color palette and simply darken/lighten those depending on the fill category. It can be slightly tricky to get a smooth gradient in each bar, but if you keep the natural order of the data (either appearance in data frame or the factor levels) this is not a big problem.
I am using the colorspace package for this task. The shades package also has the option to darken/lighten colors, but the syntax is slightly longer. It is more suitable for modification of entire palettes without specifying specific colors.
library(tidyverse)
library(colorspace)
## get some random colors, here n colors based on the Dark2 palette using the colorspace package.
## But ANY palette is possible
my_cols <- qualitative_hcl(length(unique(diamonds$color)), "Dark2")
## for easier assignment, name the colors
names(my_cols) <- unique(diamonds$color)
## assign the color to the category, by group
df_grad <-
diamonds %>%
group_by(color) %>%
## to keep the order of your stack and a natural gradient
## use order by occurrence in data frame or by factor
## clarity is an ordered factor, so I'm using a dense rank
mutate(
clarity_rank = dense_rank(as.integer(clarity)),
new_cols = my_cols[color],
## now darken or lighten according to the rank
clarity_dark = darken(new_cols, amount = clarity_rank / 10),
clarity_light = lighten(new_cols, amount = clarity_rank / 10)
)
## use this new color for your fill with scale_identity
## you additionally need to keep your ordering variable as group, in this case
## an interaction between color and your new rank
ggplot(df_grad, aes(color, group = interaction(color, clarity_rank))) +
geom_bar(aes(fill = clarity_dark)) +
scale_fill_identity()
ggplot(df_grad, aes(color, group = interaction(color, clarity_rank))) +
geom_bar(aes(fill = clarity_light)) +
scale_fill_identity()
Created on 2022-07-03 by the reprex package (v2.0.1)

Dark to light colours based on value ggplot2

I am trying to customize the colours using ggplot2. The function I wrote is as follows:
library(tidyverse)
spaghetti_plot_multiple <- function(input, MV, item_level){
MV <- enquo(MV)
titles <- enquo(item_level)
input %>%
filter(!!(MV) == item_level) %>%
mutate(first_answer = first_answer) %>%
ggplot(.,aes( x = time, y = jitter(Answer), group = ID)) +
geom_line(aes(colour = first_answer)) +
labs(title = titles ,x = 'Time', y = 'Answer', colour = 'Answer given at time 0') +
facet_wrap(~ ID, scales = "free_x")+
theme(strip.text = element_text(size = 8)) +
scale_color_manual(values = c('red', 'blue', 'brown', 'purple', 'black'))
}
This however doesn't work, but I can't seem to figure out why scale_color_manual(..) values doesn't work. The current plot I am using is:
This is somewhat in line with what I am trying to achieve: a dark color for values 1-3 (i.e. based on first_answer which ranges from 1 to 5) and lighter ones for 4 and 5. The reason is simply because there are many more lines with a value of 4 or 5 and I want to be able to see the direction of lines across time.
EDIT The image is the plot I currently have. Although it somewhat resembles what I'd like to get, I'd much rather set the colors myself or use some function that chooses colors to enhance the plotting visibility (the lines in the plot) automatically.
You can specify color gradients with 'scale_x_gradient' scale_x_gradient2 or scale_x_gradientn
(x can be fill or color)
Caveat when specifying the color values with values = c(...)): values() assigns colours based on their position within c(0,1). You therefore need to scale the values from your vector which you want to have as breaks to the range c(0,1).
Re your question which palette best to use for 5 distinct lines: I think best is to manually specify the colours as you have done. I often use hex codes instead. I personally look those up at
html color codes.

ggrepel: Repelling text in only one direction, and returning values of repelled text

I have a dataset, where each data point has an x-value that is constrained (represents an actual instance of a quantitative variable), y-value that is arbitrary (exists simply to provide a dimension to spread out text), and a label. My datasets can be very large, and there is often text overlap, even when I try to spread the data across the y-axis as much as possible.
Hence, I am trying to use the new ggrepel. However, I am trying to keep the text labels constrained at their x-value position, while only allowing them to repel from each other in the y-direction.
As an example, the below code produces an plot for 32 data points, where the x-values show the number of cylinders in a car, and the y-values are determined randomly (have no meaning but to provide a second dimension for text plotting purposes). Without using ggrepel, there is significant overlap in the text:
library(ggrepel)
library(ggplot2)
set.seed(1)
data = data.frame(x=runif(100, 1, 10),y=runif(100, 1, 10),label=paste0("label",seq(1:100)))
origPlot <- ggplot(data) +
geom_point(aes(x, y), color = 'red') +
geom_text(aes(x, y, label = label)) +
theme_classic(base_size = 16)
I can remedy the text overlap using ggrepel, as shown below. However, this changes not only the y-values, but also the x-values. I am trying to avoid changing the x-values, as they represent an actual physical meaning (the number of cylinders):
repelPlot <- ggplot(data) +
geom_point(aes(x, y), color = 'red') +
geom_text_repel(aes(x, y, label = label)) +
theme_classic(base_size = 16)
As a note, the reason I cannot allow the x-value of the text to change is because I am only plotting the text (not the points). Whereas, it seems that most examples in ggrepel keep the position of the points (so that their values remain true), and only repel the x and y values of the labels. Then, the points and connected to the labels with segments (you can see that in my second plot example).
I kept the points in the two examples above for demonstration purposes. However, I am only retaining the text (and hence will be removing the points and the segments), leaving me with something like this:
repelPlot2 <- ggplot(data) + geom_text_repel(aes(x, y, label = label), segment.size = 0) + theme_classic(base_size = 16)
My question is two fold:
1) Is it possible for me to repel the text labels only in the y-direction?
2) Is it possible for me to obtain a structure containing the new (repelled) y-values of the text?
Thank you for any advice!
ggrepel version 0.6.8 (Install from GitHub using devtools::github_install) now supports a "direction" argument, which enables repelling of labels only in "x" or "y" direction.
repelPlot2 <- ggplot(data) + geom_text_repel(aes(x, y, label = label), segment.size = 0, direction = "y") + theme_classic(base_size = 16)
Getting the y values is harder -- one approach can be to use the "repel_boxes" function from ggrepel first to get repelled values and then input those into ggplot with geom_text. For discussion and sample code of that approach, see https://github.com/slowkow/ggrepel/issues/24. Note that if using the latest version, the repel_boxes function now also has a "direction" argument, which takes in "both","x", or "y".
I don't think it is possible to repel text labels only in one direction with ggrepel.
I would approach this problem differently, by instead generating the arbitrary y-axis positions manually. For example, for the data set in your example, you could do this using the code below.
I have used the dplyr package to group the data set by the values of x, and then created a new column of data y containing the row numbers within each group. The row numbers are then used as the values for the y-axis.
library(ggplot2)
library(dplyr)
data <- data.frame(x = mtcars$cyl, label = paste0("label", seq(1:32)))
data <- data %>%
group_by(x) %>%
mutate(y = row_number())
ggplot(data, aes(x = x, y = y, label = label)) +
geom_text(size = 2) +
xlim(3.5, 8.5) +
theme_classic(base_size = 8)
ggsave("filename.png", width = 4, height = 2)

Resources