I was just wondering if anybody had any experience with coloring something like a UMAP made in ggplot based on the expression of multiple genes at the same time? What I want to do is something like the blend function in Seurat featureplots, but with 3 genes / colors instead of 2.
I'm looking to make something like this:
Where the colors for the genes combine where there is overlap.
What I've gotten to so far is
ggplot(FD, vars = c("UMAP_1", "UMAP_2", "FOSL2", "JUNB", "HES1"), aes(x = UMAP_1, y = UMAP_2, colour = FOSL2)) +
geom_point(size=0.3, alpha=1) +
scale_colour_gradientn(colours = c("lightgrey", colour1), limits = c(0, 0.3), oob = scales::squish) +
new_scale_color() +
geom_point(aes(colour = JUNB), size=0.3, alpha=0.7) +
scale_colour_gradientn(colours = c("lightgrey", colour2), limits = c(0.1, 0.2), oob = scales::squish) +
new_scale_color() +
geom_point(aes(colour = HES1), size=0.3, alpha=0.1) +
scale_colour_gradientn(colours = c("lightgrey", colour3), limits = c(0, 0.3), oob = scales::squish)
Where FD is a data frame containing the information from the seurat object for the UMAP coordinates and the expression levels of the three genes of interest. All I can get is a plot where the points from one layer obscure those below it, I've tried messing around with the colours, gradients, alpha and scales but I'm guessing I'm doing it the wrong way.
If anyone knows of a way to make this work or has any suggestions on something else to try that would be very much appreciated.
There is no 'vanilla' way of doing this in ggplot2. One can precalculate the blended colours and append invisible layers and scales with the ggnewscale package.
Let's pretend for reproducibility purposes that we want to make a UMAP of the iris dataset and using the descriptors of leaves as 'genes'.
library(ggplot2)
library(scales)
library(ggnewscale)
#> Warning: package 'ggnewscale' was built under R version 4.1.1
# Calculate a UMAP
umap <- uwot::umap(iris[, 1:4])
# Combine with original data and blended colours
df <- cbind.data.frame(
setNames(as.data.frame(umap), c("x", "y")),
iris,
colour = rgb(
rescale(iris$Sepal.Length),
rescale(iris$Sepal.Width),
rescale(iris$Petal.Length)
)
)
ggplot(df, aes(x, y, colour = colour)) +
geom_point() +
scale_colour_identity() +
new_scale_colour() +
# shape = NA --> invisible layers
geom_point(aes(colour = Sepal.Length), shape = NA) +
scale_colour_gradient(low = "black", high = "red") +
new_scale_colour() +
geom_point(aes(colour = Sepal.Width), shape = NA) +
scale_colour_gradient(low = "black", high = "green") +
new_scale_colour() +
geom_point(aes(colour = Petal.Length), shape = NA) +
scale_colour_gradient(low = "black", high = "blue")
#> Warning: Removed 150 rows containing missing values (geom_point).
#> Warning: Removed 150 rows containing missing values (geom_point).
#> Warning: Removed 150 rows containing missing values (geom_point).
On the more experimental side of things, I have a package on github that has related functionality.
library(ggchromatic) # devtools::install_github("teunbrand/ggchromatic")
ggplot(df, aes(x, y, colour = rgb_spec(Sepal.Length, Sepal.Width, Petal.Length))) +
geom_point()
Created on 2021-10-18 by the reprex package (v2.0.1)
A small sidenote: a plot becomes very hard to interpret when some attributes of the data are mapped to different colour channels.
Related
I intended to color lines in pink, and points in yellow. I don't want to use colour argument in respective geom(), I want to use scale to change colour.
p3 <- ggplot(dfcc,aes(x = yr, y = mean)) +
geom_line(aes(color = '')) +
geom_point(aes(color = ''))
p3 + scale_colour_manual(values =c('pink', 'yellow'))
This gives this plot, both lines and points are not in the right colours.
Hence, I have two questions
can I use "scale_colour_manual" to change the line and point colors in one go?
if having multiple geoms and multiple scales, how does the system know which scale applies to which geom?
Any help and explanation would be much appreciated!
Use package ggnewscale.
set.seed(2022)
df1 <- data.frame(x = 1:20, y = cumsum(rnorm(20, 2)))
library(ggplot2)
ggplot(df1, aes(x, y)) +
geom_line(color = "pink", linewidth = 2) +
ggnewscale::new_scale_color() +
geom_point(color = "yellow", size = 3) +
theme_classic()
Created on 2022-12-25 with reprex v2.0.2
I've made a histogram graph that shows the distribution of lidar returns per elevation for three lidar scans I have done.
I've converted my data to long format, with:
one column called 'value', describing the z position of each point
one column called 'variable', containing the name of each
scan group
In the attached image you can see the histograms of my three scan groups. I am currently using viridis to color the histogram by scan group (ie. the name of the scan in the variable column). However, I want to match the colours in the graph with colours I already have.
How might I do this?
The hexcols I'd like to like color each of my three histograms with are:
lightgreen = "#62FE96"
lightred = "#FE206B"
darkpurple = "#62278E"
A link to my data - 'density2'
My current code:
library(tidyverse)
library(viridisLite)
library(viridis)
# histogram
p <- density2 %>%
ggplot( aes(x=value,color = variable, show.legend = FALSE)) +
geom_histogram(binwidth = 1, alpha = 0.5, position="identity") +
scale_color_viridis(discrete =TRUE) +
scale_fill_viridis(discrete=TRUE) +
theme_bw() +
labs(fill="") +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
p + scale_y_sqrt() + theme(legend.position="none") + labs(y = "data pts", x = "elevation (m)")
Any help would be most appreciated!
Delete the scale_color_viridis and scale_fill_viridis lines - these are applying the Viridis color scale. Replace with scale_fill_manual(values = c(lightgreen, lightred, darkpurple)). And in your aesthetic mapping replace color = variable with fill = variable. For a histogram, color refers to the color of the lines outlining each bar, and fill refers to the color each bar is filled in.
This should leave you with:
p <- density2 %>%
ggplot(aes(x = value, fill = variable)) +
geom_histogram(binwidth = 1, alpha = 0.5, position = "identity") +
scale_fill_manual(values = c(lightgreen, lightred, darkpurple)) +
theme_bw() +
labs(fill = "") +
theme(panel.grid = element_blank())
p + scale_y_sqrt() +
theme(legend.position = "none") +
labs(y = "data pts", x = "elevation (m)")
I've also done some other clean-up. show.legend = FALSE does not belong inside aes() - and your theme(legend.position = "none") should take care of it.
I did not download your data, save it in my working directory, import it into R, and test this code on it. If you need more help, please post a small subset of your data in a copy/pasteable format (e.g., dput(density2[1:20, ]) for the first 20 rows---choose a suitable subset) and I'll be happy to test and adjust.
I would like to plot densities of two variables ("red_variable", "green_variable") from two independent dataframes on one density plot, using red and green color for the two variables.
This is my attempt at coding:
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = red_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = green_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
Result: The legend shows correct colors, but the colors on the plot are wrong: The "red" variable is plotted with green color, the "green" variable with red color. The "green" density (mean=8) should appear left and the "red" density (mean=12) on the right on the x-axis. This behavior of the plot doesn't make any sense to me.
I can in fact get the desired result by switching red and green in the code:
### load ggplot2
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = green_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = red_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
... While the plot makes sense now, the code doesn't. I cannot really trust code doing the opposite of what I would expect it to do. What's the problem here? Am I color blind?
On your code, in order to have color at the right position, you need to specify fill = red_color or fill = green_color (as well as alpha as it is a constant - as pointed out by #Gregor) outside of the aes such as:
...+
geom_density(aes(x=red_variable), alpha=0.5, fill = red_color, data=red_dataframe) +
geom_density(aes(x=green_variable), alpha=0.5, fill = green_color, data=green_dataframe) + ...
Alternatively, you can bind your dataframes together, reshape them into a longer format (much more appropriate to ggplot) and then add color column that you can use with scale_fill_identity function (https://ggplot2.tidyverse.org/reference/scale_identity.html):
df <- cbind(red_dataframe,green_dataframe)
library(tidyr)
library(ggplot2)
library(dplyr)
df <- df %>% pivot_longer(.,cols = c(red_variable,green_variable), names_to = "var",values_to = "val") %>%
mutate(Color = ifelse(grepl("red",var),red_color,green_color))
ggplot(df, aes(val, fill = Color))+
geom_density(alpha = 0.5)+
scale_fill_identity(guide = "legend", name = "Legend", labels = levels(as.factor(df$var)))+
xlab("X value") +
ylab("Density")
Does it answer your question ?
You're trying to use ggplot as if it's base graphics... the mindset shift can take a little while to get used to. dc37's answer shows how you should do it. I'll try to explain what goes wrong in your attempt:
When you put fill = green_color inside aes(), because it's inside aes() ggplot essentially creates a new column of data filled with the green_color values in your green_data_frame, i.e., "#008000", "#008000", "#008000", .... Ditto for the red color values in the red data frame. We can see this if we modify your plot by simply deleting your scale:
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
We can actually get what you want by putting the identity scale, which is designed for the (common in base, rare in ggplot2) case where you actually put color values in the data.
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
scale_fill_identity() +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
When you added your scale_fill_manual, ggplot was like "okay, cool, you want to specify colors and labels". But you were thinking in the order that you added the layers to the plot (much like base graphics), whereas ggplot was thinking of these newly created variables "#FF0000" and "#008000", which it ordered alphabetically by default (just as if they were factor or character columns in a data frame). And since you happened to add the layers in reverse alphabetical order, it was switched.
dc37's answer shows a couple better methods. With ggplot you should (a) work with a single, long-format data frame whenever possible (b) don't put constants inside aes() (constant color, constant alpha, etc.), (c) set colors in a scale_fill_* or scale_color_* function when they're not constant.
I have data where each point lays on a spectrum between two centroids. I have generated a color for each point by specifying a color for each centroid, then setting the color of each point as a function of its position between its two centroids. I used this to manually specify colors for each point and plotted the data in the following way:
lb.plot.dat <- data.frame('UMAP1' = lb.umap$layout[,1], 'UMAP2' = lb.umap$layout[,2],
'sample' = as.factor(substr(colnames(lb.vip), 1, 5)),
'fuzzy.class' = color.vect))
p3 <- ggplot(lb.plot.dat, aes(x = UMAP1, y = UMAP2)) + geom_point(aes(color = color.vect)) +
ggtitle('Fuzzy Classification') + scale_color_identity()
p3 + facet_grid(cols = vars(sample)) + theme(legend.) +
ggsave(filename = 'ref-samps_bcell-vip-model_fuzzy-class.png', height = 8, width = 16)
(color.vect is the aforementioned vector of colors for each point in the plot)
I would like to generate a legend of this plot that gives the color used for each centroid. I have a named vector class.cols that contains the colors used for each centroid and is named according to the corresponding class.
Is there a way to transform this vector into a legend for the plot even though it is not explicitly used in the plotting call?
You can turn on legend drawing in scale_color_identity() by setting guide = "legend". You'll have to specify the breaks and labels in the scale function so that the legend correctly states what each color represents, and not just the name of the color.
library(ggplot2)
df <- data.frame(x = 1:3, y = 1:3, color = c("red", "green", "blue"))
# no legend by default
ggplot(df, aes(x, y, color = color)) +
geom_point() +
scale_color_identity()
# legend turned on
ggplot(df, aes(x, y, color = color)) +
geom_point() +
scale_color_identity(guide = "legend")
Created on 2019-12-15 by the reprex package (v0.3.0)
using the code to produce a plot:
ggplot(df1, aes(Percent, CFM)) +
geom_point(color = df1$lab, size =6, alpha = 0.7) +
scale_color_brewer(type = 'seq', palette = 2)
Where df1 is a 3 column dataframe, and df1$lab is an integer of 4 different values representing different rooms. I am trying to color code the points as these 4 different objects but no matter which scale_xxx_brewer I use, it does nothing. Do I have issues with my AES?
ggplot(df1, aes(Percent, CFM)) +
geom_point(aes(color = factor(lab)), size =6, alpha = 0.7) +
scale_color_brewer(palette = 'Dark2')
I needed to put the aes in the geom_point, and add factor() around the lab.