With ggplot2 I want to plot two vectors (vec1_num, vec2_num) in two dimensions and colour the points by a group variable (vec3_char). Some data points are overlapping.
library(ggplot2)
vec1_num = c(1,2,3,4,1,3,4,5,5,5)
vec2_num = c(1,2,3,4,1,3,4,5,5,5)
vec3_char = c("A", "B", "C", "A", "B", "C", "C", "A", "B", "C")
# plot 1
ggplot(data = NULL) +
geom_point(aes(x=vec1_num, y=vec2_num, colour=vec3_char), alpha=0.4, size=4) +
scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "C"="orange")) +
theme(panel.grid = element_blank())
I know I can attenuate the overlap by reducing alpha or working with geom_jitter adding a bit of noise. Like this:
# plot 2
ggplot(data = NULL) +
geom_jitter(aes(x=vec1_num, y=vec2_num, colour=vec3_char), alpha=0.4, size=4, width = 0.1) +
scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "C"="orange")) +
theme(panel.grid = element_blank())
However, is it possible to make use of plot 1 but colour the overlapping points differently? So that, for example, "A" = "darkblue, "AB" = "black", "ABC" = "grey", "B" = "darkred", "BC" = "pink", "C"="orange"? And can I additionally add a small Venn Diagram (legend) that visualises the color choice for the point overlap?
Thanks!
My way of doing this would be to convert the letters into numbers, sum them and covert back into letters.
NB The one complication is that the letters need to be A, B, D, H, ... so there is only one way of making each number combination. Though there is probably a way to start with A, B, C, ... and encode for unique values
library(tidyverse)
vec1_num = c(1,2,3,4,1,3,4,5,5,5)
vec2_num = c(1,2,3,4,1,3,4,5,5,5)
vec3_char = c("A", "B", "D", "A", "B", "D", "D", "A", "B", "D")
removeDup <- function(str) paste(rle(strsplit(str, "")[[1]])$values, collapse="") # Function to remove duplicated values in a string
data <- data.frame(x = vec1_num, y = vec2_num, col = match(vec3_char, LETTERS))
data <- data %>%
group_by(x) %>%
mutate(colour = glue::glue_collapse(col, sep = "")) %>%
select(-col) %>%
distinct(x, y, .keep_all = TRUE) %>%
mutate(colour = removeDup(colour)) %>%
mutate(colour = sapply(str_extract_all(colour, '\\d'), function(x) sum(as.integer(x)))) %>%
mutate(colour = case_when(
colour == 1 ~ "A",
colour == 2 ~ "B",
colour == 3 ~ "AB",
colour == 4 ~ "D",
colour == 5 ~ "AD",
colour == 6 ~ "BD",
colour == 7 ~ "ABD"
))
# plot 1
ggplot(data) +
geom_point(aes(x=x, y=y, colour = as_factor(colour)), alpha=0.4, size=4) +
geom_text(aes(x = x, y = y, label = colour), vjust = 2) +
scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "AB"="orange", "D" = "green", "AD" = "black", "BD" = "orange", "ABD" = "purple"), name = "Colour") +
theme(panel.grid = element_blank())
.
I would firstly create a dataframe. Then I would extract for every x y combination (list(df$vec1_num, df$vec2_num)) what characters are present (...unique(xy_i$vec3_char)...). Like this:
df <- data.frame(vec1_num, vec2_num, vec3_char)
df_new <- do.call("rbind.data.frame", by(df, list(df$vec1_num, df$vec2_num), function(xy_i){
chars_i <- paste0(sort(unique(xy_i$vec3_char)),collapse= "")
xy_i$chars_comb <- factor(chars_i, levels= c("A", "AB", "AC", "ABC", "B", "BC", "C"))
xy_i
}))
If you now make the plot it shows you what characters overlap at which point.
ggplot(data = df_new) +
geom_point(aes(x=vec1_num, y=vec2_num, colour=chars_comb), alpha=0.4, size=4) +
scale_colour_manual(values=c("AB" = "black", "ABC" = "grey", "B" = "darkred", "C"="orange", "AC"= "red")) +
theme(panel.grid = element_blank())
Related
I'm working on a larger project for which I am creating several plots in ggplot2. The plots are concerned with plotting several different outcomes across several different discreet categories (think: countries, species, types). I would like to completely fix the mapping of discrete types to colors such that Type=A is always displayed in red, Type=B is always displayed in blue, and so on across all plots irrespective of what other factors are present. I know about scale_fill_manual() where I can provide color values manually and then work with drop = FALSE which helps in dealing with unused factor levels. However, I find this extremely cumbersome since every plot will need some manual work to deal with sorting the factors in the right way, sorting color values to match factor sorting, dropping unused levels, etc.
What I am looking for is a way where I can map once and globally factor levels to specific colors (A=green, B=blue, C=red, ...) and then just go about plotting whatever I please and ggplot picking the right colors.
Here is some code to illustrate the point.
# Full set with 4 categories
df1 <- data.frame(Value = c(40, 20, 10, 60),
Type = c("A", "B", "C", "D"))
ggplot(df1, aes(x = Type, y = Value, fill = Type)) + geom_bar(stat = "identity")
# Colors change complete because only 3 factor levels are present
df2 <- data.frame(Value = c(40, 20, 60),
Type = c("A", "B", "D"))
ggplot(df2, aes(x = Type, y = Value, fill = Type)) + geom_bar(stat = "identity")
# Colors change because factor is sorted differently
df3 <- data.frame(Value = c(40, 20, 10, 60),
Type = c("A", "B", "C", "D"))
df3$Type <- factor(df3$Type, levels = c("D", "C", "B", "A"), ordered = TRUE)
ggplot(df3, aes(x = Type, y = Value, fill = Type)) + geom_bar(stat = "identity")
You could define your own custom scale, if you like. If you look at the source for scale_fill_manual,
scale_fill_manual
#> function (..., values)
#> {
#> manual_scale("fill", values, ...)
#> }
#> <environment: namespace:ggplot2>
it's actually quite simple:
library(ggplot2)
scale_fill_chris <- function(...){
ggplot2:::manual_scale(
'fill',
values = setNames(c('green', 'blue', 'red', 'orange'), LETTERS[1:4]),
...
)
}
df1 <- data.frame(Value = c(40, 20, 10, 60),
Type = c("A", "B", "C", "D"))
ggplot(df1, aes(x = Type, y = Value, fill = Type)) +
geom_col() +
scale_fill_chris()
df2 <- data.frame(Value = c(40, 20, 60),
Type = c("A", "B", "D"))
ggplot(df2, aes(x = Type, y = Value, fill = Type)) +
geom_col() +
scale_fill_chris()
df3 <- data.frame(Value = c(40, 20, 10, 60),
Type = c("A", "B", "C", "D"))
df3$Type <- factor(df3$Type, levels = c("D", "C", "B", "A"), ordered = TRUE)
ggplot(df3, aes(x = Type, y = Value, fill = Type)) +
geom_col() +
scale_fill_chris()
You could make a custom plot function (including scale_fill_manual and reasonable default colours) in order to avoid repeating code:
library(ggplot2)
custom_plot <- function(.data,
colours = c("A" = "green", "B" = "blue", "C" = "red", "D" = "grey")) {
ggplot(.data, aes(x=Type, y=Value, fill= Type)) + geom_bar(stat="identity") +
scale_fill_manual(values = colours)
}
df1 <- data.frame(Value=c(40, 20, 10, 60), Type=c("A", "B", "C", "D"))
df2 <- data.frame(Value=c(40, 20, 60), Type=c("A", "B", "D"))
df3 <- data.frame(Value=c(40, 20, 10, 60), Type=c("A", "B", "C", "D"))
df3$Type <- factor(df3$Type, levels=c("D", "C", "B", "A"), ordered=TRUE)
custom_plot(df1)
custom_plot(df2)
custom_plot(df3)
Another options is to make drop = F the default by defining the default colour scales as follows:
scale_colour_discrete <- function(...)
scale_colour_manual(..., drop = F)
scale_fill_discrete <- function(...)
scale_fill_manual(..., drop = F)
That way colours are always consistent for different factors.
make sure you convert that column into Factor first and then create a variable to store the color value for each factor...
df$color <- as.factor(df$color, levels = c(1, 0))
cbPallete <- c("1"= "green", "0"="red")
ggplot(data = df) + geom_bar(x = df$x,
y = df$y,
fill = df$color) +
scale_fill_manual(values = cbPallete)
In the following example (using the iris dataset), I am creating a factor class variable in which one of the species does not contain values of level C. When I make the plot, I cannot find a way to make ggplot not drop the empty level (virginica-C). In a previous post (from 10 years ago), it indicates to use the argument drop = FALSE, but it is not working for me. any suggestions?
require(dplyr)
require(ggplot2)
iris %>%
mutate(fct_x = factor(x = sample(x = c("A", "B", "C"), size = nrow(.), replace = TRUE),
levels = c("A", "B", "C"))) %>%
filter(!(Species == "virginica" & fct_x == "C")) %>%
ggplot(aes(x = Species, y = Sepal.Length, fill = fct_x)) +
geom_boxplot() +
scale_fill_discrete(drop = FALSE)
In other words, the code shown above generates the following graphic. As you can see, the virginica group does NOT show an empty space for group C (because there are no elements of type virginica-C) and that is exactly what I want to achieve: to show that empty space in the figure.
PS: There is also another similar post (from 6 years ago) in which they suggest placing values outside the limits. It is not a bad idea when you have to make a point plot, but in my case I am making a script that generates automatic plots from incoming information and, therefore, I cannot limit the y-axis since the script itself defines the ylim according to the values that appear.
You can specify the position function in the geom_boxplot call. In dodge2 (the default position parameter) you can set preserve="single" so the width of all the single columns is the same.
iris %>%
mutate(fct_x = factor(x = sample(x = c("A", "B", "C"), size = nrow(.), replace = TRUE),
levels = c("A", "B", "C"))) %>%
filter(!(Species == "virginica" & fct_x == "C")) %>%
mutate(fct_x = factor(fct_x, levels = c("A", "B", "C"))) %>%
ggplot(aes(x = Species, y = Sepal.Length, fill = fct_x)) +
geom_boxplot(position=position_dodge2(preserve="single"))
See the definition of position_dodge2(): https://ggplot2.tidyverse.org/reference/position_dodge.html
You could get the empty slot by faceting with scales = "free_x" and using scale_x_discrete(drop = FALSE):
(The strip labels could be moved to the bottom, and the fct_x labels & gaps between facets removed, if preferred per the second example.)
require(dplyr)
require(ggplot2)
iris %>%
mutate(fct_x = factor(
x = sample(x = c("A", "B", "C"), size = nrow(.), replace = TRUE),
levels = c("A", "B", "C")
)) %>%
filter(!(Species == "virginica" & fct_x == "C")) %>%
ggplot(aes(x = fct_x, y = Sepal.Length, fill = fct_x)) +
geom_boxplot() +
facet_wrap(~ Species, scales = "free_x") +
scale_x_discrete(drop = FALSE)
Created on 2022-06-16 by the reprex package (v2.0.1)
# Mimicing the original plot
require(dplyr)
require(ggplot2)
iris %>%
mutate(fct_x = factor(
x = sample(x = c("A", "B", "C"), size = nrow(.), replace = TRUE),
levels = c("A", "B", "C")
)) %>%
filter(!(Species == "virginica" & fct_x == "C")) %>%
ggplot(aes(x = fct_x, y = Sepal.Length, fill = fct_x)) +
geom_boxplot() +
facet_wrap(~ Species, scales = "free_x", strip.position = "bottom") +
scale_x_discrete(drop = FALSE) +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
strip.background = element_blank(),
panel.spacing = unit(0, "lines")) +
labs(x = "Species")
Created on 2022-06-16 by the reprex package (v2.0.1)
How do I keep all the legends in plot?
Below is an example of scatter plot. It prints only if x > 0.5. But I want to show all the legends.
library(tidyverse)
for (iter in 1:5)
{
# generate 5 random points
tbl <- tibble(x = runif(5),
y = runif(5),
class = c("A", "B", "C", "D", "E"))
# print if x > 0.5
p <- ggplot(data = tbl %>% filter(x > 0.5),
aes(x = x,
y = y,
color = class)) +
geom_point(size = 5) +
scale_fill_manual(labels = c("A", "B", "C", "D", "E"),
values = c("Grey", "Red", "Green", "Blue", "Yellow"),
drop = FALSE) +
theme_bw() +
theme(aspect.ratio = 1) +
xlim(0, 1) +
ylim(0, 1)
ggsave(p,
filename = paste0(iter, ".png"))
}
You can do it if:
you set the class variable to factor
use scale_colour_manual instead of scale_fill_manual. If you want to use the default colour from the ggplot palette you can use scale_colour_descrete, as in my code.
library(tidyverse)
set.seed(1) # for reproducibility
plots <- lapply(1:5, function(iter){
# generate 5 random points
tbl <- tibble(x = runif(5),
y = runif(5),
class = factor(c("A", "B", "C", "D", "E")))
# print if x > 0.5
p <- ggplot(data = tbl %>% filter(x > 0.5),
aes(x = x,
y = y,
color = class)) +
geom_point(size = 5) +
scale_colour_discrete(drop = FALSE) +
theme_bw() +
theme(aspect.ratio = 1) +
xlim(0, 1) +
ylim(0, 1)
ggsave(p, filename = paste0(iter, ".png"))
p
})
# visualize them all together
cowplot::plot_grid(plotlist = plots)
PS: I've used lapply instead of a for loop, usually it is more appreciated by R users when possible.
Instead of filtering out the data you can change the values to NA:
library(tidyverse)
for (iter in 1:5)
{
# generate 5 random points
tbl <- tibble(x = runif(5),
y = runif(5),
class = c("A", "B", "C", "D", "E"))
# print if x > 0.5
p <- ggplot(data = tbl %>% mutate(y = if_else(x > 0.5, y, NA_real_)),
aes(x = x,
y = y,
color = class)) +
geom_point(size = 5) +
scale_fill_manual(labels = c("A", "B", "C", "D", "E"),
values = c("Grey", "Red", "Green", "Blue", "Yellow"),
drop = FALSE) +
theme_bw() +
theme(aspect.ratio = 1) +
xlim(0, 1) +
ylim(0, 1)
ggsave(p,
filename = paste0(iter, ".png"))
}
I'm working on a larger project for which I am creating several plots in ggplot2. The plots are concerned with plotting several different outcomes across several different discreet categories (think: countries, species, types). I would like to completely fix the mapping of discrete types to colors such that Type=A is always displayed in red, Type=B is always displayed in blue, and so on across all plots irrespective of what other factors are present. I know about scale_fill_manual() where I can provide color values manually and then work with drop = FALSE which helps in dealing with unused factor levels. However, I find this extremely cumbersome since every plot will need some manual work to deal with sorting the factors in the right way, sorting color values to match factor sorting, dropping unused levels, etc.
What I am looking for is a way where I can map once and globally factor levels to specific colors (A=green, B=blue, C=red, ...) and then just go about plotting whatever I please and ggplot picking the right colors.
Here is some code to illustrate the point.
# Full set with 4 categories
df1 <- data.frame(Value = c(40, 20, 10, 60),
Type = c("A", "B", "C", "D"))
ggplot(df1, aes(x = Type, y = Value, fill = Type)) + geom_bar(stat = "identity")
# Colors change complete because only 3 factor levels are present
df2 <- data.frame(Value = c(40, 20, 60),
Type = c("A", "B", "D"))
ggplot(df2, aes(x = Type, y = Value, fill = Type)) + geom_bar(stat = "identity")
# Colors change because factor is sorted differently
df3 <- data.frame(Value = c(40, 20, 10, 60),
Type = c("A", "B", "C", "D"))
df3$Type <- factor(df3$Type, levels = c("D", "C", "B", "A"), ordered = TRUE)
ggplot(df3, aes(x = Type, y = Value, fill = Type)) + geom_bar(stat = "identity")
You could define your own custom scale, if you like. If you look at the source for scale_fill_manual,
scale_fill_manual
#> function (..., values)
#> {
#> manual_scale("fill", values, ...)
#> }
#> <environment: namespace:ggplot2>
it's actually quite simple:
library(ggplot2)
scale_fill_chris <- function(...){
ggplot2:::manual_scale(
'fill',
values = setNames(c('green', 'blue', 'red', 'orange'), LETTERS[1:4]),
...
)
}
df1 <- data.frame(Value = c(40, 20, 10, 60),
Type = c("A", "B", "C", "D"))
ggplot(df1, aes(x = Type, y = Value, fill = Type)) +
geom_col() +
scale_fill_chris()
df2 <- data.frame(Value = c(40, 20, 60),
Type = c("A", "B", "D"))
ggplot(df2, aes(x = Type, y = Value, fill = Type)) +
geom_col() +
scale_fill_chris()
df3 <- data.frame(Value = c(40, 20, 10, 60),
Type = c("A", "B", "C", "D"))
df3$Type <- factor(df3$Type, levels = c("D", "C", "B", "A"), ordered = TRUE)
ggplot(df3, aes(x = Type, y = Value, fill = Type)) +
geom_col() +
scale_fill_chris()
You could make a custom plot function (including scale_fill_manual and reasonable default colours) in order to avoid repeating code:
library(ggplot2)
custom_plot <- function(.data,
colours = c("A" = "green", "B" = "blue", "C" = "red", "D" = "grey")) {
ggplot(.data, aes(x=Type, y=Value, fill= Type)) + geom_bar(stat="identity") +
scale_fill_manual(values = colours)
}
df1 <- data.frame(Value=c(40, 20, 10, 60), Type=c("A", "B", "C", "D"))
df2 <- data.frame(Value=c(40, 20, 60), Type=c("A", "B", "D"))
df3 <- data.frame(Value=c(40, 20, 10, 60), Type=c("A", "B", "C", "D"))
df3$Type <- factor(df3$Type, levels=c("D", "C", "B", "A"), ordered=TRUE)
custom_plot(df1)
custom_plot(df2)
custom_plot(df3)
Another options is to make drop = F the default by defining the default colour scales as follows:
scale_colour_discrete <- function(...)
scale_colour_manual(..., drop = F)
scale_fill_discrete <- function(...)
scale_fill_manual(..., drop = F)
That way colours are always consistent for different factors.
make sure you convert that column into Factor first and then create a variable to store the color value for each factor...
df$color <- as.factor(df$color, levels = c(1, 0))
cbPallete <- c("1"= "green", "0"="red")
ggplot(data = df) + geom_bar(x = df$x,
y = df$y,
fill = df$color) +
scale_fill_manual(values = cbPallete)
In the chart below, I'm looking to have each triangle in the label for subgroup A match the color of the subgroup they are referring to (green, blue, purple). Is this possible at all?
library(tibble)
library(dplyr)
library(ggplot2)
library(scales)
example_tibble <- tibble(Subgroup = c("A", "B", "C", "D"),
Result = c(0.288, 0.204, 0.206, 0.182),
A_vs_B = rep(1, 4),
A_vs_C = rep(1, 4),
A_vs_D = rep(1, 4))
ggplot(example_tibble, aes(x = Subgroup, y = Result, fill = Subgroup)) +
geom_bar(stat = "identity") + geom_text(aes(label =
paste0(percent(Result),
if_else(A_vs_B == 1 & Subgroup == "A", sprintf("\u25b2"), ""),
if_else(A_vs_C == 1 & Subgroup == "A", sprintf("\u25b2"), ""),
if_else(A_vs_D == 1 & Subgroup == "A", sprintf("\u25b2"), "")),
colour = Subgroup), hjust = -.25) +
coord_flip() + scale_y_continuous(limits = c(0,.5), labels = percent)