Adding summary statistics labels to box plot using ggplot in R - r

I am trying to add labels to sit above box plots. For example, in this example, instead of NA, I would want the label above A to say "total number of var3 = 11" and over B "total number of var3 = 34". In my real data, numbers are produced, but they bear no relation to the original data set (I cannot work out how they could possibly be calculated from the original data, so I must be doing something wrong!).
var1<- c("A", "B", "A", "B", "B", "B", "A", "B", "B")
var2<- as.numeric(c(4:12))
var3<- as.numeric(c(1:9))
df<- data.frame(var1, var2, var3)
stat_box_data <- function(y, upper_limit = max(df$var2) * 1.15 ) {
return(
data.frame(
y = 0.95* upper_limit,
label = paste('number of var1 =', length(y), '\n',
'total number of var3 =', sum(df$var3[y])
)
)
)
}
ggplot(df, aes(var1, var2)) +
geom_boxplot() +
stat_summary( fun.data = stat_box_data,
geom = "text",
hjust = 0.5,
vjust = 0.9)
df%>% group_by (var1) %>% summarise (sum = sum(var3))
Link to graph
Thanks to original post for code here https://gscheithauer.medium.com/how-to-add-number-of-observations-to-a-ggplot2-boxplot-b22710f7ef80

You could get the result you want using this rather convoluted method.
library(dplyr)
library(ggplot2)
var1<- c("A", "B", "A", "B", "B", "B", "A", "B", "B")
var2<- as.numeric(c(4:12))
var3<- as.numeric(c(1:9))
df<- data.frame(var1, var2, var3)
stat_box_data <- function(y, upper_limit = max(df$var2) * 1.15) {
return(
data.frame(
y = 0.95* upper_limit,label = paste('count =', length(y), '\n',
'mean =', sum(df$var3[match(y, df$var2)]), '\n'
)
)
)
}
d<-df%>% group_by (var1) %>% summarise (sum = sum(var3)) %>% pull(sum)
ggplot(df, aes(var1, var2)) +
geom_boxplot() +
stat_summary(fun.data = stat_box_data,
geom = "text",
hjust = 0.5,
vjust = 0.9)

Related

R: How do I assign colors on a color palette to specific values? (ggplot) [duplicate]

I'm working on a larger project for which I am creating several plots in ggplot2. The plots are concerned with plotting several different outcomes across several different discreet categories (think: countries, species, types). I would like to completely fix the mapping of discrete types to colors such that Type=A is always displayed in red, Type=B is always displayed in blue, and so on across all plots irrespective of what other factors are present. I know about scale_fill_manual() where I can provide color values manually and then work with drop = FALSE which helps in dealing with unused factor levels. However, I find this extremely cumbersome since every plot will need some manual work to deal with sorting the factors in the right way, sorting color values to match factor sorting, dropping unused levels, etc.
What I am looking for is a way where I can map once and globally factor levels to specific colors (A=green, B=blue, C=red, ...) and then just go about plotting whatever I please and ggplot picking the right colors.
Here is some code to illustrate the point.
# Full set with 4 categories
df1 <- data.frame(Value = c(40, 20, 10, 60),
Type = c("A", "B", "C", "D"))
ggplot(df1, aes(x = Type, y = Value, fill = Type)) + geom_bar(stat = "identity")
# Colors change complete because only 3 factor levels are present
df2 <- data.frame(Value = c(40, 20, 60),
Type = c("A", "B", "D"))
ggplot(df2, aes(x = Type, y = Value, fill = Type)) + geom_bar(stat = "identity")
# Colors change because factor is sorted differently
df3 <- data.frame(Value = c(40, 20, 10, 60),
Type = c("A", "B", "C", "D"))
df3$Type <- factor(df3$Type, levels = c("D", "C", "B", "A"), ordered = TRUE)
ggplot(df3, aes(x = Type, y = Value, fill = Type)) + geom_bar(stat = "identity")
You could define your own custom scale, if you like. If you look at the source for scale_fill_manual,
scale_fill_manual
#> function (..., values)
#> {
#> manual_scale("fill", values, ...)
#> }
#> <environment: namespace:ggplot2>
it's actually quite simple:
library(ggplot2)
scale_fill_chris <- function(...){
ggplot2:::manual_scale(
'fill',
values = setNames(c('green', 'blue', 'red', 'orange'), LETTERS[1:4]),
...
)
}
df1 <- data.frame(Value = c(40, 20, 10, 60),
Type = c("A", "B", "C", "D"))
ggplot(df1, aes(x = Type, y = Value, fill = Type)) +
geom_col() +
scale_fill_chris()
df2 <- data.frame(Value = c(40, 20, 60),
Type = c("A", "B", "D"))
ggplot(df2, aes(x = Type, y = Value, fill = Type)) +
geom_col() +
scale_fill_chris()
df3 <- data.frame(Value = c(40, 20, 10, 60),
Type = c("A", "B", "C", "D"))
df3$Type <- factor(df3$Type, levels = c("D", "C", "B", "A"), ordered = TRUE)
ggplot(df3, aes(x = Type, y = Value, fill = Type)) +
geom_col() +
scale_fill_chris()
You could make a custom plot function (including scale_fill_manual and reasonable default colours) in order to avoid repeating code:
library(ggplot2)
custom_plot <- function(.data,
colours = c("A" = "green", "B" = "blue", "C" = "red", "D" = "grey")) {
ggplot(.data, aes(x=Type, y=Value, fill= Type)) + geom_bar(stat="identity") +
scale_fill_manual(values = colours)
}
df1 <- data.frame(Value=c(40, 20, 10, 60), Type=c("A", "B", "C", "D"))
df2 <- data.frame(Value=c(40, 20, 60), Type=c("A", "B", "D"))
df3 <- data.frame(Value=c(40, 20, 10, 60), Type=c("A", "B", "C", "D"))
df3$Type <- factor(df3$Type, levels=c("D", "C", "B", "A"), ordered=TRUE)
custom_plot(df1)
custom_plot(df2)
custom_plot(df3)
Another options is to make drop = F the default by defining the default colour scales as follows:
scale_colour_discrete <- function(...)
scale_colour_manual(..., drop = F)
scale_fill_discrete <- function(...)
scale_fill_manual(..., drop = F)
That way colours are always consistent for different factors.
make sure you convert that column into Factor first and then create a variable to store the color value for each factor...
df$color <- as.factor(df$color, levels = c(1, 0))
cbPallete <- c("1"= "green", "0"="red")
ggplot(data = df) + geom_bar(x = df$x,
y = df$y,
fill = df$color) +
scale_fill_manual(values = cbPallete)

ggplot: define color for point overlaps

With ggplot2 I want to plot two vectors (vec1_num, vec2_num) in two dimensions and colour the points by a group variable (vec3_char). Some data points are overlapping.
library(ggplot2)
vec1_num = c(1,2,3,4,1,3,4,5,5,5)
vec2_num = c(1,2,3,4,1,3,4,5,5,5)
vec3_char = c("A", "B", "C", "A", "B", "C", "C", "A", "B", "C")
# plot 1
ggplot(data = NULL) +
geom_point(aes(x=vec1_num, y=vec2_num, colour=vec3_char), alpha=0.4, size=4) +
scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "C"="orange")) +
theme(panel.grid = element_blank())
I know I can attenuate the overlap by reducing alpha or working with geom_jitter adding a bit of noise. Like this:
# plot 2
ggplot(data = NULL) +
geom_jitter(aes(x=vec1_num, y=vec2_num, colour=vec3_char), alpha=0.4, size=4, width = 0.1) +
scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "C"="orange")) +
theme(panel.grid = element_blank())
However, is it possible to make use of plot 1 but colour the overlapping points differently? So that, for example, "A" = "darkblue, "AB" = "black", "ABC" = "grey", "B" = "darkred", "BC" = "pink", "C"="orange"? And can I additionally add a small Venn Diagram (legend) that visualises the color choice for the point overlap?
Thanks!
My way of doing this would be to convert the letters into numbers, sum them and covert back into letters.
NB The one complication is that the letters need to be A, B, D, H, ... so there is only one way of making each number combination. Though there is probably a way to start with A, B, C, ... and encode for unique values
library(tidyverse)
vec1_num = c(1,2,3,4,1,3,4,5,5,5)
vec2_num = c(1,2,3,4,1,3,4,5,5,5)
vec3_char = c("A", "B", "D", "A", "B", "D", "D", "A", "B", "D")
removeDup <- function(str) paste(rle(strsplit(str, "")[[1]])$values, collapse="") # Function to remove duplicated values in a string
data <- data.frame(x = vec1_num, y = vec2_num, col = match(vec3_char, LETTERS))
data <- data %>%
group_by(x) %>%
mutate(colour = glue::glue_collapse(col, sep = "")) %>%
select(-col) %>%
distinct(x, y, .keep_all = TRUE) %>%
mutate(colour = removeDup(colour)) %>%
mutate(colour = sapply(str_extract_all(colour, '\\d'), function(x) sum(as.integer(x)))) %>%
mutate(colour = case_when(
colour == 1 ~ "A",
colour == 2 ~ "B",
colour == 3 ~ "AB",
colour == 4 ~ "D",
colour == 5 ~ "AD",
colour == 6 ~ "BD",
colour == 7 ~ "ABD"
))
# plot 1
ggplot(data) +
geom_point(aes(x=x, y=y, colour = as_factor(colour)), alpha=0.4, size=4) +
geom_text(aes(x = x, y = y, label = colour), vjust = 2) +
scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "AB"="orange", "D" = "green", "AD" = "black", "BD" = "orange", "ABD" = "purple"), name = "Colour") +
theme(panel.grid = element_blank())
.
I would firstly create a dataframe. Then I would extract for every x y combination (list(df$vec1_num, df$vec2_num)) what characters are present (...unique(xy_i$vec3_char)...). Like this:
df <- data.frame(vec1_num, vec2_num, vec3_char)
df_new <- do.call("rbind.data.frame", by(df, list(df$vec1_num, df$vec2_num), function(xy_i){
chars_i <- paste0(sort(unique(xy_i$vec3_char)),collapse= "")
xy_i$chars_comb <- factor(chars_i, levels= c("A", "AB", "AC", "ABC", "B", "BC", "C"))
xy_i
}))
If you now make the plot it shows you what characters overlap at which point.
ggplot(data = df_new) +
geom_point(aes(x=vec1_num, y=vec2_num, colour=chars_comb), alpha=0.4, size=4) +
scale_colour_manual(values=c("AB" = "black", "ABC" = "grey", "B" = "darkred", "C"="orange", "AC"= "red")) +
theme(panel.grid = element_blank())

How do I keep all the legends in ggplot

How do I keep all the legends in plot?
Below is an example of scatter plot. It prints only if x > 0.5. But I want to show all the legends.
library(tidyverse)
for (iter in 1:5)
{
# generate 5 random points
tbl <- tibble(x = runif(5),
y = runif(5),
class = c("A", "B", "C", "D", "E"))
# print if x > 0.5
p <- ggplot(data = tbl %>% filter(x > 0.5),
aes(x = x,
y = y,
color = class)) +
geom_point(size = 5) +
scale_fill_manual(labels = c("A", "B", "C", "D", "E"),
values = c("Grey", "Red", "Green", "Blue", "Yellow"),
drop = FALSE) +
theme_bw() +
theme(aspect.ratio = 1) +
xlim(0, 1) +
ylim(0, 1)
ggsave(p,
filename = paste0(iter, ".png"))
}
You can do it if:
you set the class variable to factor
use scale_colour_manual instead of scale_fill_manual. If you want to use the default colour from the ggplot palette you can use scale_colour_descrete, as in my code.
library(tidyverse)
set.seed(1) # for reproducibility
plots <- lapply(1:5, function(iter){
# generate 5 random points
tbl <- tibble(x = runif(5),
y = runif(5),
class = factor(c("A", "B", "C", "D", "E")))
# print if x > 0.5
p <- ggplot(data = tbl %>% filter(x > 0.5),
aes(x = x,
y = y,
color = class)) +
geom_point(size = 5) +
scale_colour_discrete(drop = FALSE) +
theme_bw() +
theme(aspect.ratio = 1) +
xlim(0, 1) +
ylim(0, 1)
ggsave(p, filename = paste0(iter, ".png"))
p
})
# visualize them all together
cowplot::plot_grid(plotlist = plots)
PS: I've used lapply instead of a for loop, usually it is more appreciated by R users when possible.
Instead of filtering out the data you can change the values to NA:
library(tidyverse)
for (iter in 1:5)
{
# generate 5 random points
tbl <- tibble(x = runif(5),
y = runif(5),
class = c("A", "B", "C", "D", "E"))
# print if x > 0.5
p <- ggplot(data = tbl %>% mutate(y = if_else(x > 0.5, y, NA_real_)),
aes(x = x,
y = y,
color = class)) +
geom_point(size = 5) +
scale_fill_manual(labels = c("A", "B", "C", "D", "E"),
values = c("Grey", "Red", "Green", "Blue", "Yellow"),
drop = FALSE) +
theme_bw() +
theme(aspect.ratio = 1) +
xlim(0, 1) +
ylim(0, 1)
ggsave(p,
filename = paste0(iter, ".png"))
}

ggplot2: Fix colors to factor levels

I'm working on a larger project for which I am creating several plots in ggplot2. The plots are concerned with plotting several different outcomes across several different discreet categories (think: countries, species, types). I would like to completely fix the mapping of discrete types to colors such that Type=A is always displayed in red, Type=B is always displayed in blue, and so on across all plots irrespective of what other factors are present. I know about scale_fill_manual() where I can provide color values manually and then work with drop = FALSE which helps in dealing with unused factor levels. However, I find this extremely cumbersome since every plot will need some manual work to deal with sorting the factors in the right way, sorting color values to match factor sorting, dropping unused levels, etc.
What I am looking for is a way where I can map once and globally factor levels to specific colors (A=green, B=blue, C=red, ...) and then just go about plotting whatever I please and ggplot picking the right colors.
Here is some code to illustrate the point.
# Full set with 4 categories
df1 <- data.frame(Value = c(40, 20, 10, 60),
Type = c("A", "B", "C", "D"))
ggplot(df1, aes(x = Type, y = Value, fill = Type)) + geom_bar(stat = "identity")
# Colors change complete because only 3 factor levels are present
df2 <- data.frame(Value = c(40, 20, 60),
Type = c("A", "B", "D"))
ggplot(df2, aes(x = Type, y = Value, fill = Type)) + geom_bar(stat = "identity")
# Colors change because factor is sorted differently
df3 <- data.frame(Value = c(40, 20, 10, 60),
Type = c("A", "B", "C", "D"))
df3$Type <- factor(df3$Type, levels = c("D", "C", "B", "A"), ordered = TRUE)
ggplot(df3, aes(x = Type, y = Value, fill = Type)) + geom_bar(stat = "identity")
You could define your own custom scale, if you like. If you look at the source for scale_fill_manual,
scale_fill_manual
#> function (..., values)
#> {
#> manual_scale("fill", values, ...)
#> }
#> <environment: namespace:ggplot2>
it's actually quite simple:
library(ggplot2)
scale_fill_chris <- function(...){
ggplot2:::manual_scale(
'fill',
values = setNames(c('green', 'blue', 'red', 'orange'), LETTERS[1:4]),
...
)
}
df1 <- data.frame(Value = c(40, 20, 10, 60),
Type = c("A", "B", "C", "D"))
ggplot(df1, aes(x = Type, y = Value, fill = Type)) +
geom_col() +
scale_fill_chris()
df2 <- data.frame(Value = c(40, 20, 60),
Type = c("A", "B", "D"))
ggplot(df2, aes(x = Type, y = Value, fill = Type)) +
geom_col() +
scale_fill_chris()
df3 <- data.frame(Value = c(40, 20, 10, 60),
Type = c("A", "B", "C", "D"))
df3$Type <- factor(df3$Type, levels = c("D", "C", "B", "A"), ordered = TRUE)
ggplot(df3, aes(x = Type, y = Value, fill = Type)) +
geom_col() +
scale_fill_chris()
You could make a custom plot function (including scale_fill_manual and reasonable default colours) in order to avoid repeating code:
library(ggplot2)
custom_plot <- function(.data,
colours = c("A" = "green", "B" = "blue", "C" = "red", "D" = "grey")) {
ggplot(.data, aes(x=Type, y=Value, fill= Type)) + geom_bar(stat="identity") +
scale_fill_manual(values = colours)
}
df1 <- data.frame(Value=c(40, 20, 10, 60), Type=c("A", "B", "C", "D"))
df2 <- data.frame(Value=c(40, 20, 60), Type=c("A", "B", "D"))
df3 <- data.frame(Value=c(40, 20, 10, 60), Type=c("A", "B", "C", "D"))
df3$Type <- factor(df3$Type, levels=c("D", "C", "B", "A"), ordered=TRUE)
custom_plot(df1)
custom_plot(df2)
custom_plot(df3)
Another options is to make drop = F the default by defining the default colour scales as follows:
scale_colour_discrete <- function(...)
scale_colour_manual(..., drop = F)
scale_fill_discrete <- function(...)
scale_fill_manual(..., drop = F)
That way colours are always consistent for different factors.
make sure you convert that column into Factor first and then create a variable to store the color value for each factor...
df$color <- as.factor(df$color, levels = c(1, 0))
cbPallete <- c("1"= "green", "0"="red")
ggplot(data = df) + geom_bar(x = df$x,
y = df$y,
fill = df$color) +
scale_fill_manual(values = cbPallete)

Using multiple colors within a label

In the chart below, I'm looking to have each triangle in the label for subgroup A match the color of the subgroup they are referring to (green, blue, purple). Is this possible at all?
library(tibble)
library(dplyr)
library(ggplot2)
library(scales)
example_tibble <- tibble(Subgroup = c("A", "B", "C", "D"),
Result = c(0.288, 0.204, 0.206, 0.182),
A_vs_B = rep(1, 4),
A_vs_C = rep(1, 4),
A_vs_D = rep(1, 4))
ggplot(example_tibble, aes(x = Subgroup, y = Result, fill = Subgroup)) +
geom_bar(stat = "identity") + geom_text(aes(label =
paste0(percent(Result),
if_else(A_vs_B == 1 & Subgroup == "A", sprintf("\u25b2"), ""),
if_else(A_vs_C == 1 & Subgroup == "A", sprintf("\u25b2"), ""),
if_else(A_vs_D == 1 & Subgroup == "A", sprintf("\u25b2"), "")),
colour = Subgroup), hjust = -.25) +
coord_flip() + scale_y_continuous(limits = c(0,.5), labels = percent)

Resources