How to display p-values above boxplots on exponential (log10) y-axis? - r

I have a data frame with three groups (group1, group2, group3). I would like to show the p-value of their mean comparisons in ggplot2 which I can do however, the values are stacked ontop of one another making it difficult to see what is being compared. When I try to adjust where the p-values are located using the y_position() function, the boxplots collapse (I think because the y-axis is log10) but the p-values are no longer stacked ontop of one another. How can I keep the boxplots from collapsing and keep the p-values displayed so that you can see what is being compared?
Example data
library(ggplot2)
library(dplyr)
library(ggsignif)
df <- data.frame(matrix(ncol = 2, nrow = 30))
colnames(df)[1:2] <- c("group", "value")
df$group <- rep(c("group1","group2","group3"), each = 10)
df[1:10,2] <- rexp(10, 1/10)
df[11:20,2] <- rexp(10, 1/100)
df[21:30,2] <- rexp(10, 1/900)
# Need to say what should be compared for p-value determination
my_comparisons <- list(c("group1", "group2"),
c("group1", "group3"),
c("group2", "group3"))
Boxplots showing the distribution of value for each group however the p-values are ontop of one another so you cannot compare among groups.
df %>%
mutate(group = factor(group, levels = c("group3","group2","group1"))) %>%
ggplot(aes(x = group, y = value)) +
geom_signif(comparisons = my_comparisons,
map_signif_level = function(x) paste("p =", scales::pvalue(x))) +
scale_y_log10() +
geom_boxplot(outlier.colour="white", outlier.fill = "white", outlier.shape = 1, outlier.size = 0) +
geom_jitter(shape=1, position=position_jitter(0.2), color = "black", fill = "white", size = 2) +
labs(x = "",
y = "value") +
theme_bw() +
theme(axis.text.x = element_text(size = 16, color = "black"),
axis.text.y = element_text(size = 16, color = "black"),
axis.title = element_text(size = 16, color = "black"),
axis.title.x = element_text(vjust = -0.5),
panel.grid = element_blank(),
panel.background = element_blank())
Adjusting the y_position() of where the p-values should display but this collapses the y-axis. I have tried several values within y_position.
df %>%
mutate(group = factor(group, levels = c("group3","group2","group1"))) %>%
ggplot(aes(x = group, y = value)) +
geom_signif(y_position = c(2000,1800,1600),
comparisons = my_comparisons,
map_signif_level = function(x) paste("p =", scales::pvalue(x))) +
scale_y_log10() +
geom_boxplot(outlier.colour="white", outlier.fill = "white", outlier.shape = 1, outlier.size = 0) +
geom_jitter(shape=1, position=position_jitter(0.2), color = "black", fill = "white", size = 2) +
labs(x = "",
y = "value") +
theme_bw() +
theme(axis.text.x = element_text(size = 16, color = "black"),
axis.text.y = element_text(size = 16, color = "black"),
axis.title = element_text(size = 16, color = "black"),
axis.title.x = element_text(vjust = -0.5),
panel.grid = element_blank(),
panel.background = element_blank())

For some reason this parameter ignores the axis transformation. You therefore need to use the log10 values of the desired positions:
df %>%
mutate(group = factor(group, levels = c("group3","group2","group1"))) %>%
ggplot(aes(x = group, y = value)) +
geom_signif(comparisons = my_comparisons,
y_position = log10(c(5000, 10000, 25000)),
map_signif_level = function(x) paste("p =", scales::pvalue(x))) +
scale_y_log10() +
geom_boxplot(outlier.colour="white", outlier.fill = "white",
-outlier.shape = 1, outlier.size = 0) +
geom_jitter(shape=1, position=position_jitter(0.2), color = "black",
fill = "white", size = 2) +
labs(x = "",
y = "value") +
theme_bw() +
theme(axis.text.x = element_text(size = 16, color = "black"),
axis.text.y = element_text(size = 16, color = "black"),
axis.title = element_text(size = 16, color = "black"),
axis.title.x = element_text(vjust = -0.5),
panel.grid = element_blank(),
panel.background = element_blank())

Related

Add "p = " in front of geom_boxplot p-value in ggplot2

P-values can be added to ggplot2 figures using the function ggpubr::stat_compare_mean(). However I cannot get the text "p = " to show up in front of the p-values. There are examples of how to add "p = " in front of p-values on the help page for the function but they do not seem to work.
Example
library(ggplot2)
library(ggpubr)
library(dplyr)
data("Cars93")
# List of the comparisons I would like to make for which p-values will be derived
my_comparisons <- list(c("Front", "Rear"),
c("Front", "4WD"),
c("Rear", "4WD"))
# creates the figure with p-value but no label indicating the values are p-values
Cars93 %>%
mutate(DriveTrain = factor(DriveTrain, levels = c("Front","Rear","4WD"))) %>%
ggplot(aes(x = DriveTrain, y = Price)) +
stat_compare_means(paired = F,
comparisons = my_comparisons) +
geom_boxplot(outlier.colour="white", outlier.fill = "white", outlier.shape = 1, outlier.size = 0) +
geom_jitter(shape=1, position=position_jitter(0.2), color = "black", fill = "white", size = 2) +
theme_bw() +
theme(axis.text.x = element_text(size = 16, color = "black"),
axis.text.y = element_text(size = 16, color = "black"),
axis.title = element_text(size = 16, color = "black"),
axis.title.x = element_text(vjust = -0.5),
panel.grid = element_blank(),
panel.background = element_blank())
following the example at the bottom of the ?stat_compare_means page suggests using aes(label = paste0("p = ", ..p.format..) which does not work.
?stat_compare_means
Cars93 %>%
mutate(DriveTrain = factor(DriveTrain, levels = c("Front","Rear","4WD"))) %>%
ggplot(aes(x = DriveTrain, y = Price)) +
stat_compare_means(paired = F,
comparisons = my_comparisons,
aes(label = paste0("p = ", ..p.format..))) +
geom_boxplot(outlier.colour="white", outlier.fill = "white", outlier.shape = 1, outlier.size = 0) +
geom_jitter(shape=1, position=position_jitter(0.2), color = "black", fill = "white", size = 2) +
theme_bw() +
theme(axis.text.x = element_text(size = 16, color = "black"),
axis.text.y = element_text(size = 16, color = "black"),
axis.title = element_text(size = 16, color = "black"),
axis.title.x = element_text(vjust = -0.5),
panel.grid = element_blank(),
panel.background = element_blank())
If you look at the label argument on the ?stat_compare_means help page it says the allowed values include "p.signif" or "p.format" which made me think ..p.format.. was deprecated, so I tried adding in "p.format" which also did not work.
Cars93 %>%
mutate(DriveTrain = factor(DriveTrain, levels = c("Front","Rear","4WD"))) %>%
ggplot(aes(x = DriveTrain, y = Price)) +
stat_compare_means(paired = F,
comparisons = my_comparisons,
aes(label = paste0("p = ", "p.format"))) +
geom_boxplot(outlier.colour="white", outlier.fill = "white", outlier.shape = 1, outlier.size = 0) +
geom_jitter(shape=1, position=position_jitter(0.2), color = "black", fill = "white", size = 2) +
theme_bw() +
theme(axis.text.x = element_text(size = 16, color = "black"),
axis.text.y = element_text(size = 16, color = "black"),
axis.title = element_text(size = 16, color = "black"),
axis.title.x = element_text(vjust = -0.5),
panel.grid = element_blank(),
panel.background = element_blank())
In the end I would like the p-values to be preceded by p = such that the labels would say p = 0.00031, p = 0.059, and p = 0.027.
When you use a list of comparisons, stat_compare_means defaults to using geom_signif from the ggsignif package, essentially acting as a glorified wrapper function. In so doing, you lose some of the formatting flexibility. Better in this case to use geom_signif directly:
library(ggsignif)
Cars93 %>%
mutate(DriveTrain = factor(DriveTrain, levels = c("Front","Rear","4WD"))) %>%
ggplot(aes(x = DriveTrain, y = Price)) +
geom_signif(y_position = c(55, 60, 65),
comparisons = my_comparisons,
map_signif_level = function(x) paste("p =", scales::pvalue(x))) +
geom_boxplot(outlier.colour="white", outlier.fill = "white",
outlier.shape = 1, outlier.size = 0) +
geom_jitter(shape=1, position=position_jitter(0.2),
color = "black", fill = "white", size = 2) +
theme_bw() +
theme(axis.text.x = element_text(size = 16, color = "black"),
axis.text.y = element_text(size = 16, color = "black"),
axis.title = element_text(size = 16, color = "black"),
axis.title.x = element_text(vjust = -0.5),
panel.grid = element_blank(),
panel.background = element_blank())

Dynamic midpoint in ggplot2's scale_fill_gradient2

I'm making a heatmap in R using ggplot2 and I want to dynamically change the value of midpoint for scale_fill_gradient2. I want the midpoint for every row to be the maximum of v1 and v2.
Here's the original plot and data:
library(ggplot2)
set.seed(1L)
s = sprintf("d%s", 1:9)
vars = sprintf("v%s", 1:6)
data = data.frame(s = rep(s, 6), stringsAsFactors = FALSE)
data$variable = rep(vars, rep.int(9, 6))
data$variable = as.factor(data$variable)
data$value = round(runif(54, min=-100, max=100), 1)
pdf(save)
heatmap = ggplot(data = data, aes(x = variable, y = s, fill = value)) +
geom_tile(color = "black", aes(width = 1)) +
scale_fill_gradient2(low = cbbPalette$pink, high = cbbPalette$green, mid = cbbPalette$grey,
midpoint = 0, space = "Lab",
name = title) +
scale_color_discrete("exps", data$variable) +
theme_minimal() +
theme(axis.text.x = element_text(vjust = 1,
size = title.size), legend.title = element_blank(),
axis.text.y = element_text(size = title.size),
strip.text.x = element_text(size = title.size)) +
coord_fixed()
#add numbers to cells
heatmap = heatmap + geom_text(aes(x = variable, y = s, label = value), color = cbbPalette$black, size = 3) +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.major = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.ticks = element_blank(),
legend.justification = c(0.5, 0),
legend.direction = "horizontal",
legend.position = "top") +
guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
title.position = "top", title.hjust = 0.5))
# Print the heatmap
print(heatmap)
dev.off()
I tried to change midpoint by taking max of v1 and v2 but that affects all rows instead each row separately.
scale_fill_gradient2(low = cbbPalette$pink, high = cbbPalette$green, mid = cbbPalette$grey,
midpoint = data[data$variable == "v1", "value"], space = "Lab",
name = title)
Scales don't really work that way, as they map a range of values to a set of colours. Consequentially, a particular colour means a particular value for the whole plot. My best advice would be to pre-normalise the data by subtracting the max of v1/v2. See example in code below (there were a few variables in your example but not in the shared code which I've subsituted).
library(ggplot2)
library(tidyverse)
set.seed(1L)
s = sprintf("d%s", 1:9)
vars = sprintf("v%s", 1:6)
data = data.frame(s = rep(s, 6), stringsAsFactors = FALSE)
data$variable = rep(vars, rep.int(9, 6))
data$variable = as.factor(data$variable)
data$value = round(runif(54, min=-100, max=100), 1)
new_data <- data %>% group_by(s) %>%
mutate(value = value - max(value[variable %in% c("v1", "v2")]))
ggplot(data = new_data, aes(x = variable, y = s, fill = value)) +
geom_tile(color = "black", aes(width = 1)) +
scale_fill_gradient2(low = "pink", high = "green", mid = "grey",
midpoint = 0, space = "Lab",
name = "title") +
scale_color_discrete("exps", data$variable) +
theme_minimal() +
coord_fixed()

Center geom_text on ggplot while controlling bar width

I am trying to make a horizontal bar chart in ggplot2 where the bars are of equal width and with text labels centered on the bars. There are two groups on the y axis -- one with 2 bars, and one with three.
There are a lot of similar questions on SO that address both of these issues, but I haven't been able to fix one without breaking the other. Here's my data:
## data
df <- tibble(var1 = c("a", "b", "b", "c", "c"),
var2 = c("x", "y", "x", "y", "x"),
proportion = c(100, 33.3, 66.7, 66.7, 33.3)) %>%
mutate(var1 = factor(var1, levels = var1_order))
var1_order <- c("a", "c", "b")
Here's an example where the widths are good, but the labels of the y group are off:
## labels bad
df %>%
ggplot(aes(x = proportion, y = var2, fill = var1,
label = paste0(round(proportion, 1), "%"))) +
geom_col(position = position_dodge2(preserve = "single", padding = 0), width = .9) +
geom_text(size = 3, position = position_dodge2(width = 0.9), hjust = -.5,
color = "black", aes(group = var1)) +
scale_fill_manual(name = "", values = c("#093D6E","#5D8AA8", "#00918B",
"#F8AF54", "#CD9575")) +
labs(x = NULL) +
theme(axis.ticks = element_blank(),
axis.title.y = element_blank(),
axis.line=element_blank(),
axis.text.x = element_blank(),
panel.background = element_blank(),
strip.text = element_text(size = 7, face = "bold")) +
scale_x_continuous(expand = c(.2, .2)) +
guides(fill = guide_legend(reverse = TRUE))
And here's an example where the labels are good but the widths are now off:
## col widths bad
df %>%
ggplot(aes(x = proportion, y = var2, fill = var1,
label = paste0(round(proportion, 1), "%"))) +
geom_col(position = position_dodge(width = 0.9)) +
geom_text(size = 3, position = position_dodge(width = 0.9), hjust = -.5,
color = "black", aes(group = var1)) +
scale_fill_manual(name = "", values = c("#093D6E","#5D8AA8", "#00918B",
"#F8AF54", "#CD9575")) +
labs(x = NULL) +
theme(axis.ticks = element_blank(),
axis.title.y = element_blank(),
axis.line=element_blank(),
axis.text.x = element_blank(),
panel.background = element_blank(),
strip.text = element_text(size = 7, face = "bold")) +
scale_x_continuous(expand = c(.2, .2)) +
guides(fill = guide_legend(reverse = TRUE))
Note that this will be part of a parameterized report, so it needs to be capable of dealing with different numbers of var1 and var2 groups. Thanks!
Try this approach. You can use position_dodge2() to keep uniform bars. Here the code:
library(ggplot2)
#Code
df %>%
ggplot(aes(x = proportion, y = var2, fill = var1,
label = paste0(round(proportion, 1), "%"))) +
geom_col(position = position_dodge2(preserve = 'single',width = 0.9)) +
geom_text(size = 3, position = position_dodge2(preserve = 'single',width = 0.9), hjust = -.5,
color = "black", aes(group = var1)) +
scale_fill_manual(name = "", values = c("#093D6E","#5D8AA8", "#00918B",
"#F8AF54", "#CD9575")) +
labs(x = NULL) +
theme(axis.ticks = element_blank(),
axis.title.y = element_blank(),
axis.line=element_blank(),
axis.text.x = element_blank(),
panel.background = element_blank(),
strip.text = element_text(size = 7, face = "bold")) +
scale_x_continuous(expand = c(.2, .2)) +
guides(fill = guide_legend(reverse = TRUE))
Output:

Partial italics in facet headings of ggplot

I am wondering if there is any way to rename facet titles so that they contain partial italics and partial non-italics.
Here is some toy data
library(Hmisc)
library(dplyr)
# Plot power vs. n for various odds ratios
n <- seq(10, 1000, by=10) # candidate sample sizes
OR <- as.numeric(sort(c(seq(1/0.90,1/0.13,length.out = 9),2.9))) # candidate ORs
alpha <- c(.001, .01, .05) # alpha significance levels
# put all of these into a dataset and calculate power
powerDF <- data.frame(expand.grid(OR, n, alpha)) %>%
rename(OR = Var1, num = Var2, alph = Var3) %>%
arrange(OR) %>%
mutate(power = as.numeric(bpower(p1=.29, odds.ratio=OR, n=num, alpha = alph))) %>%
transform(OR = factor(format(round(OR,2),nsmall=2)),
alph = factor(ifelse(alph == 0.001, "p=0.001",
ifelse(alph == 0.01, "p=0.01", "p=0.05"))))
pPower <- ggplot(powerDF, aes(x = num, y = power, colour = factor(OR))) +
geom_line() +
facet_grid(factor(alph)~.) +
labs(x = "sample size") +
scale_colour_discrete(name = "Odds Ratio") +
scale_x_continuous(breaks = seq(0,1000,100)) +
scale_y_continuous(breaks = seq(0,1,.1), sec.axis = sec_axis(trans=I, breaks=NULL, name="Significance Level")) + # this is the second axis label
theme_light() +
theme(axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold"),
axis.text = element_text(size = 11),
panel.grid.minor = element_blank(),
panel.grid.major.y = element_line(colour = "gray95"),
panel.grid.major.x = element_line(colour = "gray95"),
strip.text = element_text(colour = 'black', face = 'bold', size = 12),
legend.text = element_text(size = 12),
legend.title = element_text(size = 12, face = "bold"))
pPower
Is there any way to get the facet headings to read "p=0.001", "p=0.01" etc, instead of "p=0.001", i.e. to get partial italics and partial non-italics?

ggplot facet_wrap as_labeller does not display the new sequence

I created a vector of ordered names and tried to replace each panel title with the ordered one (e.g., Jessie with 1. Jessie, Marion with 2.Marion, etc.). But, I am getting NAs for each panel title instead. Any hints what is going wrong.
With the NAs
With the labeller commented out
list.top.35.names.ordered <- data.frame( cbind(order = c(1:35),list.top.35.names)) %>%
unite( name.new, c("order" ,"list.top.35.names"), sep = ".")
list.top.35.names.ordered <- list.top.35.names.ordered$name.new[1:35]
str(list.top.35.names.ordered)
chr [1:35] "1.Jessie" "2.Marion" "3.Jackie" "4.Alva" "5.Trinidad" "6.Ollie" ...
data.babyname.all %>%
ggplot( mapping = aes(x = year, y = perc, fill = sex)) +
geom_density(stat = "identity", position = "stack" , show.legend = F ) +
facet_wrap(~name, ncol= 7, nrow =5, labeller= as_labeller(list.top.35.names.ordered)) +
scale_fill_manual(values = c('#E1AEA1','#9ABACF')) +
geom_point(data = most.unisex.year.and.value, mapping = aes(x = year, y = perc),
size = 3,
fill = "white",
color = "black",
shape = 21) +
scale_y_continuous(breaks = c(0,.50,1), labels= c("0%", "50%","%100")) +
scale_x_continuous(breaks = c(1940, 1960, 1980,2000), labels= c('1940', "'60","'80",'2000')) +
geom_text(mapping = aes(x =x , y = y , label = label), check_overlap = F, na.rm = T, position = position_dodge(width=.9), size=3) +
theme_minimal() + #set theme
theme(
text = element_text(size = 10),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid = element_blank(),
panel.border = element_blank(),
plot.background = element_blank(),
axis.ticks.x = element_line(color = "black"),
axis.ticks.length =unit(.2,'cm'),
strip.text = element_text(size = 10, margin = margin(l=10, b = .1)))

Resources