P values missing in faceted ggviolin plot (R, ggplot2) - r

I have this code:
library(ggplot2)
library(ggpubr)
df <- ToothGrowth
df$dose <- as.factor(df$dose)
df$group <- c(rep(c("grp1", "grp2"), 5), rep(c("grp1", "grp2", "grp3"), 6),
rep(c("grp1", "grp2"), 6), rep(c("grp1", "grp2", "grp3"), 6), "grp2", "grp3")
plot <- ggviolin(df, x = "group", y = "len", fill = "group",
width = 0.8, alpha = 0.5, draw_quantiles = c(0.25, 0.5, 0.75), facet.by = 'dose') +
scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) +
stat_compare_means(comparisons = list(c("grp1","grp2"),c("grp2","grp3")),
label = "p.format")
In the first facet, the p value comparing grp1 and grp2 is missing, although it can be calculated. I think this is because grp3 has no data, but how can I get it to show up? Importantly, my real data have many more facets, so I would like a solution that works across facets rather than making adjustments to specific facets, which I have found solutions for. Thank you.

Not sure if this answers your question. It shows the missing comparisons.
plot <- ggviolin(df,
x = "group", y = "len", fill = "group",
width = 0.8, alpha = 0.5, draw_quantiles = c(0.25, 0.5, 0.75), facet.by = "dose"
) +
scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) +
stat_compare_means(
comparisons = list(c("grp1", "grp2"), c("grp2", "grp3")),
label = "p.format"
) +
stat_compare_means(
comparisons = list(c("grp1", "grp2")),
label = "p.format"
)

Related

add pvalue bars to facet plot with "fill" sub-group

I'm looking for a solution since too much time without finding it, so it's time to ask for some help...
I would like to add pValue to boxplots organized with facet_wrap (ggplot2). Similar to what you obtain with the script I add to this post (the first part of the script is the exemple of what I want and it's working well for 1 plot, the second part is related to facet and doesn't work).
I would like to add pvalue between all "dose" values of "OJ", same for "VC", but also between, for exemple "dose"=1 of OJ and VC (as in the plot). It's working well for 1 plot, but not in facet_wrap. The error message is:
Error: Assigned data value must be compatible with existing data.
x Existing data has 6 rows.
x Assigned data has 60 rows.
ℹ Only vectors of size 1 are recycled.
Thanks for your help (if only...)
The script:
################# DATAFRAME
data("ToothGrowth")
df <- ToothGrowth
vec <- c("A","B")
df$dose <- as.character(df$dose)
df$facet <- rep(sample(vec, 2),replace=T, nrow(df)/2)
view(df)
################### STAT
df_pval <- df %>%
rstatix::group_by(dose) %>%
rstatix::wilcox_test(len ~ supp) %>%
rstatix::add_xy_position()
df_pval2 <- df %>%
rstatix::group_by(supp) %>%
rstatix::wilcox_test(len ~ dose) %>%
rstatix::add_xy_position(x = "supp", dodge = 0.8)
################### PLOT
plotx <- ggplot(df, aes(x = supp, y = len)) +
geom_boxplot(aes(fill = dose)) +
stat_pvalue_manual(df_pval,
label = "{p}",
color = "dose",
fontface = "bold",
step.group.by = "dose",
step.increase = 0.1,
tip.length = 0,
bracket.colour = "black",
show.legend = FALSE) +
stat_pvalue_manual(df_pval2,
label = "{p}",
color = "black",
fontface = "bold",
step.group.by = "supp",
step.increase = 0.1,
tip.length = 0,
bracket.colour = "black",
show.legend = FALSE)
plot(plotx)
################### STAT FACET
df_pval3 <- df %>%
rstatix::group_by(dose, facet) %>%
rstatix::wilcox_test(len ~ supp) %>%
rstatix::add_xy_position()
df_pval4 <- df %>%
rstatix::group_by(supp, facet) %>%
rstatix::wilcox_test(len ~ dose) %>%
rstatix::add_xy_position(x = "supp", dodge = 0.8)
print(df_pval)
print(df_pval2)
###################### PLOT FACET
ploty <- ggplot(df, aes(x = supp, y = len)) +
geom_boxplot(aes(fill = dose)) +
facet_wrap(~df[,4]) + stat_pvalue_manual(df_pval3,
label = "{p}",
color = "dose",
fontface = "bold",
step.group.by = "dose",
step.increase = 0.1,
tip.length = 0,
bracket.colour = "black",
show.legend = FALSE) +
stat_pvalue_manual(df_pval4,
label = "{p}",
color = "black",
fontface = "bold",
step.group.by = "supp",
step.increase = 0.1,
tip.length = 0,
bracket.colour = "black",
show.legend = FALSE)
plot(ploty)

got an error after trying to add pvalues manually to a boxplot in ggplot R

I am trying to add pvalues, calculated somewhere else, to my boxplots. The boxplot work just fine before trying to add the pvalues with the function: stat_pvalue_manual. My code is as follows:
df <- data.frame(
method = c(rep('JXD', 100), rep('ILL', 100),rep('NP', 100) ),
value = c((runif(100, min=400, max=800)), runif(100, min=500, max=850), runif(100, min=900, max=1500))
)
ggplot(df, aes(method, value, fill = method)) + # This is the plot function
geom_violin() +
geom_boxplot(width=0.2, fill="white", alpha = 0.1) +
labs(x = "Method", fill="Method")
After this I am trying to add p values from other program:
stat.test <- tibble::tribble(
~group1, ~group2, ~p.adj, ~p.signif,
"ILL", "JXD", 6.466374e-01, 'n.s',
"ILL", "NP", 5.301167e-50, '****'
)
ggplot(df, aes(method, value, fill = method)) + # This is the plot function
geom_violin() +
geom_boxplot(width=0.2, fill="white", alpha = 0.1) +
labs(x = "Method", fill="Method") +
stat_pvalue_manual(
stat.test,
y.position = 900, step.increase = 1,
label = "p.adj"
)
But got the following error:
Error in FUN(X[[i]], ...) : object 'method' not found
I tried using the function ggboxplot instead, and it worked fine by putting between quotation marks 'method', which does not work with the function ggplot. However using the former I cannot get the figure that I want.
g <- ggboxplot(df, x = "method", y = "value", width = 0.8)
g+ stat_pvalue_manual(
stat.test,
y.position = 900, step.increase = 0.7,
label = "p.signif"
I do not understand what is wrong.
Thanks a lot!
The issue is that you specified fill=method as a global aesthetic. Hence, in stat_pvalue_manual is looking for a column names method in your dataframe stat.test too.
To solve this issue make fill=method a local aesthetic of geom_violin:
library(ggplot2)
library(ggpubr)
df <- data.frame(
method = c(rep("JXD", 100), rep("ILL", 100)),
value = c((runif(100, min = 400, max = 800)), runif(100, min = 500, max = 850))
)
stat.test <- tibble::tribble(
~group1, ~group2, ~p.adj, ~p.signif,
"ILL", "JXD", 6.466374e-01, "n.s",
"ILL", "NP", 5.301167e-50, "****"
)
ggplot(df, aes(method, value)) + # This is the plot function
geom_violin(aes(fill = method)) +
geom_boxplot(width = 0.2, fill = "white", alpha = 0.1) +
labs(x = "Method", fill = "Method") +
stat_pvalue_manual(
stat.test,
y.position = 900, step.increase = 1,
label = "p.adj"
)

Geom_bar_pattern not treating x-axis categories as different

take the following data
df <- data.frame(replicate(2,sample(0:1,30,rep=TRUE)))
df <- reshape(data=df, varying=list(1:2),
direction="long",
times = names(df),
timevar="Type",
v.names="Score")
plotted like this:
plot <- ggbarplot(df, x = "Type", y = "Score",
color = "black", fill = "Type", add = "mean_ci")
And I want to add stripes only to X1
plot +
geom_bar_pattern(stat = "summary", fun = "mean", position="dodge", color="black", width=1,pattern_angle = 45, pattern_density = 0.4,pattern_spacing = 0.025, pattern_key_scale_factor = 0.6) +
scale_pattern_manual(values = c(X1 = "stripe", X2 = "none"))
However stripes are added to both x-axis categories (scale_pattern_manual does not work?)
Any help is much appreciated.
You could build your error bars with stat_summary instead of using ggpubr::ggbarplot, then you would get this:
library(ggplot2)
library(ggpattern)
df <- data.frame(replicate(2,sample(0:1,30,rep=TRUE)))
df <- reshape(data=df, varying=list(1:2),
direction="long",
times = names(df),
timevar="Type",
v.names="Score")
ggplot(df, aes(x = Type, y = Score, pattern=Type,
color = "black", fill = Type)) +
geom_bar_pattern(stat = "summary",
fun = "mean",
position="dodge",
color="black",
width=1, pattern_angle = 45,
pattern_density = 0.4, pattern_spacing = 0.025,
pattern_key_scale_factor = 0.6) +
scale_pattern_manual(values = c("stripe", "none")) +
stat_summary(fun.data=mean_cl_normal, geom="errorbar", col="black", width=.1)
Created on 2021-05-19 by the reprex package (v2.0.0)
As far as I know scale_pattern_manual will not work in this setting.
To avoid that stripes are added to both cols add aes(pattern = Type) to geom_bar_pattern.
See Gallery of ggpattern package
plot <- ggbarplot(df, x = "Type", y = "Score",
color = "black", fill = "Type", add = "mean_ci")
plot +
geom_bar_pattern(
stat = "summary",
fun = "mean",
position="dodge",
color="white",
width=0.7,
pattern_angle = 45,
pattern_density = 1,
pattern_spacing = 0.025,
pattern_key_scale_factor = 0.8,
aes(pattern = Type))

Adding hatches or patterns to ggplot bars [duplicate]

This question already has an answer here:
How can I add hatches, stripes or another pattern or texture to a barplot in ggplot?
(1 answer)
Closed 1 year ago.
Suppose I want to show in a barplot the gene expression results (logFC) based on RNA-seq and q-PCR analysis. My dataset looks like that:
set.seed(42)
f1 <- expand.grid(
comp = LETTERS[1:3],
exp = c("qPCR", "RNA-seq"),
geneID = paste("Gene", 1:4)
)
f1$logfc <- rnorm(nrow(f1))
f1$SE <- runif(nrow(f1), min=0, max=1.5)
My R command line
p=ggplot(f1, aes(x=geneID, y=logfc, fill= comp,color=exp))+
geom_bar(stat="identity", position =position_dodge2(preserve="single"))+
theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=1))```
I have this output:
I want to get any mark patterns or hatches on the bars corresponding to one of the variables (exp or comp) and adding the upper error bars as shown in this plot bellow:
Any help please?
Following the linked answer, it seems quite natural how to extend it to your case. In the example below, I'm using some dummy data structured like the head() data you gave, since the csv link gave me a 404.
library(ggplot2)
library(ggpattern)
#>
#> Attaching package: 'ggpattern'
#> The following objects are masked from 'package:ggplot2':
#>
#> flip_data, flipped_names, gg_dep, has_flipped_aes, remove_missing,
#> should_stop, waiver
# Setting up some dummy data
set.seed(42)
f1 <- expand.grid(
comp = LETTERS[1:3],
exp = c("qPCR", "RNA-seq"),
geneID = paste("Gene", 1:4)
)
f1$logfc <- rnorm(nrow(f1))
ggplot(f1, aes(x = geneID, y = logfc, fill = comp)) +
geom_col_pattern(
aes(pattern = exp),
colour = "black",
pattern_fill = "black",
pattern_angle = 45,
pattern_density = 0.1,
pattern_spacing = 0.01,
position = position_dodge2(preserve = 'single'),
) +
scale_pattern_manual(
values = c("none", "stripe"),
guide = guide_legend(override.aes = list(fill = "grey70")) # <- make lighter
) +
scale_fill_discrete(
guide = guide_legend(override.aes = list(pattern = "none")) # <- hide pattern
)
Created on 2021-04-19 by the reprex package (v1.0.0)
EDIT: if you want to repeat the hatching in the fill legend, you can make an interaction() and then customise a manual fill scale.
ggplot(f1, aes(x = geneID, y = logfc)) +
geom_col_pattern(
aes(pattern = exp,
fill = interaction(exp, comp)), # <- make this an interaction
colour = "black",
pattern_fill = "black",
pattern_angle = 45,
pattern_density = 0.1,
pattern_spacing = 0.01,
position = position_dodge2(preserve = 'single'),
) +
scale_pattern_manual(
values = c("none", "stripe"),
guide = guide_legend(override.aes = list(fill = "grey70")) # <- make lighter
) +
scale_fill_manual(
# Have 3 colours and repeat each twice
values = rep(scales::hue_pal()(3), each = 2),
# Extract the second name after the '.' from the `interaction()` call
labels = function(x) {
vapply(strsplit(x, "\\."), `[`, character(1), 2)
},
# Repeat the pattern over the guide
guide = guide_legend(
override.aes = list(pattern = rep(c("none", "stripe"), 3))
)
)
Created on 2021-04-19 by the reprex package (v1.0.0)
EDIT2: Now with errorbars:
library(ggplot2)
library(ggpattern)
set.seed(42)
f1 <- expand.grid(
comp = LETTERS[1:3],
exp = c("qPCR", "RNA-seq"),
geneID = paste("Gene", 1:4)
)
f1$logfc <- rnorm(nrow(f1))
f1$SE <- runif(nrow(f1), min=0, max=1.5)
ggplot(f1, aes(x = geneID, y = logfc)) +
geom_col_pattern(
aes(pattern = exp,
fill = interaction(exp, comp)), # <- make this an interaction
colour = "black",
pattern_fill = "black",
pattern_angle = 45,
pattern_density = 0.1,
pattern_spacing = 0.01,
position = position_dodge2(preserve = 'single'),
) +
geom_errorbar(
aes(
ymin = logfc,
ymax = logfc + sign(logfc) * SE,
group = interaction(geneID, comp, exp)
),
position = "dodge"
) +
scale_pattern_manual(
values = c("none", "stripe"),
guide = guide_legend(override.aes = list(fill = "grey70")) # <- make lighter
) +
scale_fill_manual(
# Have 3 colours and repeat each twice
values = rep(scales::hue_pal()(3), each = 2),
# Extract the second name after the '.' from the `interaction()` call
labels = function(x) {
vapply(strsplit(x, "\\."), `[`, character(1), 2)
},
# Repeat the pattern over the guide
guide = guide_legend(
override.aes = list(pattern = rep(c("none", "stripe"), 3))
)
)
Created on 2021-04-22 by the reprex package (v1.0.0)

How to plot a density function of two variables?

I have two variables with the same length, v1 = actual alpha and v2 = stimulated alpha.
v1= (0.1, 0.6, 0.8, 0.11)
v2= (0.3, 0.1, 0.5, 0.7)
I want to show a density function where these two are compared, kind replicating this picture:
To make the plotting easier, I would create a data frame like this:
v1 <- c(0.1, 0.6, 0.8, 0.11)
v2 <- c(0.3, 0.1, 0.5, 0.7)
df <- data.frame(x = c(v1, v2), group = rep(c("Actual", "Simulated"), each = 4))
Now you can plot the densities easily using ggplot:
library(ggplot2)
ggplot(df) +
stat_density(aes(x, linetype = group), geom = "line", position = "identity") +
scale_linetype_manual(values = c(1, 2)) +
theme_bw() +
theme(legend.position = c(0.9, 0.85))
Of course, this doesn't look much like the density plot you provided - that's just because the data in v1 and v2 are too short to have a central tendency. Here's exactly the same plot with some toy data that better matches the data used in your plot:
set.seed(69)
v1 <- rnorm(100, -0.1, 0.12)
v2 <- rnorm(100, 0, 0.06)
df <- data.frame(x = c(v1, v2), group = rep(c("Actual", "Simulated"), each = 100))
ggplot(df) +
stat_density(aes(x, linetype = group), geom = "line", position = "identity") +
scale_linetype_manual(values = c(1, 2)) +
theme_bw() +
theme(legend.position = c(0.9, 0.85)) +
scale_x_continuous(limits = c(-.6, .4))
Created on 2020-05-21 by the reprex package (v0.3.0)
Here's a base R solution (based on #Allan's second dataframe):
hist(df$x[df$group=="Simulated"],
freq = F,
xlab = "Alpha in %",
border = "white",
main = "Density function for Actual and Simulated data", cex.main = 0.9,
xlim = range(df$x[df$group=="Actual"]))
lines(density(df$x[df$group=="Simulated"]), lty = 2)
lines(density(df$x[df$group=="Actual"]), lty = 1)
legend("topleft", legend = c("Actual", "Simulated"), bty = "n", lty = c(1,2))
grid()
Alternatively, with a bit more color:
hist(df$x[df$group=="Simulated"],
freq = F,
xlab = "Alpha in %",
border = "white",
main = "Density function for Actual and Simulated Alpha", cex.main = 0.9,
xlim = range(df$x[df$group=="Actual"]))
bg <- par("usr")
rect(bg[1], bg[3], bg[2], bg[4], col="grey50", border = NA, density = 70)
grid()
lines(density(df$x[df$group=="Simulated"]), lty = 2, col = "blue")
lines(density(df$x[df$group=="Actual"]), lty = 1, col = "blue")
legend("topleft", legend = c("Actual", "Simulated"), bty = "n", lty = c(1,2), col = "blue")

Resources