How to pass external data to stat_compare_means in ggplot2 - r

I created a barplot based on ggplot2 using an original long format that has been transferred with mean and standard error, which means I was not using raw data to generate a barplot with an errorbar but using processed mean and error data to generate a barplot. Therefore, I wonder how to add statistical comparison under such a situation using the original data because it is difficult to do statistics with only mean and error values.
This is my code, and you could imagine that df was the original long format data, and the data were grouped to calculate the mean and standard error, leading to the data summaryFGA.
The barplot with errorbar was generated using summaryFGA like below using the following code:
errorbar without statistical comparsion
summaryFGA <- df %>% group_by(DMP,tumor) %>% dplyr::summarize(mean = mean(FGA, na.rm = TRUE), se = std(FGA, na.rm = TRUE)) %>% as.data.frame
p1 <- ggplot(summaryFGA, aes(x = DMP, y = mean,fill = DMP)) +
geom_bar(stat = 'identity') +
geom_errorbar(aes(ymax = mean+se, ymin = mean-se),position = position_dodge(0.9), width = 0.15) +
scale_fill_manual(values = jco[1:2]) + scale_color_manual(values = jco[1:2]) +
ylab("Fraction of Genome Altered") + xlab("") +
facet_wrap(.~tumor, nrow = 1,scales = "fixed") +
# stat_compare_means(data = df,
# aes(x = DMP, y = FGA, fill=DMP),
# comparisons = my_comparisons,
# method = "wilcox.test", inherit.aes = F) +
theme(axis.text.x = element_text(hjust = 1, vjust = 0.5,size = 10, angle = 90,color = "black"),
axis.text.y = element_text(size = 10,color = "black"),
axis.ticks = element_line(size=0.2, color="black"),
axis.ticks.length = unit(0.2, "cm"),
panel.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position = "bottom",
strip.text = element_text(colour = 'white', size = 10),
axis.line = element_line(colour = "black"))
p1
g <- ggplot_gtable(ggplot_build(p1))
stripr <- which(grepl('strip-t', g$layout$name))
k <- 1
for (i in stripr) {
j <- which(grepl('rect', g$grobs[[i]]$grobs[[1]]$childrenOrder))
g$grobs[[i]]$grobs[[1]]$children[[j]]$gp$fill <- tumorColor[k]
k <- k+1
}
What I want is the following:
errorbar with statistical comparison
to add statistical comparison using the code I annotated, but it doesn't work. Obviously, it should be based on the original df data instead of summaryFGA.
If anyone wants an example, please refer to this thread and add a statistical comparison using "original data" in the example.
Many thanks in advance.

Related

geom points are not placed on the boxplot?

I don't know how to align the dots to each belong to it is box plot.
Why they are appearing like that?
I found this post, but it is answering the dodging part which is not part of my code
here is my code
library(phyloseq)
library(ggplot2)
plot_richness(ps.prev.intesParts.f, x = "part", measures = "Shannon",
color = "Samples") +
geom_boxplot() +
theme_classic() +
theme(text = element_text(size = 20),
strip.background = element_blank(),
axis.text.x.bottom = element_text(angle = 90),
legend.title = element_blank())) +
labs(x = "Intestinal Parts", y = "Shannon Index")
Could you please advise?
You are correct that you need to specify dodging. However, dodging needs to be set in the underlying geom_point(...) of the plot_richness function itself. Unfortunately, phyloseq offers no such option. This means you'll need to calculate the alpha diversity measures yourself and generate your own plot. Luckily this only requires a few extra lines of code. Here's an example using phyloseq's GlobalPatterns.
require("phyloseq")
require("dplyr")
require("ggplot2")
# Load data
data("GlobalPatterns")
# Calculate alpha indices
a_div <- estimate_richness(GlobalPatterns, measures = "Shannon")
a_div$SampleID <- row.names(a_div)
# Add sample_data from physeq object
a_div <- left_join(a_div,
sample_data(GlobalPatterns),
by = c("SampleID" = "X.SampleID"))
# GlobalPatterns only has grouping by SampleType.
# Generate an extra group by duplicating all rows
a_div <- rbind(a_div, a_div)
a_div$Samples <- rep(x = c("MMV", "VMV"),
each = nrow(a_div)/2)
# Plot
ggplot(a_div,
aes(x = SampleType,
y = Shannon,
colour = Samples)) +
geom_boxplot(position = position_dodge(width = 0.9)) +
geom_point(position = position_dodge(width = 0.9)) +
theme_classic() +
theme(text = element_text(size = 20),
strip.background = element_blank(),
axis.text.x.bottom = element_text(angle = 90,
hjust = 1,
vjust = 0.5),
legend.title = element_blank())
Created on 2022-09-02 by the reprex package (v2.0.1)

Adding a manual right-hand-side y-axis in ggplot2

I am wondering if there is any way to get a manual right-side y-axis label when there is no scale, only facet headings.
Here's an example
library(dplyr)
library(Hmisc)
# Plot power vs. n for various odds ratios (base prob.=.1)
(n <- seq(10, 1000, by=10)) # candidate sample sizes
(OR <- as.numeric(sort(c(seq(1/0.90,1/0.13,length.out = 9),2.9)))) # candidate odds ratios, spanning the 95% CI centered around an odds ratio of 2.9
alpha <- c(.001, .01, .05)
# put all of these into a dataset and calculate power
powerDF <- data.frame(expand.grid(OR, n, alpha)) %>%
rename(OR = Var1, num = Var2, alph = Var3) %>%
arrange(OR) %>%
mutate(power = as.numeric(bpower(p1=.29, odds.ratio=OR, n=num, alpha = alph))) %>%
transform(OR = factor(format(round(OR,2),nsmall=2)))
# now plot
pPower <- ggplot(powerDF, aes(x = num, y = power, colour = factor(OR))) +
geom_line() +
facet_grid(factor(alph)~.) +
labs(x = "sample size") +
scale_colour_discrete(name = "Odds Ratio") +
scale_x_continuous(breaks = seq(0,1000,100)) +
scale_y_continuous(breaks = seq(0,1,.1)) +
theme_light() +
theme(axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold"),
axis.text = element_text(size = 11),
panel.grid.minor = element_blank(),
panel.grid.major.y = element_line(colour = "gray95"),
panel.grid.major.x = element_line(colour = "gray95"),
strip.text = element_text(colour = 'black', face = 'bold', size = 12),
legend.text = element_text(size = 12),
legend.title = element_text(size = 12, face = "bold"))
(Please forgive the cluttered axes labels, I had to reduce the size of the image to allow it to be uploaded).
I was wondering if there was any way to have an axis label saying 'significance level' down the right hand side of the graph?
Adding the following to scale_y_continuous seems one way to go (although a bunch of warnings)
sec.axis = sec_axis(trans=I, breaks=NULL, name="Significance")
Alternatively, you can add an additional strip that spans all the panels:
library(grid)
library(gtable)
g <- ggplotGrob(pPower)
rect <- grobTree(rectGrob(gp = gpar(fill = "grey70", col="grey70")),
textGrob("Significance", rot=-90, gp = gpar(col="black")))
g <- gtable_add_cols(g, g$widths[6], 6)
g <- gtable_add_grob(g, rect, l=7, t=7, b=11)
grid.newpage() ; grid.draw(g)

order y-axis of geom_tile plot by variable

I am using geom_tile to visualize random draws
Generate data:
set.seed(1)
df= crossing(sim=1:10,part= 1:10)
df$result = sample(c(1,0),size = nrow(df), replace=T)
df = df %>%
group_by(sim)%>%
# find out how many successful (1) pilots there were in the first 4 participants
summarize(good_pilots = sum(result[1:4])) %>%
arrange(good_pilots) %>%
ungroup() %>%
# add this back into full dataframe
full_join(df)
# plot data
plot = ggplot(df, aes( y=factor(sim), x=part)) +
geom_tile(aes(fill = factor(result)), colour = "black",
show.legend = T)+
scale_fill_manual(values=c("lightgrey", "darkblue"))+# c(0,1)
theme(panel.border = element_rect(size = 2),
plot.title = element_text(size = rel(1.2)),
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
legend.title = element_blank(),
legend.position = "right")+ theme_classic()+ coord_fixed(ratio=1)
This results in:
What I actually want is the y axis to be ordered by the # of blue (ie 1's) in the first four columns of the block (which is calculated in good_pilots).
I tried scale_y_discrete but that cannot be what is intended:
plot + scale_y_discrete(limits=df$sim[order(df$good_pilots)])
resulting in:
From what I can tell it seems like the ordering worked correctly, but using scale_y_discrete caused the plot to be messed up.
You can use reorder here
ggplot(df, aes(y = reorder(sim, good_pilots), x = part)) +
...

How to successfully implement 'heatmap plot and save' loop in R with ggplot2

I'm working with survey data. There are two groups of survey items, and each group has three items. There are two respondents in my survey sample.
I am attempting to generate heat maps by survey item groups, where:
the respondents are on the 'y' axis
survey items they responded to are on the 'x' axis.
Here is a fully reproducible example:
wd <- "D:/Desktop/"
setwd(wd)
#--create dataframe
respondent = c("Respondent_1", "Respondent_1", "Respondent_1","Respondent_1", "Respondent_1", "Respondent_1",
"Respondent_2", "Respondent_2", "Respondent_2","Respondent_2", "Respondent_2", "Respondent_2")
item = c("Item_1", "Item_2", "Item_3","Item_1", "Item_2", "Item_3",
"Item_1", "Item_2", "Item_3","Item_1", "Item_2", "Item_3")
item_group = c("Group_1","Group_1","Group_1","Group_2","Group_2","Group_2",
"Group_1","Group_1","Group_1","Group_2","Group_2","Group_2")
score = c(1, 40, 100, 100, 30, 12,
2, 15, 80, 77, 44, 10)
high_value_color = c("darkred", "darkred", "darkred",
"brown3", "brown3", "brown3")
plot_df = data.frame(respondent, item, item_group, score, high_value_color)
#--write function
#--inspired from this: http://www.reed.edu/data-at-reed/resources/R/loops_with_ggplot2.html
plot_list <- unique(plot_df$item_group)
survey_items.graph <- function(df, na.rm = TRUE, ...) {
#--loop to generate heatmaps for each group
for (i in seq_along(plot_list)) {
plot <- ggplot(aes(x = df$item[df$item_group == plot_list[i]],
y = df$respondent[df$item_group==plot_list[i]]),
data = subset(df, df$item_group == plot_list[i])) +
geom_tile(aes(fill = df$score[df$item_group == plot_list[i]]), colour = "black") +
scale_fill_gradient2(low = "azure1",
high = df$high_value_color[df$item_group == plot_list[i]],
guide = "colorbar") +
geom_text(aes(label = df$score[df$item_group==plot_list[i]],
hjust = 0.5,
angle = 90),
size = 4) +
ggtitle(df$item_group[df$item_group==plot_list[i]]) +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
plot.title = element_text(size = 7, face="bold"),
axis.text.y = element_text(size = 7, face ="bold"),
axis.text.x = element_text(angle=90, hjust=1),
axis.title = element_blank(),
legend.position = "none")
# save plots as .png
ggsave(plot, file=paste(wd,"plots/heatmap for ", plot_list[i], ".png", sep=""), scale=2)
print(plot)
}
}
#--load ggplot2
library(ggplot2)
#--execute function on plot dataframe
survey_items.graph(plot_df)
When I execute my code, I got the following two plots:
My intuition tells me that I'm not doing something right with the 'high' argument of the 'scale_fill_gradient2' portion of my code.
As a test, when I've replaced the value for the 'high' argument with just an acceptable color string value (e.g. 'brown3', other colors can be found here), I get the plots to behave as I want them to.
What I want is for the 'high' argument of 'scale_fill_gradient2' to accept the corresponding items' value found in the 'high_value_color' variable of the data.
Ok. The main problem was the passing of colors to scale_fill_gradient2. However, there is a lot more in your code that can be improved. Specifically, you want to only pass bare variable names to aes. I also don't see why you are constantly repeating your subsetting everywhere. You make it very likely for yourself to run into trouble.
Here is how I would probably tackle a problem like this:
First of all, we make a function that is a lot simpler: it only takes directly a data argument and simply makes the required plot with that data (no loop).
survey_items.graph <- function(dat) {
ggplot(aes(x = item, y = respondent), data = dat) +
geom_tile(aes(fill = score), colour = "black") +
scale_fill_gradient2(low = "azure1",
high = dat$high_value_color[1],
guide = "colorbar") +
geom_text(aes(label = score), hjust = 0.5, angle = 90, size = 4) +
ggtitle(dat$item_group[1]) +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
plot.title = element_text(size = 7, face="bold"),
axis.text.y = element_text(size = 7, face ="bold"),
axis.text.x = element_text(angle=90, hjust=1),
axis.title = element_blank(),
legend.position = "none")
}
We then split up your data in a list of data.frames, one per item_group:
split_data <- split(plot_df, plot_df$item_group)
Then we apply our function to each entry in the list, creating a list of plots:
plot_list <- lapply(split_data, survey_items.graph)
For convenience here, I use grid.arrange to quickly stitch both plots together:
library(gridExtra)
do.call(grid.arrange, plot_list)
I you want to save them you can use something like:
Map(function(x, i, ...) ggsave(paste0('plot', i, '.png'), x, ...),
plot_list, seq_along(plot_list), scale = 2)

How to add multiple level of x-axis in ggplot [duplicate]

This question already has answers here:
Multirow axis labels with nested grouping variables
(7 answers)
Closed 6 years ago.
variable <- c("PM10","SO2","NO","NO2","PM10","SO2","NO","NO2","PM10","SO2","NO","NO2","PM10","SO2","NO","NO2","PM10","SO2","NO","NO2","PM10","SO2","NO","NO2")
sex <- c("male","male","male","male","female","female","female","female",
"male","male","male","male","female","female","female","female",
"male","male","male","male","female","female","female","female")
exposureperiod <- c("P1","P1","P1","P1","P1","P1","P1","P1",
"P2","P2","P2","P2","P2","P2","P2","P2",
"P3","P3","P3","P3","P3","P3","P3","P3")
set.seed(100)
coef <- runif(24, -2, 2)
coef_lb <- coef - 0.3
coef_ub <- coef + 0.3
df <- data.frame(variable,sex,exposureperiod,coef,coef_lb,coef_ub)
df$variable <- factor(df$variable,levels=c("PM10","SO2","NO","NO2"))
levels(df$variable) <- c("PM[10]","SO[2]", "NO", "NO[2]")
df$exposureperiod <- factor(df$exposureperiod,levels=c("P1","P2","P3"))
df$sex <- factor(df$sex,levels=c("male","female"))
df <- df[order(df$variable,df$sex),]
df$aux <- c(1,2,3,
5,6,7,
11,12,13,
15,16,17,
21,22,23,
25,26,27,
31,32,33,
35,36,37)
library(ggplot2)
plot <- ggplot(data = df, aes(x = aux, y = coef)) +
geom_pointrange(aes(ymin=coef_lb,ymax=coef_ub),shape="none") +
geom_point(aes(shape = exposureperiod)) +
scale_shape_discrete(name ="Exposure period",
breaks=c("P1", "P2","P3"),
labels=c("P1","P2","P3")) +
scale_x_continuous("Sex and Pollutant",breaks=c(2,6,12,16,22,26,32,36),
labels=c("Boys","Girls","Boys","Girls","Boys","Girls","Boys","Girls")) +
scale_y_continuous("Mean Difference in Tanner Stage",
limits=c(-3, 3),
breaks=round(seq(-3, 3, by = 0.5),1)) +
geom_hline(yintercept=0,alpha=1,linetype="dashed") +
theme(axis.text.x = element_text()) +
theme_bw(base_size = 16,base_family="Arial") +
theme(legend.text.align = 0,
legend.title = element_text(face="plain"),
legend.key = element_blank(),
legend.position = "bottom") +
guides(shape= guide_legend(nrow = 3,byrow = TRUE)) +
theme(text = element_text(colour = "black",face="plain"),
axis.title.y = element_text(face="plain"),
axis.title.x = element_text(face="plain"),
axis.text.x = element_text(face="plain",hjust = 0),
axis.text.y = element_text(face="plain")) +
theme(panel.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
axis.line = element_line(colour = "black"))+
theme(axis.ticks = element_line(size = 1))
plot
With the above script, I got the graph as below.
But I want to add another level of x-axis, which indicate the PM10, SO2, NO, and NO2, like the below graph. (To illustrate, I added those pollutants manually.) And of course, the x-axis title and legend should move down accordingly.
I used facet before, but I want to avoid the gap between pollutants generate by facet.
Thank you.
You could try faceting the plot
plot <- plot + facet_wrap(~variable)

Resources