I don't know how to align the dots to each belong to it is box plot.
Why they are appearing like that?
I found this post, but it is answering the dodging part which is not part of my code
here is my code
library(phyloseq)
library(ggplot2)
plot_richness(ps.prev.intesParts.f, x = "part", measures = "Shannon",
color = "Samples") +
geom_boxplot() +
theme_classic() +
theme(text = element_text(size = 20),
strip.background = element_blank(),
axis.text.x.bottom = element_text(angle = 90),
legend.title = element_blank())) +
labs(x = "Intestinal Parts", y = "Shannon Index")
Could you please advise?
You are correct that you need to specify dodging. However, dodging needs to be set in the underlying geom_point(...) of the plot_richness function itself. Unfortunately, phyloseq offers no such option. This means you'll need to calculate the alpha diversity measures yourself and generate your own plot. Luckily this only requires a few extra lines of code. Here's an example using phyloseq's GlobalPatterns.
require("phyloseq")
require("dplyr")
require("ggplot2")
# Load data
data("GlobalPatterns")
# Calculate alpha indices
a_div <- estimate_richness(GlobalPatterns, measures = "Shannon")
a_div$SampleID <- row.names(a_div)
# Add sample_data from physeq object
a_div <- left_join(a_div,
sample_data(GlobalPatterns),
by = c("SampleID" = "X.SampleID"))
# GlobalPatterns only has grouping by SampleType.
# Generate an extra group by duplicating all rows
a_div <- rbind(a_div, a_div)
a_div$Samples <- rep(x = c("MMV", "VMV"),
each = nrow(a_div)/2)
# Plot
ggplot(a_div,
aes(x = SampleType,
y = Shannon,
colour = Samples)) +
geom_boxplot(position = position_dodge(width = 0.9)) +
geom_point(position = position_dodge(width = 0.9)) +
theme_classic() +
theme(text = element_text(size = 20),
strip.background = element_blank(),
axis.text.x.bottom = element_text(angle = 90,
hjust = 1,
vjust = 0.5),
legend.title = element_blank())
Created on 2022-09-02 by the reprex package (v2.0.1)
Related
I would like to plot the congruence effects (incongruent minus congruent) as a violin plot per combination of stimulus age and response type. This is what my code looks like so far. I am not yet satisfied with the representation. How can I change it so that for each of the four conditions (adult frown, adult smile, child frown, child smile) I get the corresponding violin plot horizontally next to each other? Thanks in advance for the help. Attached is the code and an excerpt from the data frame.
violin plot
dataset$congruency_effect <- ifelse(dataset$congruency == "congruent", dataset$avgAmplitude, -dataset$avgAmplitude)
p <- ggplot(dataset, aes(x = stimulusResponse, y = congruency_effect, fill = congruency_effect, group = stimulusAge)) +
geom_violin() +
geom_point(position = position_dodge(width = 0.75), size = 3, stat = "summary", fun.y = "mean") +
scale_fill_manual(values = c("#F8766D", "#00BFC4")) +
ggtitle("Conventional EEG 350-450 ms") +
scale_y_continuous(limits = c(-5, 5)) +
facet_wrap(~stimulusAge, scales = "free_x")
EEG_Conventional450_age_response <- p + theme(
# Set the plot title and axis labels to APA style
plot.title = element_text(face = "bold", size = 16),
axis.title = element_text(face = "bold", size = 14),
# Set the axis tick labels to APA style
axis.text = element_text(size = 12),
# Set the legend title and labels to APA style
legend.title = element_text(face = "bold", size = 14),
legend.text = element_text(size = 12),
# Set the plot and panel backgrounds to white
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "white")
)
EEG_Conventional450_age_response
excerpt data frame
several permutations of arguments in ggplot
This has to do with the grouping aesthetic. Remove it, and your plot works.
library(ggplot2)
set.seed(42)
dataset <- data.frame(stimulusResponse = rep(c("frown", "smile"), each = 20),
congruency_effect = rnorm(40),
stimulusAge = rep(c("baby", "adult"), 20))
## removed group = stimulusAge
ggplot(dataset, aes(x = stimulusResponse, y = congruency_effect)) +
geom_violin() +
geom_point(position = position_dodge(width = 0.75), size = 3, stat = "summary") +
facet_wrap(~stimulusAge, scales = "free_x")
I created a barplot based on ggplot2 using an original long format that has been transferred with mean and standard error, which means I was not using raw data to generate a barplot with an errorbar but using processed mean and error data to generate a barplot. Therefore, I wonder how to add statistical comparison under such a situation using the original data because it is difficult to do statistics with only mean and error values.
This is my code, and you could imagine that df was the original long format data, and the data were grouped to calculate the mean and standard error, leading to the data summaryFGA.
The barplot with errorbar was generated using summaryFGA like below using the following code:
errorbar without statistical comparsion
summaryFGA <- df %>% group_by(DMP,tumor) %>% dplyr::summarize(mean = mean(FGA, na.rm = TRUE), se = std(FGA, na.rm = TRUE)) %>% as.data.frame
p1 <- ggplot(summaryFGA, aes(x = DMP, y = mean,fill = DMP)) +
geom_bar(stat = 'identity') +
geom_errorbar(aes(ymax = mean+se, ymin = mean-se),position = position_dodge(0.9), width = 0.15) +
scale_fill_manual(values = jco[1:2]) + scale_color_manual(values = jco[1:2]) +
ylab("Fraction of Genome Altered") + xlab("") +
facet_wrap(.~tumor, nrow = 1,scales = "fixed") +
# stat_compare_means(data = df,
# aes(x = DMP, y = FGA, fill=DMP),
# comparisons = my_comparisons,
# method = "wilcox.test", inherit.aes = F) +
theme(axis.text.x = element_text(hjust = 1, vjust = 0.5,size = 10, angle = 90,color = "black"),
axis.text.y = element_text(size = 10,color = "black"),
axis.ticks = element_line(size=0.2, color="black"),
axis.ticks.length = unit(0.2, "cm"),
panel.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position = "bottom",
strip.text = element_text(colour = 'white', size = 10),
axis.line = element_line(colour = "black"))
p1
g <- ggplot_gtable(ggplot_build(p1))
stripr <- which(grepl('strip-t', g$layout$name))
k <- 1
for (i in stripr) {
j <- which(grepl('rect', g$grobs[[i]]$grobs[[1]]$childrenOrder))
g$grobs[[i]]$grobs[[1]]$children[[j]]$gp$fill <- tumorColor[k]
k <- k+1
}
What I want is the following:
errorbar with statistical comparison
to add statistical comparison using the code I annotated, but it doesn't work. Obviously, it should be based on the original df data instead of summaryFGA.
If anyone wants an example, please refer to this thread and add a statistical comparison using "original data" in the example.
Many thanks in advance.
I am wondering why the text is trending higher in the plots... it won't stay put with the facet_wrap or facet_grid. In a more complex dataset plot, the text is illegible because of the overlap.
Below is data and code to reproduce the plot and issue. Adding geom="text" to stat_fit_glance, results in Error: Discrete value supplied to continuous scale .
library(ggpmisc)
library(ggplot2)
DF <- data.frame(Site = rep(LETTERS[20:24], each = 4),
Region = rep(LETTERS[14:18], each = 4),
time = rep(LETTERS[1:10], each = 10),
group = rep(LETTERS[1:4], each = 10),
value1 = runif(n = 1000, min = 10, max = 15),
value2 = runif(n = 1000, min = 100, max = 150))
DF$time <- as.numeric(DF$time)
formula1 <- y~x
plot1 <- ggplot(data=DF,
aes(x=time, y= value2,group=Site)) +
geom_point(col="gray", alpha=0.5) +
geom_line(aes(group=Site),col="gray", alpha=0.5) +
geom_smooth(se=F, col="darkorange", alpha=0.8, fill="orange",
method="lm",formula=formula1) +
theme_bw() +
theme(strip.text.x = element_text(size=10),
strip.text.y = element_text(size=10, face="bold", angle=0),
strip.background = element_rect(colour="black", fill="gray90"),
axis.text.x = element_text(size=10), # remove x-axis text
axis.text.y = element_text(size=10), # remove y-axis text
axis.ticks = element_blank(), # remove axis ticks
axis.title.x = element_text(size=18), # remove x-axis labels
axis.title.y = element_text(size=25), # remove y-axis labels
panel.background = element_blank(),
panel.grid.major = element_blank(), #remove major-grid labels
panel.grid.minor = element_blank(), #remove minor-grid labels
plot.background = element_blank()) +
labs(y="", x="Year", title = "")+ facet_wrap(~group)
plot1 + stat_fit_glance(method = "lm", label.x="right", label.y="bottom",
method.args = list(formula = formula1),
aes(label = sprintf('R^2~"="~%.3f~~italic(p)~"="~%.2f',
stat(..r.squared..),stat(..p.value..))),
parse = TRUE)
When the position of the labels is set automatically, the npcy position is increased for each level in the grouping variable. You map Site to the group aesthetic, as Site has 5 levels unevenly appearing in different facets, the rather crude algorithm in 'ggpmisc' positions the labels unevenly: the five rows correspond one to each of the five Sites. I have changed the mapping to use colour so that this becomes more obvious. I have also deleted all code that is irrelevant to this question.
plot1 <- ggplot(data=DF,
aes(x=time, y= value2, color=Site)) +
geom_smooth(se=F, alpha=0.8,
method="lm",formula=formula1) +
facet_wrap(~group)
plot1 +
stat_fit_glance(method = "lm", label.x="right", label.y="bottom",
method.args = list(formula = formula1),
aes(label = sprintf('R^2~"="~%.3f~~italic(p)~"="~%.2f',
stat(..r.squared..),stat(..p.value..))),
parse = TRUE) +
expand_limits(y = 110)
To use fixed positions one can pass the npcy coordinates if using the default "geom_text_npcy()" or passing data coordinates and using "geom_text()". One position corresponds to each level of the grouping factor Site. If the vector is shorter, it is recycled. Of course to fit more labels you can reduce the size of the text and add space by expanding the plotting area. In any case, in practice, you will need to indicate in a way or another which estimates correspond to which line.
plot1 +
stat_fit_glance(method = "lm", label.x="right", label.y= c(0.01, 0.06, 0.11, 0.01, 0.06),
method.args = list(formula = formula1),
aes(label = sprintf('R^2~"="~%.3f~~italic(p)~"="~%.2f',
stat(..r.squared..),stat(..p.value..))),
parse = TRUE, size = 2.5) +
expand_limits(y = 110)
Note: Error: Discrete value supplied to continuous scale when attempting to use
geom_text() is a bug in 'ggpmisc' that I fixed some days ago, but has not made it yet to CRAN (future version 0.3.3).
I'd like to add the count values calculated in the geom_histogram function on ggplot2. I've put the ggplot2 into a loop so I can produce multiple plots, in my case 30 but for ease, here is a dummy set for only four plots. Facet wrap didn't work as the geom density was pooling the data across all factors before calculating proportions, rather than within a factor/variable. To produce this plot, I've essentially mixed a whole bunch of code from various sources so credit to them.
library(dplyr)
library(ggplot2)
library(ggridges)
library(reshape2)
library(gridExtra)
#Make the data#
df.fact <- data.frame("A"=rnorm(400, mean = 350, sd=160),"B"=rnorm(400, mean = 300, sd=100), "C"=rnorm(400, mean = 200, sd=80), names=rep(factor(LETTERS[23:26]), 100))
df.test<-melt(df.fact, id.vars = "names", value.name = "Length2")
names(df.test)[names(df.test) =="variable"] <- "TSM.FACT"
#Create the plotlist##
myplots <- list()
#Loop for plots##
for(i in 1:(length(unique(df.test$names)))){
p1 <- eval(substitute(
ggplot(data=df.test[df.test$names == levels(df.test$names)[i],], aes(x=Length2, group=TSM.FACT, colour = TSM.FACT, fill=TSM.FACT)) +
geom_histogram(aes( y = stat(width*density)), position = "dodge", binwidth = 50, alpha =0.4, show.legend=T)+
ggtitle(paste0(levels(df.test$names)[i]))+
geom_density_line(stat="density", aes(y=(..count..)/sum(..count..)*50), alpha=0.3, size=0.5, show.legend=F) +
geom_vline(data=ddply(df.test[df.test$names == levels(df.test$names)[i],], ~ TSM.FACT, numcolwise(mean)), mapping=aes(xintercept = Length2, group=TSM.FACT, colour=TSM.FACT), linetype=2, size=1, show.legend=F) +
scale_y_continuous(labels = percent_format()) +
ylab("relative frequency") +
scale_color_manual(values= c("#00B2EE", "#1E90FF", "#104E8B")) +
scale_fill_manual(values= c("#00B2EE", "#1E90FF", "#104E8B")) +
theme_bw() + theme(
plot.title = element_text(lineheight=0.5, hjust= 0.5, size=10),
strip.text.y = element_text(hjust = 1, angle = 0),
strip.text.x = element_text(size=10, vjust = 0.9),
strip.text=element_text(margin = margin(t=0.3,r=1,b=0.3,l=1), size=8, debug = F, vjust=0.2),
strip.background = element_blank(),
axis.text.x = element_text(size=8, angle=0, vjust=0.2, margin = margin(t=0.3,r=0.1,b=0.3,l=0.1)),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.line.x=element_line(colour="black"),
axis.line.y=element_line(colour="black"),
panel.grid.minor = element_blank(),
panel.border=element_blank(),
panel.background=element_blank(),
legend.position=(c(0.9,0.9)),
legend.title = element_blank(),
legend.key = element_blank()),
list(i = i)))
print(i)
print(p1)
myplots[[i]] <- p1
plot(p1)
}
#Join the plots
panelplot=grid.arrange(plotlist = myplots, grobs = myplots, shared.legend=T)
Unfortunately I am unable to reproduce your example. I can recommend adding a column that includes the sum of each bar (let's name it "Bar")
The required addition to the ggplot code then involves:
geom_text(aes(label = Bar), position = position_stack(vjust = 1)) +
The text height above the bar can be adjusted with vjust
I am using the ggpubr package to do boxplots with ggboxplot. Any suggestions on how to increase the distance between the adjacent boxplots?
I have been using R for a couple of weeks and I am aware that my script might be written better.
My code:
flowdata <- read.csv("flowdata.csv", header = TRUE, sep = ";")
flowdata$Haplotype = factor(flowdata$Haplotype,levels(flowdata$Haplotype)
[c(5,1,2,3,4,6)])
library(ggpubr)
p<-ggboxplot(flowdata, x="TP", y="Treg", add = "jitter",width = 0.5, shape
= "Treatment", fill = "Haplotype", palette = c("#0092d1","#62b232","#b23a32","#b232a3","#99cccc","#132a64"))+scale_shape_manual(values = c(21,23))
p1<-p+theme(legend.title = element_blank(), legend.text = element_text(size=8), text = element_text(family = "Calibri"), axis.text.x = element_text(angle = 45, hjust = 1))+ labs(x = expression(paste("")),
y = expression(paste(CD4^+{}, CD25^+{}, "cells/µL")))
p1
The parameter 'width' specifies the width of the boxes, so a simple solution would be to reduce that value (from 0.5). This would not increase spacing of the boxes, but increase spacing between them and therefore make the boxes narrower.
However, it seems to me like your boxplots are well spaced, but your points (jitter) are overlapping, making the graph look messy. A simpler solution would be to remove them, or change them to points instead of jitter. Alternatively you could use a violin plot.
For finer control, 'standard' ggplot2 can be used, perhaps with cowplot which can give you formatting:
p <- ggplot(data = flowdata, mapping = aes(x = TP, y = Treg, fill = Haplotype)) +
geom_boxplot(position = position_dodge(0.5)) +
geom_jitter(aes(shape = Treatment)) +
scale_shape_manual(c(21, 23)) +
scale_fill_manual(c("#0092d1","#62b232","#b23a32","#b232a3","#99cccc","#132a64")) +
theme(legend.title = element_blank(), legend.text = element_text(size=8), text = element_text(family = "Calibri"), axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(x = expression(paste("")), y = expression(paste(CD4^+{}, CD25^+{}, "cells/µL")))
Building on Adams answer, I would furthermore suggest you get the boxplots into several facets, so that the plot is actually readable. I suppose the haplotype might be the interesting facet. Also, you could reduce the size of the jitter points or get some transparency alpha so that they are less present, but with the facet the boxplots are larger already, so this might solve the problem of readability by itself.
p <- ggplot(flowdata, aes(x = TP, y = Treg)) +
geom_boxplot(position = position_dodge(0.5)) +
geom_jitter(aes(shape = Treatment), size = 0.5, alpha = 0.8) +
facet_wrap(~Haplotype, ncol = 3) +
scale_shape_manual(c(21, 23)) +
theme(legend.title = element_blank(),
legend.text = element_text(size=8),
text = element_text(family = "Calibri"),
axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(x = "",
y = expression(paste(CD4^+{}, CD25^+{}, "cells/µL")))