Show statistically significant difference in a graph - r

I have carried out an experiment with six treatments and each treatment was performed in the light and darkness. I have used ggplot2 to make bar plot graph. I would like add the significance letters (e.g. LSD result) into the graph to show the difference between light and darkness for each treatment but it gives me an error.
Any suggestion?
data <- read.table(header = TRUE, text =
'T0 T1 T2 T3 T4 T5 LVD
40 62 50 45 45 58 Light
30 60 44 40 30 58 Light
30 68 42 35 32 59 Light
47 75 58 55 50 70 Dark
45 75 52 54 42 78 Dark
50 75 68 48 56 75 Dark
')
gla <- melt(data,id="LVD")
ggplot(gla, aes(x=variable, y=value, fill=as.factor(LVD))) +
stat_summary(fun.y=mean,
geom="bar",position=position_dodge(),colour="black",width=.7,size=.7) +
stat_summary(fun.ymin=min,fun.ymax=max,geom="errorbar",
color="black",position=position_dodge(.7), width=.2) +
scale_fill_manual("Legend", values = c("Light" = "white", "Dark" ="gray46")) +
xlab("Treatments")+
ylab("Germination % ") +
theme(panel.background = element_rect(fill = 'white', colour = 'black'))
till here it perfectly works but when I use geom_text it gives an error
+ geom_text(aes(label=c("a","b","a","a","a","a, a","b","a","b","a","b")))
The error is:
Error: Aesthetics must be either length 1 or the same as the data (36): label, x, y, fill

The problem is that you have 36 data points, which you summarize to 12. ggplot will only allow mapping to 36 data points in geom_text (which the error tells you). In order to use the summarized 12 points, you do need to use stat_summary once again.
The basic rule is that statistical transformations (like summaries) do *not* transfer between layers (i.e. geoms and stats). So geom_text has no idea what the y values computed by the original stat_summary actually are.
Then you also need to fix the typo in your letters.
We end up with:
ggplot(gla, aes(x=variable, y=value, fill=as.factor(LVD))) +
stat_summary(fun.y=mean,
geom="bar",position=position_dodge(),colour="black",width=.7,size=.7) +
stat_summary(fun.ymin=min,fun.ymax=max,geom="errorbar",
color="black",position=position_dodge(.7), width=.2) +
stat_summary(geom = 'text', fun.y = max, position = position_dodge(.7),
label = c("a","b","a","a","a","a", "a","b","a","b","a","b"), vjust = -0.5) +
scale_fill_manual("Legend", values = c("Light" = "white", "Dark" ="gray46")) +
xlab("Treatments") +
ylab("Germination % ") +
scale_y_continuous(expand = c(0, 0), limits = c(0, 85)) +
theme_bw()
I don't like dynamite plots, so here's my version:
let <- c("a","b","a","a","a","a", "a","b","a","b","a","b")
stars <- ifelse(let[c(TRUE, FALSE)] == let[c(FALSE, TRUE)], '', '*')
ggplot(gla, aes(x = variable, y = value)) +
stat_summary(aes(col = as.factor(LVD)),
fun.y=mean, fun.ymin = min, fun.ymax = max,
position = position_dodge(.3), size = .7) +
stat_summary(geom = 'text', fun.y = max, position = position_dodge(.3),
label = stars, vjust = 0, size = 6) +
scale_color_manual("Legend", values = c("Light" = "black", "Dark" ="gray46")) +
xlab("Treatments") +
ylab("Germination % ") +
scale_y_continuous(expand = c(0.1, 0)) +
theme_bw()

I fount it the simplest way to show the statistical significance with asterisks and lines.
fig2 + geom_text(x=1.5,y=89, label = "***") + annotate("segment", x=c(1,1,2), xend=c(1,2,2), y=c(84,86,86), yend=c(86,86,84), size=1)adds 'geom_text' and 'annotate'
[1]: https://i.stack.imgur.com/fs0zN.png

Related

Is it possible to add few more details like rich factor to the bar graph along with the pvalve?

Pathway
#Proteins
Pvalue
Richfactor
Peptide chain elongation
90
1.11E-16
0.5
Translation elongation
79
1.11E-16
0.7
P53 pathway
50
1.11E-16
0.2
cGAS sting pathway
20
1.11E-16
0.4
The above given is the data. Using this data i tried to generate bar graph with pvalue and proteins but i want to add additional details to graph like Rich factor given in the data above.
library(ggplot2)
library(viridis)
top_fun <- read.delim(file="Pathways.txt",header = TRUE)
topfun <- as.data.frame(top_fun)
#Turn your 'Name' column into a character vector
topfun$Pathway <- as.character(topfun$Pathway)
#Then turn it back into a factor with the levels in the correct order
topfun$Pathway<- factor(topfun$Pathway, levels=unique(topfun$Pathway))
ggplot(topfun,aes(x=Group,y=topfun$Proteins,fill=topfun$Pvalue)) +
geom_col(position="dodge",width=0.4) +
coord_flip() + scale_fill_viridis(option="mako")+
facet_grid(Pathway~.)+
theme(strip.text.y = element_text(angle = 0))
Using the above code i generated this graph
I want to add additional details like rich factor to the graph. Thanks for the help!.
The obvious thing to do is to map Richfactor to the fill variable. You can add the p values directly as text, since they don't seem to be very helpful mapped to the fill scale, at least in this example
ggplot(topfun,aes(x = 'WT', y = Proteins, fill = Richfactor)) +
geom_col(position = "dodge", width = 0.4, color = 'gray50') +
geom_text(aes(y = 1, label = paste('p =', Pvalue), color = Pathway),
hjust = 0) +
coord_flip() +
scale_fill_viridis_c(option = "mako") +
facet_grid(Pathway ~ .) +
theme(strip.text.y = element_text(angle = 0)) +
scale_color_manual(values = c('black', 'black', 'white', 'black'),
guide = 'none')

How to Add Extra Labels on y-axis without Data in ggplot2

I am making a plot showing two sets of regression coefficients and standard errors and the graph is as follow:
What I want to do further is to add extra variables without any data on the y-axis. For instance, put a label FeatGender on top of the label FeatGenderMale, or for another example, put a label FeatEU in between the label of FeatPartyIDLiberal Democrats and the label of FeatEUIntegrationSupportEUIntegration. Below is the reduced version of data:
coef se low high sex
1 -0.038848364 0.02104994 -0.080106243 0.002409514 Female
2 0.095831201 0.02793333 0.041081877 0.150580526 Female
3 0.050972670 0.02828353 -0.004463052 0.106408391 Female
4 -0.183558492 0.02454943 -0.231675377 -0.135441606 Female
5 0.044879447 0.02712518 -0.008285914 0.098044808 Female
6 -0.003858672 0.03005477 -0.062766024 0.055048681 Male
7 0.003048763 0.04687573 -0.088827676 0.094925203 Male
8 0.015343897 0.03948959 -0.062055700 0.092743494 Male
9 -0.132600259 0.04146323 -0.213868197 -0.051332322 Male
10 -0.029764559 0.04600719 -0.119938650 0.060409533 Male
Here are my codes:
v_name <- c("FeatGenderMale", "FeatPartyIDLabourParty", "FeatPartyIDLiberalDemocrats",
"FeatEUIntegrationOpposeEUIntegration", "FeatEUIntegrationSupportEUIntegration")
t <- ggplot(temp, aes(x=c(v_name,v_name), y=coef, group=sex, colour=sex))
t +
geom_point(position = position_dodge(width = 0.3)) +
geom_errorbar(aes(ymin = low, ymax = high, width = 0), position = position_dodge(0.3)) +
coord_flip() +
scale_x_discrete(limits = rev(v_name)) +
geom_hline(yintercept = 0.0, linetype = "dotted") +
theme(legend.position = "bottom")
Thanks for the help!
Here's an approach that first applies the v_name into the source data frame, but then uses a longer appended version of the v_name vector for the axis.
library(ggplot2); library(dplyr)
# Add the v_name into the table
temp2 <- temp %>% group_by(sex) %>% mutate(v_name = v_name) %>% ungroup()
# Make the dummy label for axis with add'l entries
v_name2 <- append(v_name, "FeatGender", after = 0)
v_name2 <- append(v_name2, "FeatEU", after = 4)
# Plot using the new table
t <- ggplot(temp2, aes(x=v_name, y=coef, group=sex, colour=sex))
t +
geom_point(position = position_dodge(width = 0.3)) +
geom_errorbar(aes(ymin = low, ymax = high, width = 0), position = position_dodge(0.3)) +
coord_flip() +
# ... but use the larger list of axis names
scale_x_discrete(limits = rev(v_name2)) +
geom_hline(yintercept = 0.0, linetype = "dotted") +
theme(legend.position = "bottom")

ggplot2 multiple stat_smooth: change color & linetype

I am not able to change the colors and linetypes of my current plot with multiple smoother ( stat_smooth())
Here an overview of the data structure:
serviceInstanceName timestamp value
1 DE1Service-utilityPredicted 2014-02-22 10.000000
2 SE1Service-utilityPredicted 2014-02-22 4.385694
3 DE2Service-utilityPredicted 2014-02-22 0.000000
4 US1Service-utilityPredicted 2014-02-22 2.230000
5 DE1Service-utilityActual 2014-02-22 10.000000
6 SE1Service-utilityActual 2014-02-22 8.011919
7 DE2Service-utilityActual 2014-02-22 3.000000
8 US1Service-utilityActual 2014-02-22 1.325191
...
There are eight unique service instances with according timestamp (y axis) and value (x axis).
Here the code:
ggplot(rmm, aes(x=timestamp, y=value, color=serviceInstanceName, group=serviceInstanceName))
+ stat_smooth(size=1.5, method = "loess", level = 0.95, fullrange = TRUE, se = FALSE)
+ scale_x_datetime(breaks = date_breaks("1 day"), labels = date_format("%a/%m"))
+ theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Day")
+ ylab("Utility") + ggtitle("Utility Trend")
Here the plot:
What I want:
=> To manually change the linetypes and colors of each unique *serviceInstanceName' attribute value. I tried many things from scale_color_manual() .. until extracting the value of the smoother.. but really could not solve this.
Any help is appreciated. Thanks!
Well, your data wasn't that hlpeful for recreating the plot so i created a different sample data set
rmm<-data.frame(
timestamp = as.POSIXct(rep(seq(as.Date("2014-01-01"),
as.Date("2014-01-10"), by="1 day"),5)),
serviceInstanceName = rep(letters[1:5], each=10),
value = cumsum(rnorm(50))
)
And i'm not sure exactly what you tried, but scale_color_manual should have worked. And if you want to change the line type you need to set that in the aes()
library(ggplot2)
library(scales)
ggplot(rmm, aes(x=timestamp, y=value,
color=serviceInstanceName, linetype=serviceInstanceName)) +
stat_smooth(size=1.5, method = "loess", level = 0.95,
fullrange = TRUE, se = FALSE) +
scale_x_datetime(breaks = date_breaks("1 day"),
labels = date_format("%a/%m")) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Day") +
ylab("Utility") + ggtitle("Utility Trend") +
scale_color_manual(values=c(a="orange",b="yellow",
c="red", d="sienna",e="cornsilk"))

geom_lines not linking what they should with error bars plot in ggplot

I have the following dataset ready to plot an error bars and lines graph
> growth
treatment class variable N value sd se ci
1 elevated Dominant RBAI2012 18 0.014127713 0.009739951 0.002295728 0.004843564
2 elevated Dominant RBAI2013 18 0.021869978 0.013578741 0.003200540 0.006752549
3 elevated Codominant RBAI2012 40 0.011564725 0.013718591 0.002169100 0.004387418
4 elevated Codominant RBAI2013 41 0.011471512 0.011091167 0.001732149 0.003500804
5 elevated Subordinate RBAI2012 24 0.004419784 0.009286883 0.001895677 0.003921507
6 elevated Subordinate RBAI2013 24 0.004397105 0.008704831 0.001776866 0.003675728
7 ambient Dominant RBAI2012 13 0.025836265 0.011880315 0.003295007 0.007179203
8 ambient Dominant RBAI2013 13 0.025992636 0.015162901 0.004205432 0.009162850
9 ambient Codominant RBAI2012 26 0.018067329 0.011830940 0.002320238 0.004778620
10 ambient Codominant RBAI2013 26 0.015595275 0.012467140 0.002445007 0.005035587
11 ambient Subordinate RBAI2012 33 0.006073904 0.008287442 0.001442658 0.002938599
12 ambient Subordinate RBAI2013 35 0.003239033 0.006846507 0.001157271 0.002351857
I've tried the following code, resulting this plot:
p <- ggplot(growth,aes(class,value,colour=treatment,group=variable))
pd<-position_dodge(.9)
# se= standard error; ci=confidence interval
p + geom_errorbar(aes(ymin=value-se,ymax=value+se),width=.1,position=pd,colour="black") + geom_point(position=pd,size=4) + geom_line(position=pd) +
theme_bw() + theme(legend.position=c(1,1),legend.justification=c(1,1))
The lines should link the points of their same color within each x-axis category, but clearly they don't. Please, could you help me draw the lines properly (e.g blue with blue and red with red within "Dominant" class, different lines for "codominant" class.
Also, do you know how to include in the x-labels the variables I am grouping with (i.e. "RBAI2012","RBAI2013"?
Many thanks
To distinguish also between different of levels of 'variable' you may introduce a fourth aesstetic: shape. First define a new grouping variable, a combination of 'treatment' and 'variable', which has four levels. Map group, colours and shape to this variable. Then use scale_colour_manual and scale_shape_manual to set two levels of colours, which corresponds to the two levels of 'treatment'. Similarly, define two 'variable' shapes.
growth$grp <- paste0(growth$treatment, growth$variable)
ggplot(data = growth, aes(x = class, y = value, group = grp,
colour = grp, shape = grp)) +
geom_point(size = 4, position = pd) +
geom_line(position = pd) +
geom_errorbar(aes(ymin = value - se, ymax = value + se), colour = "black",
position = pd, width = 0.1) +
scale_colour_manual(name = "Treatment:Variable",
values = c("red", "red","blue", "blue")) +
scale_shape_manual(name = "Treatment:Variable",
values = c(19, 17, 19, 17))
theme_bw() +
theme(legend.position = c(1,1), legend.justification = c(1,1))
One option is using a facet plot like so:
p <- ggplot(growth, aes(x = class, y = value, group = treatment, color = treatment))
p + geom_point(size = 4) + facet_grid(. ~ variable) + geom_errorbar(aes(ymin=value-se,ymax=value+se),width=.1,colour="black") + geom_line()
If you want it on one graph, another option is defining a new variable that combines treatment and variable:
growth$treatment_variable <- paste(growth$treatment, growth$variable)
p <- ggplot(growth, aes(x = class, y = value, group = treatment_variable, colour = treatment_variable))
pd<-position_dodge(.2)
p + geom_point(size = 4, position=pd) + geom_errorbar(aes(ymin=value-se, ymax=value+se), width=.1, position=pd, colour="black") + geom_line(position=pd)
You have too many grouping variables (variable and treatment) and including them in a single plot may be a bit confusing. You might want to use faceting, like this:
p <- ggplot(growth,aes(class,value,colour=treatment,group=treatment))
pd<-position_dodge(.9)
p +
geom_errorbar(aes(ymin=value-se,ymax=value+se),width=.1,position=pd,colour="black") +
geom_point(position=pd,size=4) + geom_line(position=pd) +
theme_bw() + theme(legend.position=c(1,1),legend.justification=c(1,1)) +
facet_grid(variable~treatment)
It is possible to do this, but you need to hack it since you're essentially plotting a geom_line() on different groupings (variable + treatment) than with the geom_point() and geom_errorbar() calls.
You need to use ggplot_build() to get back the rendered data and draw a geom_line(), based on the existing points data, grouped by colour:
p <- ggplot(growth) # move the aes() into the individual charts
pd<-position_dodge(.9) # leave dodge as is
se<-0.01 # faked this
p <- p +
geom_point(aes(x=factor(class),y=value,colour=treatment,group=variable),position=pd,size=4) +
theme_bw() + theme(legend.position=c(1,1),legend.justification=c(1,1)) +
geom_errorbar(aes(x=factor(class),ymin=value-se,ymax=value+se,colour=treatment,group=variable),position=pd,width=.1,colour="black")
b<-ggplot_build(p)$data[[1]] # get the ggpolt rendered data for this panel
p + geom_line(data=b,aes(x,y,group=colour), color=b$colour) # plot the lines

Errorbars look like pointrange (ggplot2)

I have the following data frame:
> df <- read.table("throughputOverallSummary.txt", header = TRUE)
> df
ExperimentID clients connections msgSize Mean Deviation Error
1 77 100 50 1999 142.56427 8.368127 0.4710121
2 78 200 50 1999 284.22705 13.575943 0.3832827
3 79 400 50 1999 477.48997 44.820831 0.7538666
4 80 600 50 1999 486.87102 49.916391 0.8240869
5 81 800 50 1999 488.84899 51.422070 0.8462216
6 82 10 50 1999 15.23667 1.995150 1.0498722
7 83 50 50 1999 71.94000 5.197893 0.5793057
and some code that processes the dataframe df above:
msg_1999 = subset(df, df$msgSize == 1999)
if (nrow(msg_1999) > 0) {
limits = aes(ymax = msg_1999$Mean + msg_1999$Deviation, ymin = msg_1999$Mean -
msg_1999$Deviation)
ggplot(data = msg_1999, aes(clients, Mean, color = as.factor(connections), group =
as.factor(connections))) +
geom_point() + geom_line() +
geom_errorbar(limits, width = 0.25) +
xlab("Number of Clients") +
ylab("Throughput (in messages/second)") +
labs(title = "Message size 1999 bytes", color = "Connections")
ggsave(file = "throughputMessageSize1999.png")
}
My problem is that the error bars in the plot look like pointrange. The horizontal bars at the upper and lower end of the error bars are missing.
Ideally, the error bars should have looked something like this:
Why do errorbars from my code look different?
The width parameter as the same scale as x, you have given width = 0.25, where the range of the x axis is 0-800. A bar with width 0.25 is not going to be visible on this graph. If you don't set the width value, then something reasonably sensible is guessed.
ggplot(data = df, aes(clients, Mean, color = as.factor(connections), group =
as.factor(connections))) +
geom_point() + geom_line() +
geom_errorbar(aes(ymax = Mean + Deviation, ymin=Mean-Deviation)) +
xlab("Number of Clients") +
ylab("Throughput (in messages/second)") +
labs(title = "Message size 1999 bytes", color = "Connections")
Note that if you want to predefine your mapping argument, you should still specify the variables as you would within a call to geom_xxxx. aes (and ggplot) does some fancy footwork to ensure that this will be evaluated within the correct environment at the time of plotting.
Thus the following will work
limits <- aes(ymax = Mean + Deviation, ymin=Mean-Deviation)
ggplot(data = df, aes(clients, Mean, color = as.factor(connections), group =
as.factor(connections))) +
geom_point() + geom_line() +
geom_errorbar(limits) +
xlab("Number of Clients") +
ylab("Throughput (in messages/second)") +
labs(title = "Message size 1999 bytes", color = "Connections")

Resources