ANOVA significance visualisation of replicate experiment data (ggplot) - r

I am struggling to get significance values of my experiment replicate data. Experiment done in duplicate for each species and i want to compare how significant the values are for each time point between each species. I am trying to do two-way ANOVA...
library(ggplot2)
library(reshape)
library(dplyr)
abs2.melt<-melt(abs2,
id.vars='Time',
measure.vars=c('WT','WT.1','DsigB','DsigB.1','DrsbR','DrsbR.1'))
print(abs2.melt)
abs2.melt.mod<-abs2.melt %>%
separate(col=variable,into=c('Species'),sep='\\.')
print(abs2.melt.mod)
ggplot(abs2.melt.mod,aes(x=Time,y=value,group=Species))+
stat_summary(
fun =mean,
geom="line",
aes(color=Species))+
stat_summary(
fun=mean,
geom="point")+
stat_summary(
fun.data=mean_cl_boot,
geom='errorbar',
width=2)+
theme_bw()+
xlab("Time")+
ylab("OD600")+
labs(title="Growth Curve of Mutant Strains")
summary(abs2.melt.mod)
print(abs2.melt.mod)
###SD and mean values
as.data.frame<-abs2.melt.mod %>% group_by(Species,Time) %>%
summarize(mean.val=mean(value), sd.val=sd(value))
anova1<-aov(value~Species,data=abs2.melt.mod)
##statistical significance?
print(as.data.frame)
anova1<-aov(Time~Species+value,data=abs2.melt.mod)
summary(anova1)

Simulate something that looks like your data
set.seed(111)
df = expand.grid(rep=1:3,Time=1:5,Species=letters[1:3])
df$value = 0.5*df$Time + rnorm(nrow(df))
df$Time = factor(df$Time)
Then we plot, allowing comparison for each time point:
library(ggplot2)
ggplot(df,aes(x=Time,y=value,col=Species)) +
stat_summary(fun.data="mean_sdl",position=position_dodge(width=0.5))
Or error bar which i think looks bad:
ggplot(df,aes(x=Time,y=value,col=Species))+
stat_summary(fun.data="mean_sdl",position=position_dodge(width=0.5),
geom="errorbar",width=0.4)
Since you have a few data points, no point doing a boxplot, so you can try something like the above

Related

I have 7 different data points for virus concentration collected at 3 different time points. How do I graph this with error bars in R?

I collected seven different samples containing varying concentrations of the Kunjin Virus.
3 samples are from the 24 hour time point: 667, 1330, 1670
2 samples are from the 48 hour time point: 323000, 590000
2 samples are from the 72 hour time point: 3430000, 4670000
How do I create a dotplot reflecting this data including error bars in R? I'm using ggplot2.
My code so far is:
data1 <-data.frame(hours, titer)
ggplot(data1, aes(x=hours, y=titer, colour = hours)) + geom_point()
I would suggest you next approach. If you want error bars you can compute it based on mean and standard deviation. In the next code is sketched the way to do that. I have used one standard deviation but you can set any other value. Also as you want to see different samples, I have used facet_wrap(). Here the code:
library(ggplot2)
library(dplyr)
#Data
df <- data.frame(sample=c(rep('24 hour',3),rep('48 hour',2),rep('72 hour',2)),
titer=c(667, 1330, 1670,323000, 590000,3430000, 4670000),
stringsAsFactors = F)
#Compute error bars
df <- df %>% group_by(sample) %>% mutate(Mean=mean(titer),SD=sd(titer))
#Plot
ggplot(df,aes(x=sample,y=titer,color=sample,group=sample))+
geom_errorbar(aes(ymin=Mean-SD,ymax=Mean+SD),color='black')+
geom_point()+
scale_y_continuous(labels = scales::comma)+
facet_wrap(.~sample,scales='free')
Output:
If you have a common y-axis scale, you can try this:
#Plot 2
ggplot(df,aes(x=sample,y=titer,color=sample,group=sample))+
geom_errorbar(aes(ymin=Mean-SD,ymax=Mean+SD),color='black')+
geom_point()+
scale_y_continuous(labels = scales::comma)+
facet_wrap(.~sample,scales = 'free_x')
Output:

Best Way to Plot three vectors in R?

I have a vector of length 10k for each of the variables x and z. For each of the 10k, I have also estimated propensity scores using logit and other methods. So I have another vector that contains the predicted propensity scores.
I want to plot predicted propensity vector as the height of the 3d graph and as a function of the x and z vectors (I want something like a surface). What is the best way to go about doing this? I tried using scatter3d() from the plot3d library and it looks very bad.
Sample data: https://www.dropbox.com/s/1lf36dpxvebd7kw/mydata2.csv?dl=0
Updated Answer
Using the data you provided, we can bin the data, get the average propensity score by bin and plot using geom_tile. I provide code for that below. A better option would be to fit the propensity score model using the x and z vectors (and the binary treatment variable that you're predicting). Then, create a new data frame of predicted pz_p values on a complete grid of x and z values and plot that. I don't have your binary treatment variable with which to fit the model, so I haven't produced an actual plot, but the code would look something like this:
# Propensity score model
m1 = glm(treat ~ x + z, data=dat, family=binomial)
# Get propensity scores on full grid of x and z values
n = 100 # Number of grid points. Adjust as needed.
pred.dat = expand.grid(x=seq(min(dat$x),max(dat$x),length=n,
z=seq(min(dat$z),max(dat$z),length=n)
pred.dat$pz_p = predict(m1, newdata=pred.dat, type="response")
ggplot(pred.dat. aes(x, z, fill=pz_p)) +
geom_tile() +
scale_fill_gradient2(low="red", mid="white", high="blue", midpoint=0.5, limits=c(0,1))
Code for tile plot with binned data:
library(tidyverse)
theme_set(theme_classic())
dat = read_csv("mydata2.csv")
# Bin by x and z
dat = dat %>%
mutate(xbin = cut(x,breaks=seq(round(min(x),1)-0.05,round(max(x),1)+0.05,0.1),
labels=seq(round(min(x),1), round(max(x),1),0.1)),
xbin=as.numeric(as.character(xbin)),
zbin = cut(z,breaks=seq(round(min(z),1)-0.1,round(max(z),1)+0.1,0.2),
labels=seq(round(min(z),1), round(max(z),1),0.2)),
zbin=as.numeric(as.character(zbin)))
# Calculate average pz_p by bin and then plot
ggplot(dat %>% group_by(xbin, zbin) %>%
summarise(pz_p=mean(pz_p)),
aes(xbin, zbin, fill=pz_p)) +
geom_tile() +
scale_fill_gradient2(low="red", mid="white", high="blue", midpoint=0.5, limits=c(0,1))
Original Answer
A heat map might work well here. For example:
library(ggplot2)
# Fake data
set.seed(2)
dat = expand.grid(x=seq(0,10,length=100),
z=seq(0,10,length=100))
dat$ps = 1/(1 + exp(0.3 + 0.2*dat$x - 0.5*dat$z))
ggplot(dat, aes(x, z, fill=ps)) +
geom_tile() +
scale_fill_gradient2(low="red", mid="white", high="blue", midpoint=0.5, limits=c(0,1)) +
coord_equal()
Or in 3D with rgl::persp3d:
library(rgl)
library(tidyverse)
x=unique(sort(dat$x))
z=unique(sort(dat$z))
ps=dat %>% spread(z, ps) %>% select(-1) %>% as.matrix
persp3d(x, z, ps, col="lightblue")

bacterial growth curve (logistic/sigmoid) with multiple explanatory variables in R

Goal: I want to obtain regression (ggplot curves and model parameters) for growth curves with multiple treatments.
I have data for bacterial cultures C={a,b,c,d} growing on nutrient sources N={x,y}.
Their idealized growth curves (measuring turbidity of cell culture every hour) look something like this:
There are 8 different curves to obtain coefficients and curves for. How can I do it in one go for my data frame, feeding the different treatments as different groups for the nonlinear regression?
Thanks!!!
This question is similar to an unanswered question posted here.
(sourcecode for idealized data, sorry it's not elegant as I'm not a computer scientist):
a<-1:20
a[1]<-0.01
for(i in c(1:19)){
a[i+1]<-1.3*a[i]*(1-a[i])
}
b<-1:20
b[1]<-0.01
for(i in c(1:19)){
b[i+1]<-1.4*b[i]*(1-b[i])
}
c<-1:20
c[1]<-0.01
for(i in c(1:19)){
c[i+1]<-1.5*c[i]*(1-c[i])
}
d<-1:20
d[1]<-0.01
for(i in c(1:19)){
d[i+1]<-1.6*d[i]*(1-d[i])
}
sub.data<-cbind(a,b,c,d)
require(reshape2)
data<-melt(sub.data, value.name = "OD600")
data$nutrition<-rep(c("x", "y"), each=5, times=4)
colnames(data)[1:2]<-c("Time", "Culture")
ggplot(data, aes(x = Time, y = OD600, color = Culture, group=nutrition)) +
theme_bw() + xlab("Time/hr") + ylab("OD600") +
geom_point() + facet_wrap(~nutrition, scales = "free")
If you are familiar group_by function from dplyr (included in tidyverse), then you can group your data by Culture and nutrition and create models for each group using broom. I think this vignette is getting at exactly what you are trying to accomplish. Here is the code all in one go:
library(tidyverse)
library(broom)
library(mgcv) #For the gam model
data %>%
group_by(Culture, nutrition) %>%
do(fit = gam(OD600 ~ s(Time), data = ., family=gaussian())) %>% # Change this to whatever model you want (e.g., non-linear regession, sigmoid)
#do(fit = lm(OD600 ~ Time, data = .,)) %>% # Example using linear regression
augment(fit) %>%
ggplot(aes(x = Time, y = OD600, color = Culture)) + # No need to group by nutrition because that is broken out in the facet_wrap
theme_bw() + xlab("Time/hr") + ylab("OD600") +
geom_point() + facet_wrap(~nutrition, scales = "free") +
geom_line(aes(y = .fitted, group = Culture))
If you are ok without one go, break apart the %>% for better understanding. I used GAM which overfits here but you could replace this with whatever model you want, including sigmoid.

Proper display of confidence interval in R using ggplot

I'm trying to make a plot that will represent 2 measurements(prr and ebgm) for different adverse reactions of different drugs grouped by age category like so:
library(ggplot2)
strata <- factor(c("Neonates", "Infants", "Children", "Adolescents", "Pediatrics"), levels=c("Neonates", "Infants", "Children", "Adolescents", "Pediatrics"), order=T)
Data <- data.frame(
strata = sample(strata, 200, replace=T),
drug=sample(c("ibuprofen", "clarithromycin", "fluticasone"), 200, replace=T), #20 de medicamente
reaction=sample(c("Liver Injury", "Sepsis", "Acute renal failure", "Anaphylaxis"), 200, replace=T),
measurement=sample(c("prr", "EBGM"), 200, replace=T),
value_measurement=sample(runif(16), 200, replace=T),
lower_CI=sample(runif(6), 200, replace=T),
upper_CI=sample(runif(5), 200, replace=T)
)
g <- ggplot(Data, aes(x=strata, y=value_measurement, fill=measurement, group=measurement))+
geom_histogram(stat="identity", position="dodge")+
facet_wrap(~reaction)+
geom_errorbar(aes(x=strata, ymax=upper_CI, ymin=lower_CI), position="dodge", stat="identity")
ggsave(file="meh.png", plot=g)
The upper and lower CI are the confidence interval limit of the measurement. Given that I for each measurement I have a confidence interval I want the proper histogram to have the corresponding confidence interval, but what I get is s follows.
Graph:
Any ideas how to place those nasty conf intervals properly? Thank you!
Later edit: in the original data for a given drug I have many rows each containing an adverse reaction, the age category and each of these categories has 2 measurements: prr or EBGM and the corresponding confidence interval. This is not reflected in the data simulation.
The problem is that each of your bars is really multiple bars plotted over each other, because you have more than one row of data for each combination of reaction, strata, and measurement. (You're getting multiple error bars for the same reason.)
You can see this in the code below, where I've changed geom_histogram to geom_bar and added alpha=0.3 and colour="grey40" to show the multiple overlapping bars. I've also commented out the error bars.
ggplot(Data, aes(x=strata, y=value_measurement, fill=measurement, group=measurement)) +
geom_bar(stat="identity", position="dodge", alpha=0.3, colour="grey40") +
facet_wrap(~reaction) #+
# geom_errorbar(aes(x=strata, ymax=upper_CI, ymin=lower_CI),
# position="dodge", stat="identity")
You can fix this by adding another column to your data that adds a grouping category by which you can separate these bars. For example, in the code below we add a new column called count that just assigns numbers 1 through n for each row of data within each combination of reaction and strata. We sort by measurement so that each measurement type will be kept together in the count sequence.
library(dplyr)
Data = Data %>% group_by(reaction, strata) %>%
arrange(measurement) %>%
mutate(count = 1:n())
Now plot the data:
ggplot(Data, aes(x=strata, y=value_measurement,
fill=measurement, group=count)) +
geom_bar(stat="identity", position=position_dodge(0.7), width=0.6) +
facet_wrap(~reaction, ncol=1) +
geom_errorbar(aes(x=strata, ymax=upper_CI, ymin=lower_CI, group=count),
position=position_dodge(0.7), stat="identity", width=0.3)
Now you can see the separate bars, along with their error bars (which are weird, but only because they're fake data).

Three-way graph (variable, mean, sd) with ggplot2

I think I have an error in my logic while reproducing a graph I found in this pdf here.
It should be fairly easy to do, but I have issues to plot a variable with its mean and standard deviation each in their own graph together, as can be seen in the example graph below. Did they do it with facet_grid() or facet_wrap()?
How can I plot an arbitrary variable in that way? In particular, I would not know how to plot the mean and sd over distance (or time).
Example graph:
Here's my approach to the solution outlined by #DavidArenburg (though I simplified the data a little, using simple cumulative statistics and a plain index):
library(tidyr)
library(dplyr)
library(TTR)
v <- rnorm(1000)
df <- data.frame(index = 1:1000,
variable = v,
mean = runMean(v, n=1, cumulative=TRUE),
sd = runSD(v, n=1, cumulative=TRUE))
dd <- gather(df, facet, value, -index)
ggplot(dd, aes(x = index, y = value)) +
geom_path() +
facet_grid(facet ~ .)
Bonus: illustration that sample mean and sd are unbiased (0 and 1, respectively).

Resources