Goal: I want to obtain regression (ggplot curves and model parameters) for growth curves with multiple treatments.
I have data for bacterial cultures C={a,b,c,d} growing on nutrient sources N={x,y}.
Their idealized growth curves (measuring turbidity of cell culture every hour) look something like this:
There are 8 different curves to obtain coefficients and curves for. How can I do it in one go for my data frame, feeding the different treatments as different groups for the nonlinear regression?
Thanks!!!
This question is similar to an unanswered question posted here.
(sourcecode for idealized data, sorry it's not elegant as I'm not a computer scientist):
a<-1:20
a[1]<-0.01
for(i in c(1:19)){
a[i+1]<-1.3*a[i]*(1-a[i])
}
b<-1:20
b[1]<-0.01
for(i in c(1:19)){
b[i+1]<-1.4*b[i]*(1-b[i])
}
c<-1:20
c[1]<-0.01
for(i in c(1:19)){
c[i+1]<-1.5*c[i]*(1-c[i])
}
d<-1:20
d[1]<-0.01
for(i in c(1:19)){
d[i+1]<-1.6*d[i]*(1-d[i])
}
sub.data<-cbind(a,b,c,d)
require(reshape2)
data<-melt(sub.data, value.name = "OD600")
data$nutrition<-rep(c("x", "y"), each=5, times=4)
colnames(data)[1:2]<-c("Time", "Culture")
ggplot(data, aes(x = Time, y = OD600, color = Culture, group=nutrition)) +
theme_bw() + xlab("Time/hr") + ylab("OD600") +
geom_point() + facet_wrap(~nutrition, scales = "free")
If you are familiar group_by function from dplyr (included in tidyverse), then you can group your data by Culture and nutrition and create models for each group using broom. I think this vignette is getting at exactly what you are trying to accomplish. Here is the code all in one go:
library(tidyverse)
library(broom)
library(mgcv) #For the gam model
data %>%
group_by(Culture, nutrition) %>%
do(fit = gam(OD600 ~ s(Time), data = ., family=gaussian())) %>% # Change this to whatever model you want (e.g., non-linear regession, sigmoid)
#do(fit = lm(OD600 ~ Time, data = .,)) %>% # Example using linear regression
augment(fit) %>%
ggplot(aes(x = Time, y = OD600, color = Culture)) + # No need to group by nutrition because that is broken out in the facet_wrap
theme_bw() + xlab("Time/hr") + ylab("OD600") +
geom_point() + facet_wrap(~nutrition, scales = "free") +
geom_line(aes(y = .fitted, group = Culture))
If you are ok without one go, break apart the %>% for better understanding. I used GAM which overfits here but you could replace this with whatever model you want, including sigmoid.
Related
I have data saved in multiple datasets, each consisting of four variables. Imagine something like a data.table dt consisting of the variables Country, Male/Female, Birthyear, Weighted Average Income. I would like to create a graph where you see only one country's weighted average income by birthyear and split by male/female. I've used the facet_grid() function to get a grid of graphs for all countries as below.
ggplot() +
geom_line(data = dt,
aes(x = Birthyear,
y = Weighted Average Income,
colour = 'Weighted Average Income'))+
facet_grid(Country ~ Male/Female)
However, I've tried isolating the graphs for just one country, but the below code doesn't seem to work. How can I subset the data correctly?
ggplot() +
geom_line(data = dt[Country == 'Germany'],
aes(x = Birthyear,
y = Weighted Average Income,
colour = 'Weighted Average Income'))+
facet_grid(Country ~ Male/Female)
For your specific case the problem is that you are not quoting Male/Female and Weighted Average Income. Also your data and basic aesthetics should likely be part of ggplot and not geom_line. Doing so isolates these to the single layer, and you would have to add the code to every layer of your plot if you were to add for example geom_smooth.
So to fix your problem you could do
library(tidyverse)
plot <- ggplot(data = dt[Country == 'Germany'],
aes(x = Birthyear,
y = sym("Weighted Average Income"),
col = sym("Weighted Average Income")
) + #Could use "`x`" instead of sym(x)
geom_line() +
facet_grid(Country ~ sym("Male/Female")) ##Could use "`x`" instead of sym(x)
plot
Now ggplot2 actually has a (lesser known) builtin functionality for changing your data, so if you wanted to compare this to the plot with all of your countries included you could do:
plot %+% dt # `%+%` is used to change the data used by one or more layers. See help("+.gg")
I am struggling to get significance values of my experiment replicate data. Experiment done in duplicate for each species and i want to compare how significant the values are for each time point between each species. I am trying to do two-way ANOVA...
library(ggplot2)
library(reshape)
library(dplyr)
abs2.melt<-melt(abs2,
id.vars='Time',
measure.vars=c('WT','WT.1','DsigB','DsigB.1','DrsbR','DrsbR.1'))
print(abs2.melt)
abs2.melt.mod<-abs2.melt %>%
separate(col=variable,into=c('Species'),sep='\\.')
print(abs2.melt.mod)
ggplot(abs2.melt.mod,aes(x=Time,y=value,group=Species))+
stat_summary(
fun =mean,
geom="line",
aes(color=Species))+
stat_summary(
fun=mean,
geom="point")+
stat_summary(
fun.data=mean_cl_boot,
geom='errorbar',
width=2)+
theme_bw()+
xlab("Time")+
ylab("OD600")+
labs(title="Growth Curve of Mutant Strains")
summary(abs2.melt.mod)
print(abs2.melt.mod)
###SD and mean values
as.data.frame<-abs2.melt.mod %>% group_by(Species,Time) %>%
summarize(mean.val=mean(value), sd.val=sd(value))
anova1<-aov(value~Species,data=abs2.melt.mod)
##statistical significance?
print(as.data.frame)
anova1<-aov(Time~Species+value,data=abs2.melt.mod)
summary(anova1)
Simulate something that looks like your data
set.seed(111)
df = expand.grid(rep=1:3,Time=1:5,Species=letters[1:3])
df$value = 0.5*df$Time + rnorm(nrow(df))
df$Time = factor(df$Time)
Then we plot, allowing comparison for each time point:
library(ggplot2)
ggplot(df,aes(x=Time,y=value,col=Species)) +
stat_summary(fun.data="mean_sdl",position=position_dodge(width=0.5))
Or error bar which i think looks bad:
ggplot(df,aes(x=Time,y=value,col=Species))+
stat_summary(fun.data="mean_sdl",position=position_dodge(width=0.5),
geom="errorbar",width=0.4)
Since you have a few data points, no point doing a boxplot, so you can try something like the above
From a bootstrapping model I have 1000 sets of coefficients for this regression model:
y = b0 + b1x + b2(x^2)
What is the function call to plot a quadratic line if I already have the coefficients? I.E. I do not want to "fit" a linear model to my data.
I tried adding lines via a for loop to my ggplot object:
for (i in 1:1000) {
reg_line <- stat_function(fun=function(x) quad$coefficients[1] +
quad$coefficients[i,2]*x + quad$coefficients[i,3]*(x**2))
reg_lines <- reg_lines + reg_line}
That didn't work - it seems to only add the last line in the loop.
The reason I want to add 1000 regression lines to my plot is because it is for a homework problem - I am well aware this is not a common use case.
There may be other ways to do this, but hopefully this can give you some ideas. I used the mtcars dataset and generated some bootstrap samples for modelling. You can skip this step.
library(ggplot2)
library(tidyr)
library(dplyr)
data(mtcars)
drat=seq(min(mtcars$drat), max(mtcars$drat), length.out=100)
# Bootstrap function
bs <- function() {
df = mtcars[sample(1:nrow(mtcars), replace=TRUE),]
lm_fit <- lm(mpg ~ drat+I(drat^2), data=df)
data.frame(Model=predict(lm_fit, newdata=data.frame(drat))) # Replace with your own
}
foo <- replicate(10, bs()) # Simulate
You would start from here since you should already have a data frame or list of predicted values from your 1,000 bootstrap models. Reshape it into a very long form to create a grouping column for the geom_line function.
foo_long <- data.frame(foo, drat) %>%
pivot_longer(cols=-drat, names_to="Model", values_to="mpg")
ggplot(data = mtcars, aes(x = drat, y = mpg)) +
geom_point(color='blue') +
geom_line(data = foo_long, aes(x=drat, y=mpg, group=Model, color=Model)) +
guides(color=FALSE)
I calculated a linear-mixed model using the nlme package. I was evaluating a psychological treatment and used treatment condition and measurement point as predictors. I did post-hoc comparisons using the emmans package. So far so good, everything worked out well and I am looking forward to finish my thesis. There is only one problem left. I am really really bad in plotting. I want to plot the emmeans for the four measurement points for each group. The emmip function in emmeans does this, but I am not that happy with the result. I used the following code to generate the result:
emmip(HLM_IPANAT_pos, Gruppe~TP, CIs=TRUE) + theme_bw() + labs(x = "Zeit", y = "IPANAT-PA")
I don't like the way the confidence intervals are presented. I would prefer a line bar with "normal" confidence bars, like the one below, which is taken from Ireland et al. (2017). I tried to do it in excel, but did not find out how to integrate seperate confidence intervals for each line. So I was wondering if there was the possibility to do it using ggplot2. However, I do not know how to integrate the values I obtained using emmeans in ggplot. As I said, I really have no idea about plotting. Does someone know how to do it?
I think it is possible. Rather than using emmip to create the plot, you could use emmeans to get the values for ggplot2. With ggplot2 and the data, you might be able to better control the format of the plot. Since I do not have your data, I can only suggest a few steps.
First, after fitting the model HLM_IPANAT_pos, get values using emmeans. Second, broom::tidy this object. Third, ggplot the above broom::tidy object.
Using mtcars data as an example:
library(emmeans)
# mtcars data
mtcars$cyl = as.factor(mtcars$cyl)
# Model
mymodel <- lm(mpg ~ cyl * am, data = mtcars)
# using ggplot2
library(tidyverse)
broom::tidy(emmeans(mymodel, ~ am | cyl)) %>%
mutate(cyl_x = as.numeric(as.character(cyl)) + 0.1*am) %>%
ggplot(aes(x = cyl_x, y = estimate, color = as.factor(am))) +
geom_point() +
geom_line() +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.1)
Created on 2019-12-29 by the reprex package (v0.3.0)
I have a vector of length 10k for each of the variables x and z. For each of the 10k, I have also estimated propensity scores using logit and other methods. So I have another vector that contains the predicted propensity scores.
I want to plot predicted propensity vector as the height of the 3d graph and as a function of the x and z vectors (I want something like a surface). What is the best way to go about doing this? I tried using scatter3d() from the plot3d library and it looks very bad.
Sample data: https://www.dropbox.com/s/1lf36dpxvebd7kw/mydata2.csv?dl=0
Updated Answer
Using the data you provided, we can bin the data, get the average propensity score by bin and plot using geom_tile. I provide code for that below. A better option would be to fit the propensity score model using the x and z vectors (and the binary treatment variable that you're predicting). Then, create a new data frame of predicted pz_p values on a complete grid of x and z values and plot that. I don't have your binary treatment variable with which to fit the model, so I haven't produced an actual plot, but the code would look something like this:
# Propensity score model
m1 = glm(treat ~ x + z, data=dat, family=binomial)
# Get propensity scores on full grid of x and z values
n = 100 # Number of grid points. Adjust as needed.
pred.dat = expand.grid(x=seq(min(dat$x),max(dat$x),length=n,
z=seq(min(dat$z),max(dat$z),length=n)
pred.dat$pz_p = predict(m1, newdata=pred.dat, type="response")
ggplot(pred.dat. aes(x, z, fill=pz_p)) +
geom_tile() +
scale_fill_gradient2(low="red", mid="white", high="blue", midpoint=0.5, limits=c(0,1))
Code for tile plot with binned data:
library(tidyverse)
theme_set(theme_classic())
dat = read_csv("mydata2.csv")
# Bin by x and z
dat = dat %>%
mutate(xbin = cut(x,breaks=seq(round(min(x),1)-0.05,round(max(x),1)+0.05,0.1),
labels=seq(round(min(x),1), round(max(x),1),0.1)),
xbin=as.numeric(as.character(xbin)),
zbin = cut(z,breaks=seq(round(min(z),1)-0.1,round(max(z),1)+0.1,0.2),
labels=seq(round(min(z),1), round(max(z),1),0.2)),
zbin=as.numeric(as.character(zbin)))
# Calculate average pz_p by bin and then plot
ggplot(dat %>% group_by(xbin, zbin) %>%
summarise(pz_p=mean(pz_p)),
aes(xbin, zbin, fill=pz_p)) +
geom_tile() +
scale_fill_gradient2(low="red", mid="white", high="blue", midpoint=0.5, limits=c(0,1))
Original Answer
A heat map might work well here. For example:
library(ggplot2)
# Fake data
set.seed(2)
dat = expand.grid(x=seq(0,10,length=100),
z=seq(0,10,length=100))
dat$ps = 1/(1 + exp(0.3 + 0.2*dat$x - 0.5*dat$z))
ggplot(dat, aes(x, z, fill=ps)) +
geom_tile() +
scale_fill_gradient2(low="red", mid="white", high="blue", midpoint=0.5, limits=c(0,1)) +
coord_equal()
Or in 3D with rgl::persp3d:
library(rgl)
library(tidyverse)
x=unique(sort(dat$x))
z=unique(sort(dat$z))
ps=dat %>% spread(z, ps) %>% select(-1) %>% as.matrix
persp3d(x, z, ps, col="lightblue")