Visualizing multiple curves in ggplot from bootstrapping, curve fitting - r

I have time series data that is well modeled using a sinusoidal curve. I'd like to visualize the uncertainty in the fitted model using bootstrapping.
I adapted the approach from here. I am also interested in this approach too, using nlsBoot. I can get the first approach to run, but the resulting plot contains curves that are not continuous, but jagged.
library(dplyr)
library(broom)
library(ggplot2)
xdata <- c(-35.98, -34.74, -33.46, -32.04, -30.86, -29.64, -28.50, -27.29, -26.00,
-24.77, -23.57, -22.21, -21.19, -20.16, -18.77, -17.57, -16.47, -15.35,
-14.40, -13.09, -11.90, -10.47, -9.95,-8.90,-7.77,-6.80, -5.99,
-5.17, -4.21, -3.06, -2.29, -1.04)
ydata <- c(-4.425, -4.134, -5.145, -5.411, -6.711, -7.725, -8.087, -9.059, -10.657,
-11.734, NA, -12.803, -12.906, -12.460, -12.128, -11.667, -10.947, -10.294,
-9.185, -8.620, -8.025, -7.493, -6.713, -6.503, -6.316, -5.662, -5.734, -4.984,
-4.723, -4.753, -4.503, -4.200)
data <- data.frame(xdata,ydata)
bootnls_aug <- data %>% bootstrap(100) %>%
do(augment(nls(ydata ~ A*cos(2*pi*((xdata-x_0)/z))+M, ., start=list(A=4,M=-7,x_0=-10,z=30),.)))
ggplot(bootnls_aug, aes(xdata, ydata)) +
geom_line(aes(y=.fitted, group=replicate), alpha=.1, color="blue") +
geom_point(size=3) +
theme_bw()
ggplot output
Can anyone offer help? Why are the displayed curves not smooth? Is there a better way to implement?

broom::augment is merely returning fitted values for each of the available data points. Therefore, the resolution of x is limited to the resolution of the data. You can predict values from the model with a much higher resolution:
x_range <- seq(min(xdata), max(xdata), length.out = 1000)
fitted_boot <- data %>%
bootstrap(100) %>%
do({
m <- nls(ydata ~ A*cos(2*pi*((xdata-x_0)/z))+M, ., start=list(A=4,M=-7,x_0=-10,z=30))
f <- predict(m, newdata = list(xdata = x_range))
data.frame(xdata = x_range, .fitted = f)
} )
ggplot(data, aes(xdata, ydata)) +
geom_line(aes(y=.fitted, group=replicate), fitted_boot, alpha=.1, color="blue") +
geom_point(size=3) +
theme_bw()
Some more work is needed to add the mean and 95% confidence interval:
quants <- fitted_boot %>%
group_by(xdata) %>%
summarise(mean = mean(.fitted),
lower = quantile(.fitted, 0.025),
upper = quantile(.fitted, 0.975)) %>%
tidyr::gather(stat, value, -xdata)
ggplot(mapping = aes(xdata)) +
geom_line(aes(y = .fitted, group = replicate), fitted_boot, alpha=.05) +
geom_line(aes(y = value, lty = stat), col = 'red', quants, size = 1) +
geom_point(aes(y = ydata), data, size=3) +
scale_linetype_manual(values = c(lower = 2, mean = 1, upper = 2)) +
theme_bw()

Related

Plotting different models for different x value ranges in ggplot()

I am attempting to display a linear model for low x values and a non-linear model for higher x values. To do this, I will use DNase as an example:
library(ggplot2)
#Assinging DNase as a new dataframe:
data_1 <- DNase
#Creating a column that can distinguish low and high range values:
data_1$range <- ifelse(data_1$conc <5, "low", "high")
#Attempting to plot separate lines for low and high range values, and also facet_wrap by run:
ggplot(data_1, aes(x = conc, y = density, colour = range)) +
geom_point(size = 0.5) + stat_smooth(method = "nls",
method.args = list(formula = y ~ a*exp(b*x),
start = list(a = 0.8, b = 0.1)),
data = data_1,
se = FALSE) +
stat_smooth(method = 'lm', formula = 'y~0+x') +
facet_wrap(~Run)
However, as you can see, it seems to plot both the linear model and the non-linear model for both, and I can't quite figure out where to put information that would tell it to only plot one for each. Also, if possible, can I extend these models out to the full range of values on the x axis?
You can provide specific data to each geom. In this case use subset data_1 using range to only provide the relevant data to each stat_smooth() call (and the whole frame to geom_point()
ggplot(NULL, aes(x = conc, y = density, colour = range)) +
geom_point(data = data_1, size = 0.5) +
stat_smooth(data = subset(data_1, range == "high"),
method = "nls",
method.args = list(formula = y ~ a*exp(b*x),
start = list(a = 0.8, b = 0.1)),
se = FALSE) +
stat_smooth(data = subset(data_1, range == "low"), method = 'lm', formula = 'y~0+x') +
facet_wrap(~Run)
If you want to fit both models on all the data, then just calculate those manually in data_1 and plot manually.

How to remove variables in plot_summs?

I use plot_summs to plot my regression coefficients. below is a reproducible sample. I want to do two things. First, I want to remove Frost and Murder from the graph. Second, I want to change the color of Illiteracy and Population to Green and Life Exp and HS Grad to red. I’d appreciate any help I can get here.
fit1 <- lm(Income ~ Frost + Illiteracy + Murder +
Population + Area + Life Exp + HS Grad,
data = states, weights = runif(50, 0.1, 3))
plot_summs(fit1, scale = TRUE)
Without using the jtools package:
If I'm not wrong, plot_summ with scale - TRUE, scales the independent variables and plots the summary of the variable estimates, with mean as point and 2*SE as segments.
states2 <- states
states2[,-1] <- scale(states2[,-1]) # Considering first column is Income
fit2 <- lm(Income ~ ., data = states2, weights = runif(50, 0.1, 3))
df <- as.data.frame(summary(fit2)[["coefficients"]][-1,1:2])
df$variable <- rownames(df)
df <- df[!df$variable %in% c("Frost", "Murder"), ]
library(ggplot2)
ggplot(df) +
geom_point(aes(x = variable, y = Estimate,
color = variable), size = 6) +
geom_segment(aes(x = variable, xend = variable,
y = Estimate - (2 * `Std. Error`),
yend = Estimate + (2 * `Std. Error`),
color = variable), lwd = 2) +
scale_color_manual(values = c("Illiteracy" = "green","Population" = "green",
"Area" = "blue",
"`Life Exp`" = "red", "`HS Grad`" = "red")) +
coord_flip() +
theme_classic()

How can I add confidence intervals to a scatterplot for a regression on two variables?

I need to create an insightful graphic with a regression line, data points, and confidence intervals. I am not looking for smoothed lines. I have tried multiple codes, but I just can't get it right.
I am looking for something like this:
Some codes I have tried:
p <- scatterplot(df.regsoft$w ~ df.regsoft$b,
data = df.regsoft,
boxplots = FALSE,
regLine = list(method=lm, col="red"),
pch = 16,
cex = 0.7,
xlab = "Fitted Values",
ylab = "Residuals",
legend = TRUE,
smooth = FALSE)
abline(coef = confint.lm(result.rs))
But this doesn't create what I want to create, however it is closest to what I intended. Notice that I took out "smooth" since this is not really what I am looking for.
How can I make this plot interactive?
If you don't mind switch to ggplot and the tidyverse, then this is simply a geom_smooth(method = "lm"):
library(tidyverse)
d <- tibble( #random stuff
x = rnorm(100, 0, 1),
y = 0.25 * x + rnorm(100, 0, 0.25)
)
m <- lm(y ~ x, data = d) #linear model
d %>%
ggplot() +
aes(x, y) + #what to plot
geom_point() +
geom_smooth(method = "lm") +
theme_bw()
without method = "lm" it draws a smoothed line.
As for the Conf. interval (Obs 95%) lines, it seems to me that's simply a quantile regression. In that case, you can use the quantreg package.
If you want to make it interactive, you can use the plotly package:
library(plotly)
p <- d %>%
ggplot() +
aes(x, y) +
geom_point() +
geom_smooth(method = "lm") +
theme_bw()
ggplotly(p)
================================================
P.S.
I am not completely sure this is what the figure you posted is showing (I guess so), but to add the quantile lines, I would just perform two quantile regressions (upper and lower) and then calculate the values of the quantile lines for your data:
library(tidyverse)
library(quantreg)
d <- tibble( #random stuff
x = rnorm(100, 0, 1),
y = 0.25 * x + rnorm(100, 0, 0.25)
)
m <- lm(y ~ x, data = d) #linear model
# 95% quantile, two tailed
rq_low <- rq(y ~ x, data = d, tau = 0.025) #lower quantile
rq_high <- rq(y ~ x, data = d, tau = 0.975) #upper quantile
d %>%
mutate(low = rq_low$coefficients[1] + x * rq_low$coefficients[2],
high = rq_high$coefficients[1] + x * rq_high$coefficients[2]) %>%
ggplot() +
geom_point(aes(x, y)) +
geom_smooth(aes(x, y), method = "lm") +
geom_line(aes(x, low), linetype = "dashed") +
geom_line(aes(x, high), linetype = "dashed") +
theme_bw()

Fit and plot a Weibull model to a survival data

I want to achieve the exact same thing asked in this question:
How to plot the survival curve generated by survreg (package survival of R)?
Except for the fact that I don't want the data to be stratified by a variable (in the question above it was stratified by sex).
I just want the progression free survival for the whole group of treated patients.
So when I copy the code from the other question, here is where I get stuck:
library(survminer)
library(tidyr)
s <- with(lung,Surv(time,status))
fKM <- survfit(s ~ sex,data=lung)
sWei <- survreg(s ~ as.factor(sex),dist='weibull',data=lung) # in my case here I would replace as.factor(sex) by 1
pred.sex1 = predict(sWei, newdata=list(sex=1),type="quantile",p=seq(.01,.99,by=.01)) #Since I don't want to stratify, what do I do with these 2 lines of code?
pred.sex2 = predict(sWei, newdata=list(sex=2),type="quantile",p=seq(.01,.99,by=.01))
df = data.frame(y=seq(.99,.01,by=-.01), sex1=pred.sex1, sex2=pred.sex2)
df_long = gather(df, key= "sex", value="time", -y)
p = ggsurvplot(fKM, data = lung, risk.table = T)
p$plot = p$plot + geom_line(data=df_long, aes(x=time, y=y, group=sex))
I tried replacing as.factor(sex) by 1 and then the rest of the code just does not make sense, can someone help me with this?
Many thanks in advance!
If you just want to plot the overall empirical survival curve, you might do something like this:
library(survival)
library(survminer)
library(tidyr)
s <- with(lung, Surv(time, status))
fKM <- survfit(s ~ 1, data = survival::lung)
ggsurvplot(fKM, ggtheme = theme_bw())
However, if you want to fit a Weibull model with no predictors, then your formula is fine.
sWei <- survreg(s ~ 1, dist = 'weibull', data = lung)
probs <- seq(0.01, 1, by = 0.01)
time <- predict(sWei, type = "quantile", se = TRUE, p = probs)
The only problem is that time is now a named list of two matrices: fit and se.fit. Both have the same number of rows as lung, but all rows are identical, so we just take one from each and calculate the confidence interval in a data frame which we can then use to create a ggplot:
ggplot(data = data.frame(p = 1 - probs,
time = time$fit[1,],
upper = time$fit[1,] + 1.96 * time$se.fit[1,],
lower = time$fit[1,] - 1.96 * time$se.fit[1,])) +
geom_step(aes(p, time, colour = "All"), size = 1) +
geom_ribbon(aes(p, ymin = lower, ymax = upper, fill = "All"), alpha = 0.2) +
coord_flip(ylim = c(0, 1000)) +
scale_fill_discrete(name = "Strata") +
scale_color_discrete(name = "Strata") +
theme_bw() +
theme(legend.position = "top")
Which we can see looks like a pretty good fit.
If you want both in the same plot you can do something like:
df <- data.frame(p = 1 - probs,
time = time$fit[1,],
upper = time$fit[1,] + 1.96 * time$se.fit[1,],
lower = time$fit[1,] - 1.96 * time$se.fit[1,])
ggsurvplot(fKM, ggtheme = theme_bw())$plot +
geom_line(data = df, aes(time, p), linetype = 2, size = 1) +
geom_line(data = df, aes(upper, p), linetype = 2, size = 1) +
geom_line(data = df, aes(lower, p), linetype = 2, size = 1)
Created on 2020-08-18 by the reprex package (v0.3.0)

ggplot2 geom_ribbon from mgcv::gamm

I'm trying to add a ribbon based on predictions from a gamm model, this seems a little harder than intended, as gamm is somewhat different from gam.
I first tried directly with geom_stat, but that will not work (and will not use my entire model, which also includes several other covariates)
library(tidyverse); library(mgcv)
dt = cbind(V1=scale(sample(1000)),
Age=rnorm(n = 1000, mean = 40, sd = 10),
ID=rep(seq(1:500),each=2) %>% as.data.frame()
# Works fine ----
dt %>% ggplot(aes(x=Age, y=V1)) +
stat_smooth(method="gam", formula= y~s(x,bs="cr"))
# Fails horribly :P
dt %>% ggplot(aes(x=Age, y=V1)) +
stat_smooth(method="gamm", formula= y~s(x,bs="cr"))
Maximum number of PQL iterations: 20
iteration 1
Warning message:
Computation failed in `stat_smooth()`:
no applicable method for 'predict' applied to an object of class "c('gamm', 'list')"
I've tried using the predict function on the model$gamm, but I'm not sure how to use this, and how to make the CI ribbon
dt.model = gamm(V1 ~ s(Age, bs="cr") + s(ID, bs = 're'), data=dt, family="gaussian", discrete=T)
dt$pred = predict(dt.model$gam)
dt %>% ggplot(aes(x = Age, y = V1)) +
geom_line(aes(group=ID), alpha=.3) +
geom_point(alpha=.2) +
geom_smooth(aes(y=pred))
I recognise this is shitty example data because this is a stupid shape.
But I'd like to be able to add a ribbon with the CI along the line as predicted by the model.fit. And I'd prefer to do this in ggplot, particularly as I want a spagetti plot in the background.
Use se.fit=TRUE inside predict:
library(tidyverse)
library(mgcv)
dt <- cbind(V1=scale(sample(1000)),
Age=rnorm(n = 1000, mean = 40, sd = 10),
ID=rep(seq(1:500),each=2)) %>% as.data.frame()
dt.model <- gamm(V1 ~ s(Age, bs="cr") + s(ID, bs = "re"),
data=dt, family="gaussian", discrete=T)
pred <- predict(dt.model$gam, se.fit=T)
dt %>% ggplot(aes(x = Age, y = V1)) +
geom_line(aes(group=ID), alpha=.3) +
geom_point(alpha=.2) +
geom_ribbon(aes(ymin=pred$fit-1.96*pred$se.fit,
ymax=pred$fit+1.96*pred$se.fit), alpha=0.2, fill="red")+
geom_line(aes(y=pred$fit), col="blue", lwd=1)

Resources