I have longitudinal repeated measures on individuals over 4 timepoints. Following a mixed models analysis with time as fixed effect and random slopes I have used lsmeans to estimate the mean values at each time point as well as 95% confidence intervals. I would now like to plot a line graph with time points (x) and mean values of my outcome variable (y) with the CIs. Can I use e.g. ggplot to plot the results that I got from lsmeans? Or is there another smart way to plot this?
The results that I get from lsmeans, and that I would like to plot (lsmean, lower.CL, upperCL over time), are:
$lsmeans
time lsmean SE df lower.CL upper.CL
0 21.967213 0.5374422 60 20.892169 23.04226
1 16.069586 0.8392904 60 14.390755 17.74842
2 13.486802 0.8335159 60 11.819522 15.15408
3 9.495137 0.9854642 60 7.523915 11.46636
Confidence level used: 0.95
Is this what you meant?
# To convert from lsmeans output (d <- lsmeans(paramaters))
d <- summary(d)$lsmeans[c("lsmean", "lower.CL", "upper.CL")]
library(ggplot2)
ggplot(d, aes(time)) +
geom_line(aes(y = lsmean)) +
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL),
width = 0.2) +
geom_point(aes(y = lsmean), size = 3,
shape = 21, fill = "white") +
labs(x = "Time", y = "ls mean",
title = "ls mean result over time") +
theme_bw()
To summarize, the whole code that will give you the estimates and plot of the mixed model is:
## random slope model
summary(model <- lme(outcome ~ time, random = ~1+time|ID, data = data,
na.action = na.exclude, method = "ML"))
## pairwise comparisons of timepoints
install.packages("lsmeans")
library(lsmeans)
lsmeans(model, pairwise~time, adjust="tukey")
### Draw the picture
d <- summary(lsmeans(model, ~time))
library(ggplot2)
ggplot(d, aes(time)) +
geom_line(aes(y = lsmean, group = 1)) +
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), width = 0.2) +
geom_point(aes(y = lsmean), size = 3, shape = 21, fill = "white") +
labs(x = "Time", y = "ls mean", title = "ls mean result over time") +
theme_bw()
Related
I use plot_summs to plot my regression coefficients. below is a reproducible sample. I want to do two things. First, I want to remove Frost and Murder from the graph. Second, I want to change the color of Illiteracy and Population to Green and Life Exp and HS Grad to red. I’d appreciate any help I can get here.
fit1 <- lm(Income ~ Frost + Illiteracy + Murder +
Population + Area + Life Exp + HS Grad,
data = states, weights = runif(50, 0.1, 3))
plot_summs(fit1, scale = TRUE)
Without using the jtools package:
If I'm not wrong, plot_summ with scale - TRUE, scales the independent variables and plots the summary of the variable estimates, with mean as point and 2*SE as segments.
states2 <- states
states2[,-1] <- scale(states2[,-1]) # Considering first column is Income
fit2 <- lm(Income ~ ., data = states2, weights = runif(50, 0.1, 3))
df <- as.data.frame(summary(fit2)[["coefficients"]][-1,1:2])
df$variable <- rownames(df)
df <- df[!df$variable %in% c("Frost", "Murder"), ]
library(ggplot2)
ggplot(df) +
geom_point(aes(x = variable, y = Estimate,
color = variable), size = 6) +
geom_segment(aes(x = variable, xend = variable,
y = Estimate - (2 * `Std. Error`),
yend = Estimate + (2 * `Std. Error`),
color = variable), lwd = 2) +
scale_color_manual(values = c("Illiteracy" = "green","Population" = "green",
"Area" = "blue",
"`Life Exp`" = "red", "`HS Grad`" = "red")) +
coord_flip() +
theme_classic()
I want to achieve the exact same thing asked in this question:
How to plot the survival curve generated by survreg (package survival of R)?
Except for the fact that I don't want the data to be stratified by a variable (in the question above it was stratified by sex).
I just want the progression free survival for the whole group of treated patients.
So when I copy the code from the other question, here is where I get stuck:
library(survminer)
library(tidyr)
s <- with(lung,Surv(time,status))
fKM <- survfit(s ~ sex,data=lung)
sWei <- survreg(s ~ as.factor(sex),dist='weibull',data=lung) # in my case here I would replace as.factor(sex) by 1
pred.sex1 = predict(sWei, newdata=list(sex=1),type="quantile",p=seq(.01,.99,by=.01)) #Since I don't want to stratify, what do I do with these 2 lines of code?
pred.sex2 = predict(sWei, newdata=list(sex=2),type="quantile",p=seq(.01,.99,by=.01))
df = data.frame(y=seq(.99,.01,by=-.01), sex1=pred.sex1, sex2=pred.sex2)
df_long = gather(df, key= "sex", value="time", -y)
p = ggsurvplot(fKM, data = lung, risk.table = T)
p$plot = p$plot + geom_line(data=df_long, aes(x=time, y=y, group=sex))
I tried replacing as.factor(sex) by 1 and then the rest of the code just does not make sense, can someone help me with this?
Many thanks in advance!
If you just want to plot the overall empirical survival curve, you might do something like this:
library(survival)
library(survminer)
library(tidyr)
s <- with(lung, Surv(time, status))
fKM <- survfit(s ~ 1, data = survival::lung)
ggsurvplot(fKM, ggtheme = theme_bw())
However, if you want to fit a Weibull model with no predictors, then your formula is fine.
sWei <- survreg(s ~ 1, dist = 'weibull', data = lung)
probs <- seq(0.01, 1, by = 0.01)
time <- predict(sWei, type = "quantile", se = TRUE, p = probs)
The only problem is that time is now a named list of two matrices: fit and se.fit. Both have the same number of rows as lung, but all rows are identical, so we just take one from each and calculate the confidence interval in a data frame which we can then use to create a ggplot:
ggplot(data = data.frame(p = 1 - probs,
time = time$fit[1,],
upper = time$fit[1,] + 1.96 * time$se.fit[1,],
lower = time$fit[1,] - 1.96 * time$se.fit[1,])) +
geom_step(aes(p, time, colour = "All"), size = 1) +
geom_ribbon(aes(p, ymin = lower, ymax = upper, fill = "All"), alpha = 0.2) +
coord_flip(ylim = c(0, 1000)) +
scale_fill_discrete(name = "Strata") +
scale_color_discrete(name = "Strata") +
theme_bw() +
theme(legend.position = "top")
Which we can see looks like a pretty good fit.
If you want both in the same plot you can do something like:
df <- data.frame(p = 1 - probs,
time = time$fit[1,],
upper = time$fit[1,] + 1.96 * time$se.fit[1,],
lower = time$fit[1,] - 1.96 * time$se.fit[1,])
ggsurvplot(fKM, ggtheme = theme_bw())$plot +
geom_line(data = df, aes(time, p), linetype = 2, size = 1) +
geom_line(data = df, aes(upper, p), linetype = 2, size = 1) +
geom_line(data = df, aes(lower, p), linetype = 2, size = 1)
Created on 2020-08-18 by the reprex package (v0.3.0)
First things first, I got the 2 mixed distributions (they have mixed part) and I've known the samples come from which distribution.
Then I want to plot histogram according to the samples' density and the mixture distribution.
Let's head to the code (seg 1):
library(mixtools)
# two components
set.seed(1) # for reproducible example
b1 <- rnorm(900000, mean=8, sd=2) # samples
b2 <- rnorm(100000, mean=17, sd=2)
# densities corresponding to samples
d = dnorm(c(b1, b2), mean = 8, sd = 2)*.9 + dnorm(c(b1, b2), mean = 17, sd = 2)*.1
# ground truth
b <- data.frame(ss=c(b1,b2), dd=d, gg=factor(c(rep(1, length(b1)), rep(2, length(b2)))))
# sample from mixed distribution
c <- b[sample(nrow(b), 500000),]
library(ggplot2)
ggplot(data = c, aes(x = ss)) +
geom_histogram(aes(y = stat(density)), binwidth = .5, alpha = .3, position="identity") +
geom_line(data = c, aes(x = ss, y = dd), color = "red", inherit.aes = FALSE)
this result is fine: like this
But I want to fill the color according to the samples' group. So I change the code (seg 2):
ggplot(data=c, aes(x=ss)) +
geom_histogram(aes(y=stat(density), fill=gg, color=gg),
binwidth=.5, alpha=.3, position="identity") +
geom_line(data=c, aes(x=ss, y=dd), color="red", inherit.aes=FALSE)
the result is wrong. R calculate the density of two parts separately. So the two part looks like the same height.
Then I found some methods like this (seg 3):
breaks = seq(min(c$ss), max(c$ss), .5) # form cut points
bins1 = cut(with(c, ss[gg==1]), breaks) # form intervals by cutting
bins2 = cut(with(c, ss[gg==2]), breaks)
cnt1 = sapply(split(with(c, ss[gg==1]), bins1), length) # assign points to its interval
cnt2 = sapply(split(with(c, ss[gg==2]), bins2), length)
h = data.frame(
x = head(breaks, -1)+.25,
dens1 = cnt1/sum(cnt1,cnt2), # height of density bar
dens2 = cnt2/sum(cnt1,cnt2)
# weight = sapply(split(samples.mixgamma$samples, bins), sum)
)
ggplot(h) +
geom_bar(aes(x, dens1), fill="red", alpha = .3, stat="identity") +
geom_bar(aes(x, dens2), fill="blue", alpha = .3, stat="identity") +
geom_line(data=c, aes(x=ss, y=dd), color="red", inherit.aes=FALSE)
or set y=stat(count)/sum(stat(count)) like this (seg 4):
ggplot(data=c, aes(x=ss)) +
geom_histogram(aes(y=stat(count)/sum(stat(count)), fill=gg, color=gg),
binwidth=.5, alpha=.3, position="identity") +
geom_line(data=c, aes(x=ss, y=dd), color="red", inherit.aes=FALSE)
the results are the same and wrong, all the bars are about half as tall as seg 1.
So if I want to fill the 2 groups with different color with mixture like seg 2 and the right proportion like seg 1 and avoid the mistake like seg 3 and seg 4, what can I do?
Many thanks!
The solution is that: probability density should be calculated as y=stat(count)/.5/sum(stat(count)). I only do the normolization but not divide mass by it's volume. So the answer such as this and seg 3 need to be modified
I'm having trouble emulating how stat_smooth calculates it's confidence interval.
Let's generate some data and a simple model:
library(tidyverse)
# sample data
df = tibble(
x = runif(10),
y = x + rnorm(10)*0.2
)
# simple linear model
model = lm(y ~ x, df)
Now use predict() to generate values and confidence intervals
# predict
df$predicted = predict(
object = model,
newdata = df
)
# predict 95% confidence interval
df$CI = predict(
object = model,
newdata = df,
se.fit = TRUE
)$se.fit * qnorm(1 - (1-0.95)/2)
Notice that qnorm is used to expand from standard error to 95% CI
Plot the data (black dots), geom_smooth (black line + gray ribbon), and the predicted ribbon (red and blue lines).
ggplot(df) +
aes(x = x, y = y) +
geom_point(size = 2) +
geom_smooth(method = "lm", level = 0.95, fullrange = TRUE, color = "black") +
geom_line(aes(y = predicted + CI), color = "blue") + # upper
geom_line(aes(y = predicted - CI), color = "red") + # lower
theme_classic()
The red and blue lines should be the same as the ribbon's edges. What am I doing wrong?
As posted in a comment by #Dason, the answer is that geom_smooth uses a t-distribution, not a normal distribution.
In my original question, replace qnorm(1 - (1-0.95)/2) with qt(1 - (1-0.95)/2, nrow(df)) for the lines to match up.
When fitting a cox model that includes spline terms for a continuous covariate, I would like to be able to produce a plot of the hazard ratio across range of that covariate (relative to a fixed reference value) using ggplot2.
I have adapted an example from Terry Therneau's splines vignette here (see page 3). The only issue with this approach is the lack of a "waist" in the confidence interval at the reference value, as in this plot:
The example below produces the following plot, without the narrowing of the CI at the reference value.
library(survival)
library(splines)
library(ggplot2)
# colon cancer death dataset
ccd <- na.omit(subset(colon, etype == 2))
# fit model with ns() term for age
cox <- coxph(Surv(time, status) ~ rx + sex + ns(age, knots = c(20, 50, 70)), data = ccd)
# get data for plot
tp <- termplot(cox, se = TRUE, plot = FALSE)
# hazard ratio plot for natural spline of age, with reference # 50 yrs
ref <- tp$age$y[tp$age$x == 50]
ggplot() +
geom_line(data = tp$age, aes(x = x, y = exp(y - ref))) +
geom_line(data = tp$age, aes(x = x, y = exp(y - 1.96 * se - ref)), linetype = 2) +
geom_line(data = tp$age, aes(x = x, y = exp(y + 1.96 * se - ref)), linetype = 2) +
geom_hline(aes(yintercept = 1), linetype = 3) +
geom_rug(data = ccd, aes(x = age), sides = "b") +
labs(x = "Age at baseline, years",
y = "Hazard Ratio (95% CI) vs. 50 years",
title = "Mortality hazard ratio as a function of age",
subtitle = "Natural spline: knots at 20, 50, and 70 years")
I am aware that there are features in the rms package and the smoothHRpackage that produce these types of plots, but I am looking for a solution that is amenable to ggplot2 graphics and the coxph() function in the survival package. My question therefore boils down to:
Is there a way to adapt the output of termplot() to produce a plot with a "waist" at the reference value?
If termplot() cannot be used, how can I obtain the relevant plotting data by other means?
Edit 1: As the first comment suggested, this can be accomplished using rms and ggplot2 together. For example:
library(rms)
dd <- datadist(ccd)
dd$limits$age[2] <- 50
options(datadist = "dd")
cph <- cph(Surv(time, status) ~ rx + sex + rcs(age, c(20, 50, 70)), data = ccd, x = TRUE, y = TRUE)
pdata <- Predict(cph, age, ref.zero = TRUE, fun = exp)
ggplot(data = pdata) +
geom_hline(aes(yintercept = 1), linetype = 3) +
labs(x = "Age at baseline, years",
y = "Hazard Ratio (95% CI) vs. 50 years",
title = "Mortality hazard ratio as a function of age",
subtitle = "Natural spline: knots at 20, 50, and 70 years")
Which produces a plot very close to what I am after:
However, I would still like to know if there is a way to do this using coxph() and ns(). Not that I have anything against the rms package, I just have a bunch of old code based on survivalfunctionality.