Graphing model results of longitudinal data in R - r

I am looking to create a graph of longitudinal data by age and sex, similar to the graph in this image , from this paper https://www.thelancet.com/journals/lanpub/article/PIIS2468-2667(20)30258-9/fulltext.
To graph model results in the past, I have used both ggplot2 and ggpredict. I prefer ggpredict because it graphs the results accounting for covariates, but I am OK with graphing in ggplot2 if it can't be done in ggpredict.
I am providing a minimal reproducible example below, with id, wave (2 waves, separated by 6 years), age, sex, tst (total sleep time), and bmi for a covariate.
id<-rep(1:50, 2)
wave<-c(rep(1, 50),rep(2, 50))
tst<-c(sample(7:9,50, replace = T),sample(4:7,50, replace = T))
mydf<-data.frame(id,wave,tst)
mydf$age[mydf$wave==1]<-sample(40:90,50, replace = T)
mydf$age[mydf$wave==2]<-mydf$age[mydf$wave==1]+6
mydf$bmi<-sample(20:30,50, replace = T)
mydf$sex<-sample(1:2,50, replace = T)
mydf$age.cat<-cut(mydf$age[mydf$wave==1], breaks = 3,labels = c(1,2,3))
##Overall model##
(model <- lmer( tst ~ wave + age + sex + bmi +(1|id), data = mydf))
I tried to graph it with ggplot2 using the following syntax, however I'm not sure that the graph is exactly what I'm looking for. I would like to graph change in tst between waves 1 and 2, by age group and sex. TST would be on the y axis, age would be on the x axis, with separate lines for age group and sex, with standard errors. The lines will correspond to within-person change in TST between waves 1 and 2.
I think that the graph right now is showing the between subjects effects of age on tst, and not taking into account the fact that the data is nested within-person. Any help would be greatly appreciated.
ggplot(mydf,aes(x=age, y=tst, color=as.factor(sex), group=as.factor(age.cat), linetype=as.factor(age.cat)))+
geom_smooth(data=mydf[mydf$sex==1,], method = lm, formula = y~x)+
geom_smooth(data=mydf[mydf$sex==2,], method = lm, formula = y~x)+
geom_point() +
theme_bw()

Related

(R) Adding Confidence Intervals To Plots

I am using R. I am following this tutorial over here (https://rviews.rstudio.com/2017/09/25/survival-analysis-with-r/ ) and I am trying to adapt the code for a similar problem.
In this tutorial, a statistical model is developed on a dataset and then this statistical model is used to predict 3 news observations. We then plot the results for these 3 observations:
#load libraries
library(survival)
library(dplyr)
library(ranger)
library(data.table)
library(ggplot2)
#use the built in "lung" data set
#remove missing values (dataset is called "a")
a = na.omit(lung)
#create id variable
a$ID <- seq_along(a[,1])
#create test set with only the first 3 rows
new = a[1:3,]
#create a training set by removing first three rows
a = a[-c(1:3),]
#fit survival model (random survival forest)
r_fit <- ranger(Surv(time,status) ~ age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss, data = a, mtry = 4, importance = "permutation", splitrule = "extratrees", verbose = TRUE)
#create new intermediate variables required for the survival curves
death_times <- r_fit$unique.death.times
surv_prob <-data.frame(r_fit$survival)
avg_prob <- sapply(surv_prob, mean)
#use survival model to produce estimated survival curves for the first three observations
pred <- predict(r_fit, new, type = 'response')$survival
pred <- data.table(pred)
colnames(pred) <- as.character(r_fit$unique.death.times)
#plot the results for these 3 patients
plot(r_fit$unique.death.times, pred[1,], type = "l", col = "red")
lines(r_fit$unique.death.times, r_fit$survival[2,], type = "l", col = "green")
lines(r_fit$unique.death.times, r_fit$survival[3,], type = "l", col = "blue")
From here, I would like to try an add confidence interval (confidence regions) to each of these 3 curves, so that they look something like this:
I found a previous stackoverflow post (survfit() Shade 95% confidence interval survival plot ) that shows how to do something similar, but I am not sure how to extend the results from this post to each individual observation.
Does anyone know if there is a direct way to add these confidence intervals?
Thanks
If you create your plot using ggplot, you can use the geom_ribbon function to draw confidence intervals as follows:
ggplot(data=...)+
geom_line(aes(x=..., y=...),color=...)+
geom_ribbon(aes(x=.. ,ymin =.., ymax =..), fill=.. , alpha =.. )+
geom_line(aes(x=..., y=...),color=...)+
geom_ribbon(aes(x=.. ,ymin =.., ymax =..), fill=.. , alpha =.. )
You can put + after geom_line and repeat the same steps for each observation.
You can also check:
Having trouble plotting multiple data sets and their confidence intervals on the same GGplot. Data Frame included and
https://bookdown.org/ripberjt/labbook/appendix-guide-to-data-visualization.html

How to plot survival relative to general population with age on the X-axis (left-truncated data)?

I am trying to compare the survival in my study cohort with the survival in the Dutch general population (matched for age and sex). I created a rate table of the Dutch population.
library(relsurv)
setwd("")
nldpop <- transrate.hmd("mltper_1x1.txt","fltper_1x1.txt")
Then, I wanted to create a plot of the survival of my cohort (observed) and the survival of the population (expected) with age on the X-axis. However, the 'survexp' function does not seem to support a (start,stop,event)-format. Only with the normal (futime, event)-format it works, see below, but then I have follow-up time on the X-axis. Does anyone know how to get the age on the X-axis instead of follow-up time?
# Observed and expected survival with time on X-axis
fit <- survfit(Surv(futime, event)~1)
efit <- survexp(futime ~ 1, rmap = list(year=(date_entry), age=(age_entry), sex=(sex)),
ratetable=nldpop)
plot(fit)
lines(efit)
You didn't provide your example data, so i used survival::mgus data for this. Your problem may be due to incorrectly specifying variable names in the rmap option. See plot here
library(relsurv)
nldpop <- transrate.hmd("mltper_1x1.txt", "fltper_1x1.txt")
mgus2 <- mgus %>% mutate(date_year = dxyr + 1900)
fit <- survfit(Surv(futime, death) ~ 1, data = mgus2)
efit <- survexp(Surv(futime, death) ~ 1, data = mgus2,
ratetable = nldpop, rmap = list(age = age*365.25, year = date_year, sex = sex))
plot(fit)
lines(efit)

How do I interpret my categorical coefficient in my mixed-effects linear model in R?

I would like to know how to interpret my coefficient 'Diet' in this multi-level model. The 'Diet' category is 1-4, and refers to what diet the chick is on. Time is in days, and weight is in grams. So the chicks all increase in weight over time, but at different rates due to different diets. 'Chick' is a chick's unique ID.
Using the code below you should get the MLE estimates/coefficients as intercept :23.018, Time : 8.443 and Diet: 2.979
I can see that as time increases 1 unit, weight increase by 8.443. However how can this be true for Diet, being a categorical variable, when '3' leads to more weight increase than '4'? (I know this from plotting the data, see code below).
Perhaps it is a modelling problem and I'm doing something wrong. Does the diet variable need to be text in nature, so R dummy codes it?
Info about the data is here if you need it: http://vincentarelbundock.github.io/Rdatasets/doc/datasets/ChickWeight.html
Thanks.
library(tidyverse)
library(lme4)
library(lmerTest)
chickdiet <- read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/datasets/ChickWeight.csv')
chickm3 <- lmer(weight ~ Time + Diet + (Time | Chick), data = chickdiet)
summary(chickm3)
#from plotting the data with the code below I can see that the diet that increases the chicks' weight the most, in ascending order are 1, 2, 4, 3
ggplot(chickdiet, aes(x = Time, y = weight, colour = as.factor(Diet))) + geom_point() +
stat_smooth(method = lm, se = F) + theme_minimal()

Plotting predicted survival curves for continuous covariates in ggplot

How can I plot survival curves for representative values of a continuous covariate in a cox proportional hazards model? Specifically, I would like to do this in ggplot using a "survfit.cox" "survfit" object.
This may seem like a question that has already been answered, but I have searched through everything in SO with the terms 'survfit' and 'newdata' (plus many other search terms). This is the thread that comes closest to answering my question so far: Plot Kaplan-Meier for Cox regression
In keeping with the reproducible example offered in one of the answers to that post:
url <- "http://socserv.mcmaster.ca/jfox/Books/Companion/data/Rossi.txt"
df <- read.table(url, header = TRUE)
library(dplyr)
library(ggplot2)
library(survival)
library(magrittr)
library(broom)
# Identifying the 25th and 75th percentiles for prio (continuous covariate)
summary(df$prio)
# Cox proportional hazards model with other covariates
# 'prio' is our explanatory variable of interest
m1 <- coxph(Surv(week, arrest) ~
fin + age + race + prio,
data = df)
# Creating new df to get survival predictions
# Want separate curves for the the different 'fin' and 'race'
# groups as well as the 25th and 75th percentile of prio
newdf <- df %$%
expand.grid(fin = levels(fin),
age = 30,
race = levels(race),
prio = c(1,4))
# Obtain the fitted survival curve, then tidy
# into a dataframe that can be used in ggplot
survcurv <- survfit(m1, newdata = newdf) %>%
tidy()
The problem is, that once I have this dataframe called survcurv, I cannot tell which of the 'estimate' variables belongs to which pattern because none of the original variables are retained. For example, which of the 'estimate' variables represents the fitted curve for 30 year old, race = 'other', prio = '4', fin = 'no'?
In all other examples i've seen, usually one puts the survfit object into a generic plot() function and does not add a legend. I want to use ggplot and add a legend for each of the predicted curves.
In my own dataset, the model is a lot more complex and there are a lot more curves than I show here, so as you can imagine seeing 40 different 'estimate.1'..'estimate.40' variables makes it hard to understand what is what.
Thanks for providing a well phrased question and a good example. I'm a little surpirsed that tidy does a relatively poor job here of creating sensible output. Please see below for my attempt at creating some plottable data:
library(tidyr)
newdf$group <- as.character(1:nrow(newdf))
survcurv <- survfit(m1, newdata = newdf) %>%
tidy() %>%
gather('key', 'value', -time, -n.risk, -n.event, -n.censor) %>%
mutate(group = substr(key, nchar(key), nchar(key)),
key = substr(key, 1, nchar(key) - 2)) %>%
left_join(newdf, 'group') %>%
spread(key, value)
And the create a plot (perhaps you'd like to use geom_step instead, but there is not step shaped ribbon, unfortunately):
ggplot(survcurv, aes(x = time, y = estimate, ymin = conf.low, ymax = conf.high,
col = race, fill = race)) +
geom_line(size = 1) +
geom_ribbon(alpha = 0.2, col = NA) +
facet_grid(prio ~ fin)
Try defining your survcurv like this:
survcurv <-
lapply(1:nrow(newdf),
function(x, m1, newdata){
cbind(newdata[x, ], survfit(m1, newdata[x, ]) %>% tidy)
},
m1,
newdf) %>%
bind_rows()
This will include all of the predictor values as columns with the predicted estimates.

Sorting the x axes in R

I built a logistic regression model (called 'mylogit') using the glm function in R as follows:
mylogit <- glm(answer ~ as.factor(gender) + age, data = mydata, family = "binomial")
where age is numeric and gender is categorical (male and female).
I then proceeded to make predictions with the model built.
pred <- predict(mylogit, type = "response")
I can easily make a time series plot of the predictions by doing:
plot.ts(ts(pred))
to give a plot that looks like this:
Plot of Time against Predictions
which gives a plot of the predictions.
My question is this:
Is it possible to put the x axis in segments according to gender (male or female) which was specified in the glm? In other words, can I have predictions on the y axis and have gender (divided into male and female) on the x axis?
A sample of the data I want to plot from is similar to this:
I did:
bind = cbind(mydata, pred)
'bind' looks like this:
pred age gender
0.9461198 32 male
0.9463577 45 female
0.9461198 45 female
0.9461198 37 female
0.9477645 40 male
0.8304513 32 female
Check out #4 on this blog post, "4. How To Create Two Different X- or Y-axes".
My suggestion to you is that you look at some of the dedicated R plotting tools, like ggplot2.
I don't think you need to use ts and plot.ts because the data you have is not a time series, right? Just sort pred before plotting.
# Get data
str <- "pred,age,gender
0.9461198,32,male
0.9463577,45,female
0.9461198,45,female
0.9461198,37,female
0.9477645,40,male
0.8304513,32,female"
bind <- read.csv(textConnection(str))
# Plot
bind <- bind[order(bind$gender),]
plot(bind$pred, col = bind$gender)
library(ggplot2)
ggplot(bind, aes(x = gender, y = pred)) +
geom_point(position = position_jitter(width = .3))
Or without creating bind you could do plot(pred[order(mydata$gender)]).

Resources