Plotting estimated HR from coxph object with time-dependent coefficient and splines - r

I want to plot the estimated hazard ratio as a function of time in the case of a coxph model with a time-dependent coefficient that is based on a spline term. I created the time-dependent coefficient using function tt, analogous to this example that comes straight from ?coxph:
# Fit a time transform model using current age
cox = coxph(Surv(time, status) ~ ph.ecog + tt(age), data=lung,
tt=function(x,t,...) pspline(x + t/365.25))
Calling survfit(cox) results in an error that survfit does not understand models with a tt term (as described in 2011 by Terry Therneau).
You can extract the linear predictor using cox$linear.predictors, but I would need to somehow extract ages and less trivially, times to go with each. Because tt splits the dataset on event times, I can't just match up the columns of the input dataframe with the coxph output. Additionally, I really would like to plot the estimated function itself, not just the predictions for the observed data points.
There is a related question involving splines here, but it does not involve tt.
Edit (7/7)
I'm still stuck on this. I've been looking in depth at this object:
spline.obj = pspline(lung$age)
str(spline.obj)
# something that looks very useful, but I am not sure what it is
# cbase appears to be the cardinal knots
attr(spline.obj, "printfun")
function (coef, var, var2, df, history, cbase = c(43.3, 47.6,
51.9, 56.2, 60.5, 64.8, 69.1, 73.4, 77.7, 82, 86.3, 90.6))
{
test1 <- coxph.wtest(var, coef)$test
xmat <- cbind(1, cbase)
xsig <- coxph.wtest(var, xmat)$solve
cmat <- coxph.wtest(t(xmat) %*% xsig, t(xsig))$solve[2, ]
linear <- sum(cmat * coef)
lvar1 <- c(cmat %*% var %*% cmat)
lvar2 <- c(cmat %*% var2 %*% cmat)
test2 <- linear^2/lvar1
cmat <- rbind(c(linear, sqrt(lvar1), sqrt(lvar2), test2,
1, 1 - pchisq(test2, 1)), c(NA, NA, NA, test1 - test2,
df - 1, 1 - pchisq(test1 - test2, max(0.5, df - 1))))
dimnames(cmat) <- list(c("linear", "nonlin"), NULL)
nn <- nrow(history$thetas)
if (length(nn))
theta <- history$thetas[nn, 1]
else theta <- history$theta
list(coef = cmat, history = paste("Theta=", format(theta)))
}
So, I have the knots, but I am still not sure how to combine the coxph coefficients with the knots in order to actually plot the function. Any leads much appreciated.

I think what you need can be generated by generating an input matrix using pspline and matrix-multiplying this by the relevant coefficients from the coxph output. To get the HR, you then need to take the exponent.
i.e.
output <- data.frame(Age = seq(min(lung$age) + min(lung$time) / 365.25,
max(lung$age + lung$time / 365.25),
0.01))
output$HR <- exp(pspline(output$Age) %*% cox$coefficients[-1] -
sum(cox$means[-1] * cox$coefficients[-1]))
library("ggplot2")
ggplot(output, aes(x = Age, y = HR)) + geom_line()
Note the age here is the age at the time of interest (i.e. the sum of the baseline age and the elapsed time since study entry). It has to use the range specified to match with the parameters in the original model. It could also be calculated using the x output from using x = TRUE as shown:
cox <- coxph(Surv(time, status) ~ ph.ecog + tt(age), data=lung,
tt=function(x,t,...) pspline(x + t/365.25), x = TRUE)
index <- as.numeric(unlist(lapply(strsplit(rownames(cox$x), "\\."), "[", 1)))
ages <- lung$age[index]
output2 <- data.frame(Age = ages + cox$y[, 1] / 365.25,
HR = exp(cox$x[, -1] %*% cox$coefficients[-1] -
sum(cox$means[-1] * cox$coefficients[-1])))

Related

Can I do Fine-Gray regression on a split survival dataset?

This is my first question here, so if I need to share more information please let me know.
I have done a Cox regression analysis in R in which I am interested in the effect of implant surface on reoperation over 36 months. Here's a reproducible example:
library(survival)
n <- 100
df <- data.frame(id=1:n,
time=sample(1:36, n, replace=TRUE),
event=sample(0:2, n, replace=TRUE),
implantsurface=sample(0:1, n, replace=TRUE),
covariate1=sample(0:1, n, replace=TRUE),
covariate2=sample(0:1, n, replace=TRUE))
df$time <- as.numeric(df$time)
I adjusted for a number of covariates, which showed that the proportional hazards assumption was violated for covariate1. I split my dataset into 0-4 mo and 4-36 mo as follows (simplified code), so that the PH assumption was no longer violated:
fit1 <- survSplit(Surv(time, event == 1) ~
implantsurface + covariate1 + covariate2,
data = df, cut=c(4),
episode= "tgroup")
fit2 <- coxph(Surv(tstart, time, event) ~
implantsurface + strata(tgroup):covariate1 + covariate2,
data = fit1)
Now I would also like to adjust for competing risks with Fine-Gray regression, but I am unable to do this for the split dataset. I have tried the following:
FG <- finegray(Surv(time = time, event = event.competing, type = "mstate") ~
implantsurface + strata(tgroup):covariate1 + covariate2,
data = fit1, etype = "event_of_interest")
FGfit <- coxph(Surv(fgstart, fgstop, fgstatus) ~
implantsurface + strata(tgroup):covariate1 + covariate2,
weights = fgwt, data = FG)
Error in strata(tgroup) : object 'tgroup' not found
Does anyone know how/if Fine-Gray can be applied to a split survival dataset? Many thanks in advance for thinking along!

emtrends and piecewise regression

I want to obtain four slopes for piecewise regression. Two slopes for each release type before 365 days, and after 365 days. I also know I should use the emmeans package.
Here is a dummy dataset.
df <- data.frame (tsr = c(0,0,9,10,19,20,20,21, 30,30,100,101,200,205,350,360, 400,401,500,501,600,605,700,710,800,801,900,902,1000,1001,1100,1105,2000,2250,2500,2501),
release_type = c('S','H','S','H','S','S','H','S','H','S','S','H','S','H','S','S','H','S','H','S','S','H','S','H','S','S','H','S','H','S','S','H','S','H','S', 'H'),
cond = c(250,251,250,251,300,301,351,375,250,249,216,257,264,216,250,251,250,251,300,301,351,375,250,249,216,257,264,216, 250,251,250,251,300,301,351,375),
notch = c('A','B','C','D','A','B','C','D','A','B','C','D','E','G','E','G','A','H','J','K','L','Q','W','E','R','Y','U','I','O','P','Y','U','I','O','P', 'Z'))
#Load libraries
library(emmeans)
library(lme4)
#Set up break point manually
bp = 365
b1 <- function(x, bp) ifelse(x < bp, bp - x, 0)
b2 <- function(x, bp) ifelse(x < bp, 0, x - bp)
#Fit linear mixed effect model using piecewise regression
m1 <- lmer(cond~b1(tsr, bp) + b2(tsr,bp) + b1(tsr, bp):release_type
+ b2(tsr,bp):release_type + release_type + (1|notch), data = df)
#Obtain slopes
emtrends(m1, params = "bp", var = "tsr", pairwise ~ release_type)
I am only getting estimates for one slope of each release type. What am I doing wrong?
Note: I cannot use the summary() function to obtain slopes because it uses the function above to generate those estimates. So it is not the pure slopes.
You have to add at = list(tsr = c(10, 400)) to the emtrends() call to specify representative times before and after the breakpoint. Otherwise, it just uses the average value of tsr since it is a quantitative predictor.

Producing effects plots within the GLMMadaptive package

I am getting an inscrutable error message while trying to run effects plots on objects created using the GLMMadaptive::mixed_model() and effects::predictorEffect() functions.
Here is an example problem created from toy binary longitudinal data, created using code supplied with one of the vignettes included with the GLMMadaptive package.
# Library Relevant Packages
library(GLMMadaptive)
library(effects)
# Now we constuct a data frame with the design:
# everyone has a baseline measurment, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ time, data = DF)
betas <- c(-2.13, -0.25, 0.24, -0.05) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts
D22 <- 0.1 # variance of random slopes
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))
Now when we fit a longitudinal logistic regression using the mixed_model function...
# fit the mixed effects logistic regression for y assuming random intercepts and random slopes for the random-effects part.
fm <- mixed_model(fixed = y ~ sex * time,
random = ~ time | id,
data = DF,
family = binomial())
...and try to create an effects plot using the effects::predictorEffect() function...
plot(predictorEffect("time", fm), type = "link")
...we get the following error
Error in mod.matrix %*% scoef : non-conformable arguments
Has anyone encountered this problem before and if so, found a way to solve it?

Individual terms in prediction of linear regression

I performed a regression analyses in R on some dataset and try to predict the contribution of each individual independent variable on the dependent variable for each row in the dataset.
So something like this:
set.seed(123)
y <- rnorm(10)
m <- data.frame(v1=rnorm(10), v2=rnorm(10), v3=rnorm(10))
regr <- lm(formula=y~v1+v2+v3, data=m)
summary(regr)
terms <- predict.lm(regr,m, type="terms")
In short: run a regression and use the predict function to calculate the terms of v1,v2 and v3 in dataset m. But I am having a hard time understanding what the predict function is calculating. I would expect it multiplies the coefficient of the regression result with the variable data. So something like this for v1:
coefficients(regr)[2]*m$v1
But that gives different results compared to the predict function.
Own calculation:
0.55293884 0.16253411 0.18103537 0.04999729 -0.25108302 0.80717945 0.22488764 -0.88835486 0.31681455 -0.21356803
And predict function calculation:
0.45870070 0.06829597 0.08679724 -0.04424084 -0.34532115 0.71294132 0.13064950 -0.98259299 0.22257641 -0.30780616
The prediciton function is of by 0.1 or so Also if you add all terms in the prediction function together with the constant it doesn’t add up to the total prediction (using type=”response”). What does the prediction function calculate here and how can I tell it to calculate what I did with coefficients(regr)[2]*m$v1?
All the following lines result in the same predictions:
# our computed predictions
coefficients(regr)[1] + coefficients(regr)[2]*m$v1 +
coefficients(regr)[3]*m$v2 + coefficients(regr)[4]*m$v3
# prediction using predict function
predict.lm(regr,m)
# prediction using terms matrix, note that we have to add the constant.
terms_predict = predict.lm(regr,m, type="terms")
terms_predict[,1]+terms_predict[,2]+terms_predict[,3]+attr(terms_predict,'constant')
You can read more about using type="terms" here.
The reason that your own calculation (coefficients(regr)[2]*m$v1) and the predict function calculation (terms_predict[,1]) are different is because the columns in the terms matrix are centered around the mean, so their mean becomes zero:
# this is equal to terms_predict[,1]
coefficients(regr)[2]*m$v1-mean(coefficients(regr)[2]*m$v1)
# indeed, all columns are centered; i.e. have a mean of 0.
round(sapply(as.data.frame(terms_predict),mean),10)
Hope this helps.
The function predict(...,type="terms") centers each variable by its mean. As a result, the output is a little difficult to interpret. Here's an alternative where each variable (constant, x1, and x2) is multiplied to its coefficient.
TLDR: pred_terms <- model.matrix(formula(mod$terms), testData) %*% diag(coef(mod))
library(tidyverse)
### simulate data
set.seed(123)
nobs <- 50
x1 <- cumsum(rnorm(nobs) + 3)
x2 <- cumsum(rnorm(nobs) * 3)
y <- 2 + 2*x1 -0.5*x2 + rnorm(nobs,0,50)
df <- data.frame(t=1:nobs, y=y, x1=x1, x2=x2)
train <- 1:round(0.7*nobs,0)
rm(x1, x2, y)
trainData <- df[train,]
testData <- df[-train,]
### linear model
mod <- lm(y ~ x1 + x2 , data=trainData)
summary(mod)
### predict test set
test_preds <- predict(mod, newdata=testData)
head(test_preds)
### contribution by predictor
test_contribution <- model.matrix(formula(mod$terms), testData) %*% diag(coef(mod))
colnames(test_contribution) <- names(coef(mod))
head(test_contribution)
all(round(apply(test_contribution, 1, sum),5) == round(test_preds,5)) ## should be true
### Visualize each contribution
test_contribution_df <- as.data.frame(test_contribution)
test_contribution_df$pred <- test_preds
test_contribution_df$t <- row.names(test_contribution_df)
test_contribution_df$actual <- df[-train,"y"]
test_contribution_df_long <- pivot_longer(test_contribution_df, -t, names_to="variable")
names(test_contribution_df_long)
ggplot(test_contribution_df_long, aes(x=t, y=value, group=variable, color=variable)) +
geom_line() +
theme_bw()

Plotting interaction effects in Bayesian models (using rstanarm)

I'm trying to show how the effect of one variables changes with the values of another variable in a Bayesian linear model in rstanarm(). I am able to fit the model and take draws from the posterior to look at the estimates for each parameter, but it's not clear how to give some sort of plot of the effects of one variable in the interaction as the other changes and the associated uncertainty (i.e. a marginal effects plot). Below is my attempt:
library(rstanarm)
# Set Seed
set.seed(1)
# Generate fake data
w1 <- rbeta(n = 50, shape1 = 2, shape2 = 1.5)
w2 <- rbeta(n = 50, shape1 = 3, shape2 = 2.5)
dat <- data.frame(y = log(w1 / (1-w1)),
x = log(w2 / (1-w2)),
z = seq(1:50))
# Fit linear regression without an intercept:
m1 <- rstanarm::stan_glm(y ~ 0 + x*z,
data = dat,
family = gaussian(),
algorithm = "sampling",
chains = 4,
seed = 123,
)
# Create data sets with low values and high values of one of the predictors
dat_lowx <- dat
dat_lowx$x <- 0
dat_highx <- dat
dat_highx$x <- 5
out_low <- rstanarm::posterior_predict(object = m1,
newdata = dat_lowx)
out_high <- rstanarm::posterior_predict(object = m1,
newdata = dat_highx)
# Calculate differences in posterior predictions
mfx <- out_high - out_low
# Somehow get the coefficients for the other predictor?
In this (linear, Gaussian, identity link, no intercept) case,
mu = beta_x * x + beta_z * z + beta_xz * x * z
= (beta_x + beta_xz * z) * x
= (beta_z + beta_xz * x) * z
So, to plot the marginal effect of x or z, you just need an appropriate range of each and the posterior distribution of the coefficients, which you can obtain via
post <- as.data.frame(m1)
Then
dmu_dx <- post[ , 1] + post[ , 3] %*% t(sort(dat$z))
dmu_dz <- post[ , 2] + post[ , 3] %*% t(sort(dat$x))
And you can then estimate a single marginal effect for each observation in your data by using something like the below, which calculated the effect of x on mu for each observation in your data and the effect of z on mu for each observation.
colnames(dmu_dx) <- round(sort(dat$x), digits = 1)
colnames(dmu_dz) <- dat$z
bayesplot::mcmc_intervals(dmu_dz)
bayesplot::mcmc_intervals(dmu_dx)
Note that the column names are simply the observations in this case.
You could also use either the ggeffects-package, especially for marginal effects; or the sjPlot-package for marginal effects and other plot types (for marginal effects, sjPlot simply wraps the functions from ggeffects).
To plot marginal effects of interactions, use sjPlot::plot_model() with type = "int". Use mdrt.values to define which values to plot for continuous moderator variables, and use ppd to let prediction be based on either the posterior distribution of the linear predictor or draws from posterior predictive distribution.
library(sjPlot)
plot_model(m1, type = "int", terms = c("x", "z"), mdrt.values = "meansd")
plot_model(m1, type = "int", terms = c("x", "z"), mdrt.values = "meansd", ppd = TRUE)
or to plot marginal effects at other specific values, use type = "pred" and specify the values in the terms-argument:
plot_model(m1, type = "pred", terms = c("x", "z [10, 20, 30, 40]"))
# same as:
library(ggeffects)
dat <- ggpredict(m1, terms = c("x", "z [10, 20, 30, 40]"))
plot(dat)
There are more options, and also different ways of customizing the plot appearance. See related help files and package vignettes.

Resources