Piecewise time-dependent coefficients for Cox regression in R - r

In a Cox regression framework, I'd like to implement piecewise continuous time-dependent coefficients. This is for variable that do not satisify the PH assumption.
In this vignette https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf, there are examples for step functions (p.17), and coefficient with some functional form (p.20).
What I'd like is to somehow have a piecewise relationship. Using the example provided in the vignette:
library(survival)
vfit <- coxph(Surv(time, status) ~ trt + prior + karno, veteran)
zp <- cox.zph(vfit, transform= function(time) log(time +20))
## Step functions
vet2 <- survSplit(Surv(time, status) ~ ., data= veteran, cut=c(90, 180),
episode= "tgroup", id="id")
vfit2 <- coxph(Surv(tstart, time, status) ~ trt + prior +
karno:strata(tgroup), data=vet2)
## Functional form
vfit3 <- coxph(Surv(time, status) ~ trt + prior + karno + tt(karno),
data=veteran,
tt = function(x, t, ...) x * log(t+20))
plot(zp[3])
abline(coef(vfit3)[3:4], col=2)
From the plot (also on p.21 in the vignette), we might argue that we could have a similar but inverted trend from approx. Time=200. I've tried but without success.
First tried directly with a piecewise function with the tt argument but it does not give two sets of coefficents, only one coef for karno and one for tt(karno). I mean we should have something like ax+b for t<200 and cx+d for t>=200
vfit3 <- coxph(Surv(time, status) ~ trt + prior + karno + tt(karno),
data=veteran,
tt = function(x, t, t1, t2, ...) x * log(t1+20) * (t<200) +
x * t2 * (t>=200))
So in a second step, I tried to mix both step functions with some functional form for each. Meaning to split the data in two time periods as for step functions and then fit a function in each. But gives error.
vfit3 <- coxph(Surv(tstart, time, status) ~ trt + prior +
(karno + tt(karno)):strata(tgroup),
data=vet2,
tt = function(x, t, ...) x * log(t+20) * (t<200) -
x * t * (t>=200))
Does someone knows how to implement this?
EDIT:
This is what I've come up with
library(survival)
## Original model
m1 <- coxph(formula = Surv(time, status) ~ trt + prior + karno,
data = veteran)
## Transform to long format as in the link
vet1 <- survSplit(Surv(time, status)~., data = veteran, id = "id",
cut = unique(veteran$time))
## Add a grouping variable (strata) for time before 200 days and after.
vet1$tgroup <- ifelse(vet1$time < 200, 1, 2)
## Add a time-transform function
## Here it is the same function for both strata, but they could be different
## e.g. ifelse(vet1$time < 200, f1(time), f2(time))
## Actually not sure, as we need to be careful with the time scale... Anyway
vet1$time1 <- log(vet1$time + 20)
## Same model as in the link, but then add an interaction with the strata
m2 <- coxph(formula = Surv(tstart, time, status)~
trt + prior + (karno + karno:time1):strata(tgroup), data = vet1)
## Some plots as in the vignette
zp <- cox.zph(m1, transform = function(time) log(time +20))
plot(zp[3])
abline(coef(m2)[c(3,5)], col="tomato")
abline(coef(m2)[c(4,6)], col="tomato")

Related

R: Predicting with lmer, y ~ . formula error

Predicting values in new data from an lmer model throws an error when a period is used to represent predictors. Is there any way around this?
The answer to this similar question offers a way to automatically write out the full formula instead of using the period, but I'm curious if there's a way to get predictions from new data just using the period.
Here's a reproducible example:
mydata <- data.frame(
groups = rep(1:3, each = 100),
x = rnorm(300),
dv = rnorm(300)
)
train_subset <- sample(1:300, 300 * .8)
train <- mydata[train_subset,]
test <- mydata[-train_subset,]
# Returns an error
mod <- lmer(dv ~ . - groups + (1 | groups), data = train)
predict(mod, newdata = test)
predict(mod) # getting predictions for the original data works
# Writing the full formula without the period does not return an error, even though it's the exact same model
mod <- lmer(dv ~ x + (1 | groups), data = train)
predict(mod, newdata = test)
This should be fixed in the development branch of lme4 now. You can install from GitHub (see first line below) or wait a few weeks (early April-ish) for a new version to hit CRAN.
remotes::install_github("lme4/lme4") ## you will need compilers etc.
mydata <- data.frame(
groups = rep(1:3, each = 100),
x = rnorm(300),
dv = rnorm(300)
)
train_subset <- sample(1:300, 300 * .8)
train <- mydata[train_subset,]
test <- mydata[-train_subset,]
# Returns an error
mod <- lmer(dv ~ . - groups + (1 | groups), data = train)
p1 <- predict(mod, newdata = test)
mod2 <- lmer(dv ~ x + (1 | groups), data = train)
p2 <- predict(mod2, newdata = test)
identical(p1, p2) ## TRUE

Implementing multinomial-Poisson transformation with multilevel models

I know variations of this question have been asked before but I haven't yet seen an answer on how to implement the multinomial Poisson transformation with multilevel models.
I decided to make a fake dataset and follow the method outlined here, also consulting the notes the poster mentions as well as the Baker paper on MP transformation.
In order to check if I'm doing the coding correctly, I decided to create a binary outcome variable as a first step; because glmer can handle binary response variables, this will let me check I'm correctly recasting the logit regression as multiple Poissons.
The context of this problem is running multilevel regressions with survey data where the outcome variable is response to a question and the possible predictors are demographic variables. As I mentioned above, I wanted to see if I could properly code the binary outcome variable as a Poisson regression before moving on to multi-level outcome variables.
library(dplyr)
library(lme4)
key <- expand.grid(sex = c('Male', 'Female'),
age = c('18-34', '35-64', '45-64'))
set.seed(256)
probs <- runif(nrow(key))
# Make a fake dataset with 1000 responses
n <- 1000
df <- data.frame(sex = sample(c('Male', 'Female'), n, replace = TRUE),
age = sample(c('18-34', '35-64', '45-64'), n, replace = TRUE),
obs = seq_len(n), stringsAsFactors = FALSE)
age <- model.matrix(~ age, data = df)[, -1]
sex <- model.matrix(~ sex, data = df)[, -1]
beta_age <- matrix(c(0, 1), nrow = 2, ncol = 1)
beta_sex <- matrix(1, nrow = 1, ncol = 1)
# Create class probabilities as a function of age and sex
probs <- plogis(
-0.5 +
age %*% beta_age +
sex %*% beta_sex +
rnorm(n)
)
id <- ifelse(probs > 0.5, 1, 0)
df$y1 <- id
df$y2 <- 1 - df$y1
# First run the regular hierarchical logit, just with a varying intercept for age
glm_out <- glmer(y1 ~ (1|age), family = 'binomial', data = df)
summary(glm_out)
#Next, two Poisson regressions
glm_1 <- glmer(y1 ~ (1|obs) + (1|age), data = df, family = 'poisson')
glm_2 <- glmer(y2 ~ (1|obs) + (1|age), data = df, family = 'poisson')
coef(glm_1)$age - coef(glm_2)$age
coef(glm_out)$age
The outputs for the last two lines are:
> coef(glm_1)$age - coef(glm_2)$age
(Intercept)
18-34 0.14718933
35-64 0.03718271
45-64 1.67755129
> coef(glm_out)$age
(Intercept)
18-34 0.13517758
35-64 0.02190587
45-64 1.70852847
These estimates seem close but they are not exactly the same. I'm thinking I've specified an equation wrong with the intercept.

Adjusting ODE model output using a Rogan-Gladen estimator in R

I have made an ODE model in R using the package deSolve. Currently the output of the model gives me the "observed" prevalence of a disease (i.e. the prevalence not accounting for diagnostic imperfection).
However, I want to adjust the model to output the "true" prevalence, using a simple adjustment formula called the Rogan-Gladen estimator (http://influentialpoints.com/Training/estimating_true_prevalence.htm):
True prevalence =
(Apparent prev. + (Specificity-1)) / (Specificity + (Sensitivity-1))
As you will see in the code below, I have attempted to adjust only one of the differential equations (diggP).
Running the model without adjustment gives an expected output (a proportion between 0 and 1). However, attempting to adjust the model using the RG-estimator gives a spurious output (a proportion less than 0).
Any advice on what might be going wrong here would be very much appreciated.
# Load required packages
library(tidyverse)
library(broom)
library(deSolve)
# Set time (age) for function
time = 1:80
# Defining exponential decay of lambda over age
y1 = 0.003 + (0.15 - 0.003) * exp(-0.05 * time) %>% jitter(10)
df <- data.frame(t = time, y = y1)
fit <- nls(y ~ SSasymp(time, yf, y0, log_alpha), data = df)
fit
# Values of lambda over ages 1-80 years
data <- as.matrix(0.003 + (0.15 - 0.003) * exp(-0.05 * time))
lambda<-as.vector(data[,1])
t<-as.vector(seq(1, 80, by=1))
foi<-cbind(t, lambda)
foi[,1]
# Making lambda varying by time useable in the ODE model
input <- approxfun(x = foi[,1], y = foi[,2], method = "constant", rule = 2)
# Model
ab <- function(time, state, parms) {
with(as.list(c(state, parms)), {
# lambda, changing by time
import<-input(time)
# Derivatives
# RG estimator:
#True prevalence = (apparent prev + (sp-1)) / (sp + (se-1))
diggP<- (((import * iggN) - iggR * iggP) + (sp_igg-1)) / (sp_igg + (se_igg-1))
diggN<- (-import*iggN) + iggR*iggP
dtgerpP<- (0.5*import)*tgerpN -tgerpR*tgerpP
dtgerpN<- (0.5*-import)*tgerpN + tgerpR*tgerpP
# Return results
return(list(c(diggP, diggN, dtgerpP, dtgerpN)))
})
}
# Initial values
yini <- c(iggP=0, iggN=1,
tgerpP=0, tgerpN=1)
# Parameters
pars <- c(iggR = 0, tgerpR = (1/8)/12,
se_igg = 0.95, sp_igg = 0.92)
# Solve model
results<- ode(y=yini, times=time, func=ab, parms = pars)
# Plot results
plot(results, xlab="Time (years)", ylab="Proportion")

PLS regression in R: Testing alternative model specifications

In R, I would like to test the specification of a partial least square (PLS) model m1 against a non-nested alternative m2, applying the Davidson-MacKinnon J test. For a simple linear outcome Y it works quite well using the plsr estimator followed by the jtest command:
# Libraries and data
library(plsr)
library(plsRglm)
library(lmtest)
Z <- Cornell # illustration dataset coming with the plsrglm package
# Simple linear model
m1 <- plsr(Z$Y ~ Z$X1 + Z$X2 + Z$X3 + Z$X4 + Z$X5 ,2) # including X1
m2 <- plsr(Z$Y ~ Z$X6 + Z$X2 + Z$X3 + Z$X4 + Z$X5 ,2) # including X6 as alternative
jtest(m1,m2)
However, if Iuse the generalized linear model (plsRglm) estimator to account for a possible nonlinear distibution of an outcome, e.g.:
# Generalized Model
m1 <- plsRglm(Z$Y ~ Z$X1 + Z$X2 + Z$X3 + Z$X4 + Z$X5 ,2, modele = "pls-glm-family", family=Gamma(link = "log"), pvals.expli=TRUE)
m2 <- plsRglm(Z$Y ~ Z$X6 + Z$X2 + Z$X3 + Z$X4 + Z$X5 ,2, modele = "pls-glm-family", family=Gamma(link = "log"), pvals.expli=TRUE)
I am running into an error when using jtest:
> jtest(m1,m2)
Error in terms.default(formula1) : no terms component nor attribute
>
It seems that plsRglm does not save objects of class "formula", that jtest can handle. Has anybody a suggestion of how to edit my code to get this to work?
Thanks!

Use all variables in a model with {plm} in R

Using different sources, I wrote a little function that creates a table with standard errors, t statistics and standard errors that are clustered according to a group variable "cluster" after a linear regression model. The code is as follows
cl1 <- function(modl,clust) {
# model is the regression model
# clust is the clustervariable
# id is a unique identifier in ids
library(plm)
library(lmtest)
# Get Formula
form <- formula(modl$call)
# Get Data frame
dat <- eval(modl$call$data)
dat$row <- rownames(dat)
dat$id <- ave(dat$row, dat[[deparse(substitute(clust))]], FUN =seq_along)
pdat <- pdata.frame(dat,
index=c("id", deparse(substitute(clust)))
, drop.index= F, row.names= T)
# # Regression
reg <- plm(form, data=pdat, model="pooling")
# # Adjustments
G <- length(unique(dat[, deparse(substitute(clust))]))
N <- length(dat[,deparse(substitute(clust))])
# # Resid degrees of freedom, adjusted
dfa <- (G/(G-1))*(N-1)/reg$df.residual
d.vcov <- dfa* vcovHC(reg, type="HC0", cluster="group", adjust=T)
table <- coeftest(reg, vcov=d.vcov)
# # Output: se, t-stat and p-val
cl1out <- data.frame(table[, 2:4])
names(cl1out) <- c("se", "tstat", "pval")
# # Cluster VCE
return(cl1out)
}
For a regression like reg1 <- lm (y ~ x1 + x2 , data= df), calling the function cl1(reg1, cluster) will work just fine.
However, if I use a model like reg2 <- lm(y ~ . , data=df), I will get the error message:
Error in terms.formula(object) : '.' in formula and no 'data' argument
After some tests, I am guessing that I can't use "." to signal "use all variables in the data frame" for {plm}. Is there a way I can do this with {plm}? Otherwise, any ideas on how I could improve my function in a way that does not use {plm} and that accepts all possible specifications of a linear model?
Indeed you can't use . notation for formula within plm pacakge.
data("Produc", package = "plm")
plm(gsp ~ .,data=Produc)
Error in terms.formula(object) : '.' in formula and no 'data' argument
One idea is to expand the formula when you have a .. Here is a custom function that does the job (surely is done within other packages):
expand_formula <-
function(form="A ~.",varNames=c("A","B","C")){
has_dot <- any(grepl('.',form,fixed=TRUE))
if(has_dot){
ii <- intersect(as.character(as.formula(form)),
varNames)
varNames <- varNames[!grepl(paste0(ii,collapse='|'),varNames)]
exp <- paste0(varNames,collapse='+')
as.formula(gsub('.',exp,form,fixed=TRUE))
}
else as.formula(form)
}
Now test it :
(eform = expand_formula("gsp ~ .",names(Produc)))
# gsp ~ state + year + pcap + hwy + water + util + pc + emp + unemp
plm(eform,data=Produc)
# Model Formula: gsp ~ state + year + pcap + hwy + water + util + pc + emp + unemp
# <environment: 0x0000000014c3f3c0>

Resources