AIC() for mgcv model - r

I am creating a series of logistic regression models using mgcv and comparing them with AIC(). I have only four variables (socioeconomic class (Socio), sex, year of death (YOD), and age) for each individual and I am curious how these variables explain the likelihood of someone being buried with burial goods (N=c.12,000).
For one model, I ran the following:
model5 <- mgcv::gam(Commemorated ~ s(Age, k=5) + s(YOD, k=5) + Socio + Sex +
ti(Age,YOD, k=5) + s(Age, by=Socio, k=5) + s(YOD, by=Socio, k=5),
family=binomial(link='logit'), data=mydata, method='ML')
AIC(model5) was -1333.434. This was drastically different than I expected given models I had run previously. As a test, I ran the following:
model6 <- mgcv::gam(Commemorated ~ s(Age, k=6) + s(YOD, k=5) + Socio + Sex +
ti(Age,YOD, k=5) + s(Age, by=Socio, k=5) + s(YOD, by=Socio, k=5),
family=binomial(link='logit'), data=mydata, method='ML')
gam.check() for both models were fine. For the second model, I only shifted the k value of the first term up 1, which in my understanding should not have altered the AIC drastically but when I ran AIC(model6), it was 6048.187, which is as expected given previous models I have run.
Other things I have looked at:
model5$aic: 6047.284
model6$aic: 6047.245
logLik.gam(model5): -3005.652 (df=-3673.87)
logLik.gam(model6): -3005.629 (df=18.46467)
So it would appear that for some reason, the degrees of freedom for model5 is drastically different than model6 for a reason I cannot explain. If anyone has any ideas on how to troubleshoot this problem further, it would be greatly appreciated!
Edit: As commented below, I have also altered model5 from s(YOD, k=5) to ti(YOD,k=5) and this 'fixes' the AIC() result. Running model5 without the term, Sex (which previous models have shown has little effect but I have left in because it is meaningful from a theoretical standpoint), also 'fixes' the AIC() result. So this problem is not specific to the term s(Age), the number of knots, or the smoothed terms in general.

Related

Probing interactions in nlme using the "interactions" package in R

I am running a linear mixed effects models using the "nlme" package looking at stress and lifestyle as predictors of change in cognition over 4 years in a longitudinal dataset. All variables in the model are continuous variables.
I am able to create the model and get the summary statistics using this code:
mod1 <- lme(MS ~ age + sex + edu + GDST1*Time + HLI*Time + GDST1*HLI*Time, random= ~ 1|ID, data=NuAge_long, na.action=na.omit)
summary(mod1)
I am trying to use the "interactions" package to probe the 3-way interaction:
sim_slopes(model = mod1, pred = Time, modx = GDST1, mod2 = HLI, data = NuAge_long)
but am receiving this error:
Error in if (tcol == "df") tcol <- "t val." : argument is of length zero
I am also trying to plot the interaction using the same "interactions" package:
interact_plot(model = mod1, pred = Time, modx = GDST1, mod2 = HLI, data = NuAge_long)
and am receiving this error:
Error in UseMethod("family") : no applicable method for 'family' applied to an object of class "lme"
I can't seem to find what these errors mean and why I'm getting them. Any help would be appreciated!
From ?interactions::sim_slopes:
The function is tested with ‘lm’, ‘glm’,
‘svyglm’, ‘merMod’, ‘rq’, ‘brmsfit’, ‘stanreg’ models. Models
from other classes may work as well but are not officially
supported. The model should include the interaction of
interest.
Note this does not include lme models. On the other hand, merMod models are those generated by lme4::[g]lmer(), and as far as I can tell you should be able to fit this model equally well with lmer():
library(lme4)
mod1 <- lmer(MS ~ age + sex + edu + GDST1*Time + HLI*Time + GDST1*HLI*Time
+ (1|ID), data=NuAge_long)
(things will get harder if you want to specify correlation structures, e.g. correlation = corAR1(), which works for lme() but not lmer() ...)

Different estimates between bam and gam model (mgcv) and interaction term estimates edf of 0

I am new to fitting gamm models and ran into two problems with my analysis.
I ran the same model using the gam and the bam function of the package mgcv. The models give me different estimates, and I don't understand why and how to choose which function to use. Can anyone explain to me why these functions give different estimates?
I am estimating a model including an interaction between age and condition (binomial factor with 2 conditions). For some reason one of the interaction terms (age:conditioncomputer or age:conditioncozmo) looks weird. It always gives a EDF and chi square of 0 and a p-value of 0.5, as if it was fixed to that. I tried using sum-to-zero and dummy contrasts, but that did not change the output. What is weird to me that there is a significant age effect, but this effect is not significant in neither condition. So I have the strong feeling that something is going wrong here.
Did anyone ever run into this before and can help me figure out if this is even a problem or normal, and how to solve it if it is a problem?
My model syntax is the following:
`bam(reciprocity ~ s(age,k=8) + condition + s(age, by = condition, k=8) + s(ID, bs="re") + s(class, bs="re") + s(school, bs="re"), data=df, family=binomial(link="logit"))`
This is the model output:
My df looks somewhat like this:
In short, I've used below code:
library(tidyverse)
library(psych)
library(mgcv)
library(ggplot2)
library(cowplot)
library(patchwork)
library(rstatix)
library(car)
library(yarrr)
library(itsadug)
df <- read.csv("/Users/lucaleisten/Desktop/Private/Master/Major_project/Data/test/test.csv", sep=",")
df$ID <- as.factor(as.character(df$ID))
df$condition <- as.factor(df$condition)
df$school <- as.factor(df$school)
df$class <- as.factor(df$class)
df$reciprocity <- as.factor(as.character(df$reciprocity))
summary(df)
model_reciprocity <- bam(reciprocity ~ s(age,k=7) +condition + s(age, by = condition, k=7) + s(ID, bs="re") + s(class, bs="re") + s(school, bs="re"), data=df, family=binomial(link="logit"))
summary(model_reciprocity)

pglm fixed effect Poisson model with offset

I would like to run a fixed effect Poisson model with panel data in R, with a count variable as the outcome, and the log of the population as an offset variable (i.e. modeling a rate). However, using the example dataset below, I get the same results when I run the two models m1 and m2. I'd be grateful if anyone could point out what I'm doing wrong in terms of specifying m1, or offer a solution using a different package? Many thanks
library(AER)
data(Fatalities)
library(pglm)
m1 <- pglm(fatal ~ beertax + as.factor(year) + offset(log(pop)), index = c("state"), model = "within", effect="individual", data = Fatalities, family = poisson)
summary(m1)
m2 <- pglm(fatal ~ beertax + as.factor(year), index = c("state"), model = "within", effect="individual", data = Fatalities, family = poisson)
summary(m2)
One direct solution is by using glm instead, with dummy variables for year and state:
fit_model <- glm(fatal ~ beertax + as.factor(year) + as.factor(state) + offset(log(pop)) , data = Fatalities, family = poisson)
which gives the same result in STATA (at least using this command: xtpoisson fatal beertax year1-year7, fe offset(log_pop)).
This approach is not feasible when the number of states is reasonably large. In CRAN, there is the novel fixest package (https://cran.r-project.org/web/packages/fixest/index.html) that provides a fast solution with robust standard errors.

How to fit frailty survival models in R

Because this is such a long question I've broken it down into 2 parts; the first being just the basic question and the second providing details of what I've attempted so far.
Question - Short
How do you fit an individual frailty survival model in R? In particular I am trying to re-create the coefficient estimates and SE's in the table below that were found from fitting the a semi-parametric frailty model to this dataset link. The model takes the form:
h_i(t) = z_i h_0(t) exp(\beta'X_i)
where z_i is the unknown frailty parameter per each patient, X_i is a vector of explanatory variables, \beta is the corresponding vector of coefficients and h_0(t) is the baseline hazard function using the explanatory variables disease, gender, bmi & age ( I have included code below to clean up the factor reference levels).
Question - Long
I am attempting to follow and re-create the Modelling Survival Data in Medical Research text book example for fitting frailty mdoels. In particular I am focusing on the semi parametric model for which the textbook provides parameter and variance estimates for the normal cox model, lognormal frailty and Gamma frailty which are shown in the above table
I am able to recreate the no frailty model estimates using
library(dplyr)
library(survival)
dat <- read.table(
"./Survival of patients registered for a lung transplant.dat",
header = T
) %>%
as_data_frame %>%
mutate( disease = factor(disease, levels = c(3,1,2,4))) %>%
mutate( gender = factor(gender, levels = c(2,1)))
mod_cox <- coxph( Surv(time, status) ~ age + gender + bmi + disease ,data = dat)
mod_cox
however I am really struggling to find a package that can reliably re-create the results of the second 2 columns. Searching online I found this table which attempts to summarise the available packages:
source
Below I have posted my current findings as well as the code I've used encase it helps someone identify if I have simply specified the functions incorrectly:
frailtyEM - Seems to work the best for gamma however doesn't offer log-normal models
frailtyEM::emfrail(
Surv(time, status) ~ age + gender + bmi + disease + cluster(patient),
data = dat ,
distribution = frailtyEM::emfrail_dist(dist = "gamma")
)
survival - Gives warnings on the gamma and from everything I've read it seems that its frailty functionality is classed as depreciated with the recommendation to use coxme instead.
coxph(
Surv(time, status) ~ age + gender + bmi + disease + frailty.gamma(patient),
data = dat
)
coxph(
Surv(time, status) ~ age + gender + bmi + disease + frailty.gaussian(patient),
data = dat
)
coxme - Seems to work but provides different estimates to those in the table and doesn't support gamma distribution
coxme::coxme(
Surv(time, status) ~ age + gender + bmi + disease + (1|patient),
data = dat
)
frailtySurv - I couldn't get to work properly and seemed to always fit the variance parameter with a flat value of 1 and provide coefficient estimates as if a no frailty model had been fitted. Additionally the documentation doesn't state what strings are support for the frailty argument so I couldn't work out how to get it to fit a log-normal
frailtySurv::fitfrail(
Surv(time, status) ~ age + gender + bmi + disease + cluster(patient),
dat = dat,
frailty = "gamma"
)
frailtyHL - Produce warning messages saying "did not converge" however it still produced coeficiant estimates however they were different to that of the text books
mod_n <- frailtyHL::frailtyHL(
Surv(time, status) ~ age + gender + bmi + disease + (1|patient),
data = dat,
RandDist = "Normal"
)
mod_g <- frailtyHL::frailtyHL(
Surv(time, status) ~ age + gender + bmi + disease + (1|patient),
data = dat,
RandDist = "Gamma"
)
frailtypack - I simply don't understand the implementation (or at least its very different from what is taught in the text book). The function requires the specification of knots and a smoother which seem to greatly impact the resulting estimates.
parfm - Only fits parametric models; having said that everytime I tried to use it to fit a weibull proportional hazards model it just errored.
phmm - Have not yet tried
I fully appreciate given the large number of packages that I've gotten through unsuccessfully that it is highly likely that the problem is myself not properly understanding the implementation and miss using the packages. Any help or examples on how to successfully re-create the above estimates though would be greatly appreciated.
Regarding
I am really struggling to find a package that can reliably re-create the results of the second 2 columns.
See the Survival Analysis CRAN task view under Random Effect Models or do a search on R Site Search on e.g., "survival frailty".

how to do predictions from cox survival model with time varying coefficients

I have built a survival cox-model, which includes a covariate * time interaction (non-proportionality detected).
I am now wondering how could I most easily get survival predictions from my model.
My model was specified:
coxph(formula = Surv(event_time_mod, event_indicator_mod) ~ Sex +
ageC + HHcat_alt + Main_Branch + Acute_seizure + TreatmentType_binary +
ICH + IVH_dummy + IVH_dummy:log(event_time_mod)
And now I was hoping to get a prediction using survfit and providing new.data for the combination of variables I am doing the predictions:
survfit(cox, new.data=new)
Now as I have event_time_mod in the right-hand side in my model I need to specify it in the new data frame passed on to survfit. This event_time would need to be set at individual times of the predictions. Is there an easy way to specify event_time_mod to be the correct time to survfit?
Or are there any other options for achieving predictions from my model?
Of course I could create as many rows in the new data frame as there are distinct times in the predictions and setting to event_time_mod to correct values but it feels really cumbersome and I thought that there must be a better way.
You have done what is refereed to as
An obvious but incorrect approach ...
as stated in Using Time Dependent Covariates and Time Dependent Coefficients in the Cox Model vignette in version 2.41-3 of the R survival package. Instead, you should use the time-transform functionality, i.e., the tt function as stated in the same vignette. The code would be something similar to the example in the vignette
> library(survival)
> vfit3 <- coxph(Surv(time, status) ~ trt + prior + karno + tt(karno),
+ data=veteran,
+ tt = function(x, t, ...) x * log(t+20))
>
> vfit3
Call:
coxph(formula = Surv(time, status) ~ trt + prior + karno + tt(karno),
data = veteran, tt = function(x, t, ...) x * log(t + 20))
coef exp(coef) se(coef) z p
trt 0.01648 1.01661 0.19071 0.09 0.9311
prior -0.00932 0.99073 0.02030 -0.46 0.6462
karno -0.12466 0.88279 0.02879 -4.33 1.5e-05
tt(karno) 0.02131 1.02154 0.00661 3.23 0.0013
Likelihood ratio test=53.8 on 4 df, p=5.7e-11
n= 137, number of events= 128
The survfit though does not work when you have a tt term
> survfit(vfit3, veteran[1, ])
Error in survfit.coxph(vfit3, veteran[1, ]) :
The survfit function can not yet process coxph models with a tt term
However, you can easily get out the terms, linear predictor or mean response with predict. Further, you can create the term over time for the tt term using the answer here.

Resources