Cox PH model, how to control for multipel events - r

IΒ΄m working on a time-to-event modell in R using the coxph function in the survival package. IΒ΄m analysing animal movements until they pass the study area which consists of 2 zones. Each individual can have multiple events before the final passing event as they move between zones, each zone is analysed seperatley. I was thinking that stratifying the model on number of zone exposures or adding frailty(individual id) as a random effect would controll for this and give somewhat similar results.
However the results are widely different when looking at model fit (AIC). Covariates with a large effect in the stratified model barley have very little impact in the frailty model. The coeficients and the significance levels are more similar though.
It feels like there is something wrong or should this large difference be expected from the two different approaches?
Is any of the approaches prefered when controling for multipel events? Or is there a better way than these two approaches?
(range number of events events 1-11, mean number of events 2.24)
Example code:
frailty_model <- coxph(formula=Surv(T1,T2,event==2) ~ day_night + discharge + frailty(id), data = cox_passage, na.action = na.fail)
stratified_model <- coxph(formula=Surv(T1,T2,event==2) ~ day_night + discharge + strata(n exposures witin zone), data = cox_passage, na.action = na.fail)

Related

GLS / GLM nested design with autocorrelation over time

Still fairly new to GLM and a bit confused about how to establish my model.
About my project:
I sampled the microbiome (and measured a diversity index value = Shannon) from the root system of a sample of 9 trees (=tree1_cat).
In each tree I sampled fine and thick roots (=rootpart) and each tree was sampled four times (=days) over the course of one season. Thus I have a nested design but have to keep time in mind for autocorrelation. Also not all values are present, thus I have a few missing values). So far I have tried and tested the following:
Model <- gls(Shannon ~ tree1_cat/rootpart + tree1_cat + days,
na.action = na.omit, data = psL.meta,
correlation = corAR1(form =~ 1|days),
weights = varIdent(form= ~ 1|days))
Furthermore I've tried to get more insight and used anova(Model) to get the p-values of those factors. Am I allowed to use those p-values? Also I've used emmeans(Model, specs = pairwise ~ rootpart)for pairwise comparisons but since rootpart was entered as nested factor it only gives me the paired interactions.
It all works, but I am not sure, whether this is the right model! Any help would be highly appreciated!
It would be helpful to know your scientific question, but let's suppose you're interested in differences in Shannon diversity between fine and thick roots and in time trends. A model you could use would be:
library(lmerTest)
lmer(Shannon ~ rootpart*days + (rootpart*days|tree1_cat), data = ...)
The fixed-effect component rootpart*days can be expanded into 1 + rootpart + days + rootpart:days (where 1 signifies the intercept)
intercept: SD in fine roots on day 0 (hopefully that's the beginning of the season)
rootpart: difference between fine and thick roots on day 0
days: change per day in SD in fine roots (slope)
rootpart:days difference in slope between thick roots and fine roots
The random-effect component (rootpart*days|tree1_cat) measures how all four of these effects vary across trees, and their correlations (e.g. do trees with a larger-than-average difference between fine and thick roots on day 0 also have a larger-than-average change over time in fine root SD?)
This 'maximal' random effects model is almost certainly too complex for your data; a rough rule of thumb says you should have 10-20 data points per parameter estimated, the fixed-effect model takes 4 parameters. A full model with 4 random effects requires the estimate of a 4Γ—4 covariance matrix, which has (4*5)/2 = 10 parameters all by itself. I might just try (1+days|tree1_cat) (random slopes) or (rootpart|tree_cat) (among-tree difference in fine vs. thick differences), with a bias towards allowing for the variation in the effect that is your primary interest (e.g. if your primary question is about fine vs. thick then go with (rootpart|tree_cat).
I probably wouldn't worry about autocorrelation at all, nor heteroscedasticity by day (your varIdent(~1|days) term) unless those patterns are very strongly evident in the data.
If you want to allow for autocorrelation you'll need to fit the model with nlme::lme or glmmTMB (lmer still doesn't have machinery for autocorrelation models); something like
library(nlme)
lme(Shannon ~ rootpart*days,
random = ~days|tree1_cat,
data = ...,
correlation = corCAR1(form = ~days|tree1_cat)
)
You need to use corCAR1 (continuous-time autoregressive order-1) rather than the more common corAR1 for unevenly sampled data. Be aware that lme is more finicky/worse at dealing with singular models, so you may discover you need to simplify your model before you can actually get this model to run.

How to estimate a regression with both variables i and t simultaneously

I want to estimate a regression for a variable, LWAGE (log wage), against EXP (years of work experience). The data that I have has participants tracked across 7 years, so each year their number of years of work experience increases by 1.
When I do the regression for
πΏπ‘Šπ΄πΊπΈπ‘– = 𝛽0 + 𝛽1𝐸𝐷𝑖 + 𝑒𝑖
I used
reg1 <- lm(LWAGE~EXP, data=df)
Now I'm trying to do the following regression:
πΏπ‘Šπ΄πΊπΈπ‘–π‘‘ = 𝛽0 + 𝛽1𝐸𝑋𝑃𝑖𝑑 + 𝑒i.
But I'm not sure how to include my the time based component into my regression. I searched around but couldn't find anything relevant.
Are you attempting to include time-fixed effects in your model or an interaction between your variable EXP and time (calling this TIME for this demonstration)?
For time fixed effects using lm() you can just include time as a variable in your model. Time should be a factor.
reg2 <- lm(LWAGE~EXP + TIME, data = df)
As an interaction between EXP and TIME it would be
reg3 <- lm(LWAGE~EXP*TIME, data = df)
Based on your description it sounds like you might be looking for the interaction. I.e. How does the effect of experience on log of wages vary by time?
You can also take a look at the plm package for working with panel data.
https://cran.r-project.org/web/packages/plm/vignettes/plmPackage.html

How can I plot my lmer() mixed model growth curves in r?

I have constructed a mixed effect model using lmer() with the aim of comparing the growth in reading scores for four different groups of children as they age.
I would like to plot a graph of the 4 different slopes with confidence intervals in R in order to visualize this relationship but I keep getting stuck.
I have tried to use the plot function and some versions of the ggplot as I have done for previous lm() models but it isn't working so far. Here is my attempted model which I hope looks at how the change in reading scores over time(age) interacts with a child's SESDLD grouping (this indicated whether a child has a language problem and whether or not they are high or low income).
AgeSES.model <- lmer(ReadingMeasure ~ Age.c*SESDLD1 + (1|childid), data = reshapedomit, REML = FALSE)
The ReadingMeasure is a continuous score, age.c is centered age measured in months. SESDLD1 is a categorical measure which has 4 levels. I would expect four positive slopes of ReadingMeasure growth with different intercepts and probably differing slopes.
I would really appreciate any pointers on how to do this!
Thank you so much!!
The type of plot I would like to achieve - this was done in Stata

Age as a time scale in survival analysis

In survival analysis, you measure the time from some startpoint and right censor your data. The code thus usually looks like this:
fit <- coxph(Surv(time, status) ~ outcome + x1 + x2, data=db)
However, in epidemiological analysis, the startpoint is not informative (i.e. it just indicates one completed a form, not that one is exposed to something), so the data is said to be left truncated and the measured time has to be adjusted.
I've been told to overcome this limit by using age as the time-scale, which makes perfectly sense as my outcome can be related to age.
This seems easily doable with SAS (end of page 5). How can I achieve this with R ?

How to check and control for autocorrelation in a mixed effect model of longitudinal data?

I have behavioral data for many groups of birds over 10 days of observation. I wanted to investigate whether there is a temporal pattern in some behaviors (e.g. does mate competition increase over time?) And I was told that I had to account for the autocorrelation of the data, since behavior is unlikely to be independent in each day.
However I was wondering about two things:
Since I'm not interested in the differences in y among days but the trend of y over days, do I still need to correct for autocorrelation?
If yes, how do I control for the autocorrelation so that I'm left out only with the signal (and noise of course)?
For the second question, keep in mind I will be analyzing the effect of time on behavior using mixed models in R (since there are random effects such as pseudo-replication), but I have not found any straightforward method of correcting for autocorrelation in the data when modeling the responses.
(1) Yes, you should check for/account for autocorrelation.
The first example here shows an example of estimating trends in a mixed model while accounting for autocorrelation.
You can fit these models with lme from the nlme package. Here's a mixed model without autocorrelation included:
cmod_lme <- lme(GS.NEE ~ cYear,
data=mc2, method="REML",
random = ~ 1 + cYear | Site)
and you can explore the autocorrelation by using plot(ACF(cmod_lme)).
(2) Add correlation to the model something like this:
cmod_lme_acor <- update(cmod_lme,
correlation=corAR1(form=~cYear|Site)
#JeffreyGirard notes that
to check the ACF after updating the model to include the correlation argument, you will need to use plot(ACF(cmod_lme_acor, resType = "normalized"))

Resources