Age as a time scale in survival analysis - r

In survival analysis, you measure the time from some startpoint and right censor your data. The code thus usually looks like this:
fit <- coxph(Surv(time, status) ~ outcome + x1 + x2, data=db)
However, in epidemiological analysis, the startpoint is not informative (i.e. it just indicates one completed a form, not that one is exposed to something), so the data is said to be left truncated and the measured time has to be adjusted.
I've been told to overcome this limit by using age as the time-scale, which makes perfectly sense as my outcome can be related to age.
This seems easily doable with SAS (end of page 5). How can I achieve this with R ?

Related

backwards selection of glm does not change the complete model

I am very new to working with GLM. I have a dataset with categorical (as factors) and numerical predictor variables and the response variable is count data wiht a poisson distribution. These I put in a glm:
glm2<- glm(formula = count ~ Salinity + Period + Intensity + Depth + Temp + Treatment, data = dfglm, family = "poisson")
Treatment(1.1 - 3.6) and Period (morning/midday) are factors.
The output looks like this:
I already see multiple suprising things in this output (very big difference between the null-deviance and residual deviance, treatment 1.1 not showing, period morning and midday not shown as separate levels, very high standard errors) but I will continue for now.
For the backward selection I used this code:
backward<-step(glm2,direction="backward",trace=0)
summary(backward)
I got exactly the same output as given above. Also when checking backward$coefficients, all coefficients remained.
Lastly I tried this:
If anyone could give me advice/an interpretation of this output and how to make a better model with a working backward selection, it is greatly appreciated!

Cox PH model, how to control for multipel events

IΒ΄m working on a time-to-event modell in R using the coxph function in the survival package. IΒ΄m analysing animal movements until they pass the study area which consists of 2 zones. Each individual can have multiple events before the final passing event as they move between zones, each zone is analysed seperatley. I was thinking that stratifying the model on number of zone exposures or adding frailty(individual id) as a random effect would controll for this and give somewhat similar results.
However the results are widely different when looking at model fit (AIC). Covariates with a large effect in the stratified model barley have very little impact in the frailty model. The coeficients and the significance levels are more similar though.
It feels like there is something wrong or should this large difference be expected from the two different approaches?
Is any of the approaches prefered when controling for multipel events? Or is there a better way than these two approaches?
(range number of events events 1-11, mean number of events 2.24)
Example code:
frailty_model <- coxph(formula=Surv(T1,T2,event==2) ~ day_night + discharge + frailty(id), data = cox_passage, na.action = na.fail)
stratified_model <- coxph(formula=Surv(T1,T2,event==2) ~ day_night + discharge + strata(n exposures witin zone), data = cox_passage, na.action = na.fail)

Gamma distribution in a GLMM

I am trying to create a GLMM in R. I want to find out how the emergence time of bats depends on different factors. Here I take the time difference between the departure of the respective bat and the sunset of the day as dependent variable (metric). As fixed factors I would like to include different weather data (metric) as well as the reproductive state (categorical) of the bats. Additionally, there is the transponder number (individual identification code) as a random factor to exclude inter-individual differences between the bats.
I first worked in R with a linear mixed model (package lme4), but the QQ plot of the residuals deviates very strongly from the normal distribution. Also a histogram of the data rather indicates a gamma distribution. As a result, I implemented a GLMM with a gamma distribution. Here is an example with one weather parameter:
model <- glmer(formula = difference_in_min ~ repro + precipitation + (1+repro|transponder number), data = trip, control=ctrl, family=gamma(link = log))
However, since there was no change in the QQ plot this way, I looked at the residual diagnostics of the DHARMa package. But the distribution assumption still doesn't seem to be correct, because the data in the QQ plot deviates very much here, too.
Residual diagnostics from DHARMa
But if the data also do not correspond to a gamma distribution, what alternative is there? Or maybe the problem lies somewhere else entirely.
Does anyone have an idea where the error might lie?
But if the data also do not correspond to a gamma distribution, what alternative is there?
One alternative is called the lognormal distribution (https://en.wikipedia.org/wiki/Log-normal_distribution)
Gaussian (or normal) distributions are typically used for data that are normally distributed around zero, which sounds like you do not have. But the lognormal distribution does not have the same requirements. Following your previous code, you would fit it like this:
model <- glmer(formula = log(difference_in_min) ~ repro + precipitation + (1+repro|transponder number), data = trip, control=ctrl, family=gaussian(link = identity))
or instead of glmer you can just call lmer directly where you don't need to specify the distribution (which it may tell you to do in a warning message anyway:
model <- lmer(formula = log(difference_in_min) ~ repro + precipitation + (1+repro|transponder number), data = trip, control=ctrl)

How to estimate a regression with both variables i and t simultaneously

I want to estimate a regression for a variable, LWAGE (log wage), against EXP (years of work experience). The data that I have has participants tracked across 7 years, so each year their number of years of work experience increases by 1.
When I do the regression for
πΏπ‘Šπ΄πΊπΈπ‘– = 𝛽0 + 𝛽1𝐸𝐷𝑖 + 𝑒𝑖
I used
reg1 <- lm(LWAGE~EXP, data=df)
Now I'm trying to do the following regression:
πΏπ‘Šπ΄πΊπΈπ‘–π‘‘ = 𝛽0 + 𝛽1𝐸𝑋𝑃𝑖𝑑 + 𝑒i.
But I'm not sure how to include my the time based component into my regression. I searched around but couldn't find anything relevant.
Are you attempting to include time-fixed effects in your model or an interaction between your variable EXP and time (calling this TIME for this demonstration)?
For time fixed effects using lm() you can just include time as a variable in your model. Time should be a factor.
reg2 <- lm(LWAGE~EXP + TIME, data = df)
As an interaction between EXP and TIME it would be
reg3 <- lm(LWAGE~EXP*TIME, data = df)
Based on your description it sounds like you might be looking for the interaction. I.e. How does the effect of experience on log of wages vary by time?
You can also take a look at the plm package for working with panel data.
https://cran.r-project.org/web/packages/plm/vignettes/plmPackage.html

Testing a General Linear Hypothesis in R

I'm working my way through a Linear Regression Textbook and am trying to replicate the results from a section on the Test of the General Linear Hypothesis, but I need a little bit of help on how to do so in R.
I've already taken a look at a number of other posts, but am hoping someone can give me some example code. I have data on twenty-six subjects which has the following form:
Group, Weight (lb), HDL Cholesterol mg/decaliters
1,163.5,75
1,180,72.5
1,178.5,62
2,106,57.5
2,134,49
2,216.5,74
3,163.5,76
3,154,55.5
3,139,68
Given this data I am trying to test to see if the regression lines fit to the three groups of subjects have a common slope. The models postulated are:
y=Ξ²o + Ξ²1β‹…x + Ο΅
y=Ξ³0 + Ξ³1β‹…xi + Ο΅
y= Ξ΄0 + Ξ΄1β‹…xi + Ο΅
So the hypothesis of interest is H0: Ξ²1 = Ξ³1 = Ξ΄1
I have been trying to do this using the linearHypothesis function in the car library, but have been having trouble knowing what the model object should be, and am not confident that this is the correct approach (or package) to be using.
Any help would be much appreciated – Thanks!
Tim, your question doesn't seem so much to be about R code. Instead, it appears that you have questions about how to test the interaction of your Group and Weight (lb) variables on the outcome HDL Cholesterol mg/decaliters. You don't state this specifically, but I'm taking a guess that these are your predictors and outcome, respectively.
So essentially, you're trying to see if the predictor Weight (lb) has differential effects depending on the level of the variable Group. This can be done in a number of ways using the linear model. A simple regression approach would be lm(hdl ~ 1 + group + weight + group*weight). And then the coefficient for the interaction term group*weight would tell you whether or not there is a significant interaction (i.e., moderation) effect.
However, I think we would have a major concern. In particular, we ought to worry that our hypothesized effect is that the group variable and the hdl variable do not interact. That is, you're essentially predicting the null. Furthermore, you're predicting the null despite having a small sample size. Therefore, it would be rather unlikely that we have sufficient statistical power to detect an effect, even if there were one to be observed.

Resources