cox analysis time dependant R - r

I create my table to make an analysis cox survival time-dependant (with t start and t stop for each ID)
my variable dependant is death and my variable independent is a score
So i look the document R about this type of analysis but when i run the code to have an HR and confidence interval my sample is multiple by 2 ( like they considered all my row as a sample)
I'm wondering if the following code is right to make analysis survival time dependant
coxph(Surv(tStart, tStop, DEATH) ~ score ,
data =df_timedep, cluster = ID)
Do you know if it's ok with this code ?

Related

Cox PH model, how to control for multipel events

IΒ΄m working on a time-to-event modell in R using the coxph function in the survival package. IΒ΄m analysing animal movements until they pass the study area which consists of 2 zones. Each individual can have multiple events before the final passing event as they move between zones, each zone is analysed seperatley. I was thinking that stratifying the model on number of zone exposures or adding frailty(individual id) as a random effect would controll for this and give somewhat similar results.
However the results are widely different when looking at model fit (AIC). Covariates with a large effect in the stratified model barley have very little impact in the frailty model. The coeficients and the significance levels are more similar though.
It feels like there is something wrong or should this large difference be expected from the two different approaches?
Is any of the approaches prefered when controling for multipel events? Or is there a better way than these two approaches?
(range number of events events 1-11, mean number of events 2.24)
Example code:
frailty_model <- coxph(formula=Surv(T1,T2,event==2) ~ day_night + discharge + frailty(id), data = cox_passage, na.action = na.fail)
stratified_model <- coxph(formula=Surv(T1,T2,event==2) ~ day_night + discharge + strata(n exposures witin zone), data = cox_passage, na.action = na.fail)

How to estimate a regression with both variables i and t simultaneously

I want to estimate a regression for a variable, LWAGE (log wage), against EXP (years of work experience). The data that I have has participants tracked across 7 years, so each year their number of years of work experience increases by 1.
When I do the regression for
πΏπ‘Šπ΄πΊπΈπ‘– = 𝛽0 + 𝛽1𝐸𝐷𝑖 + 𝑒𝑖
I used
reg1 <- lm(LWAGE~EXP, data=df)
Now I'm trying to do the following regression:
πΏπ‘Šπ΄πΊπΈπ‘–π‘‘ = 𝛽0 + 𝛽1𝐸𝑋𝑃𝑖𝑑 + 𝑒i.
But I'm not sure how to include my the time based component into my regression. I searched around but couldn't find anything relevant.
Are you attempting to include time-fixed effects in your model or an interaction between your variable EXP and time (calling this TIME for this demonstration)?
For time fixed effects using lm() you can just include time as a variable in your model. Time should be a factor.
reg2 <- lm(LWAGE~EXP + TIME, data = df)
As an interaction between EXP and TIME it would be
reg3 <- lm(LWAGE~EXP*TIME, data = df)
Based on your description it sounds like you might be looking for the interaction. I.e. How does the effect of experience on log of wages vary by time?
You can also take a look at the plm package for working with panel data.
https://cran.r-project.org/web/packages/plm/vignettes/plmPackage.html

Running diagnostics on a multivariate multiple regression in r

I have a data set that gives the rates of incidence of some phenomena in all the zip codes of a state, and some demographic data. The rates are given for each year in the data set (year 1 - year 6). A snippet of the data is available here.
I've run a multivariate linear regression to examine the impact of the demographic variables on the rates, per Fox & Weisberg (2011), weighted by the average zip code population across all years (var = POPmean):
Y <- cbind(data$rateY1, data$rateY2, data$rateY3, data$rateY4, data$rateY5, data$rateY6)
model <- lm(Y ~ someVAR1+someVAR2+someVAR3+someVAR4+someVAR5, data=data, weights= POPmean)
summary(model)
coef(model)
summary(manova(model))
I'd like to plot the regression diagnostics for this model for each year, but have no idea how to do so. I'd like to use influencePlot() from the car package, but when I try to do so:
influencePlot(model, id.method="noteworthy", main="Robustness Check")
I receive an error stating that the lengths of x,y differ (which, of course, they do). Can anyone help figure out how to plot the regression diagnostics for the model given above? Or suggest an alternative method?

specifying multiple separate random effects in nlme

I am analysing some whale tourism data and am trying to construct linear mixed effect models in the nlme package to see if any of my explanatory variables affect encounter time between whales and tourists. (I am also open to running this model in lme4.)
My variables are:
mins: encounter time (response variable)
Id: individual whale ID (random effect)
Vessel: vessel Id (random effect)
Sex: sex of the animal
Length: length of the animal
Year
Month (nested within Year).
So my random variables are Id and Vessel and I also have Year and Month as nested random effects.
I have come up with the following:
form1 <- formula(Min ~ length + Sex+ Encounter)
model1 <- lme(form1,
random = list(Id = ~1,
Vessel = ~1,
Year=~1,
Month = ~1), data=wsdata, method="ML")
But all my random effects become nested within Id.
Is there any way I can define Id and Vessel as separate random effects and Year and Month as nested random effects?
In general it's much easier to specify crossed (what you mean by "separate", I think) random effects in lme4, so unless you need models for temporal or spatial autocorrelation or heteroscedasticity (which are still easier to achieve with nlme), I would go ahead with
library(lme4)
fit <- lmer(mins ~ Length + Sex+ (1|Id) + (1|Vessel) +
(1|Year/Month), data=wsdata, REML=FALSE)
A few other comments:
what is encounter? it was in your formula but not in your description of the data set
it seems quite likely that encounter times (a duration of encounters?) would be skewed, in which case you might want to log-transform them.

life expectancy survival package R

I would like to calculate the life-years lost due to a disease in a way that I correct for other variables in the model (corrected group prognosis method). My dataset is a cohort of individuals for which I have follow-up time till death/censored and a variable whether they died, together with covariates as age, sex and prevalence of disease. I searched the web and I got the impression this should be possible with the survival package in R.
I used the following code which returns probabilities:
fit1 <- coxph(Surv(fup_death, death) ~ age + sex + prev_disease, data)
direct <- survexp( ~prev_disease, data=data, ratetable=fit1)
I also tried the survfit function, but than my computer crashes:
t<-survfit(fit1, newdata = data)
How can I derive the life-expectancy in the ones with the disease and without the disease? Or should I do it differently?
Thanks you in advance!
Best,
Symen
The calculation for years of life lost is the difference in mean survival. You can get survfit objects for two separate but comparable conditions like this:
fit1 <- coxph(Surv(fup_death, death) ~ age + sex + prev_disease, data)
survfit_WithDisease <- survfit(fit1,
newdata=data.frame(age=50,
sex='m',
prev_disease=TRUE))
survfit_NoDisease <- survfit(fit1,
newdata=data.frame(age=50,
sex='m',
prev_disease=FALSE))
and by setting print.rmean=TRUE you can get estimates of mean survival for each condition.
print(survfit_WithDisease,print.rmean=TRUE)
print(survfit_NoDisease,print.rmean=TRUE)
Note that mean isn't defined for every survival curve. There are several options for calculating mean survival when the survival curve does not go all the way to zero, which you should read about in ?print.survfit.

Resources