How to get 95% CI from R's coxph? - r

I have used the following function in R's coxph() to fit a cox hazard model. I want to report the proper statistics; however, there is no 95% CI in the output.
Surv(days, censor) ~ gender + age + treatment, data_1)
I only get the following columns.
coef exp(coef) se(coef) z p

A simple way to get confidence intervals for the hazard ratios associated with your predictor variables would be to use the "summary" function on your model fit. If you want confidence intervals for the coefficient estimates themselves you could use the "confint" function. Exponentiation of the results from confint can also be used to get the hazard ratio confidence intervals.
fit <- coxph(Surv(t,y) ~ x)
summary(fit) #output provides HR CIs
confint(fit) #coefficient CIs
exp(confint(fit)) #Also HR CIs

If you need this for further processing, consider a tidy solution with broom:
library(broom)
library(dplyr)
library(survival)
mod <- coxph(Surv(time, status) ~ sex + age, data = lung)
mod |>
tidy(conf.int = TRUE, exponentiate = TRUE) |>
select(term, estimate, starts_with("conf"))
#> # A tibble: 2 x 4
#> term estimate conf.low conf.high
#> <chr> <dbl> <dbl> <dbl>
#> 1 sex 0.599 0.431 0.831
#> 2 age 1.02 0.999 1.04
More details at this reference

You can simply find CI conf.int in CoxPH model summary in R ( in case using survival package . Its under lower .95 and upper .95
Update 2019 (But I assume the suggestion did not work previously):
test1 <- list(time=c(4,3,1,1,2,2,3),
status=c(1,1,1,0,1,1,0),
x=c(0,2,1,1,1,0,0),
sex=c(0,0,0,0,1,1,1))
# Fit a stratified model
coxobj <- coxph(Surv(time, status) ~ x + strata(sex), test1)
coxobj_summary <- summary(coxobj)
coxobj_summary$conf.int
Output:
exp(coef) exp(-coef) lower .95 upper .95
x 2.230706 0.4482887 0.4450758 11.18022

Related

Bootstrap for Mediation Confidence Interval [duplicate]

I have used the following function in R's coxph() to fit a cox hazard model. I want to report the proper statistics; however, there is no 95% CI in the output.
Surv(days, censor) ~ gender + age + treatment, data_1)
I only get the following columns.
coef exp(coef) se(coef) z p
A simple way to get confidence intervals for the hazard ratios associated with your predictor variables would be to use the "summary" function on your model fit. If you want confidence intervals for the coefficient estimates themselves you could use the "confint" function. Exponentiation of the results from confint can also be used to get the hazard ratio confidence intervals.
fit <- coxph(Surv(t,y) ~ x)
summary(fit) #output provides HR CIs
confint(fit) #coefficient CIs
exp(confint(fit)) #Also HR CIs
If you need this for further processing, consider a tidy solution with broom:
library(broom)
library(dplyr)
library(survival)
mod <- coxph(Surv(time, status) ~ sex + age, data = lung)
mod |>
tidy(conf.int = TRUE, exponentiate = TRUE) |>
select(term, estimate, starts_with("conf"))
#> # A tibble: 2 x 4
#> term estimate conf.low conf.high
#> <chr> <dbl> <dbl> <dbl>
#> 1 sex 0.599 0.431 0.831
#> 2 age 1.02 0.999 1.04
More details at this reference
You can simply find CI conf.int in CoxPH model summary in R ( in case using survival package . Its under lower .95 and upper .95
Update 2019 (But I assume the suggestion did not work previously):
test1 <- list(time=c(4,3,1,1,2,2,3),
status=c(1,1,1,0,1,1,0),
x=c(0,2,1,1,1,0,0),
sex=c(0,0,0,0,1,1,1))
# Fit a stratified model
coxobj <- coxph(Surv(time, status) ~ x + strata(sex), test1)
coxobj_summary <- summary(coxobj)
coxobj_summary$conf.int
Output:
exp(coef) exp(-coef) lower .95 upper .95
x 2.230706 0.4482887 0.4450758 11.18022

R (or stata):How to test a regression model where the null hypothesis is not that the coefficient is zero [duplicate]

The standard summary(lm(Height~Weight)) will output results for the hypothesis test H0: Beta1=0, but if I am interested in testing the hypothesis H0: B1=1 is there a package that will produce that p-value? I know I can calculate it by hand and I know I can "flip the confidence interval" for a two tailed test (test a 95% hypothesis by seeing if the 95% confint contains the point of interest), but I am looking for an easy way to generate the p-values for a simulation study.
You can use linearHypothesis from the package car, for example:
library(car)
fit = lm(Petal.Width ~ Petal.Length,data=iris)
fit
Call:
lm(formula = Petal.Width ~ Petal.Length, data = iris)
Coefficients:
(Intercept) Petal.Length
-0.3631 0.4158
linearHypothesis(fit,"Petal.Length=0.4")
Linear hypothesis test
Hypothesis:
Petal.Length = 0.4
Model 1: restricted model
Model 2: Petal.Width ~ Petal.Length
Res.Df RSS Df Sum of Sq F Pr(>F)
1 149 6.4254
2 148 6.3101 1 0.11526 2.7034 0.1023
There's also an article about the specifics of this package.
I don't know about a package that will do this, but you can test this hypothesis by using an offset:
## specify null-model parameters
null_int <- 0; null_slope <- 1
## base model
m0 <- lm(mpg ~ disp, data=mtcars)
## include offset as null model
m1 <- update(m0, . ~ . + offset(null_int + null_slope*disp))
Comparing results:
cbind(base=coef(m0),offset=coef(m1))
base offset
(Intercept) 29.59985476 29.599855
disp -0.04121512 -1.041215
You can see that the estimated slope is now 1 unit lower (since it's relative to a null model with slope=1). The summary values, standard errors, p-values etc., will all be adjusted appropriately.

Forecast dependent values with mvrnorm and include temporal autocorrelation

I have matrices with values (weight, maturity, etc.) by time step and age class and I would like to forecast future values indeterministicly. Age classes are not independent so I've been using mvrnorm to deal with that. How do I also get (lag 1) temporal autocorrelation in my predictions?
Here is what I would like to do in R:
library(MASS)
# dummy matrix: rows are time steps columns are dependent classes (ages)
x <- matrix(rnorm(20),4,5,dimnames = list(years=c('year1','year2','year3','year4'),ages=c('age1','age2','age3','age4','age5')))
# what I have so far to get next year's values (the goal would be to predict several years)
sigma <- cov(x) #covariance matrix
delta <- mvrnorm(1,rep(0,ncol(x)),cov(x)) # deviations
xl <- tail(x,1) #last year values
xp <- xl+delta #new values
# There is no temporal autocorrelation in here of course
xnew <- rbind(x,xp)
matplot(xnew,type='l')
# So I would need new values based on something like this:
rho <- apply(x,2,function(x) acf(x)$acf[2,1,1])
delta <- mvrnorm(1,xl,cov(x))
xp <- rho*xl+(1-rho)*delta
The last part doesn't feel right though.
The first part of this answer is how to account for Temporal Autocorrelation in the original question. The 2nd part adds an answer about the multivariate case per the revised question.
Part 1:
library(MASS)
# dummy matrix: rows are time steps columns are dependent classes (ages)
x <- matrix(rnorm(20),4,5)
# what I have so far to get next year's values (the goal would be to predict several years)
sigma <- cov(x)
delta <- mvrnorm(1,rep(0,ncol(x)),cov(x))
xl <- tail(x,1)
xp <- xl+delta #new values
# There is no temporal autocorrelation in here of course
xnew <- rbind(x,xp)
matplot(xnew,type='l')
# Clean up / construct your data set
dat <- as.data.frame(x)
dat$year <- c(2014,2015,2016,2017)
dat <- rbind(dat, c(xp, 2018))
colnames(dat) <- c("maturity", "age", "height", "sales", "year")
# Account for Temporal Autocorrelation
library(nlme)
mdl.ac <- gls(sales ~ year, data=dat,
correlation = corAR1(form=~year),
na.action=na.omit)
summary(mdl.ac)
Generalized least squares fit by REML
Model: sales ~ year
Data: dat
AIC BIC logLik
14.01155 10.406 -3.005773
Correlation Structure: ARMA(1,0)
Formula: ~year
Parameter estimate(s):
Phi1
0.1186508
Coefficients:
Value Std.Error t-value p-value
(Intercept) 1.178018 0.5130766 2.2959883 0.1054
year 0.012666 0.3537748 0.0358023 0.9737
Correlation:
(Intr)
year 0.646
Standardized residuals:
1 2 3 4 5
0.3932124 -0.4053291 -1.8081473 0.0699103 0.8821300
attr(,"std")
[1] 0.5251018 0.5251018 0.5251018 0.5251018 0.5251018
attr(,"label")
[1] "Standardized residuals"
Residual standard error: 0.5251018
Degrees of freedom: 5 total; 3 residual
Part 2:
# Account for Temporal Autocorrelation
library(nlme)
mdl.ac <- gls(year ~ height + sales + I(maturity*age), data=dat,
correlation = corAR1(form=~year),
na.action=na.omit)
summary(mdl.ac)
Generalized least squares fit by REML
Model: year ~ height + sales + I(maturity * age)
Data: dat
AIC BIC logLik
15.42011 3.420114 -1.710057
Correlation Structure: ARMA(1,0)
Formula: ~year
Parameter estimate(s):
Phi1
0
Coefficients:
Value Std.Error t-value p-value
(Intercept) 0.2100381 0.4532345 0.4634203 0.7237
height -0.7602539 0.7758925 -0.9798444 0.5065
sales -0.1840694 0.8327382 -0.2210411 0.8615
I(maturity * age) 0.0449278 0.1839260 0.2442712 0.8475
Correlation:
(Intr) height sales
height -0.423
sales 0.214 -0.825
I(maturity * age) 0.349 -0.941 0.889
Standardized residuals:
1 2 3 4 5
-7.004956e-17 -4.985525e-01 -1.319137e+00 -1.568271e+00 -1.441708e+00
attr(,"std")
[1] 0.3962277 0.3962277 0.3962277 0.3962277 0.3962277
attr(,"label")
[1] "Standardized residuals"
Residual standard error: 0.3962277
Degrees of freedom: 5 total; 1 residual
Please also see CARBayesST and its vignette for an alternate approach:
https://cran.r-project.org/web/packages/CARBayesST/vignettes/CARBayesST.pdf

Generalized least squares results interpretation

I checked my linear regression model (WMAN = Species, WDNE = sea surface temp) and found auto-correlation so instead, I am trying generalized least squares with the following script;
library(nlme)
modelwa <- gls(WMAN ~WDNE, data=dat,
correlation = corAR1(form=~MONTH),
na.action=na.omit)
summary(modelwa)
I compared both models;
> library(MuMIn)
> model.sel(modelw,modelwa)
Model selection table
(Intrc) WDNE class na.action correlation df logLik AICc delta
modelwa 31.50 0.1874 gls na.omit crAR1(MONTH) 4 -610.461 1229.2 0.00
modelw 11.31 0.7974 lm na.excl 3 -658.741 1323.7 94.44
weight
modelwa 1
modelw 0
Abbreviations:
na.action: na.excl = ‘na.exclude’
correlation: crAR1(MONTH) = ‘corAR1(~MONTH)’
Models ranked by AICc(x)
I believe the results suggest I should use gls as the AIC is lower.
My problem is, I have been reporting F-value/R²/p-value, but the output from the gls does not have these?
I would be very grateful if someone could assist me in interpreting these results?
> summary(modelwa)
Generalized least squares fit by REML
Model: WMAN ~ WDNE
Data: mp2017.dat
AIC BIC logLik
1228.923 1240.661 -610.4614
Correlation Structure: ARMA(1,0)
Formula: ~MONTH
Parameter estimate(s):
Phi1
0.4809973
Coefficients:
Value Std.Error t-value p-value
(Intercept) 31.496911 8.052339 3.911524 0.0001
WDNE 0.187419 0.091495 2.048401 0.0424
Correlation:
(Intr)
WDNE -0.339
Standardized residuals:
Min Q1 Med Q3 Max
-2.023362 -1.606329 -1.210127 1.427247 3.567186
Residual standard error: 18.85341
Degrees of freedom: 141 total; 139 residual
>
I have now overcome the problem of auto-correlation so I can use lm()
Add lag1 of residual as an X variable to the original model. This can be done using the slide function in DataCombine package.
library(DataCombine)
econ_data <- data.frame(economics, resid_mod1=lmMod$residuals)
econ_data_1 <- slide(econ_data, Var="resid_mod1",
NewVar = "lag1", slideBy = -1)
econ_data_2 <- na.omit(econ_data_1)
lmMod2 <- lm(pce ~ pop + lag1, data=econ_data_2)
This script can be found here

lmer linear contrasts : Kenward Rogers or Satterthwaite DF and SE

In R, I am searching for a way to estimate confidence intervals for linear contrasts for lmer models that use either kenward-rogers or satterthwaite degrees of freedom and SE.
For example, I can compute a CI for a fixed effect parameter in a mixed model like SAS with R, using the t-value (with df from KR) and SE.
mod<-lmerTest::lmer(y~time1+treatment+time1:treatment+(1|PersonID),data=data)
lmerTest::summary(mod,ddf = "Kenward-Roger")
This output:
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 49.0768 1.0435 56.4700 47.029 < 2e-16 ***
time1 5.8224 0.5963 48.0000 9.764 5.51e-13 ***
treatment 1.6819 1.4758 56.4700 1.140 0.2592
time1:treatment 2.0425 0.8433 48.0000 2.422 0.0193 *
Allows a CI for time1 like:
5.8224+abs(qt(0.05/2, 48))*0.5963 #7.021342
5.8224-abs(qt(0.05/2, 48))*0.5963 #4.623458
I would like to do this same thing for a linear contrast of the fixed coefficients. This is the p-value but there is no SE output.
pbkrtest::KRmodcomp(mod,matrix(c(0,0,1,0),nrow = 1))
stat ndf ddf F.scaling p.value
Ftest 1.2989 1.0000 56.4670 1 0.2592
Is there anyway to get a SE or a CI from lmer linear contrasts that uses this type of df?
For this, you have at least two options: using the lsmeans package, or doing it manually (using functions vcovAdj.lmerMod and pbkrtest::get_Lb_ddf). Personally, I go with the later if the contrast to be tested is not very "simple", because I find the syntax in lsmeans a bit complicated.
To exemplify, take the following model:
library(pbkrtest)
library(lme4)
library(nlme) # for the 'Orthodont' data
# 'age' is a numeric variable, while 'Sex' and 'Subject' are factors
model <- lmer(distance ~ age : Sex + (1 | Subject), data = Orthodont)
Linear mixed model fit by REML ['lmerMod']
Formula: distance ~ age:Sex + (1 | Subject)
…
Fixed Effects:
(Intercept) age:SexMale age:SexFemale
16.7611 0.7555 0.5215
from which we would like to obtain stats on the difference between the coefficients for age in males and females (i.e., age:SexMale - age:SexFemale).
Using lsmeans:
library(lsmeans)
# Evaluate the contrast at a value of 'age' set to 1,
# so that the resulting value is equal to the regression coefficient
lsm = lsmeans(model, pairwise ~ age : Sex, at = list(age = 1))$contrast
produces:
contrast estimate SE df t.ratio p.value
1,Male - 1,Female 0.2340135 0.06113276 42.64 3.828 0.0004
Alternatively, doing the calculation manually:
# Specify the contrasts: age:SexMale - age:SexFemale
# Must have the same order as the fixed effects in the model
K = c("(Intercept)" = 0, "age:SexMale" = 1, "age:SexFemale" = -1)
# Retrieve the adjusted variance-covariance matrix, to calculate the SE
V = pbkrtest::vcovAdj.lmerMod(model, 0)
# Point estimate, SE and df
point_est = sum(K * fixef(model))
SE = sqrt(sum(K * (V %*% K)))
df = pbkrtest::get_Lb_ddf(model, K)
alpha = 0.05 # significance level
# Calculate confidence interval for the difference between the 'age' coefficients for males and females
Delta_age_CI = point_est + SE * qt(c(0.5 * alpha, 1 - 0.5 * alpha), df)
will result in a point estimate equal to 0.2340135, SE 0.06113276, df 42.63844, and confidence interval [0.1106973, 0.3573297]

Resources