Opposite directions of exponential hazard model coefficients ( with survreg and glm poisson) - r

I want to estimate an exponential hazards model with one predictor in R. For some reason, I am getting coefficients with opposite signs when I estimate it using a glm poisson with offset log t and when I just use the survreg function from the survival package. I am sure the explanation is perfectly obvious but I can not figure it out.
Example
t <- c(89,74,23,74,53,3,177,44,28,43,25,24,31,111,57,20,19,137,45,48,9,17,4,59,7,26,180,56,36,51,6,71,23,6,13,28,16,180,16,25,6,25,4,5,32,94,106,1,69,63,31)
d <- c(0,1,1,0,1,1,0,1,1,0,1,1,1,1,0,0,1,0,1,1,1,0,1,0,1,1,0,0,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,0,1,1,1,1,1)
p <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1)
df <- data.frame(d,t,p)
# exponential hazards model using poisson with offest log(t)
summary(glm(d ~ offset(log(t)) + p, data = df, family = "poisson"))
Produces:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.3868 0.7070 -7.619 2.56e-14 ***
p 1.3932 0.7264 1.918 0.0551 .
Compared to
# exponential hazards model using survreg exponential
require(survival)
summary(survreg(Surv(t,d) ~ p, data = df, dist = "exponential"))
Produces:
Value Std. Error z p
(Intercept) 5.39 0.707 7.62 2.58e-14
p -1.39 0.726 -1.92 5.51e-02
Why are the coefficients in opposite directions and how would I interpret the results as they stand?
Thanks!

In the second model an increased value of p is associated with a decreased expected survival. In the first model the increased p that had a long t value would imply a higher chance of survival and a lower risk. Variations in risk and mean survival time values of necessity go in opposite directions. The fact that the absolute values are the same comes from the mathematical identity log(1/x) = -log(x). The risk is (exactly) inversely proportional to mean lifetime in exponential models.

Related

How to make polynomial predictions? [duplicate]

I have
library(ISLR)
attach(Wage)
# Polynomial Regression and Step Functions
fit=lm(wage~poly(age,4),data=Wage)
coef(summary(fit))
fit2=lm(wage~poly(age,4,raw=T),data=Wage)
coef(summary(fit2))
plot(age, wage)
lines(20:350, predict(fit, newdata = data.frame(age=20:350)), lwd=3, col="darkred")
lines(20:350, predict(fit2, newdata = data.frame(age=20:350)), lwd=3, col="darkred")
The prediction lines seem to be the same, however why are the coefficients so different? How do you intepret them in raw=T and raw=F.
I see that the coefficients produced with poly(...,raw=T) match the ones with ~age+I(age^2)+I(age^3)+I(age^4).
If I want to use the coefficients to get the prediction "manually" (without using the predict() function) is there something I should pay attention to? How should I interpret the coefficients of the orthogonal polynomials in poly().
By default, with raw = FALSE, poly() computes an orthogonal polynomial. It internally sets up the model matrix with the raw coding x, x^2, x^3, ... first and then scales the columns so that each column is orthogonal to the previous ones. This does not change the fitted values but has the advantage that you can see whether a certain order in the polynomial significantly improves the regression over the lower orders.
Consider the simple cars data with response stopping distance and driving speed. Physically, this should have a quadratic relationship but in this (old) dataset the quadratic term is not significant:
m1 <- lm(dist ~ poly(speed, 2), data = cars)
m2 <- lm(dist ~ poly(speed, 2, raw = TRUE), data = cars)
In the orthogonal coding you get the following coefficients in summary(m1):
Estimate Std. Error t value Pr(>|t|)
(Intercept) 42.980 2.146 20.026 < 2e-16 ***
poly(speed, 2)1 145.552 15.176 9.591 1.21e-12 ***
poly(speed, 2)2 22.996 15.176 1.515 0.136
This shows that there is a highly significant linear effect while the second order is not significant. The latter p-value (i.e., the one of the highest order in the polynomial) is the same as in the raw coding:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.47014 14.81716 0.167 0.868
poly(speed, 2, raw = TRUE)1 0.91329 2.03422 0.449 0.656
poly(speed, 2, raw = TRUE)2 0.09996 0.06597 1.515 0.136
but the lower order p-values change dramatically. The reason is that in model m1 the regressors are orthogonal while they are highly correlated in m2:
cor(model.matrix(m1)[, 2], model.matrix(m1)[, 3])
## [1] 4.686464e-17
cor(model.matrix(m2)[, 2], model.matrix(m2)[, 3])
## [1] 0.9794765
Thus, in the raw coding you can only interpret the p-value of speed if speed^2 remains in the model. And as both regressors are highly correlated one of them can be dropped. However, in the orthogonal coding speed^2 only captures the quadratic part that has not been captured by the linear term. And then it becomes clear that the linear part is significant while the quadratic part has no additional significance.
I believe the way the polynomial regression would be run based on raw=T, is that one would look at the highest power term and assess its significance based on the pvalue for that coefficient.
If found not significant (large pvalue) then the regression would be re-run without that particular non-significant power (ie. the next lower degree) and this would be carried out one step at a time reducing if not significant.
If at any time the higher degree is significant then the process would stop and assert that, that degree is the appropriate one.

poly() in lm(): difference between raw vs. orthogonal

I have
library(ISLR)
attach(Wage)
# Polynomial Regression and Step Functions
fit=lm(wage~poly(age,4),data=Wage)
coef(summary(fit))
fit2=lm(wage~poly(age,4,raw=T),data=Wage)
coef(summary(fit2))
plot(age, wage)
lines(20:350, predict(fit, newdata = data.frame(age=20:350)), lwd=3, col="darkred")
lines(20:350, predict(fit2, newdata = data.frame(age=20:350)), lwd=3, col="darkred")
The prediction lines seem to be the same, however why are the coefficients so different? How do you intepret them in raw=T and raw=F.
I see that the coefficients produced with poly(...,raw=T) match the ones with ~age+I(age^2)+I(age^3)+I(age^4).
If I want to use the coefficients to get the prediction "manually" (without using the predict() function) is there something I should pay attention to? How should I interpret the coefficients of the orthogonal polynomials in poly().
By default, with raw = FALSE, poly() computes an orthogonal polynomial. It internally sets up the model matrix with the raw coding x, x^2, x^3, ... first and then scales the columns so that each column is orthogonal to the previous ones. This does not change the fitted values but has the advantage that you can see whether a certain order in the polynomial significantly improves the regression over the lower orders.
Consider the simple cars data with response stopping distance and driving speed. Physically, this should have a quadratic relationship but in this (old) dataset the quadratic term is not significant:
m1 <- lm(dist ~ poly(speed, 2), data = cars)
m2 <- lm(dist ~ poly(speed, 2, raw = TRUE), data = cars)
In the orthogonal coding you get the following coefficients in summary(m1):
Estimate Std. Error t value Pr(>|t|)
(Intercept) 42.980 2.146 20.026 < 2e-16 ***
poly(speed, 2)1 145.552 15.176 9.591 1.21e-12 ***
poly(speed, 2)2 22.996 15.176 1.515 0.136
This shows that there is a highly significant linear effect while the second order is not significant. The latter p-value (i.e., the one of the highest order in the polynomial) is the same as in the raw coding:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.47014 14.81716 0.167 0.868
poly(speed, 2, raw = TRUE)1 0.91329 2.03422 0.449 0.656
poly(speed, 2, raw = TRUE)2 0.09996 0.06597 1.515 0.136
but the lower order p-values change dramatically. The reason is that in model m1 the regressors are orthogonal while they are highly correlated in m2:
cor(model.matrix(m1)[, 2], model.matrix(m1)[, 3])
## [1] 4.686464e-17
cor(model.matrix(m2)[, 2], model.matrix(m2)[, 3])
## [1] 0.9794765
Thus, in the raw coding you can only interpret the p-value of speed if speed^2 remains in the model. And as both regressors are highly correlated one of them can be dropped. However, in the orthogonal coding speed^2 only captures the quadratic part that has not been captured by the linear term. And then it becomes clear that the linear part is significant while the quadratic part has no additional significance.
I believe the way the polynomial regression would be run based on raw=T, is that one would look at the highest power term and assess its significance based on the pvalue for that coefficient.
If found not significant (large pvalue) then the regression would be re-run without that particular non-significant power (ie. the next lower degree) and this would be carried out one step at a time reducing if not significant.
If at any time the higher degree is significant then the process would stop and assert that, that degree is the appropriate one.

How to get an estimate and confidence interval for a contrast in R with offset

I've got a Poisson GLM model fitted in R, looking something like this:
glm(Outcome~Exposure + Var1 + offset(log(persontime)),family=poisson,data=G))
Where Outcome will end up being a rate, the Exposure is a continuous variable, and Var1 is a factor with three levels.
It's easy enough from the output of that:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.6998 0.1963 -29.029 < 2e-16
Exposure 4.7482 1.0793 4.399 1.09e-05
Var1Thing1 -0.2930 0.2008 -1.459 0.144524
Var1Thin 1.0395 0.2037 5.103 3.34e-07
Var1Thing3 0.7722 0.2201 3.508 0.000451
To get the estimate of a one-unit increase in Exposure. But a one-unit increase isn't actually particularly meaningful. An increase of 0.025 is actually far more likely. Getting an estimate for that isn't particularly difficult either, but I'd like a confidence interval along with the estimate. My intuition is that I need to use the contrast package, but the following generated an error:
diff <- contrast(Fit,list(Exposure=0.030,Var1="Thing1"),list(Exposure=0.005,Type="Thing1"))
"Error in offset(log(persontime)) : object 'persontime' not found"
Any idea what I'm doing wrong?
you want to use the confint function (which in this case will call the MASS:::confint.glm method), as in:
confint(Fit)
Since the standard errors is the model scale linearly with the linear changes in the scale of the variable 'Exposure' in your model, you can simply multiply the confidence interval by the difference in scale to get the confidence for a smaller 'unit' change.
Dumb example:
Lets say you want to test the hypothesis that people fall down more often when they've had more alcohol. You test this by randomly serving individuals varying amounts of alcohol (which you measure in ml) and counting the number of times each person falls down. Your model is:
Fit <- glm(falls ~ alcohol_ml,data=myData, family=poisson)
and the coef table is
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.6998 0.1963 -29.029 < 2e-16
Alcohol_ml 4.7482 1.0793 4.399 1.09e-05
and the Confidence interval for alcohol is 4-6 (just to keep is simple). Now a colegue asks you to give the the confidence interval in ounces. All you have to do is to scale by the confidence interval by the conversion factor (29.5735 ounces per ml), as in:
c(4,6) * 29.5735 # effect per ounce alcohol [notice multiplication is used to rescale here]
alternatively you could re-scale your data and re-fit the model:
mydata$alcohol_oz <- mydata$alcohol_ml / 29.5735 #[notice division is used to rescale here]
Fit <- glm(falls ~ alcohol_oz,data=myData, family=poisson)
or you could re-scale your data right in the model:
#[again notice that division is used here]
Fit <- glm(falls ~ I(alcohol_ml/29.5735),data=myData, family=poisson)
Either way, you will get the same confidence intervals on the new scale.
Back to your example: if you're units of Exposure are so large that you are unlikely to observe such a change within an individual and a smaller change is more easily interpreted, just re-scale your variable 'Exposure' (as in myData$Exposure_newScale = myData$Exposure / 0.030 so Exposure_newScale is in multiples of 0.030) or rescale the confidence intervals using either of these methods.

Random slope for time in subject not working in lme4

I can not insert a random slope in this model with lme4(1.1-7):
> difJS<-lmer(JS~Tempo+(Tempo|id),dat,na.action=na.omit)
Error: number of observations (=274) <= number of random effects (=278) for term
(Tempo | id); the random-effects parameters and the residual variance (or scale
parameter) are probably unidentifiable
With nlme it is working:
> JSprova<-lme(JS~Tempo,random=~1+Tempo|id,data=dat,na.action=na.omit)
> summary(JSprova)
Linear mixed-effects model fit by REML Data: dat
AIC BIC logLik
769.6847 791.3196 -378.8424
Random effects:
Formula: ~1 + Tempo | id
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 1.1981593 (Intr)
Tempo 0.5409468 -0.692
Residual 0.5597984
Fixed effects: JS ~ Tempo
Value Std.Error DF t-value p-value
(Intercept) 4.116867 0.14789184 138 27.837013 0.0000
Tempo -0.207240 0.08227474 134 -2.518874 0.0129
Correlation:
(Intr)
Tempo -0.837
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.79269550 -0.39879115 0.09688881 0.41525770 2.32111142
Number of Observations: 274
Number of Groups: 139
I think it is a problem of missing data as I have few cases where there is a missing data in time two of the DV but with na.action=na.omit should not the two package behave in the same way?
It is "working" with lme, but I'm 99% sure that your random slopes are indeed confounded with the residual variation. The problem is that you only have two measurements per subject (or only one measurement per subject in 4 cases -- but that's not important here), so that a random slope plus a random intercept for every individual gives one random effect for every observation.
If you try intervals() on your lme fit, it will give you an error saying that the variance-covariance matrix is unidentifiable.
You can force lmer to do it by disabling some of the identifiability checks (see below).
library("lme4")
library("nlme")
library("plyr")
Restrict the data to only two points per individual:
sleepstudy0 <- ddply(sleepstudy,"Subject",
function(x) x[1:2,])
m1 <- lme(Reaction~Days,random=~Days|Subject,data=sleepstudy0)
intervals(m1)
## Error ... cannot get confidence intervals on var-cov components
lmer(Reaction~Days+(Days|Subject),data=sleepstudy0)
## error
If you want you can force lmer to fit this model:
m2B <- lmer(Reaction~Days+(Days|Subject),data=sleepstudy0,
control=lmerControl(check.nobs.vs.nRE="ignore"))
## warning messages
The estimated variances are different from those estimated by lme, but that's not surprising since some of the parameters are jointly unidentifiable.
If you're only interested in inference on the fixed effects, it might be OK to ignore these problems, but I wouldn't recommend it.
The sensible thing to do is to recognize that the variation among slopes is unidentifiable; there may be among-individual variation among slopes, but you just can't estimate it with this model. Don't try; fit a random-intercept model and let the implicit/default random error term take care of the variation among slopes.
There's a recent related question on CrossValidated; there I also refer to another example.

Residual variance extracted from glm and lmer in R

I am trying to take what I have read about multilevel modelling and merge it with what I know about glm in R. I am now using the height growth data from here.
I have done some coding shown below:
library(lme4)
library(ggplot2)
setwd("~/Documents/r_code/multilevel_modelling/")
rm(list=ls())
oxford.df <- read.fwf("oxboys/OXBOYS.DAT",widths=c(2,7,6,1))
names(oxford.df) <- c("stu_code","age_central","height","occasion_id")
oxford.df <- oxford.df[!is.na(oxford.df[,"age_central"]),]
oxford.df[,"stu_code"] <- factor(as.character(oxford.df[,"stu_code"]))
oxford.df[,"dummy"] <- 1
chart <- ggplot(data=oxford.df,aes(x=occasion_id,y=height))
chart <- chart + geom_point(aes(colour=stu_code))
# see if lm and glm give the same estimate
glm.01 <- lm(height~age_central+occasion_id,data=oxford.df)
glm.02 <- glm(height~age_central+occasion_id,data=oxford.df,family="gaussian")
summary(glm.02)
vcov(glm.02)
var(glm.02$residual)
(logLik(glm.01)*-2)-(logLik(glm.02)*-2)
1-pchisq(-2.273737e-13,1)
# lm and glm give the same estimation
# so glm.02 will be used from now on
# see if lmer without level2 variable give same result as glm.02
mlm.03 <- lmer(height~age_central+occasion_id+(1|dummy),data=oxford.df,REML=FALSE)
(logLik(glm.02)*-2)-(logLik(mlm.03)*-2)
# 1-pchisq(-3.408097e-07,1)
# glm.02 and mlm.03 give the same estimation, only if REML=FALSE
mlm.03 gives me the following output:
> mlm.03
Linear mixed model fit by maximum likelihood
Formula: height ~ age_central + occasion_id + (1 | dummy)
Data: oxford.df
AIC BIC logLik deviance REMLdev
1650 1667 -819.9 1640 1633
Random effects:
Groups Name Variance Std.Dev.
dummy (Intercept) 0.000 0.0000
Residual 64.712 8.0444
Number of obs: 234, groups: dummy, 1
Fixed effects:
Estimate Std. Error t value
(Intercept) 142.994 21.132 6.767
age_central 1.340 17.183 0.078
occasion_id 1.299 4.303 0.302
Correlation of Fixed Effects:
(Intr) ag_cnt
age_central 0.999
occasion_id -1.000 -0.999
You can see that there is a variance for the residual in the random effect section, which I have read from Applied Multilevel Analysis - A Practical Guide by Jos W.R. Twisk, that this represents the amount of "unexplained variance" from the model.
I wondered if I could arrive at the same residual variance from glm.02, so I tried the following:
> var(resid(glm.01))
[1] 64.98952
> sd(resid(glm.01))
[1] 8.061608
The results are slightly different from the mlm.03 output. Does this refer to the same "residual variance" stated in mlm.03?
Your glm.02 and glm.01 estimate a simple linear regression model using least squares. On the other hand, mlm.03 is a linear mixed model estimated through maximum likelihood.
I don't know your dataset, but it looks like you use the dummy variable to create a cluster structure at level-2 with zero variance.
So your question has basically two answers, but only the second answer is important in your case. The models glm.02 and mlm.03 do not contain the same residual variance estimate, because...
The models are usually different (mixed effects vs. classical regression). In your case, however, the dummy variable seems to supress the additional variance component in the mixed model. So for me the models seem to be equal.
The method used to estimate the residual variance is different. glm uses LS, lmer uses ML in your code. ML estimates for the residual variance are slightly biased (resulting in smaller variance estimates). This can be solved by using REML instead of ML to estimate variance components.
Using classic ML (instead of REML), however, is still necessary and correct for the likelihood-ratio test. Using REML the comparison of the two likelihoods would not be correct.
Cheers!

Resources