I am looking to obtain parameter estimates for one predictor when constraining another predictors to specific values in a negative binomial glm in order to better explain an interaction effect.
My model is something like this:
model <- glm.nb(outcome ~ IV * moderator + covariate1 + covariate2)
Because the IV:moderator term is significant, I would like to obtain parameter estimates for IV at specific values of moderator (i.e., at +1 and -1 SD). I can obtain slope estimates for IV at various levels of moderator using the visreg package but I don't know how to estimate SEs and test statistics. moderator is a continuous variable so I can't use the multcomp package and other packages designed for finding simple slopes (e.g., pequod and QuantPsyc) are incompatible with negative binomial regression. Thanks!
If you want to constrain one of the values in your regression, consider taking that variable out of the model and adding it in as an offset. For example with the sample data.
dd<-data.frame(
x1=runif(50),
x2=runif(50)
)
dd<-transform(dd,
y=5*x1-2*x2+3+rnorm(50)
)
We can run a model with both x1 and x2 as parameters
lm(y ~ x1 + x2,dd)
# Call:
# lm(formula = y ~ x1 + x2, data = dd)
#
# Coefficients:
# (Intercept) x1 x2
# 3.438438 4.135162 -2.154770
Or say that we know that the coefficient of x2 is -2. Then we can not estimate x2 but put that term in as an offset
lm(y ~ x1 + offset(-2*x2), dd)
# Call:
# lm(formula = y ~ x1 + offset(-2 * x2), data = dd)
#
# Coefficients:
# (Intercept) x1
# 3.347531 4.153594
The offset() option basically just create a covariate who's coefficient is always 1. Even though I've demonstrated with lm, this same method should work for glm.nb and many other regression models.
Related
I am attempting to fit an ARIMAX model to daily consumption data in R. When I perform an OLS regression with lm() I am able to include a dummy variable for each unit and remove the constant term (intercept) to avoid less then full rank matrices.
lm1 <- lm(y ~ -1 + x1 + x2 + x3, data = dat)
I have not found a way to do this with arima() which forces me to use the constant term and exclude one of the dummy variables.
with(dat, arima(y, xreg = cbind(x1, x2))
Is there a specific reason why arima() doesn't allow this and is there a way to bypass?
See the documentation for the argument include.mean in ?arima, it seems you want the following: arima(y, xreg = cbind(x1, x2), include.mean=FALSE).
Be also aware of the definition of the model fitted by ARIMA as pointed by #RichardHardy.
Using an ordered factor as a predictor in a regression by default produces a linear (.L) and quadratic (.Q) polynomial contrast. Is there a way to omit the quadratic contrast? Here's some clumsy example code I rigged up:
xvar<-rnorm(100)
yvar<-x+rnorm(100)
xfac<-as.factor(c(1,2,3))
dat<-cbind(xvar,yvar,xfac)
dat<-data.frame(dat)
dat$xfac<-ordered(as.factor(dat$xfac))
summary(lm(yvar~xvar+xfac,data=dat))
Am I correct in assuming that the quadratic contrast being included as a predictor might result in some multicollinearity issues? I looked around but couldn't find any other posts about only including the linear component. Thank you!
No, you are not correct. You would be correct if you had done this:
lm( yvar ~ xvar + as.numeric(xfac) +I(as.numeric(xfac)^2), data=dat)
But that's not the same as what R does when it encounters such a situation. Whether or not the quadratic term will "weaken" the linear estimate really depends on the data situation. If a quadratic fit reduces the deviations of fit from data, then the linear estimate might get "weakened", but not necessarily.
If you do want only the linear contrasts, you could do this (which is often called a "test of trend" for xfac):
lm( yvar ~ xvar + as.numeric(xfac), data=dat)
If you have an ordered factor with several levels and you only wanted the linear and quadratic contrasts then you can do this:
> fac <- factor(c("E","VG","G","F","P"),
levels=c("E","VG","G","F","P"), ordered=TRUE)
> sfac <- sample(fac, 30, rep=TRUE)
> outcome <- 5*as.numeric(sfac) +rnorm(30) # linear outcome effect
> lm(outcome ~ sfac)
#-----------
Call:
lm(formula = outcome ~ sfac)
Coefficients:
(Intercept) sfac.L sfac.Q sfac.C sfac^4
14.97297 15.49134 0.10634 -0.03287 0.40144
#---------
> contrasts(sfac, 2) <- contr.poly(5)[, 1:2]
> lm(outcome ~ sfac)
Call:
lm(formula = outcome ~ sfac)
Coefficients:
(Intercept) sfac.L sfac.Q
14.97078 15.50680 0.07977
I am trying to specify both a random intercept and random slope term in a GAMM model with one fixed effect.
I have successfully fitted a model with a random intercept using the below code within the mgcv library, but can now not determine what the syntax is for a random slope within the gamm() function:
M1 = gamm(dur ~ s(dep, bs="ts", k = 4), random= list(fInd = ~1), data= df)
If I was using both a random intercept and slope within a linear mixed-effects model I would write it in the following way:
M2 = lme(dur ~ dep, random=~1 + dep|fInd, data=df)
The gamm() supporting documentation states that the random terms need to be given in the list form as in lme() but I cannot find any interpretable examples that include both slope and intercept terms. Any advice / solutions would be much appreciated.
The gamm4 function in the gamm4 package contains a way to do this. You specify the random intercept and slope in the same way that you do in the lmer style. In your case:
M1 = gamm4(dur~s(dep,bs="ts",k=4), random = ~(1+dep|fInd), data=df)
Here is the gamm4 documentation:
https://cran.r-project.org/web/packages/gamm4/gamm4.pdf
Here is the gamm() syntax to enter correlated random intercept and slope effects, using the sleepstudy dataset.
library(nlme)
library(mgcv)
data(sleepstudy,package='lme4')
# Model via lme()
fm1 <- lme(Reaction ~ Days, random= ~1+Days|Subject, data=sleepstudy, method='REML')
# Model via gamm()
fm1.gamm <- gamm(Reaction ~ Days, random= list(Subject=~1+Days), data=sleepstudy, method='REML')
VarCorr(fm1)
VarCorr(fm1.gamm$lme)
# Both are identical
# Subject = pdLogChol(1 + Days)
# Variance StdDev Corr
# (Intercept) 612.0795 24.740241 (Intr)
# Days 35.0713 5.922103 0.066
# Residual 654.9424 25.591843
The syntax to enter uncorrelated random intercept and slope effects is the same for lme() and gamm().
# Model via lme()
fm2 <- lme(Reaction ~ Days, random= list(Subject=~1, Subject=~0+Days), data=sleepstudy, method='REML')
# Model via gamm()
fm2.gamm <- gamm(Reaction ~ Days, random= list(Subject=~1, Subject=~0+Days), data=sleepstudy, method='REML')
VarCorr(fm2)
VarCorr(fm2.gamm$lme)
# Both are identical
# Variance StdDev
# Subject = pdLogChol(1)
# (Intercept) 627.5690 25.051328
# Subject = pdLogChol(0 + Days)
# Days 35.8582 5.988172
# Residual 653.5838 25.565285
This answer also shows how to enter multiple random effects into lme().
I fitted a model in R with the lmer()-function from the lme4 package. I scaled the dependent variable:
mod <- lmer(scale(Y)
~ X
+ (X | Z),
data = df,
REML = FALSE)
I look at the fixed-effect coefficients with fixef(mod):
> fixef(mod)
(Intercept) X1 X2 X3 X4
0.08577525 -0.16450047 -0.15040043 -0.25380073 0.02350007
It is quite easy to calculate the means by hand from the fixed-effects coefficients. However, I want them to be unscaled and I am unsure how to do this exactly. I am aware that scaling means substracting the mean from every Y and deviding by the standard deviation. But both, mean and standard deviation, were calculated from the original data. Can I simply reverse this process after I fitted an lmer()-model by using the mean and standard deviation of the original data?
Thanks for any help!
Update: The way I presented the model above seems to imply that the dependent variable is scaled by taking the mean over all responses and dividing by the standard deviation of all the responses. Usually, it is done differently. Rather than taking the overall mean and standard deviation the responses are standardized per subject by using the mean and standard deviation of the responses of that subject. (This is odd in an lmer() I think as the random intercept should take care of that... Not to mention the fact that we are talking about calculating means on an ordinal scale...) The problem however stays the same: Once I fitted such a model, is there a clean way to rescale the coefficients of the fitted model?
Updated: generalized to allow for scaling of the response as well as the predictors.
Here's a fairly crude implementation.
If our original (unscaled) regression is
Y = b0 + b1*x1 + b2*x2 ...
Then our scaled regression is
(Y0-mu0)/s0 = b0' + (b1'*(1/s1*(x1-mu1))) + b2'*(1/s2*(x2-mu2))+ ...
This is equivalent to
Y0 = mu0 + s0((b0'-b1'/s1*mu1-b2'/s2*mu2 + ...) + b1'/s1*x1 + b2'/s2*x2 + ...)
So bi = s0*bi'/si for i>0 and
b0 = s0*b0'+mu0-sum(bi*mui)
Implement this:
rescale.coefs <- function(beta,mu,sigma) {
beta2 <- beta ## inherit names etc.
beta2[-1] <- sigma[1]*beta[-1]/sigma[-1]
beta2[1] <- sigma[1]*beta[1]+mu[1]-sum(beta2[-1]*mu[-1])
beta2
}
Try it out for a linear model:
m1 <- lm(Illiteracy~.,as.data.frame(state.x77))
b1 <- coef(m1)
Make a scaled version of the data:
ss <- scale(state.x77)
Scaled coefficients:
m1S <- update(m1,data=as.data.frame(ss))
b1S <- coef(m1S)
Now try out rescaling:
icol <- which(colnames(state.x77)=="Illiteracy")
p.order <- c(icol,(1:ncol(state.x77))[-icol])
m <- colMeans(state.x77)[p.order]
s <- apply(state.x77,2,sd)[p.order]
all.equal(b1,rescale.coefs(b1S,m,s)) ## TRUE
This assumes that both the response and the predictors are scaled.
If you scale only the response and not the predictors, then you should submit (c(mean(response),rep(0,...)) for m and c(sd(response),rep(1,...)) for s (i.e., m and s are the values by which the variables were shifted and scaled).
If you scale only the predictors and not the response, then submit c(0,mean(predictors)) for m and c(1,sd(predictors)) for s.
I'm using the package ordinal in R to run ordinal logistic regression on a dependent variable that is based on a 1 - 5 likert scale and trying to figure out how to test the proportional odds assumption.
My current model is y ~ x1 + x2 + x3 + x4 + x2*x3 + (1|ID) + (1|form) where x1 and x2 are dichotomous and x3 and x4 are continuous variables. (92 subjects, 4 forms).
As far as I know,
-"nominal" is not implemented in the more recent version of clmm.
-clmm2 (the older version) does not accept more than one random variable
-nominal_test() only appears to work for clm2 (without random effects at all)
For a different dv (that only has one random term and no interaction), I had used:
m1 <- clmm2 (y ~ x1 + x2 + x3, random = ID, Hess = TRUE, data = d
m1.nom <- clmm2 (y ~ x1 + x2, random = ID, Hess = TRUE, nominal = ~x3, data = d)
m2.nom <- clmm2 (y ~ x2+ x3, random = ID, Hess = TRUE, nominal = ~ x1, data = d)
m3.nom <- clmm2 (y ~ x1+ x3, random = ID, Hess = TRUE, nominal = ~ x2, data = d)
anova (m1.nom, m1)
anova (m2.nom, m1)
anova (m3.nom, m1) # (as well as considering the output in summary (m#.nom)
But I'm not sure how to modify this approach to handle the current model (2 random terms and an interaction of the fixed effects), nor am I sure that this actually a correct way to test the proportional odds assumption in the first place. (The example in the package tutorial only has 2 fixed effects.)
I'm open to other approaches (be they other packages, software, or graphical approaches) that would let me test this. Any suggestions?
Even in the case of the most basic ordinal logistic regression models, the diagnostic tests for the proportional odds assumption are known to frequently reject the null hypothesis that the coefficients are the same across the levels of the ordered factor. The statistician Frank Harrell suggests here a general graphical method for examining the proportional odds assumption, which is probably your best bet. In this approach you'd just graph the linear predictions from a logit model (with random effects) for each level of the outcome and one predictor variable at a time.