Hausman type test in R - r

I have been using "plm" package of R to do the analysis of panel data. One of the important test in this package for choosing between "fixed effect" or "random effect" model is called Hausman type. A similar test is also available for the Stata. The point here is that Stata requires fixed effect to be estimated first followed by random effect. However, I didn't see any such restriction in the "plm" package. So, I was wondering whether "plm" package has the default "fixed effect" first and then "random effect" second. For your reference, I mention below the steps in Stata and R that I followed for the analysis.
*
Stata Steps: (data=mydata, y=dependent variable,X1:X4: explanatory variables)
*step 1 : Estimate the FE model
xtreg y X1 X2 X3 X4 ,fe
*step 2: store the estimator
est store fixed
*step 3 : Estimate the RE model
xtreg y X1 X2 X3 X4,re
* step 4: store the estimator
est store random
*step 5: run Hausman test
hausman fixed random
#R steps (data=mydata, y=dependent variable,X1:X4: explanatory variables)
#step 1 : Estimate the FE model
fe <- plm(y~X1+X2+X3+X4,data=mydata,model="within")
summary(model.fe)
#step 2 : Estimate the RE model
re <- pggls(y~X1+X2+X3+X4,data=mydata,model="random")
summary(model.re)
#step 3 : Run Hausman test
phtest(fe, re)

Update: Be sure to read the comments. Original answer below.
Trial-and-error way of finding this out:
> library(plm)
> data("Gasoline", package = "plm")
> form <- lgaspcar ~ lincomep + lrpmg + lcarpcap
> wi <- plm(form, data = Gasoline, model = "within")
> re <- plm(form, data = Gasoline, model = "random")
> phtest(wi, re)
Hausman Test
data: form
chisq = 302.8037, df = 3, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent
> phtest(re, wi)
Hausman Test
data: form
chisq = 302.8037, df = 3, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent
As you can see, the test yields the same result no matter which of the models you feed it as the first and which as the second argument.

Related

How to do negative binomial regression with the rms package in R?

How can I use the rms package in R to execute a negative binomial regression? (I originally posted this question on Statistics SE, but it was closed apparently because it is a better fit here.)
With the MASS package, I use the glm.nb function, but I am trying to switch to the rms package because I sometimes get weird errors when bootstrapping with glm.nb and some other functions. But I cannot figure out how to do a negative binomial regression with the rms package.
Here is sample code of what I would like to do (copied from the rms::Glm function documentation):
library(rms)
## Dobson (1990) Page 93: Randomized Controlled Trial :
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
f <- Glm(counts ~ outcome + treatment, family=poisson())
f
anova(f)
summary(f, outcome=c('1','2','3'), treatment=c('1','2','3'))
So, instead of using family=poisson(), I would like to use something like family=negative.binomial(), but I cannot figure out how to do this.
In the documentation for family {stats}, I found this note in the "See also" section:
For binomial coefficients, choose; the binomial and negative binomial distributions, Binomial, and NegBinomial.
But even after clicking the link for ?NegBinomial, I cannot make any sense of this.
I would appreciate any help on how to use the rms package in R to execute a negative binomial regression.
opinion up front You might be better off posting (as a separate question) a reproducible example of the "weird errors" from your bootstrap attempts and seeing whether people have ideas for resolving them. It's fairly common for NB fitting procedures to throw warnings or errors when data are equi- or underdispersed, as the estimates of the dispersion parameter become infinite in this case ...
#coffeinjunky is correct that using family = negative.binomial(theta=VALUE) will work (where VALUE is a numeric constant, e.g. theta=1 for the geometric distribution [a special case of the NB]). However: you won't be able (without significantly more work) be able to fit the general NB model, i.e. the model where the dispersion parameter (theta) is estimated as part of the fitting procedure. That's what MASS::glm.nb does, and AFAICS there is no analogue in the rms package.
There are a few other packages/functions in addition to MASS::glm.nb that fit the negative binomial model, including (at least) bbmle and glmmTMB — there may be others such as gamlss.
## Dobson (1990) Page 93: Randomized Controlled Trial :
dd < data.frame(
counts = c(18,17,15,20,10,20,25,13,12)
outcome = gl(3,1,9),
treatment = gl(3,3))
MASS::glm.nb
library(MASS)
m1 <- glm.nb(counts ~ outcome + treatment, data = dd)
## "iteration limit reached" warning
glmmTMB
library(glmmTMB)
m2 <- glmmTMB(counts ~ outcome + treatment, family = nbinom2, data = dd)
## "false convergence" warning
bbmle
library(bbmle)
m3 <- mle2(counts ~ dnbinom(mu = exp(logmu), size = exp(logtheta)),
parameters = list(logmu ~outcome + treatment),
data = dd,
start = list(logmu = 0, logtheta = 0)
)
signif(cbind(MASS=coef(m1), glmmTMB=fixef(m2)$cond, bbmle=coef(m3)[1:5]), 5)
MASS glmmTMB bbmle
(Intercept) 3.0445e+00 3.04540000 3.0445e+00
outcome2 -4.5426e-01 -0.45397000 -4.5417e-01
outcome3 -2.9299e-01 -0.29253000 -2.9293e-01
treatment2 -1.1114e-06 0.00032174 8.1631e-06
treatment3 -1.9209e-06 0.00032823 6.5817e-06
These all agree fairly well (at least for the intercept/outcome parameters). This example is fairly difficult for a NB model (5 parameters + dispersion for 9 observations, data are Poisson rather than NB).
Based on this, the following seems to work:
library(rms)
library(MASS)
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
Glm(counts ~ outcome + treatment, family = negative.binomial(theta = 1))
General Linear Model
rms::Glm(formula = counts ~ outcome + treatment, family = negative.binomial(theta = 1))
Model Likelihood
Ratio Test
Obs 9 LR chi2 0.31
Residual d.f.4 d.f. 4
g 0.2383063 Pr(> chi2) 0.9892
Coef S.E. Wald Z Pr(>|Z|)
Intercept 3.0756 0.2121 14.50 <0.0001
outcome=2 -0.4598 0.2333 -1.97 0.0487
outcome=3 -0.2962 0.2327 -1.27 0.2030
treatment=2 -0.0347 0.2333 -0.15 0.8819
treatment=3 -0.0503 0.2333 -0.22 0.8293

How can I get the p-value for whether my binomial regression is significantly different from a null model in R?

I have a dataset demos_mn of demographics and an outcome variable. There are 5 variables of interest, so that my glm and null models looks like this:
# binomial model
res.binom <- glm(var.bool ~ var1 + var2*var3 + var4 + var5,
data = demos_mn, family = "binomial")
# null model
res.null <- glm(var.bool ~ 1,
data = demos_mn, family = "binomial")
# calculate marginal R2
print(r.squaredGLMM(res.binom))
# show p value
print(anova(res.null, res.binom))
That is my work flow for glm mixed models, but for my binomial model I do not get a p-value for the overall model only for the predictors. I'm hoping someone could enlighten me?
I did have some success using glmer for a repeated measures version of the model, however that unfortunately means I had to get rid of some key variables that were not measured repeatedly.
Perhaps you forgot test="Chisq" ? From ?anova.glm:
test: a character string, (partially) matching one of ‘"Chisq"’,
‘"LRT"’, ‘"Rao"’, ‘"F"’ or ‘"Cp"’. See ‘stat.anova’.
example("glm") ## to set up / fit the glm.D93 model
null <- update(glm.D93, . ~ 1)
anova(glm.D93, null, test="Chisq")
Analysis of Deviance Table
Model 1: counts ~ outcome + treatment
Model 2: counts ~ 1
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 4 5.1291
2 8 10.5814 -4 -5.4523 0.244
test="Chisq" is poorly named: it's a likelihood ratio test, note it's an asymptotic test [relies on a large sample size]. For GLMs with an adjustable scale parameter (Gaussian, Gamma, quasi-likelihood) you would use test="F".

multinomial logistic regression in R: multinom in nnet package result different from mlogit in mlogit package?

Both R functions, multinom (package nnet) and mlogit (package mlogit) can be used for multinomial logistic regression. But why this example returns different result of p values of coefficients?
#prepare data
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mydata$rank <- factor(mydata$rank)
mydata$gre[1:10] = rnorm(10,mean=80000)
#multinom:
test = multinom(admit ~ gre + gpa + rank, data = mydata)
z <- summary(test)$coefficients/summary(test)$standard.errors
# For simplicity, use z-test to approximate t test.
pv <- (1 - pnorm(abs(z)))*2
pv
# (Intercept) gre gpa rank2 rank3 rank4
# 0.00000000 0.04640089 0.00000000 0.00000000 0.00000000 0.00000000
#mlogit:
mldata = mlogit.data(mydata,choice = 'admit', shape = "wide")
mlogit.model1 <- mlogit(admit ~ 1 | gre + gpa + rank, data = mldata)
summary(mlogit.model1)
# Coefficients :
# Estimate Std. Error t-value Pr(>|t|)
# 1:(intercept) -3.5826e+00 1.1135e+00 -3.2175 0.0012930 **
# 1:gre 1.7353e-05 8.7528e-06 1.9825 0.0474225 *
# 1:gpa 1.0727e+00 3.1371e-01 3.4195 0.0006274 ***
# 1:rank2 -6.7122e-01 3.1574e-01 -2.1258 0.0335180 *
# 1:rank3 -1.4014e+00 3.4435e-01 -4.0697 4.707e-05 ***
# 1:rank4 -1.6066e+00 4.1749e-01 -3.8482 0.0001190 ***
Why the p values from multinorm and mlogit are so different? I guess it is because of the outliers I added using mydata$gre[1:10] = rnorm(10,mean=80000). If outlier is an inevitable issue (for example in genomics, metabolomics, etc.), which R function should I use?
As alternative, you can use broom, which outputs tidy format for multinom class models.
library(broom)
tidy(test)
It'll return a data.frame with z-statistics and p-values.
Take a look at tidy documentation for further information.
P.S.: as I can't get the data from the link you posted, I can't replicate the results
The difference here is the difference between the Wald $z$ test (what you calculated in pv) and the Likelihood Ratio test (what is returned by summary(mlogit.model). The Wald test is computationally simpler, but in general has less desirable properties (e.g., its CIs are not scale-invariant). You can read more about the two procedures here.
To perform LR tests on your nnet model coefficents, you can load the car and lmtest packages and call Anova(test) (though you'll have to do a little more work for the single df tests).

How do you obtain the zero inflation parameter (pz) of a zero-inflated NB model using glmmADMB?

I have run a zero-inflated negative binomial model using the glmmADMB package in R. From what I understand, the pz parameter is the zero-inflation parameter and it is fitted by the package to the model that you run- the pz value that best fits your data is searched for and the package starts searching from pz=0.2. This is the default and can be changed.
After you run the model, does anyone know how to find what pz value is chosen for the data?
The zero-inflation estimate can be obtained (along with its standard deviation) from the model object. See below using built-in data from the glmmADMB package:
library(glmmADMB)
# munge data
Owls = transform(Owls, Nest = reorder(Nest, NegPerChick),
logBroodSize = log(BroodSize), NCalls = SiblingNegotiation)
# fit model
fit_zinb = glmmadmb(NCalls ~ (FoodTreatment + ArrivalTime) * SexParent +
offset(logBroodSize) + (1 | Nest),
data = Owls, zeroInflation = TRUE,
family = "nbinom")
# overall summary, check for match
summary(fit_zinb)
# zero-inflation estimate
fit_zinb$pz
# zero-inflation standard deviation
fit_zinb$sd_pz

Is there a way of getting "marginal effects" from a `glmer` object

I am estimating random effects logit model using glmer and I would like to report Marginal Effects for the independent variables. For glm models, package mfx helps compute marginal effects. Is there any package or function for glmer objects?
Thanks for your help.
A reproducible example is given below
## mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
## as of 2020-08-24:
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
mydata$rank <- factor(mydata$rank) #creating ranks
id <- rep(1:ceiling(nrow(mydata)/2), times=c(2)) #creating ID variable
mydata <- cbind(mydata,data.frame(id,stringsAsFactors=FALSE))
set.seed(12345)
mydata$ran <- runif(nrow(mydata),0,1) #creating a random variable
library(lme4)
cfelr <- glmer(admit ~ (1 | id) + rank + gpa + ran + gre, data=mydata ,family = binomial)
summary(cfelr)
Here's an approach using the margins() package:
library(margins)
library(lme4)
gm1 <- glmer(cbind(incidence, size - incidence) ~ period +
(1 | herd),
data = cbpp,
family = binomial)
m <- margins(gm1, data = cbpp)
m
You could use the ggeffects-package (examples in the package-vignettes). So, for your code this might look like this:
library(ggeffects)
# dat is a data frame with marginal effects
dat <- ggpredict(cfelr, term = "rank")
plot(dat)
or you use, as Benjamin described, the You could use the sjPlot-package, using the plot_model() function with plot-type "pred" (this simply wraps the ggeffects package for marginal effect plots):
library(sjPlot)
plot_model(cfelr, type = "pred", term = "rank")
This is a much less technical answer, but perhaps provides a useful resource. I am a fan of the sjPlot package which provides plots of marginal effects of glmer objects, like so:
library(sjPlot)
sjp.glmer(cfelr, type = "eff")
The package provides a lot of options for exploring a glmer model's fixed and random effects as well. https://github.com/strengejacke/sjPlot
My solution does not answer the question,
"Is there a way of getting “marginal effects” from a glmer object",
but rather,
"Is there a way of getting marginal logistic regression coefficients from a conditional logistic regression with one random intercept?"
I am only offering this write-up because the reproducible example provided was a conditional logistic regression with one random intercept and I'm intending to be helpful. Please do not downvote; I will take down if this answer is deemed too off topic.
The R-code is based on the work of Patrick Heagerty (click "View Raw" to see pdf), and I include a reproducible example below from my github version of his lnMLE package (excuse the warnings at installation -- I'm shoehorning Patrick's non-CRAN package). I'm omitting the output for all except the last line, compare, which shows the fixed effect coefficients side-by-side.
library(devtools)
install_github("lnMLE_1.0-2", "swihart")
library(lnMLE)
## run the example from the logit.normal.mle help page
## see also the accompanying document (click 'View Raw' on page below:)
## https://github.com/swihart/lnMLE_1.0-2/blob/master/inst/doc/lnMLEhelp.pdf
data(eye_race)
attach(eye_race)
marg_model <- logit.normal.mle(meanmodel = value ~ black,
logSigma= ~1,
id=eye_race$id,
model="marginal",
data=eye_race,
tol=1e-5,
maxits=100,
r=50)
marg_model
cond_model <- logit.normal.mle(meanmodel = value ~ black,
logSigma= ~1,
id=eye_race$id,
model="conditional",
data=eye_race,
tol=1e-5,
maxits=100,
r=50)
cond_model
compare<-round(cbind(marg_model$beta, cond_model$beta),2)
colnames(compare)<-c("Marginal", "Conditional")
compare
The output of the last line:
compare
Marginal Conditional
(Intercept) -2.43 -4.94
black 0.08 0.15
I attempted the reproducible example given, but had problems with both the glmer and lnMLE implementations; again I only include output pertaining to the comparison results and the warnings from the glmer() call:
##original question / answer... glmer() function gave a warning and the lnMLE did not fit well...
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mydata$rank <- factor(mydata$rank) #creating ranks
id <- rep(1:ceiling(nrow(mydata)/2), times=c(2)) #creating ID variable
mydata <- cbind(mydata,data.frame(id,stringsAsFactors=FALSE))
set.seed(12345)
mydata$ran <- runif(nrow(mydata),0,1) #creating a random variable
library(lme4)
cfelr <- glmer(admit ~ (1 | id) + rank + gpa + ran + gre,
data=mydata,
family = binomial)
Which gave:
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00161047 (tol = 0.001, component 2)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
- Rescale variables?
but I foolishly went on without rescaling, trying to apply the logit.normal.mle to the example given. However, the conditional model doesn't converge or produce standard error estimates,
summary(cfelr)
library(devtools)
install_github("lnMLE_1.0-2", "swihart")
library(lnMLE)
mydata$rank2 = mydata$rank==2
mydata$rank3 = mydata$rank==3
mydata$rank4 = mydata$rank==4
cfelr_cond = logit.normal.mle(meanmodel = admit ~ rank2+rank3+rank4+gpa+ran+gre,
logSigma = ~1 ,
id=id,
model="conditional",
data=mydata,
r=50,
tol=1e-6,
maxits=500)
cfelr_cond
cfelr_marg = logit.normal.mle(meanmodel = admit ~ rank2+rank3+rank4+gpa+ran+gre,
logSigma = ~1 ,
id=id,
model="marginal",
data=mydata,
r=50,
tol=1e-6,
maxits=500)
cfelr_marg
compare_glmer<-round(cbind(cfelr_marg$beta, cfelr_cond$beta,summary(cfelr)$coeff[,"Estimate"]),3)
colnames(compare_glmer)<-c("Marginal", "Conditional","glmer() Conditional")
compare_glmer
The last line of which reveals that the conditional model from cfelr_cond did not evaluate a conditional model but just returned the marginal coefficients and no standard errors.
> compare_glmer
Marginal Conditional glmer() Conditional
(Intercept) -4.407 -4.407 -4.425
rank2 -0.667 -0.667 -0.680
rank3 -1.832 -1.833 -1.418
rank4 -1.930 -1.930 -1.585
gpa 0.547 0.548 0.869
ran 0.860 0.860 0.413
gre 0.004 0.004 0.002
I hope to iron out these issues. Any help/comments appreciated. I'll give status updates when I can.

Resources