I have six fixed factors: A, B, C, D, E and F, and one random factor R. I want to test linear terms, pure quadratic terms and two-way interactions using language R. So, I constructed the full linear mixed model and tried to test its terms with drop1:
full.model <- lmer(Z ~ A + B + C + D + E + F
+ I(A^2) + I(B^2) + I(C^2) + I(D^2) + I(E^2) + I(F^2)
+ A:B + A:C + A:D + A:E + A:F
+ B:C + B:D + B:E + B:F
+ C:D + C:E + C:F
+ D:E + D:F
+ E:F
+ (1 | R), data=mydata, REML=FALSE)
drop1(full.model, test="Chisq")
It seems that drop1 is completely ignoring linear terms:
Single term deletions
Model:
Z ~ A + B + C + D + E + F + I(A^2) + I(B^2) + I(C^2) + I(D^2) +
I(E^2) + I(F^2) + A:B + A:C + A:D + A:E + A:F + B:C + B:D +
B:E + B:F + C:D + C:E + C:F + D:E + D:F + E:F + (1 | R)
Df AIC LRT Pr(Chi)
<none> 127177
I(A^2) 1 127610 434.81 < 2.2e-16 ***
I(B^2) 1 127378 203.36 < 2.2e-16 ***
I(C^2) 1 129208 2032.42 < 2.2e-16 ***
I(D^2) 1 127294 119.09 < 2.2e-16 ***
I(E^2) 1 127724 548.84 < 2.2e-16 ***
I(F^2) 1 127197 21.99 2.747e-06 ***
A:B 1 127295 120.24 < 2.2e-16 ***
A:C 1 127177 1.75 0.185467
A:D 1 127240 64.99 7.542e-16 ***
A:E 1 127223 48.30 3.655e-12 ***
A:F 1 127242 66.69 3.171e-16 ***
B:C 1 127180 5.36 0.020621 *
B:D 1 127202 27.12 1.909e-07 ***
B:E 1 127300 125.28 < 2.2e-16 ***
B:F 1 127192 16.60 4.625e-05 ***
C:D 1 127181 5.96 0.014638 *
C:E 1 127298 122.89 < 2.2e-16 ***
C:F 1 127176 0.77 0.380564
D:E 1 127223 47.76 4.813e-12 ***
D:F 1 127182 6.99 0.008191 **
E:F 1 127376 201.26 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If I exclude interactions from the model:
full.model <- lmer(Z ~ A + B + C + D + E + F
+ I(A^2) + I(B^2) + I(C^2) + I(D^2) + I(E^2) + I(F^2)
+ (1 | R), data=mydata, REML=FALSE)
drop1(full.model, test="Chisq")
then the linear terms get tested:
Single term deletions
Model:
Z ~ A + B + C + D + E + F + I(A^2) + I(B^2) + I(C^2) + I(D^2) +
I(E^2) + I(F^2) + (1 | R)
Df AIC LRT Pr(Chi)
<none> 127998
A 1 130130 2133.9 < 2.2e-16 ***
B 1 130177 2181.0 < 2.2e-16 ***
C 1 133464 5467.6 < 2.2e-16 ***
D 1 129484 1487.9 < 2.2e-16 ***
E 1 130571 2575.0 < 2.2e-16 ***
F 1 128009 12.7 0.0003731 ***
I(A^2) 1 128418 422.2 < 2.2e-16 ***
I(B^2) 1 128193 197.4 < 2.2e-16 ***
I(C^2) 1 129971 1975.1 < 2.2e-16 ***
I(D^2) 1 128112 115.6 < 2.2e-16 ***
I(E^2) 1 128529 533.0 < 2.2e-16 ***
I(F^2) 1 128017 21.3 3.838e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Because this is the way drop1 works (it's not specific to mixed models - you would find this behaviour for a regular linear model fitted with lm as well). From ?drop1:
The hierarchy is respected when considering terms to be added or dropped: all main effects contained in a second-order interaction must remain, and so on.
I discuss this at some length in this CrossValidated post
The statistically tricky part is that testing lower-level interactions in a model that also contains higher-level interactions is (depending on who you talk to) either (i) hard to do correctly or (ii) just plain silly (for the latter position, see part 5 of Bill Venables's "exegeses on linear models"). The rubric for this is the principle of marginality. At the very least, the meaning of the lower-order terms depends sensitively on how contrasts in the model are coded (e.g. treatment vs. midpoint/sum-to-zero). My default rule is that if you're not sure you understand exactly why this might be a problem, you shouldn't violate the principle of marginality.
However, as Venables actually describes in the linked article, you can get R to violate marginality if you want (p. 15):
To my delight I see that marginality constraints between factor terms are by default honoured and students are not led down the logically slippery ‘Type III sums of squares’ path. We discuss why it is that no main effects are shown, and it makes a useful tutorial point.
The irony is, of course, that Type III sums of squares were available all along if only people understood what they really were and how to get them. If the call to drop1 contains any formula as the second argument, the sections of the model matrix corresponding to all non-intercept terms are omitted seriatim from the model, giving some sort of test for a main effect ...
Provided you have used a contrast matrix with zero-sum columns they will be unique, and they are none other than the notorious ‘Type III sums of squares’. If you use, say, contr.treatment contrasts, though, so that the columns do not have sum zero, you get nonsense. This sensitivity to something that should in this context be arbitrary ought to be enough to alert anyone to the fact that something silly is being done.
In other words, using scope = . ~ . will force drop1 to ignore marginality. You do this at your own risk - you should definitely be able to explain to yourself what you're actually testing when you follow this procedure ...
For example:
set.seed(101)
dd <- expand.grid(A=1:10,B=1:10,g=factor(1:10))
dd$y <- rnorm(1000)
library(lme4)
m1 <- lmer(y~A*B+(1|g),data=dd)
drop1(m1,scope=.~.)
## Single term deletions
##
## Model:
## y ~ A * B + (1 | g)
## Df AIC
## <none> 2761.9
## A 1 2761.7
## B 1 2762.4
## A:B 1 2763.1
Related
I'd like to estimate the effect of a treatment on two separate groups, so something of the form
Equation 1
T being the treatment and M the dummy separating the two groups.
The problem is that the treatment is correlated to other variables that affect Y. Luckily, there exists a variable Z that serves as an instrument for T. What I've been able to implement in R was to "manually" run 2SLS, following the stages
Equation 2
and
Equation 3
To provide a reproducible example, first a simulation
n <- 100
set.seed(271)
Z <- runif(n)
e <- rnorm(n, sd = 0.5)
M <- as.integer(runif(n)) # dummy
u <- rnorm(n)
# Treat = 1 + 2*Z + e
alpha_0 <- 1
alpha_1 <- 2
Treat <- alpha_0 + alpha_1*Z + e
# Y = 3 + M + 2*Treat + 3*Treat * M + e + u (ommited vars that determine Treat affect Y)
beta_0 <- 3
beta_1 <- 1
beta_2 <- 2
beta_3 <- 3
Y <- beta_0 + beta_1*M + beta_2*Treat + beta_3 * M*Treat + e + u
The first stage regression
fs <- lm(Treat ~ Z)
stargazer::stargazer(fs, type = "text")
===============================================
Dependent variable:
---------------------------
Treat
-----------------------------------------------
Z 2.383***
(0.168)
Constant 0.835***
(0.096)
-----------------------------------------------
Observations 100
R2 0.671
Adjusted R2 0.668
Residual Std. Error 0.445 (df = 98)
F Statistic 200.053*** (df = 1; 98)
===============================================
And second stage
Treat_hat <- fitted(fs)
ss <- lm(Y ~ M + Treat_hat + M:Treat_hat)
stargazer::stargazer(ss, type = "text")
===============================================
Dependent variable:
---------------------------
Y
-----------------------------------------------
M 1.230
(1.717)
Treat_hat 2.243***
(0.570)
M:Treat_hat 2.636***
(0.808)
Constant 2.711**
(1.213)
-----------------------------------------------
Observations 100
R2 0.727
Adjusted R2 0.718
Residual Std. Error 2.539 (df = 96)
F Statistic 85.112*** (df = 3; 96)
===============================================
The problem now is that those Standard Errors aren't adjusted for the first stage, which looks like quite some work to do manually. As I'd do for any other IV regression, I'd prefer to just use AER::ivreg.
But I can't seem to get the same regression going there. Here are many possible iterations, that never quite do the same thing
AER::ivreg(Y ~ M + Treat + M:Treat | Z)
AER::ivreg(Y ~ M + Treat + M:Treat | M + Z)
Warning message:
In ivreg.fit(X, Y, Z, weights, offset, ...) :
more regressors than instruments
These make sense, I guess
AER::ivreg(Y ~ M + Treat + M:Treat | M + Z + M:Z)
Call:
AER::ivreg(formula = Y ~ M + Treat + M:Treat | M + Z + M:Z)
Coefficients:
(Intercept) M Treat M:Treat
2.641 1.450 2.229 2.687
Surprisingly close, but not quite.
I couldn't find a way to tell ivreg that Treat and M:Treat aren't really two separate endogenous variables, but really just the same endogenous variable moved around and interacted with an exogenous one.
In conclusion,
i) Is there some way to mess with ivreg and make this work?
ii) Is there some other function for 2SLS that can just manually accept 1st and 2nd stage formulas without this sort of restriction, and that adjusts standard errors?
iii) What's the simplest way to get the correct SEs if there are no other alternatives? I didn't come across any direct R code, just a bunch of matrix multiplication formulas (although I didn't dig too deep for this one).
Thank you
Essentially, if Z is a valid a valid instrument for Treat, M:Z should be a valid instrument for M:Treat, so, to me this makes sense:
AER::ivreg(Y ~ M + Treat + M:Treat | M + Z + M:Z)
I actually managed to back out the correct param values for a modified simulation:
n <- 100
set.seed(271)
Z <- runif(n)
e <- rnorm(n, sd = 0.5)
M <- round(runif(n)) # note: I changed from as.integer() to round() in order to get some 1's in the regression
u <- rnorm(n)
# Treat = 1 + 2*Z + e
alpha_0 <- 1
alpha_1 <- 2
Treat <- alpha_0 + alpha_1*Z + e
beta_0 <- 3
beta_1 <- 1
beta_2 <- 2
beta_3 <- 3
Y <- beta_0 + beta_1*M + beta_2*Treat + beta_3 * M*Treat
Now:
my_ivreg <- AER::ivreg(Y ~ M + Treat + M:Treat | M + Z + M:Z)
>summary(my_ivreg)
Call:
AER::ivreg(formula = Y ~ M + Treat + M:Treat | M + Z + M:Z)
Residuals:
Min 1Q Median 3Q Max
-1.332e-14 -7.105e-15 -3.553e-15 -8.882e-16 3.553e-15
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.000e+00 2.728e-15 1.100e+15 <2e-16 ***
M 1.000e+00 3.810e-15 2.625e+14 <2e-16 ***
Treat 2.000e+00 1.255e-15 1.593e+15 <2e-16 ***
M:Treat 3.000e+00 1.792e-15 1.674e+15 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.633e-15 on 96 degrees of freedom
Multiple R-Squared: 1, Adjusted R-squared: 1
Wald test: 1.794e+31 on 3 and 96 DF, p-value: < 2.2e-16
Which is what we were looking for...
My regression is as follow :
model <- lm(y ~ a:b + a + b + c)
And I want to test whereas the coefficients of my interaction so "a:b" and my variable "a" are equal to 0, or if at least one is different from 0.
I know that I need to use linearHypothesis.
But I only managed to test if at least one of the coefficients of my interaction is different from 0.
linearHypothesis(model,matchCoefs(model,":"))
Do you know how to enter into the linearHypothesis my variable "a" ?
Thanks for your help.
You can pass the names of the variables you want to test if ar both equal 0.
Dummy data:
a = rnorm(100)
b = rnorm(100)
c = rnorm(100)
y = 4 + 1*a + 3*b + 0.5*c + 2*a*b + rnorm(100)
mod = lm(y ~ a:b + a + b + c)
>car::linearHypothesis(mod, c("a","a:b"))
Linear hypothesis test
Hypothesis:
a = 0
a:b = 0
Model 1: restricted model
Model 2: y ~ a:b + a + b + c
Res.Df RSS Df Sum of Sq F Pr(>F)
1 97 657.20
2 95 116.31 2 540.88 220.89 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If there was only one of a or a*b in the model, the null would also be rejected, only if none of them were present. To see that try setting y = 4 + 3*b + 0.5*c + rnorm(100) and y = 4 + 3*b + 0.5*c + a + rnorm(100). But the test isn't perfect, if there is too much noise on y, even if a and a:b were in the model we would accept the null (try y = 4 + 1*a + 3*b + 0.5*c + 2*a*b + rnorm(100, sd=1000))
I am doing some count data analysis. The data is in this link:
[1]: https://www.dropbox.com/s/q7fwqicw3ebvwlg/stackquestion.csv?dl=0
Column A is the count data, and other columns are the independent variables. At first I used Poisson regression to analyze it:
m0<-glm(A~.,data=d,family="poisson")
summary(m0)
#We see that the residual deviance is greater than the degrees of freedom so that we have over-dispersion.
Call:
glm(formula = A ~ ., family = "poisson", data = d)
Deviance Residuals:
Min 1Q Median 3Q Max
-28.8979 -4.5110 0.0384 5.4327 20.3809
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.7054842 0.9100882 9.566 < 2e-16 ***
B -0.1173783 0.0172330 -6.811 9.68e-12 ***
C 0.0864118 0.0182549 4.734 2.21e-06 ***
D 0.1169891 0.0301960 3.874 0.000107 ***
E 0.0738377 0.0098131 7.524 5.30e-14 ***
F 0.3814588 0.0093793 40.670 < 2e-16 ***
G -0.3712263 0.0274347 -13.531 < 2e-16 ***
H -0.0694672 0.0022137 -31.380 < 2e-16 ***
I -0.0634488 0.0034316 -18.490 < 2e-16 ***
J -0.0098852 0.0064538 -1.532 0.125602
K -0.1105270 0.0128016 -8.634 < 2e-16 ***
L -0.3304606 0.0155454 -21.258 < 2e-16 ***
M 0.2274175 0.0259872 8.751 < 2e-16 ***
N 0.2922063 0.0174406 16.754 < 2e-16 ***
O 0.1179708 0.0119332 9.886 < 2e-16 ***
P 0.0618776 0.0260646 2.374 0.017596 *
Q -0.0303909 0.0060060 -5.060 4.19e-07 ***
R -0.0018939 0.0037642 -0.503 0.614864
S 0.0383040 0.0065841 5.818 5.97e-09 ***
T 0.0318111 0.0116611 2.728 0.006373 **
U 0.2421129 0.0145502 16.640 < 2e-16 ***
V 0.1782144 0.0090858 19.615 < 2e-16 ***
W -0.5105135 0.0258136 -19.777 < 2e-16 ***
X -0.0583590 0.0043641 -13.373 < 2e-16 ***
Y -0.1554609 0.0042604 -36.489 < 2e-16 ***
Z 0.0064478 0.0001184 54.459 < 2e-16 ***
AA 0.3880479 0.0164929 23.528 < 2e-16 ***
AB 0.1511362 0.0050471 29.945 < 2e-16 ***
AC 0.0557880 0.0181129 3.080 0.002070 **
AD -0.6569099 0.0368771 -17.813 < 2e-16 ***
AE -0.0040679 0.0003960 -10.273 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 97109.0 on 56 degrees of freedom
Residual deviance: 5649.7 on 26 degrees of freedom
AIC: 6117.1
Number of Fisher Scoring iterations: 6
Then I think I should use negative binomial regression for the over-dispersion data. Since you can see I have many independent variables, and I wanted to select the important variables. And I decide to use stepwise regression to select the independent variable. At first, I create a full model:
full.model <- glm.nb(A~., data=d,maxit=1000)
# when not indicating maxit, or maxit=100, it shows Warning messages: 1: glm.fit: algorithm did not converge; 2: In glm.nb(A ~ ., data = d, maxit = 100) : alternation limit reached
# When indicating maxit=1000, the warning message disappear.
summary(full.model)
Call:
glm.nb(formula = A ~ ., data = d, maxit = 1000, init.theta = 2.730327193,
link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.5816 -0.8893 -0.3177 0.4882 1.9073
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 11.8228596 8.3004322 1.424 0.15434
B -0.2592324 0.1732782 -1.496 0.13464
C 0.2890696 0.1928685 1.499 0.13393
D 0.3136262 0.3331182 0.941 0.34646
E 0.3764257 0.1313142 2.867 0.00415 **
F 0.3257785 0.1448082 2.250 0.02447 *
G -0.7585881 0.2343529 -3.237 0.00121 **
H -0.0714660 0.0343683 -2.079 0.03758 *
I -0.1050681 0.0357237 -2.941 0.00327 **
J 0.0810292 0.0566905 1.429 0.15291
K 0.2582978 0.1574582 1.640 0.10092
L -0.2009784 0.1543773 -1.302 0.19296
M -0.2359658 0.3216941 -0.734 0.46325
N -0.0689036 0.1910518 -0.361 0.71836
O 0.0514983 0.1383610 0.372 0.70974
P 0.1843138 0.3253483 0.567 0.57105
Q 0.0198326 0.0509651 0.389 0.69717
R 0.0892239 0.0459729 1.941 0.05228 .
S -0.0430981 0.0856391 -0.503 0.61479
T 0.2205653 0.1408009 1.567 0.11723
U 0.2450243 0.1838056 1.333 0.18251
V 0.1253683 0.0888411 1.411 0.15820
W -0.4636739 0.2348172 -1.975 0.04831 *
X -0.0623290 0.0508299 -1.226 0.22011
Y -0.0939878 0.0606831 -1.549 0.12142
Z 0.0019530 0.0015143 1.290 0.19716
AA -0.2888123 0.2449085 -1.179 0.23829
AB 0.1185890 0.0696343 1.703 0.08856 .
AC -0.3401963 0.2047698 -1.661 0.09664 .
AD -1.3409002 0.4858741 -2.760 0.00578 **
AE -0.0006299 0.0051338 -0.123 0.90234
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(2.7303) family taken to be 1)
Null deviance: 516.494 on 56 degrees of freedom
Residual deviance: 61.426 on 26 degrees of freedom
AIC: 790.8
Number of Fisher Scoring iterations: 1
Theta: 2.730
Std. Err.: 0.537
2 x log-likelihood: -726.803
When not indicating maxit, or maxit=100, it shows Warning messages: 1: glm.fit: algorithm did not converge; 2: In glm.nb(A ~ ., data = d, maxit = 100) : alternation limit reached.
When indicating maxit=1000, the warning message disappear.
Then I create a first model:
first.model <- glm.nb(A ~ 1, data = d)
Then I tried the forward stepwise regression:
step.model <- step(first.model, direction="forward", scope=formula(full.model))
#Error in glm.fit(X, y, wt, offset = offset, family = object$family, control = object$control) :
#NA/NaN/Inf in 'x'
#In addition: Warning message:
# step size truncated due to divergence
#What is the problem?
It gives me error message: Error in glm.fit(X, y, wt, offset = offset, family = object$family, control = object$control) :
NA/NaN/Inf in 'x'
In addition: Warning message:
step size truncated due to divergence
I also tried the backward regression:
step.model2 <- step(full.model,direction="backward")
#the final step
Step: AIC=770.45
A ~ B + C + E + F + G + H + I + K + L + R + T + V + W + Y + AA +
AB + AD
Df Deviance AIC
<none> 62.375 770.45
- AB 1 64.859 770.93
- H 1 65.227 771.30
- V 1 65.240 771.31
- L 1 65.291 771.36
- Y 1 65.831 771.90
- B 1 66.051 772.12
- C 1 67.941 774.01
- AA 1 69.877 775.95
- K 1 70.411 776.48
- W 1 71.526 777.60
- I 1 71.863 777.94
- E 1 72.338 778.41
- G 1 73.344 779.42
- F 1 73.510 779.58
- AD 1 79.620 785.69
- R 1 80.358 786.43
- T 1 95.725 801.80
Warning messages:
1: glm.fit: algorithm did not converge
2: glm.fit: algorithm did not converge
3: glm.fit: algorithm did not converge
4: glm.fit: algorithm did not converge
My question is: Why it is different in using forward and backward stepwise regression? And why do I get the error message when performing forward selection? Also, what exactly do these warning messages mean? And how should I deal with it?
I am not a stats person but need to conduct statical analysis for my research data. So I am struggling in learning how to do different regression analyses using real data. I searched online for similar questions but I still could understand ... And please let me know if I did anything wrong in my regression analysis. I would really appreciate it if you could help me with these questions!
I am using the 'bife' package to run the fixed effect logit model in R. However, I cannot compute any goodness-of-fit to measure the model's overall fit given the result I have below. I would appreciate if I can know how to measure the goodness-of-fit given this limited information. I prefer chi-square test but still cannot find a way to implement this either.
---------------------------------------------------------------
Fixed effects logit model
with analytical bias-correction
Estimated model:
Y ~ X1 +X2 + X3 + X4 + X5 | Z
Log-Likelihood= -9153.165
n= 20383, number of events= 5104
Demeaning converged after 6 iteration(s)
Offset converged after 3 iteration(s)
Corrected structural parameter(s):
Estimate Std. error t-value Pr(> t)
X1 -8.67E-02 2.80E-03 -31.001 < 2e-16 ***
X2 1.79E+00 8.49E-02 21.084 < 2e-16 ***
X3 -1.14E-01 1.91E-02 -5.982 2.24E-09 ***
X4 -2.41E-04 2.37E-05 -10.171 < 2e-16 ***
X5 1.24E-01 3.33E-03 37.37 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
AIC= 18730.33 , BIC= 20409.89
Average individual fixed effects= 1.6716
---------------------------------------------------------------
Let the DGP be
n <- 1000
x <- rnorm(n)
id <- rep(1:2, each = n / 2)
y <- 1 * (rnorm(n) > 0)
so that we will be under the null hypothesis. As it says in ?bife, when there is no bias-correction, everything is the same as with glm, except for the speed. So let's start with glm.
modGLM <- glm(y ~ 1 + x + factor(id), family = binomial())
modGLM0 <- glm(y ~ 1, family = binomial())
One way to perform the LR test is with
library(lmtest)
lrtest(modGLM0, modGLM)
# Likelihood ratio test
#
# Model 1: y ~ 1
# Model 2: y ~ 1 + x + factor(id)
# #Df LogLik Df Chisq Pr(>Chisq)
# 1 1 -692.70
# 2 3 -692.29 2 0.8063 0.6682
But we may also do it manually,
1 - pchisq(c((-2 * logLik(modGLM0)) - (-2 * logLik(modGLM))),
modGLM0$df.residual - modGLM$df.residual)
# [1] 0.6682207
Now let's proceed with bife.
library(bife)
modBife <- bife(y ~ x | id)
modBife0 <- bife(y ~ 1 | id)
Here modBife is the full specification and modBife0 is only with fixed effects. For convenience, let
logLik.bife <- function(object, ...) object$logl_info$loglik
for loglikelihood extraction. Then we may compare modBife0 with modBife as in
1 - pchisq((-2 * logLik(modBife0)) - (-2 * logLik(modBife)), length(modBife$par$beta))
# [1] 1
while modGLM0 and modBife can be compared by running
1 - pchisq(c((-2 * logLik(modGLM0)) - (-2 * logLik(modBife))),
length(modBife$par$beta) + length(unique(id)) - 1)
# [1] 0.6682207
which gives the same result as before, even though with bife we, by default, have bias correction.
Lastly, as a bonus, we may simulate data and see it the test works as it's supposed to. 1000 iterations below show that both test (since two tests are the same) indeed reject as often as they are supposed to under the null.
I estimated a cross-classified logistic mixed model in R, with two types of level-2 subjects: 34 election dates and 408 municipalities. The level 1 observations are aggregated proportions of the voter turnout (dependent) and 2 independent variables. Then there are some characteristics on the election dates and some controls. I estimated cross-level interactions between the main independent variables and the types of elections (macro-level characteristic of election date). Therefore, the effects were estimated as random over election dates. Now I would like to plot the random effects: 34 regression lines in a plot, being of 4 different colors (according to the macro-level: the different types of elections).
Does anyone know how to plot such a thing in R?
UPDATE:
Thanks for the hints! I'm quite new to R. I know how to estimate my models, but apart from that I still have a lot to learn.
This is my final model.
"gemnr" is the municipality level.
"Date" is the election date I was talking about.
"Windchill" and "Rain" are the random effects over "Date", and are interacted with three types of elections "Provincie" "Gemeente" and "Europa". The rest of the variables are controls.
The dependent variable is the real proportion of voters: cbind(opkomst, nnietgestemd), which stands for the number of votes(opkomst) and the number of non-voters (nnietgestemd).
# Model with controls and interactions.
model3b <- glmer (cbind(opkomst, nnietgestemd) ~
(1|gemnr)+(1+Windchill+Rain|Date) + Windchill + Windspeed + Rain + SP + lag_popkomst + Provincie + Gemeente + Europa + NB + OL + loginw
+ Provincie:Windchill + Gemeente:Windchill + Europa:Windchill + Provincie:Rain + Gemeente:Rain + Europa:Rain, family=binomial(link=logit))
And this is the result:
Generalized linear mixed model fit by maximum likelihood ['glmerMod']
Family: binomial ( logit )
Formula: cbind(opkomst, nnietgestemd) ~ (1 | gemnr) + (1 + Windchill + Rain | Date) + Windchill + Windspeed + Rain + SP + lag_popkomst + Provincie + Gemeente + Europa + NB + OL + loginw + Provincie:Windchill + Gemeente:Windchill + Europa:Windchill + Provincie:Rain + Gemeente:Rain + Europa:Rain
AIC BIC logLik deviance
1452503.3 1452691.4 -726226.6 1452453.3
Random effects:
Groups Name Variance Std.Dev. Corr
gemnr (Intercept) 0.0146186 0.12091
Date (Intercept) 0.1902650 0.43619
Windchill 0.0009727 0.03119 -0.07
Rain 0.0103655 0.10181 0.59 -0.10
Number of obs: 13735, groups: gemnr, 408; Date, 34
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.23067667 0.12124970 -1.9 0.057107 .
Windchill -0.00768949 0.00834125 -0.9 0.356600
Windspeed 0.01040831 0.00017884 58.2 < 0.0000000000000002 ***
Rain -0.00157012 0.02908864 -0.1 0.956953
SP 0.00045626 0.00001432 31.9 < 0.0000000000000002 ***
lag_popkomst 2.10911785 0.00386440 545.8 < 0.0000000000000002 ***
Provincie -1.09414033 0.25162607 -4.3 0.00001372100383 ***
Gemeente -0.60849053 0.18353633 -3.3 0.000915 ***
Europa -1.21169484 0.21694178 -5.6 0.00000002332356 ***
NB 0.07397575 0.01053297 7.0 0.00000000000217 ***
OL 0.00288172 0.00821660 0.4 0.725799
loginw -0.10297623 0.00721768 -14.3 < 0.0000000000000002 ***
Windchill:Provincie 0.01743852 0.01769197 1.0 0.324293
Windchill:Gemeente 0.01010439 0.01292002 0.8 0.434172
Windchill:Europa 0.01664707 0.01522839 1.1 0.274323
Rain:Provincie -0.13692131 0.05956872 -2.3 0.021531 *
Rain:Gemeente -0.03330741 0.04340056 -0.8 0.442819
Rain:Europa -0.04864840 0.05142619 -0.9 0.344156
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1