Extracting p-values for fixed effects from nlme/lme4 output - r

I am trying to extract individual elements (p-values specifically) from the fixed effects table contained within the object created by the summary call of a mixed-effects model.
Toy data:
set.seed(1234)
score <- c(rnorm(8, 20, 3), rnorm(8, 35, 5))
rep <- rep(c(0,1,2,3), each = 8)
group <- rep(0:1, times = 16)
id <- factor(rep(1:8, times = 4))
df <- data.frame(id, group, rep, score)
Now create a model
require(nlme)
modelLME <- summary(lme(score ~ group*rep, data = df, random = ~ rep|id))
modelLME
When we call it we get the output
Linear mixed-effects model fit by REML
Data: df
AIC BIC logLik
219.6569 230.3146 -101.8285
Random effects:
Formula: ~rep | id
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 2.664083e-04 (Intr)
rep 2.484345e-05 0
Residual 7.476621e+00
Fixed effects: score ~ group * rep
Value Std.Error DF t-value p-value
(Intercept) 22.624455 3.127695 22 7.233587 0.0000
group -1.373324 4.423229 6 -0.310480 0.7667
rep 2.825635 1.671823 22 1.690152 0.1051
group:rep 0.007129 2.364315 22 0.003015 0.9976
Correlation:
(Intr) group rep
group -0.707
rep -0.802 0.567
group:rep 0.567 -0.802 -0.707
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.86631781 -0.74498367 0.03515508 0.76672652 1.91896578
Number of Observations: 32
Number of Groups: 8
Now I can extract the parameter estimates for the fixed effects via
fixef(modelLME)
but how do I extract the p-values?
To extract the entire random effects table we would call
VarCorr(modelLME)
and then extract individual elements within that table via the subsetting function [,]. But I don't know what the equivalent function to VarCorr() is for the fixed effects.

You can extract the p-values with:
modelLME$tTable[,5]
(Intercept) group rep group:rep
0.0000003012047 0.7666983225269 0.1051210824864 0.9976213300628
Generally, looking at str(modelLME) helps to find the different components.

Related

How to get within R squared from plm FE regression?

I regress monthly stocks returns on a set of firm characteristics using the plm package.
library(plm)
set.seed(1)
id=rep(1:10,each=10); t=rep(1:10,10); industry=rep(1:2,each=50); return=rnorm(100); x=rnorm(100)
data=data.frame(id,t,industry,return,x)
In a first step, I want to include time fixed effects. The following two formulas give the same coefficients for x but different R-squares. The first model estimates the overall R-squared, while the second model gives the within R-squared.
reg1=plm(return~x+factor(t),model="pooling",index=c("id","t"),data=data)
summary(reg1)$r.squared
reg2=plm(return~x,model="within",index=c("id","t"),data=data,effect="time")
summary(reg2)$r.squared
In a second step, I now want to include both time and industry fixed effects. I obtain coefficients by this formula:
reg3=plm(return~x+factor(t)+factor(industry),model="pooling",index=c("id","t"), data=data)
Unfortunately, I cannot use the "within" model as in reg2 because industry is not one of my index variables. Is there another way to calculate the within R-squared for reg3?
This is not a direct answer to your question, because I am not sure
plm can do this. (It might, but I can’t figure it out.)
However, if you are mainly estimating fixed effects models, then I can
warmly recommend the fixest
package, which is super fast and
offers a convenient formula syntax to specify fixed effects and
interactions. Here’s a simple example:
library(fixest)
library(modelsummary)
dat = read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/plm/EmplUK.csv")
models = list(
feols(wage ~ emp | year, data = dat),
feols(wage ~ emp | firm, data = dat),
feols(wage ~ emp | firm + year, data = dat)
)
modelsummary(models)
Model 1
Model 2
Model 3
emp
-0.039
-0.120
-0.047
(0.003)
(0.064)
(0.042)
Num.Obs.
1031
1031
1031
R2
0.039
0.868
0.896
R2 Adj.
0.030
0.847
0.879
R2 Within
0.012
0.016
0.003
R2 Pseudo
AIC
6474.0
4687.7
4455.6
BIC
6523.4
5384.0
5191.4
Log.Lik.
-3226.988
-2202.833
-2078.818
Std.Errors
by: year
by: firm
by: firm
FE: year
X
X
FE: firm
X
X
To include time and industry effects (next to individual effects), just use a two-ways within model and include any further fixed effects by + factor(eff) in the formula.
For your example, this would be:
reg3 <- plm(return ~ x + factor(industry), model="within", effect = "twoways", index=c("id","t"), data = data)
summary(reg3)
# Twoways effects Within Model
#
# Call:
# plm(formula = return ~ x + factor(industry), data = data, effect = "twoways",
# model = "within", index = c("id", "t"))
#
# Balanced Panel: n = 10, T = 10, N = 100
#
# Residuals:
# Min. 1st Qu. Median 3rd Qu. Max.
# -1.84660 -0.61135 0.06318 0.57474 2.06264
#
# Coefficients:
# Estimate Std. Error t-value Pr(>|t|)
# x 0.050906 0.112408 0.4529 0.6519
#
# Total Sum of Squares: 68.526
# Residual Sum of Squares: 68.35
# R-Squared: 0.0025571
# Adj. R-Squared: -0.23434
# F-statistic: 0.20509 on 1 and 80 DF, p-value: 0.65187
summary(reg3)$r.squared
# rsq adjrsq
# 0.002557064 -0.234335633
However, note that for your toy example data, the variable industry is collinear after the fixed effects transformation and, thus, drops out of the estimation (see ?detect.lindep for an explanation and another example). Check via, e.g.:
detect.lindep(reg3)
# [1] "Suspicious column number(s): 2"
# [1] "Suspicious column name(s): factor(industry)2"
Or via:
alias(reg3)
# Model :
# [1] "return ~ x + factor(industry)"
#
# Complete :
# [,1]
# factor(industry)2 0

Forecast dependent values with mvrnorm and include temporal autocorrelation

I have matrices with values (weight, maturity, etc.) by time step and age class and I would like to forecast future values indeterministicly. Age classes are not independent so I've been using mvrnorm to deal with that. How do I also get (lag 1) temporal autocorrelation in my predictions?
Here is what I would like to do in R:
library(MASS)
# dummy matrix: rows are time steps columns are dependent classes (ages)
x <- matrix(rnorm(20),4,5,dimnames = list(years=c('year1','year2','year3','year4'),ages=c('age1','age2','age3','age4','age5')))
# what I have so far to get next year's values (the goal would be to predict several years)
sigma <- cov(x) #covariance matrix
delta <- mvrnorm(1,rep(0,ncol(x)),cov(x)) # deviations
xl <- tail(x,1) #last year values
xp <- xl+delta #new values
# There is no temporal autocorrelation in here of course
xnew <- rbind(x,xp)
matplot(xnew,type='l')
# So I would need new values based on something like this:
rho <- apply(x,2,function(x) acf(x)$acf[2,1,1])
delta <- mvrnorm(1,xl,cov(x))
xp <- rho*xl+(1-rho)*delta
The last part doesn't feel right though.
The first part of this answer is how to account for Temporal Autocorrelation in the original question. The 2nd part adds an answer about the multivariate case per the revised question.
Part 1:
library(MASS)
# dummy matrix: rows are time steps columns are dependent classes (ages)
x <- matrix(rnorm(20),4,5)
# what I have so far to get next year's values (the goal would be to predict several years)
sigma <- cov(x)
delta <- mvrnorm(1,rep(0,ncol(x)),cov(x))
xl <- tail(x,1)
xp <- xl+delta #new values
# There is no temporal autocorrelation in here of course
xnew <- rbind(x,xp)
matplot(xnew,type='l')
# Clean up / construct your data set
dat <- as.data.frame(x)
dat$year <- c(2014,2015,2016,2017)
dat <- rbind(dat, c(xp, 2018))
colnames(dat) <- c("maturity", "age", "height", "sales", "year")
# Account for Temporal Autocorrelation
library(nlme)
mdl.ac <- gls(sales ~ year, data=dat,
correlation = corAR1(form=~year),
na.action=na.omit)
summary(mdl.ac)
Generalized least squares fit by REML
Model: sales ~ year
Data: dat
AIC BIC logLik
14.01155 10.406 -3.005773
Correlation Structure: ARMA(1,0)
Formula: ~year
Parameter estimate(s):
Phi1
0.1186508
Coefficients:
Value Std.Error t-value p-value
(Intercept) 1.178018 0.5130766 2.2959883 0.1054
year 0.012666 0.3537748 0.0358023 0.9737
Correlation:
(Intr)
year 0.646
Standardized residuals:
1 2 3 4 5
0.3932124 -0.4053291 -1.8081473 0.0699103 0.8821300
attr(,"std")
[1] 0.5251018 0.5251018 0.5251018 0.5251018 0.5251018
attr(,"label")
[1] "Standardized residuals"
Residual standard error: 0.5251018
Degrees of freedom: 5 total; 3 residual
Part 2:
# Account for Temporal Autocorrelation
library(nlme)
mdl.ac <- gls(year ~ height + sales + I(maturity*age), data=dat,
correlation = corAR1(form=~year),
na.action=na.omit)
summary(mdl.ac)
Generalized least squares fit by REML
Model: year ~ height + sales + I(maturity * age)
Data: dat
AIC BIC logLik
15.42011 3.420114 -1.710057
Correlation Structure: ARMA(1,0)
Formula: ~year
Parameter estimate(s):
Phi1
0
Coefficients:
Value Std.Error t-value p-value
(Intercept) 0.2100381 0.4532345 0.4634203 0.7237
height -0.7602539 0.7758925 -0.9798444 0.5065
sales -0.1840694 0.8327382 -0.2210411 0.8615
I(maturity * age) 0.0449278 0.1839260 0.2442712 0.8475
Correlation:
(Intr) height sales
height -0.423
sales 0.214 -0.825
I(maturity * age) 0.349 -0.941 0.889
Standardized residuals:
1 2 3 4 5
-7.004956e-17 -4.985525e-01 -1.319137e+00 -1.568271e+00 -1.441708e+00
attr(,"std")
[1] 0.3962277 0.3962277 0.3962277 0.3962277 0.3962277
attr(,"label")
[1] "Standardized residuals"
Residual standard error: 0.3962277
Degrees of freedom: 5 total; 1 residual
Please also see CARBayesST and its vignette for an alternate approach:
https://cran.r-project.org/web/packages/CARBayesST/vignettes/CARBayesST.pdf

Generalized least squares results interpretation

I checked my linear regression model (WMAN = Species, WDNE = sea surface temp) and found auto-correlation so instead, I am trying generalized least squares with the following script;
library(nlme)
modelwa <- gls(WMAN ~WDNE, data=dat,
correlation = corAR1(form=~MONTH),
na.action=na.omit)
summary(modelwa)
I compared both models;
> library(MuMIn)
> model.sel(modelw,modelwa)
Model selection table
(Intrc) WDNE class na.action correlation df logLik AICc delta
modelwa 31.50 0.1874 gls na.omit crAR1(MONTH) 4 -610.461 1229.2 0.00
modelw 11.31 0.7974 lm na.excl 3 -658.741 1323.7 94.44
weight
modelwa 1
modelw 0
Abbreviations:
na.action: na.excl = ‘na.exclude’
correlation: crAR1(MONTH) = ‘corAR1(~MONTH)’
Models ranked by AICc(x)
I believe the results suggest I should use gls as the AIC is lower.
My problem is, I have been reporting F-value/R²/p-value, but the output from the gls does not have these?
I would be very grateful if someone could assist me in interpreting these results?
> summary(modelwa)
Generalized least squares fit by REML
Model: WMAN ~ WDNE
Data: mp2017.dat
AIC BIC logLik
1228.923 1240.661 -610.4614
Correlation Structure: ARMA(1,0)
Formula: ~MONTH
Parameter estimate(s):
Phi1
0.4809973
Coefficients:
Value Std.Error t-value p-value
(Intercept) 31.496911 8.052339 3.911524 0.0001
WDNE 0.187419 0.091495 2.048401 0.0424
Correlation:
(Intr)
WDNE -0.339
Standardized residuals:
Min Q1 Med Q3 Max
-2.023362 -1.606329 -1.210127 1.427247 3.567186
Residual standard error: 18.85341
Degrees of freedom: 141 total; 139 residual
>
I have now overcome the problem of auto-correlation so I can use lm()
Add lag1 of residual as an X variable to the original model. This can be done using the slide function in DataCombine package.
library(DataCombine)
econ_data <- data.frame(economics, resid_mod1=lmMod$residuals)
econ_data_1 <- slide(econ_data, Var="resid_mod1",
NewVar = "lag1", slideBy = -1)
econ_data_2 <- na.omit(econ_data_1)
lmMod2 <- lm(pce ~ pop + lag1, data=econ_data_2)
This script can be found here

Extracting elements from output in of mixed-effects model using nlme [duplicate]

This question already has an answer here:
Extracting Random effects from nlme summary
(1 answer)
Closed 6 years ago.
I am trying to extract individual elements from the random effects table contained within the object created by the summary call of a mixed-effects model. Specifically I want to extract each of the level-2 random effects.
Toy data:
set.seed(1234)
score <- c(rnorm(8, 20, 3), rnorm(8, 35, 5))
rep <- rep(c(0,1,2,3), each = 8)
group <- rep(0:1, times = 16)
id <- factor(rep(1:8, times = 4))
df <- data.frame(id, group, rep, score)
Now create a model
require(nlme)
modelLME <- summary(lme(score ~ group*rep, data = df, random = ~ rep|id))
modelLME
When we call it we get the output
Linear mixed-effects model fit by REML
Data: df
AIC BIC logLik
219.6569 230.3146 -101.8285
Random effects:
Formula: ~rep | id
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 2.664083e-04 (Intr)
rep 2.484345e-05 0
Residual 7.476621e+00
Fixed effects: score ~ group * rep
Value Std.Error DF t-value p-value
(Intercept) 22.624455 3.127695 22 7.233587 0.0000
group -1.373324 4.423229 6 -0.310480 0.7667
rep 2.825635 1.671823 22 1.690152 0.1051
group:rep 0.007129 2.364315 22 0.003015 0.9976
Correlation:
(Intr) group rep
group -0.707
rep -0.802 0.567
group:rep 0.567 -0.802 -0.707
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.86631781 -0.74498367 0.03515508 0.76672652 1.91896578
Number of Observations: 32
Number of Groups: 8
Now I can extract the residuals from the random effects table above via
modelLME$sigma
But I can't find the values in the (Intercept) and rep rows of the StdDev column of the random effects table in this output (2.664083e-04 and 2.484345e-05 respectively) It must be there somewhere and I looked via searching str(modelLME) but I can't find it.
Do you want something like this?
library(nlme)
library(broom)
modelLME = lme(score ~ group*rep, data = df, random = ~ rep|id)
tidy(modelLME)

Interpreting the output of summary(glmer(...)) in R

I'm an R noob, I hope you can help me:
I'm trying to analyse a dataset in R, but I'm not sure how to interpret the output of summary(glmer(...)) and the documentation isn't a big help:
> data_chosen_stim<-glmer(open_chosen_stim~closed_chosen_stim+day+(1|ID),family=binomial,data=chosenMovement)
> summary(data_chosen_stim)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: open_chosen_stim ~ closed_chosen_stim + day + (1 | ID)
Data: chosenMovement
AIC BIC logLik deviance df.resid
96.7 105.5 -44.4 88.7 62
Scaled residuals:
Min 1Q Median 3Q Max
-1.4062 -1.0749 0.7111 0.8787 1.0223
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 0 0
Number of obs: 66, groups: ID, 35
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4511 0.8715 0.518 0.605
closed_chosen_stim2 0.4783 0.5047 0.948 0.343
day -0.2476 0.5060 -0.489 0.625
Correlation of Fixed Effects:
(Intr) cls__2
clsd_chsn_2 -0.347
day -0.916 0.077
I understand the GLM behind it, but I can't see the weights of the independent variables and their error bounds.
update: weights.merMod already has a type argument ...
I think what you're looking for weights(object,type="working").
I believe these are the diagonal elements of W in your notation?
Here's a trivial example that matches up the results of glm and glmer (since the random effect is bogus and gets an estimated variance of zero, the fixed effects, weights, etc etc converges to the same value).
Note that the weights() accessor returns the prior weights by default (these are all equal to 1 for the example below).
Example (from ?glm):
d.AD <- data.frame(treatment=gl(3,3),
outcome=gl(3,1,9),
counts=c(18,17,15,20,10,20,25,13,12))
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson(),
data=d.AD)
library(lme4)
d.AD$f <- 1 ## dummy grouping variable
glmer.D93 <- glmer(counts ~ outcome + treatment + (1|f),
family = poisson(),
data=d.AD,
control=glmerControl(check.nlev.gtr.1="ignore"))
Fixed effects and weights are the same:
all.equal(fixef(glmer.D93),coef(glm.D93)) ## TRUE
all.equal(unname(weights(glm.D93,type="working")),
weights(glmer.D93,type="working"),
tol=1e-7) ## TRUE

Resources