How to get within R squared from plm FE regression? - r

I regress monthly stocks returns on a set of firm characteristics using the plm package.
library(plm)
set.seed(1)
id=rep(1:10,each=10); t=rep(1:10,10); industry=rep(1:2,each=50); return=rnorm(100); x=rnorm(100)
data=data.frame(id,t,industry,return,x)
In a first step, I want to include time fixed effects. The following two formulas give the same coefficients for x but different R-squares. The first model estimates the overall R-squared, while the second model gives the within R-squared.
reg1=plm(return~x+factor(t),model="pooling",index=c("id","t"),data=data)
summary(reg1)$r.squared
reg2=plm(return~x,model="within",index=c("id","t"),data=data,effect="time")
summary(reg2)$r.squared
In a second step, I now want to include both time and industry fixed effects. I obtain coefficients by this formula:
reg3=plm(return~x+factor(t)+factor(industry),model="pooling",index=c("id","t"), data=data)
Unfortunately, I cannot use the "within" model as in reg2 because industry is not one of my index variables. Is there another way to calculate the within R-squared for reg3?

This is not a direct answer to your question, because I am not sure
plm can do this. (It might, but I can’t figure it out.)
However, if you are mainly estimating fixed effects models, then I can
warmly recommend the fixest
package, which is super fast and
offers a convenient formula syntax to specify fixed effects and
interactions. Here’s a simple example:
library(fixest)
library(modelsummary)
dat = read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/plm/EmplUK.csv")
models = list(
feols(wage ~ emp | year, data = dat),
feols(wage ~ emp | firm, data = dat),
feols(wage ~ emp | firm + year, data = dat)
)
modelsummary(models)
Model 1
Model 2
Model 3
emp
-0.039
-0.120
-0.047
(0.003)
(0.064)
(0.042)
Num.Obs.
1031
1031
1031
R2
0.039
0.868
0.896
R2 Adj.
0.030
0.847
0.879
R2 Within
0.012
0.016
0.003
R2 Pseudo
AIC
6474.0
4687.7
4455.6
BIC
6523.4
5384.0
5191.4
Log.Lik.
-3226.988
-2202.833
-2078.818
Std.Errors
by: year
by: firm
by: firm
FE: year
X
X
FE: firm
X
X

To include time and industry effects (next to individual effects), just use a two-ways within model and include any further fixed effects by + factor(eff) in the formula.
For your example, this would be:
reg3 <- plm(return ~ x + factor(industry), model="within", effect = "twoways", index=c("id","t"), data = data)
summary(reg3)
# Twoways effects Within Model
#
# Call:
# plm(formula = return ~ x + factor(industry), data = data, effect = "twoways",
# model = "within", index = c("id", "t"))
#
# Balanced Panel: n = 10, T = 10, N = 100
#
# Residuals:
# Min. 1st Qu. Median 3rd Qu. Max.
# -1.84660 -0.61135 0.06318 0.57474 2.06264
#
# Coefficients:
# Estimate Std. Error t-value Pr(>|t|)
# x 0.050906 0.112408 0.4529 0.6519
#
# Total Sum of Squares: 68.526
# Residual Sum of Squares: 68.35
# R-Squared: 0.0025571
# Adj. R-Squared: -0.23434
# F-statistic: 0.20509 on 1 and 80 DF, p-value: 0.65187
summary(reg3)$r.squared
# rsq adjrsq
# 0.002557064 -0.234335633
However, note that for your toy example data, the variable industry is collinear after the fixed effects transformation and, thus, drops out of the estimation (see ?detect.lindep for an explanation and another example). Check via, e.g.:
detect.lindep(reg3)
# [1] "Suspicious column number(s): 2"
# [1] "Suspicious column name(s): factor(industry)2"
Or via:
alias(reg3)
# Model :
# [1] "return ~ x + factor(industry)"
#
# Complete :
# [,1]
# factor(industry)2 0

Related

including non linearity in fixed effects model in plm

I am trying to build a fixed effects regression with the plm package in R. I am using country level panel data with year and country fixed effects.
My problem concerns 2 explanatory variables. One is an interaction term of two varibels and one is a squared term of one of the variables.
model is basically:
y = x1 + x1^2+ x3 + x1*x3+ ...+xn , with the variables all being in log form
It is central to the model to include the squared term, but when I run the regression it always gets excluded because of "singularities", as x1 and x1^2 are obviously correlated.
Meaning the regression works and I get estimates for my variables, just not for x1^2 and x1*x2.
How do I circumvent this?
library(plm)
fe_reg<- plm(log(y) ~ log(x1)+log(x2)+log(x2^2)+log(x1*x2)+dummy,
data = df,
index = c("country", "year"),
model = "within",
effect = "twoways")
summary(fe_reg)
´´´
#I have tried defining the interaction and squared terms as vectors, which helped with the #interaction term but not the squared term.
df1.pd<- df1 %>% mutate_at(c('x1'), ~(scale(.) %>% as.vector))
df1.pd<- df1 %>% mutate_at(c('x2'), ~(scale(.) %>% as.vector))
´´´
I am pretty new to R, so apologies if this not a very well structured question.
You just found two properties of the logarithm function:
log(x^2) = 2 * log(x)
log(x*y) = log(x) + log(y)
Then, obviously, log(x) is collinear with 2*log(x) and one of the two collinear variables is dropped from the estimation. Same for log(x*y) and log(x) + log(y).
So, the model you want to estimate is not estimable by linear regression methods. You might want to take different data transformations than log into account or the original variables.
See also the reproducible example below wher I just used log(x^2) = 2*log(x). Linear dependence can be detected, e.g., via function detect.lindep from package plm (see also below). Dropping of coefficients from estimation also hints at collinear columns in the model estimation matrix. At times, linear dependence appears only after data transformations invovled in the estimation functions, see for an example of the within transformation the help page ?detect.lindep in the Example section).
library(plm)
data("Grunfeld")
pGrun <- pdata.frame(Grunfeld)
pGrun$lvalue <- log(pGrun$value) # log(x)
pGrun$lvalue2 <- log(pGrun$value^2) # log(x^2) == 2 * log(x)
mod <- plm(inv ~ lvalue + lvalue2 + capital, data = pGrun, model = "within")
summary(mod)
#> Oneway (individual) effect Within Model
#>
#> Call:
#> plm(formula = inv ~ lvalue + lvalue2 + capital, data = pGrun,
#> model = "within")
#>
#> Balanced Panel: n = 10, T = 20, N = 200
#>
#> Residuals:
#> Min. 1st Qu. Median 3rd Qu. Max.
#> -186.62916 -20.56311 -0.17669 20.66673 300.87714
#>
#> Coefficients: (1 dropped because of singularities)
#> Estimate Std. Error t-value Pr(>|t|)
#> lvalue 30.979345 17.592730 1.7609 0.07988 .
#> capital 0.360764 0.020078 17.9678 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Total Sum of Squares: 2244400
#> Residual Sum of Squares: 751290
#> R-Squared: 0.66525
#> Adj. R-Squared: 0.64567
#> F-statistic: 186.81 on 2 and 188 DF, p-value: < 2.22e-16
detect.lindep(mod) # run on the model
#> [1] "Suspicious column number(s): 1, 2"
#> [1] "Suspicious column name(s): lvalue, lvalue2"
detect.lindep(pGrun) # run on the data
#> [1] "Suspicious column number(s): 6, 7"
#> [1] "Suspicious column name(s): lvalue, lvalue2"

R -plm - error within and random effects models (pooling, between & first differences work)

I have problem with Within and random effect method (it doesn't work). And I have no problem with pooling, between or first diffeences estimator -> it works.
I have the same problem like R - Error in class(x) - plm - only within and random effects models.
Here is the link to my data: https://www.dropbox.com/s/8tgeyhxeb0wrdri/my_data.xlsx?raw=1 (there are some financial measures and GDP growth for some countries)
My code:
proba<-read_excel("my_data.xlsx")
attach(proba)
Y<-cbind(GDP_growth)
X<-cbind(gfdddi01, gfdddi02, gfdddi04, gfdddi05)
pdata<-pdata.frame(proba,index=c("id","year"))
##POOLED OLS estimator
pooling<-plm(Y~X,data=pdata,model="pooling")
summary(pooling)
##BETWEEN ESTIMATOR
between<-plm(Y~X,data=pdata,model="between")
summary(between)
#FIRST DIFFERENCES ESTIMATOR
firstdiff<-plm(Y~X,data=pdata,model="fd")
summary(firstdiff)
#FIXED EFFECT OR WITHIN ESTIMATOR
fixed <-plm(Y~X,data=pdata,model="within")
summary(fixed)
#RANDOM EFFECTS ESTIMATOR
random<- plm(Y~X,data=pdata,model="random")
summary(random)
The error message I get:
Error in class(x) <- setdiff(class(x), "pseries") : invalid to set the class to matrix unless the dimension attribute is of length 2 (was 0)
What can be wrong?
Do not use variables from the environment (like you have done with Y and X - no need to create those). Rather, use in the formula argument of plm the variable names as they occur in your data pdata:
#FIXED EFFECT OR WITHIN ESTIMATOR
fixed <-plm(GDP_growth ~ gfdddi01 + gfdddi02 + gfdddi04 + gfdddi05, data = pdata, model ="within")
summary(fixed)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = GDP_growth ~ gfdddi01 + gfdddi02 + gfdddi04 + gfdddi05,
## data = pdata, model = "within")
##
## Balanced Panel: n = 17, T = 41, N = 697
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -18.89148 -1.17470 0.12701 1.48874 20.70109
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## gfdddi01 -0.0066663 0.0153800 -0.4334 0.6648
## gfdddi02 0.0051626 0.0153343 0.3367 0.7365
## gfdddi04 -0.0245573 0.0150069 -1.6364 0.1022
## gfdddi05 -0.0049627 0.0073786 -0.6726 0.5014
##
## Total Sum of Squares: 5421.5
## Residual Sum of Squares: 5366.8
## R-Squared: 0.010095
## Adj. R-Squared: -0.019192
## F-statistic: 1.72352 on 4 and 676 DF, p-value: 0.14296

Forecast dependent values with mvrnorm and include temporal autocorrelation

I have matrices with values (weight, maturity, etc.) by time step and age class and I would like to forecast future values indeterministicly. Age classes are not independent so I've been using mvrnorm to deal with that. How do I also get (lag 1) temporal autocorrelation in my predictions?
Here is what I would like to do in R:
library(MASS)
# dummy matrix: rows are time steps columns are dependent classes (ages)
x <- matrix(rnorm(20),4,5,dimnames = list(years=c('year1','year2','year3','year4'),ages=c('age1','age2','age3','age4','age5')))
# what I have so far to get next year's values (the goal would be to predict several years)
sigma <- cov(x) #covariance matrix
delta <- mvrnorm(1,rep(0,ncol(x)),cov(x)) # deviations
xl <- tail(x,1) #last year values
xp <- xl+delta #new values
# There is no temporal autocorrelation in here of course
xnew <- rbind(x,xp)
matplot(xnew,type='l')
# So I would need new values based on something like this:
rho <- apply(x,2,function(x) acf(x)$acf[2,1,1])
delta <- mvrnorm(1,xl,cov(x))
xp <- rho*xl+(1-rho)*delta
The last part doesn't feel right though.
The first part of this answer is how to account for Temporal Autocorrelation in the original question. The 2nd part adds an answer about the multivariate case per the revised question.
Part 1:
library(MASS)
# dummy matrix: rows are time steps columns are dependent classes (ages)
x <- matrix(rnorm(20),4,5)
# what I have so far to get next year's values (the goal would be to predict several years)
sigma <- cov(x)
delta <- mvrnorm(1,rep(0,ncol(x)),cov(x))
xl <- tail(x,1)
xp <- xl+delta #new values
# There is no temporal autocorrelation in here of course
xnew <- rbind(x,xp)
matplot(xnew,type='l')
# Clean up / construct your data set
dat <- as.data.frame(x)
dat$year <- c(2014,2015,2016,2017)
dat <- rbind(dat, c(xp, 2018))
colnames(dat) <- c("maturity", "age", "height", "sales", "year")
# Account for Temporal Autocorrelation
library(nlme)
mdl.ac <- gls(sales ~ year, data=dat,
correlation = corAR1(form=~year),
na.action=na.omit)
summary(mdl.ac)
Generalized least squares fit by REML
Model: sales ~ year
Data: dat
AIC BIC logLik
14.01155 10.406 -3.005773
Correlation Structure: ARMA(1,0)
Formula: ~year
Parameter estimate(s):
Phi1
0.1186508
Coefficients:
Value Std.Error t-value p-value
(Intercept) 1.178018 0.5130766 2.2959883 0.1054
year 0.012666 0.3537748 0.0358023 0.9737
Correlation:
(Intr)
year 0.646
Standardized residuals:
1 2 3 4 5
0.3932124 -0.4053291 -1.8081473 0.0699103 0.8821300
attr(,"std")
[1] 0.5251018 0.5251018 0.5251018 0.5251018 0.5251018
attr(,"label")
[1] "Standardized residuals"
Residual standard error: 0.5251018
Degrees of freedom: 5 total; 3 residual
Part 2:
# Account for Temporal Autocorrelation
library(nlme)
mdl.ac <- gls(year ~ height + sales + I(maturity*age), data=dat,
correlation = corAR1(form=~year),
na.action=na.omit)
summary(mdl.ac)
Generalized least squares fit by REML
Model: year ~ height + sales + I(maturity * age)
Data: dat
AIC BIC logLik
15.42011 3.420114 -1.710057
Correlation Structure: ARMA(1,0)
Formula: ~year
Parameter estimate(s):
Phi1
0
Coefficients:
Value Std.Error t-value p-value
(Intercept) 0.2100381 0.4532345 0.4634203 0.7237
height -0.7602539 0.7758925 -0.9798444 0.5065
sales -0.1840694 0.8327382 -0.2210411 0.8615
I(maturity * age) 0.0449278 0.1839260 0.2442712 0.8475
Correlation:
(Intr) height sales
height -0.423
sales 0.214 -0.825
I(maturity * age) 0.349 -0.941 0.889
Standardized residuals:
1 2 3 4 5
-7.004956e-17 -4.985525e-01 -1.319137e+00 -1.568271e+00 -1.441708e+00
attr(,"std")
[1] 0.3962277 0.3962277 0.3962277 0.3962277 0.3962277
attr(,"label")
[1] "Standardized residuals"
Residual standard error: 0.3962277
Degrees of freedom: 5 total; 1 residual
Please also see CARBayesST and its vignette for an alternate approach:
https://cran.r-project.org/web/packages/CARBayesST/vignettes/CARBayesST.pdf

Extracting p-values for fixed effects from nlme/lme4 output

I am trying to extract individual elements (p-values specifically) from the fixed effects table contained within the object created by the summary call of a mixed-effects model.
Toy data:
set.seed(1234)
score <- c(rnorm(8, 20, 3), rnorm(8, 35, 5))
rep <- rep(c(0,1,2,3), each = 8)
group <- rep(0:1, times = 16)
id <- factor(rep(1:8, times = 4))
df <- data.frame(id, group, rep, score)
Now create a model
require(nlme)
modelLME <- summary(lme(score ~ group*rep, data = df, random = ~ rep|id))
modelLME
When we call it we get the output
Linear mixed-effects model fit by REML
Data: df
AIC BIC logLik
219.6569 230.3146 -101.8285
Random effects:
Formula: ~rep | id
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 2.664083e-04 (Intr)
rep 2.484345e-05 0
Residual 7.476621e+00
Fixed effects: score ~ group * rep
Value Std.Error DF t-value p-value
(Intercept) 22.624455 3.127695 22 7.233587 0.0000
group -1.373324 4.423229 6 -0.310480 0.7667
rep 2.825635 1.671823 22 1.690152 0.1051
group:rep 0.007129 2.364315 22 0.003015 0.9976
Correlation:
(Intr) group rep
group -0.707
rep -0.802 0.567
group:rep 0.567 -0.802 -0.707
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.86631781 -0.74498367 0.03515508 0.76672652 1.91896578
Number of Observations: 32
Number of Groups: 8
Now I can extract the parameter estimates for the fixed effects via
fixef(modelLME)
but how do I extract the p-values?
To extract the entire random effects table we would call
VarCorr(modelLME)
and then extract individual elements within that table via the subsetting function [,]. But I don't know what the equivalent function to VarCorr() is for the fixed effects.
You can extract the p-values with:
modelLME$tTable[,5]
(Intercept) group rep group:rep
0.0000003012047 0.7666983225269 0.1051210824864 0.9976213300628
Generally, looking at str(modelLME) helps to find the different components.

Extracting elements from output in of mixed-effects model using nlme [duplicate]

This question already has an answer here:
Extracting Random effects from nlme summary
(1 answer)
Closed 6 years ago.
I am trying to extract individual elements from the random effects table contained within the object created by the summary call of a mixed-effects model. Specifically I want to extract each of the level-2 random effects.
Toy data:
set.seed(1234)
score <- c(rnorm(8, 20, 3), rnorm(8, 35, 5))
rep <- rep(c(0,1,2,3), each = 8)
group <- rep(0:1, times = 16)
id <- factor(rep(1:8, times = 4))
df <- data.frame(id, group, rep, score)
Now create a model
require(nlme)
modelLME <- summary(lme(score ~ group*rep, data = df, random = ~ rep|id))
modelLME
When we call it we get the output
Linear mixed-effects model fit by REML
Data: df
AIC BIC logLik
219.6569 230.3146 -101.8285
Random effects:
Formula: ~rep | id
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 2.664083e-04 (Intr)
rep 2.484345e-05 0
Residual 7.476621e+00
Fixed effects: score ~ group * rep
Value Std.Error DF t-value p-value
(Intercept) 22.624455 3.127695 22 7.233587 0.0000
group -1.373324 4.423229 6 -0.310480 0.7667
rep 2.825635 1.671823 22 1.690152 0.1051
group:rep 0.007129 2.364315 22 0.003015 0.9976
Correlation:
(Intr) group rep
group -0.707
rep -0.802 0.567
group:rep 0.567 -0.802 -0.707
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.86631781 -0.74498367 0.03515508 0.76672652 1.91896578
Number of Observations: 32
Number of Groups: 8
Now I can extract the residuals from the random effects table above via
modelLME$sigma
But I can't find the values in the (Intercept) and rep rows of the StdDev column of the random effects table in this output (2.664083e-04 and 2.484345e-05 respectively) It must be there somewhere and I looked via searching str(modelLME) but I can't find it.
Do you want something like this?
library(nlme)
library(broom)
modelLME = lme(score ~ group*rep, data = df, random = ~ rep|id)
tidy(modelLME)

Resources