Adding robust standard errors to original panel model in R - r

I use a within plm model:
model <- plm(Y ~ x1 + x2 + x3, data=dataset, model="within", effect="twoways")
I detected heteroskedasticity and calculated robust standard errors with the vcovHC function from the plm package:
coeftest(model, vcov = vcovHC(model, method = "arellano"))
But unfortunately, I don't know how to "add" these robust standard errors to my original model. I do get the results with the vcovHC function:
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
x1 0.04589038 0.02465875 1.8610 0.06317 **
x2 -0.00065238 0.00027054 1.4114 0.01615 *
x3 -0.00087420 0.00043580 1.0059 0.04525 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
But it does not print the usual regression statistic like when I would run summary(model):
Total Sum of Squares:
Residual Sum of Squares:
R-Squared:
Adj. R-Squared:
F-statistic: , p-value:
So, I would like to find a way to merge the robust standard errors of the vcovHC function with my plm model.

May I suggest to have a look at the documentation: ?summary.plm.
You will find explanation as well as an example that is easy to transfer to your requirement:
summary(model, vcov = function(x) vcovHC(x, method = "arellano"))

Related

how to center the response and predictor( center at mean) variable using the summary table?

**we dont have the data only summary table is given**
Estimate Std. Error t value Pr(>|t|)
(Intercept) -36.8522 12.6560 -2.912 0.005573 **
X1 -0.7120 1.4540 -0.490 0.626747
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12.27 on 45 degrees of freedom
Multiple R-squared: 0.7377, Adjusted R-squared: 0.7144
F-statistic: 31.63 on 4 and 45 DF, p-value: 1.478e-12
I know that for centering I need to center predictor that is centered at the mean has new values–the entire scale has shifted so that the mean now has a value of 0.And I know The intercept will change, but the regression coefficient for that variable will not.
It looks like you have access to the model in R (since you mention you can get the variance/covariance matrix of the betas). If that's the case, then you can convert it using the model.
The relationship between standardized slopes and unstandardized slopes is:
beta = b*(sx/sy)
Embedded within all lm's is the data, which can be accessed using the following R code:
model$model
(Assuming the fitted object is called "model")
To get the standardized slopes, all you'd have to do is something like this:
get_betas = function(object){
b = summary(object)$coef[, 1]
sx = apply(model.matrix(object), 2, sd)
sy = apply(object$model[1], 2, sd)
beta <- b * sx/sy
return(beta)
}
Then you would use that function to extract the betas
get_betas(model)

plm vs lm - different results?

I tried several times to use lm and plm to do a regression. And I get different results.
First, I used lm as follows:
fixed.Region1 <- lm(CapNormChange ~ Policychanges + factor(Region),
data=Panel)
Further I used plm in the following way:
fixed.Region2 <- plm(CapNormChange ~ Policychanges+ factor(Region),
data=Panel, index=c("Region", "Year"), model="within", effect="individual")
I think there is something wrong with plm because I don't see an intercept in the results (see below).
Furthermore, I am not entirely sure if + factor (Region) is necessary, however, if it is not there, I don't see the coefficients (and significance) for the dummy.
So, my question is:
I am using the plm function wrong? (or what is wrong about it)
If not, how can it be that the results are different?
If somebody could give me a hint, I would really appreciate.
Results from LM:
Call:
lm(formula = CapNormChange ~ Policychanges + factor(Region),
data = Panel)
Residuals:
Min 1Q Median 3Q Max
-31.141 -4.856 -0.642 1.262 192.803
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.3488 4.9134 3.531 0.000558 ***
Policychanges 0.6412 0.1215 5.277 4.77e-07 ***
factor(Region)Asia -19.3377 6.7804 -2.852 0.004989 **
factor(Region)C America + Carib 0.1147 6.8049 0.017 0.986578
factor(Region)Eurasia -17.6476 6.8294 -2.584 0.010767 *
factor(Region)Europe -20.7759 8.8993 -2.335 0.020959 *
factor(Region)Middle East -17.3348 6.8285 -2.539 0.012200 *
factor(Region)N America -17.5932 6.8064 -2.585 0.010745 *
factor(Region)Oceania -14.0440 6.8417 -2.053 0.041925 *
factor(Region)S America -14.3580 6.7781 -2.118 0.035878 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 19.72 on 143 degrees of freedom
Multiple R-squared: 0.3455, Adjusted R-squared: 0.3043
F-statistic: 8.386 on 9 and 143 DF, p-value: 5.444e-10`
Results from PLM:
Call:
plm(formula = CapNormChange ~ Policychanges, data = Panel, effect = "individual",
model = "within", index = c("Region", "Year"))
Balanced Panel: n = 9, T = 17, N = 153
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-31.14147 -4.85551 -0.64177 1.26236 192.80277
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
Policychanges 0.64118 0.12150 5.277 4.769e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 66459
Residual Sum of Squares: 55627
R-Squared: 0.16299
Adj. R-Squared: 0.11031
F-statistic: 27.8465 on 1 and 143 DF, p-value: 4.7687e-07`
You would need to leave out + factor(Region) in your formula for the within model with plm to get what you want.
Within models do not have an intercept, but some software packages (esp. Stata and Gretl) report one. You can estimate it with plm by running within_intercept on you estimated model. The help page has the details about this somewhat artificial intercept.
If you want the individual effects and their significance, use summary(fixef(<your_plm_model>)). Use pFtest to check if the within specification seems worthwhile.
The R squareds diverge between the lm model and the plm model. This is due to the lm model (if used like this with the dummies, it is usually called the LSDV model (least squares dummy variables)) gives what is sometimes called the overall R squared while plm will give you the R squared of the demeaned regression, sometimes called the within R squared. Stata's documentation has some details about this: https://www.stata.com/manuals/xtxtreg.pdf

R: Translate the results from lm() to an equation

I'm using R and I want to translate the results from lm() to an equation.
My model is:
Residuals:
Min 1Q Median 3Q Max
-0.048110 -0.023948 -0.000376 0.024511 0.044190
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.17691 0.00909 349.50 < 2e-16 ***
poly(QPB2_REF1, 2)1 0.64947 0.03015 21.54 2.66e-14 ***
poly(QPB2_REF1, 2)2 0.10824 0.03015 3.59 0.00209 **
B2DBSA_REF1DONSON -0.20959 0.01286 -16.30 3.17e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03015 on 18 degrees of freedom
Multiple R-squared: 0.9763, Adjusted R-squared: 0.9724
F-statistic: 247.6 on 3 and 18 DF, p-value: 8.098e-15
Do you have any idea?
I tried to have something like
f <- function(x) {3.17691 + 0.64947*x +0.10824*x^2 -0.20959*1 + 0.03015^2}
but when I tried to set a x, the f(x) value is incorrect.
Your output indicates that the model includes use of the poly function which be default orthogonalizes the polynomials (includes centering the x's and other things). In your formula there is no orthogonalization done and that is the likely difference. You can refit the model using raw=TRUE in the call to poly to get the raw coefficients that can be multiplied by $x$ and $x^2$.
You may also be interested in the Function function in the rms package which automates creating functions from fitted models.
Edit
Here is an example:
library(rms)
xx <- 1:25
yy <- 5 - 1.5*xx + 0.1*xx^2 + rnorm(25)
plot(xx,yy)
fit <- ols(yy ~ pol(xx,2))
mypred <- Function(fit)
curve(mypred, add=TRUE)
mypred( c(1,25, 3, 3.5))
You need to use the rms functions for fitting (ols and pol for this example instead of lm and poly).
If you want to calculate y-hat based on the model, you can just use predict!
Example:
set.seed(123)
my_dat <- data.frame(x=1:10, e=rnorm(10))
my_dat$y <- with(my_dat, x*2 + e)
my_lm <- lm(y~x, data=my_dat)
summary(my_lm)
Result:
Call:
lm(formula = y ~ x, data = my_dat)
Residuals:
Min 1Q Median 3Q Max
-1.1348 -0.5624 -0.1393 0.3854 1.6814
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5255 0.6673 0.787 0.454
x 1.9180 0.1075 17.835 1e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9768 on 8 degrees of freedom
Multiple R-squared: 0.9755, Adjusted R-squared: 0.9724
F-statistic: 318.1 on 1 and 8 DF, p-value: 1e-07
Now, instead of making a function like 0.5255 + x * 1.9180 manually, I just call predict for my_lm:
predict(my_lm, data.frame(x=11:20))
Same result as this (not counting minor errors from rounding the slope/intercept estimates):
0.5255 + (11:20) * 1.9180
If you are looking for actually visualizing or writing out a complex equation (e.g. something that has restricted cubic spline transformations), I recommend using the rms package, fitting your model, and using the latex function to see it in latex
my_lm <- ols(y~x, data=my_dat)
latex(my_lm)
Note you will need to render the latex code so as to see your equation. There are websites and, if you are using a Mac, Mac Tex software, that will render it for you.

why step() returns weird results from backward elimination for full model using lmerTest

I am confused that why the results from processing step(model) in lmerTest are abnormal.
m0 <- lmer(seed ~ connection*age + (1|unit), data = test)
step(m0)
note: Both "connection" and "age" have been set as.factor()
Random effects:
Chi.sq Chi.DF elim.num p.value
unit 0.25 1 1 0.6194
Fixed effects:
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
connection 1 0.01746 0.017457 1.5214 0.22142
age 1 0.07664 0.076643 6.6794 0.01178 *
connection:age 1 0.04397 0.043967 3.8317 0.05417 .
Residuals 72 0.82617 0.011475
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Least squares means:
Estimate Standard Error DF t-value Lower CI Upper CI p-value
Final model:
Call:
lm(formula = fo, data = mm, contrasts = l.lmerTest.private.contrast)
Coefficients:
(Intercept) connectionD ageB connectionD:ageB
-0.84868 -0.07852 0.01281 0.09634
Why it does not show me the Final model?
The thing here is that random effect was eliminated as being NS according to the LR test. Then the anova method for the fixed effects model, the "lm" object was applied and no elimination of NS fixed effects was done. You are right, that the output is different from "lmer" objects and there are no (differences of ) least squares means. If you want to get the latter you may try the lsmeans package. For the backward elimination of NS effect of the final model you may use stats::step function.

How to do ANOVA for regression models in R?

When I run ANOVA for the regression models which I develop, this error appears:
Error in anova.nls(model3) : nova is only defined for sequences of "nls" objects
What is the meaning of this error?
It should be mentioned that when I run summary of model, I see just parameters estimated for the model and no other statistical parameters. Does it mean that the model still needs modification and that is not the final model? please look at the instruction of my model, and the summary and ANOVA:
model3 = nls(Height ~ 1.30 + a*(I(1- exp(-b*Diameter))^c), data = dat1, start = list(a=48,b=0.012,c=0.491), algorithm="port")
summary(model3)
anova(model3)
Here are the result:
model3 = nls(Height ~ 1.30 + a*(I(1- exp(-b*Diameter))^c), data = dat1, start = list(a=48,b=0.012,c=0.491), algorithm="port")
summary(model3)
Formula: Height ~ 1.3 + a * (I(1 - exp(-b * Diameter))^c)
Parameters:
Estimate Std. Error t value Pr(>|t|)
a 43.121923 1.653027 26.087 < 2e-16 ***
b 0.022037 0.003811 5.783 1.38e-08 ***
c 0.914263 0.116202 7.868 2.75e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.369 on 446 degrees of freedom
Algorithm "port", convergence message: relative convergence (4)
anova(model3)
Error in anova.nls(model3) :
anova is only defined for sequences of "nls" objects
I am a beginner in R. Is there somebody who help me?
Thank you
The error message means you need to specify a sub-model - for non-linear regression, there's no obvious choice, and so you'll need to do anova(model3, model0) where model0 corresponds to a fit of another model - probably where one or more of your parameters are held constant.

Resources