plm vs lm - different results? - r

I tried several times to use lm and plm to do a regression. And I get different results.
First, I used lm as follows:
fixed.Region1 <- lm(CapNormChange ~ Policychanges + factor(Region),
data=Panel)
Further I used plm in the following way:
fixed.Region2 <- plm(CapNormChange ~ Policychanges+ factor(Region),
data=Panel, index=c("Region", "Year"), model="within", effect="individual")
I think there is something wrong with plm because I don't see an intercept in the results (see below).
Furthermore, I am not entirely sure if + factor (Region) is necessary, however, if it is not there, I don't see the coefficients (and significance) for the dummy.
So, my question is:
I am using the plm function wrong? (or what is wrong about it)
If not, how can it be that the results are different?
If somebody could give me a hint, I would really appreciate.
Results from LM:
Call:
lm(formula = CapNormChange ~ Policychanges + factor(Region),
data = Panel)
Residuals:
Min 1Q Median 3Q Max
-31.141 -4.856 -0.642 1.262 192.803
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.3488 4.9134 3.531 0.000558 ***
Policychanges 0.6412 0.1215 5.277 4.77e-07 ***
factor(Region)Asia -19.3377 6.7804 -2.852 0.004989 **
factor(Region)C America + Carib 0.1147 6.8049 0.017 0.986578
factor(Region)Eurasia -17.6476 6.8294 -2.584 0.010767 *
factor(Region)Europe -20.7759 8.8993 -2.335 0.020959 *
factor(Region)Middle East -17.3348 6.8285 -2.539 0.012200 *
factor(Region)N America -17.5932 6.8064 -2.585 0.010745 *
factor(Region)Oceania -14.0440 6.8417 -2.053 0.041925 *
factor(Region)S America -14.3580 6.7781 -2.118 0.035878 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 19.72 on 143 degrees of freedom
Multiple R-squared: 0.3455, Adjusted R-squared: 0.3043
F-statistic: 8.386 on 9 and 143 DF, p-value: 5.444e-10`
Results from PLM:
Call:
plm(formula = CapNormChange ~ Policychanges, data = Panel, effect = "individual",
model = "within", index = c("Region", "Year"))
Balanced Panel: n = 9, T = 17, N = 153
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-31.14147 -4.85551 -0.64177 1.26236 192.80277
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
Policychanges 0.64118 0.12150 5.277 4.769e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 66459
Residual Sum of Squares: 55627
R-Squared: 0.16299
Adj. R-Squared: 0.11031
F-statistic: 27.8465 on 1 and 143 DF, p-value: 4.7687e-07`

You would need to leave out + factor(Region) in your formula for the within model with plm to get what you want.
Within models do not have an intercept, but some software packages (esp. Stata and Gretl) report one. You can estimate it with plm by running within_intercept on you estimated model. The help page has the details about this somewhat artificial intercept.
If you want the individual effects and their significance, use summary(fixef(<your_plm_model>)). Use pFtest to check if the within specification seems worthwhile.
The R squareds diverge between the lm model and the plm model. This is due to the lm model (if used like this with the dummies, it is usually called the LSDV model (least squares dummy variables)) gives what is sometimes called the overall R squared while plm will give you the R squared of the demeaned regression, sometimes called the within R squared. Stata's documentation has some details about this: https://www.stata.com/manuals/xtxtreg.pdf

Related

How can I calculate the variance of coefficient in linear model using R? [duplicate]

Simple question really! I am running lots of linear regressions of y~x and want to obtain the variance for each regression without computing it from hand from the Standard Error output given in the summary.lm command. Just to save a bit of time :-). Any ideas of the command to do this? Or will I have to write a function to do it myself?
m<-lm(Alopecurus.geniculatus~Year)
> summary(m)
Call:
lm(formula = Alopecurus.geniculatus ~ Year)
Residuals:
Min 1Q Median 3Q Max
-19.374 -8.667 -2.094 9.601 21.832
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 700.3921 302.2936 2.317 0.0275 *
Year -0.2757 0.1530 -1.802 0.0817 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 11.45 on 30 degrees of freedom
(15 observations deleted due to missingness)
Multiple R-squared: 0.09762, Adjusted R-squared: 0.06754
F-statistic: 3.246 on 1 and 30 DF, p-value: 0.08168
So I get a Standard Error output and I was hoping to get a Variance output without calculating it by hand...
I'm not sure what you want the variance of.
If you want the residual variance, it's: (summary(m)$sigma)**2.
If you want the variance of your slope, it's: (summary(m)$coefficients[2,2])**2, or vcov(m)[2,2].
vcov(m)
gives the covariance matrix of the coefficients – variances on the diagonal.
if you're referring to the standard errors for the coefficient estimates, the answer is
summary(m)$coef[,2]
and if you're referring to the estimated residual variance, it's
summary(m)$sigma
Type names( summary(m) ) and names(m) for other information you can access.

Binary Logistic Regression output

I'm an undergrad student and am currently struggling with R, i'be been trying to teach myself for weeks but I'm not a natural, so I thought i'd seek some support.
I'm currently trying to analyse the interaction of my variables on recall of a target using logistic regression, as specified by my tutor. I have a 2 (isolate x control condition)by 2 (similarity/difference list type) study, and my dependent variable is binary of recall (yes or no). I've tried to clean my data and run the code,
Call:
glm(formula = Target ~ Condition * List, family = "binomial",
data = pro)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8297 -0.3288 0.6444 0.6876 2.4267
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.4663 0.6405 2.289 0.022061 *
Conditionisolate -1.1097 0.8082 -1.373 0.169727
Listsim -4.3567 1.2107 -3.599 0.000320 ***
Conditionisolate:Listsim 5.3218 1.4231 3.740 0.000184 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 97.736 on 70 degrees of freedom
Residual deviance: 65.869 on 67 degrees of freedom
AIC: 73.869
that's my output^ it completely ignores the difference and control condition, I know i'm doing something wrong and i'm feeling quite exacerbated by it. Can any one help me?
In the model output, R is treating control and difference as the baseline levels of your two variables. The outcome associated with them is wrapped up in the intercept. For other combinations of variable levels, the coefficients show how those differ from that baseline.
Control/Difference: just use the intercept
Control/Similarity: intercept + listsim
Isolate/Difference: intercept + conditionisolate
Isolate/Similarity: intercept + listsim + conditionisolate + conditionisolate:listsim

R: test quadratic regression with interaction

I have data from an experiment with two conditions (dichotomous IV: 'condition'). I also want to make use of another IV which is metric ('hh'). My DV is also metric ('attention.hh'). I've already run a multiple regression model with an interaction of my IVs. Therefore, I centered the metric IV by doing this:
hh.cen <- as.numeric(scale(data$hh, scale = FALSE))
with these variables I ran the following analysis:
model.hh <- lm(attention.hh ~ hh.cen * condition, data = data)
summary(model.hh)
The results are as follows:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.04309 3.83335 0.011 0.991
hh.cen 4.97842 7.80610 0.638 0.525
condition 4.70662 5.63801 0.835 0.406
hh.cen:condition -13.83022 11.06636 -1.250 0.215
However, the theory behind my analysis tells me, that I should expect a quadratic relation of my metric IV (hh) and the DV (but only in one condition).
Looking at the plot, one could at least imply this relation:
Of course I want to test this statistically. However, I'm struggling now how to compute the lineare regression model.
I have two solutions I think that should be good, leading to different outcomes. Unfortunately, I don't know which is the right one now. I know, that by including interactions (and 3-way interactions) into the model, I also have to include all simple/main effects as well.
Solution: Including all terms on their own:
therefore I first compute the squared IV:
attention.hh.cen <- scale(data$attention.hh, scale = FALSE)
now i can compute the linear model:
sqr.model.1 <- lm(attention.hh.cen ~ condition + hh.cen + hh.sqr + (condition : hh.cen) + (condition : hh.sqr) , data = data)
summary(sqr.model.1)
This leads to the following outcome:
Call:
lm(formula = attention.hh.cen ~ condition + hh.cen + hh.sqr +
(condition:hh.cen) + (condition:hh.sqr), data = data)
Residuals:
Min 1Q Median 3Q Max
-53.798 -14.527 2.912 13.111 49.119
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.3475 3.5312 -0.382 0.7037
condition -9.2184 5.6590 -1.629 0.1069
hh.cen 4.0816 6.0200 0.678 0.4996
hh.sqr 5.0555 8.1614 0.619 0.5372
condition:hh.cen -0.3563 8.6864 -0.041 0.9674
condition:hh.sqr 33.5489 13.6448 2.459 0.0159 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 20.77 on 87 degrees of freedom
Multiple R-squared: 0.1335, Adjusted R-squared: 0.08365
F-statistic: 2.68 on 5 and 87 DF, p-value: 0.02664
Solution: R includes all main effects of an interaction by using the *
sqr.model.2 <- lm(attention.hh.cen ~ condition * I(hh.cen^2), data = data)
summary(sqr.model.2)
IMHO, this should also be fine -- however, the output is not the same as the one received by the code above
Call:
lm(formula = attention.hh.cen ~ condition * I(hh.cen^2), data = data)
Residuals:
Min 1Q Median 3Q Max
-52.297 -13.353 2.508 12.504 49.740
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.300 3.507 -0.371 0.7117
condition -8.672 5.532 -1.567 0.1206
I(hh.cen^2) 4.490 8.064 0.557 0.5791
condition:I(hh.cen^2) 32.315 13.190 2.450 0.0162 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 20.64 on 89 degrees of freedom
Multiple R-squared: 0.1254, Adjusted R-squared: 0.09587
F-statistic: 4.252 on 3 and 89 DF, p-value: 0.007431
I'd rather go with solution number 1 but I'm not sure about that.
Maybe someone has a better solution or can help me out?

R: Translate the results from lm() to an equation

I'm using R and I want to translate the results from lm() to an equation.
My model is:
Residuals:
Min 1Q Median 3Q Max
-0.048110 -0.023948 -0.000376 0.024511 0.044190
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.17691 0.00909 349.50 < 2e-16 ***
poly(QPB2_REF1, 2)1 0.64947 0.03015 21.54 2.66e-14 ***
poly(QPB2_REF1, 2)2 0.10824 0.03015 3.59 0.00209 **
B2DBSA_REF1DONSON -0.20959 0.01286 -16.30 3.17e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03015 on 18 degrees of freedom
Multiple R-squared: 0.9763, Adjusted R-squared: 0.9724
F-statistic: 247.6 on 3 and 18 DF, p-value: 8.098e-15
Do you have any idea?
I tried to have something like
f <- function(x) {3.17691 + 0.64947*x +0.10824*x^2 -0.20959*1 + 0.03015^2}
but when I tried to set a x, the f(x) value is incorrect.
Your output indicates that the model includes use of the poly function which be default orthogonalizes the polynomials (includes centering the x's and other things). In your formula there is no orthogonalization done and that is the likely difference. You can refit the model using raw=TRUE in the call to poly to get the raw coefficients that can be multiplied by $x$ and $x^2$.
You may also be interested in the Function function in the rms package which automates creating functions from fitted models.
Edit
Here is an example:
library(rms)
xx <- 1:25
yy <- 5 - 1.5*xx + 0.1*xx^2 + rnorm(25)
plot(xx,yy)
fit <- ols(yy ~ pol(xx,2))
mypred <- Function(fit)
curve(mypred, add=TRUE)
mypred( c(1,25, 3, 3.5))
You need to use the rms functions for fitting (ols and pol for this example instead of lm and poly).
If you want to calculate y-hat based on the model, you can just use predict!
Example:
set.seed(123)
my_dat <- data.frame(x=1:10, e=rnorm(10))
my_dat$y <- with(my_dat, x*2 + e)
my_lm <- lm(y~x, data=my_dat)
summary(my_lm)
Result:
Call:
lm(formula = y ~ x, data = my_dat)
Residuals:
Min 1Q Median 3Q Max
-1.1348 -0.5624 -0.1393 0.3854 1.6814
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5255 0.6673 0.787 0.454
x 1.9180 0.1075 17.835 1e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9768 on 8 degrees of freedom
Multiple R-squared: 0.9755, Adjusted R-squared: 0.9724
F-statistic: 318.1 on 1 and 8 DF, p-value: 1e-07
Now, instead of making a function like 0.5255 + x * 1.9180 manually, I just call predict for my_lm:
predict(my_lm, data.frame(x=11:20))
Same result as this (not counting minor errors from rounding the slope/intercept estimates):
0.5255 + (11:20) * 1.9180
If you are looking for actually visualizing or writing out a complex equation (e.g. something that has restricted cubic spline transformations), I recommend using the rms package, fitting your model, and using the latex function to see it in latex
my_lm <- ols(y~x, data=my_dat)
latex(my_lm)
Note you will need to render the latex code so as to see your equation. There are websites and, if you are using a Mac, Mac Tex software, that will render it for you.

How do I print the variance of an lm in R without computing from the Standard Error by hand?

Simple question really! I am running lots of linear regressions of y~x and want to obtain the variance for each regression without computing it from hand from the Standard Error output given in the summary.lm command. Just to save a bit of time :-). Any ideas of the command to do this? Or will I have to write a function to do it myself?
m<-lm(Alopecurus.geniculatus~Year)
> summary(m)
Call:
lm(formula = Alopecurus.geniculatus ~ Year)
Residuals:
Min 1Q Median 3Q Max
-19.374 -8.667 -2.094 9.601 21.832
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 700.3921 302.2936 2.317 0.0275 *
Year -0.2757 0.1530 -1.802 0.0817 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 11.45 on 30 degrees of freedom
(15 observations deleted due to missingness)
Multiple R-squared: 0.09762, Adjusted R-squared: 0.06754
F-statistic: 3.246 on 1 and 30 DF, p-value: 0.08168
So I get a Standard Error output and I was hoping to get a Variance output without calculating it by hand...
I'm not sure what you want the variance of.
If you want the residual variance, it's: (summary(m)$sigma)**2.
If you want the variance of your slope, it's: (summary(m)$coefficients[2,2])**2, or vcov(m)[2,2].
vcov(m)
gives the covariance matrix of the coefficients – variances on the diagonal.
if you're referring to the standard errors for the coefficient estimates, the answer is
summary(m)$coef[,2]
and if you're referring to the estimated residual variance, it's
summary(m)$sigma
Type names( summary(m) ) and names(m) for other information you can access.

Resources