I would like to extract the variance-covariance matrix from a simple plm fixed effects model. For example:
library(plm)
data("Grunfeld")
M1 <- plm(inv ~ lag(inv) + value + capital, index = 'firm',
data = Grunfeld)
The usual vcov function gives me:
vcov(M1)
lag(inv) value capital
lag(inv) 3.561238e-03 -7.461897e-05 -1.064497e-03
value -7.461897e-05 9.005814e-05 -1.806683e-05
capital -1.064497e-03 -1.806683e-05 4.957097e-04
plm's fixef function only gives:
fixef(M1)
1 2 3 4 5 6 7
-286.876375 -97.190009 -209.999074 -53.808241 -59.348086 -34.136422 -34.397967
8 9 10
-65.116699 -54.384488 -6.836448
Any help extracting the variance-covariance matrix that includes the fixed effects would be much appreciated.
Using names sometimes is very useful:
names(M1)
[1] "coefficients" "vcov" "residuals" "df.residual"
[5] "formula" "model" "args" "call"
M1$vcov
lag(inv) value capital
lag(inv) 1.265321e-03 3.484274e-05 -3.395901e-04
value 3.484274e-05 1.336768e-04 -7.463365e-05
capital -3.395901e-04 -7.463365e-05 3.662395e-04
Picking up your example, do the following to get the standard errors (if that is what you are interested in; it is not the whole variance-covariance matrix):
library(plm)
data("Grunfeld")
M1 <- plm(inv ~ lag(inv) + value + capital, index = 'firm',
data = Grunfeld)
fix <- fixef(M1)
fix_se <- attr(fix, "se")
fix_se
1 2 3 4 5 6 7 8 9 10
43.453642 25.948160 20.294977 11.245009 12.472005 9.934159 10.554240 11.083221 10.642589 9.164694
You can also use the summary function for more info:
summary(fix)
Estimate Std. Error t-value Pr(>|t|)
1 -286.8764 43.4536 -6.6019 4.059e-11 ***
2 -97.1900 25.9482 -3.7455 0.0001800 ***
3 -209.9991 20.2950 -10.3473 < 2.2e-16 ***
4 -53.8082 11.2450 -4.7851 1.709e-06 ***
5 -59.3481 12.4720 -4.7585 1.950e-06 ***
6 -34.1364 9.9342 -3.4363 0.0005898 ***
7 -34.3980 10.5542 -3.2592 0.0011174 **
8 -65.1167 11.0832 -5.8753 4.222e-09 ***
9 -54.3845 10.6426 -5.1101 3.220e-07 ***
10 -6.8364 9.1647 -0.7460 0.4556947
Btw, the documentation expains the "se" attribute:
Value An object of class "fixef". It is a numeric vector containing
the fixed effects with attribute se which contains the standard
errors. [...]"
Note: You might need the latest development version for that because much has improved there about fixef: https://r-forge.r-project.org/R/?group_id=406
Related
How do I remove the intercept from the prediction when using predict.glm? I'm not talking about the model itself, just in the prediction.
For example, I want to get the difference and standard error between x=1 and x=3
I tried putting newdata=list(x=2), intercept = NULL when using predict.glm and it doesn't work
So for example:
m <- glm(speed ~ dist, data=cars, family=gaussian(link="identity"))
prediction <- predict.glm(m, newdata=list(dist=c(2)), type="response", se.fit=T, intercept=NULL)
I'm not sure if this is somehow implemented in predict, but you could the following trick1.
Add a manual intercept column (i.e. a vector of 1s) to the data and use it in the model while adding 0 to RHS of formula (to remove the "automatic" intercept).
cars$intercept <- 1L
m <- glm(speed ~ 0 + intercept + dist, family=gaussian, data=cars)
This gives us an intercept column in the model.frame, internally used by predict,
model.frame(m)
# speed intercept dist
# 1 4 1 2
# 2 4 1 10
# 3 7 1 4
# 4 7 1 22
# ...
which allows us to set it to an arbitrary value such as zero.
predict.glm(m, newdata=list(dist=2, intercept=0), type="response", se.fit=TRUE)
# $fit
# 1
# 0.3311351
#
# $se.fit
# [1] 0.03498896
#
# $residual.scale
# [1] 3.155753
The function poly() in R is used in order to produce orthogonal vectors and can be helpful to interpret coefficient significance. However, I don't see the point of using it for prediction. To my view, the two following model (model_1 and model_2) should produce the same predictions.
q=1:11
v=c(3,5,7,9.2,14,20,26,34,50,59,80)
model_1=lm(v~poly(q,2))
model_2=lm(v~1+q+q^2)
predict(model_1)
predict(model_2)
But it doesn't. Why?
Because they are not the same model. Your second one has one unique covariate, while the first has two.
> model_2
Call:
lm(formula = v ~ 1 + q + q^2)
Coefficients:
(Intercept) q
-15.251 7.196
You should use the I() function to modify one parameter inside your formula in order the regression to consider it as a covariate:
model_2=lm(v~1+q+I(q^2))
> model_2
Call:
lm(formula = v ~ 1 + q + I(q^2))
Coefficients:
(Intercept) q I(q^2)
7.5612 -3.3323 0.8774
will give the same prediction
> predict(model_1)
1 2 3 4 5 6 7 8 9 10 11
5.106294 4.406154 5.460793 8.270210 12.834406 19.153380 27.227133 37.055664 48.638974 61.977063 77.069930
> predict(model_2)
1 2 3 4 5 6 7 8 9 10 11
5.106294 4.406154 5.460793 8.270210 12.834406 19.153380 27.227133 37.055664 48.638974 61.977063 77.069930
I have some x and y values which can be fitted nicely with a polynomial
> mysubx
[1] 0.05 0.10 0.20 0.50 1.00 2.00 5.00
[8] 9.00 12.30 18.30
> mysuby
[1] 1.008 1.019 1.039 1.091 1.165 1.258 1.402
[8] 1.447 1.421 1.278
> mymodel <- lm(mysuby ~ poly(mysubx,5))
The fit can be confirmed graphically.
> plot(mysubx, mysuby)
> lines(mysubx, mymodel$fitted.values, col = "red")
My problem happens when I try to use the coefficients returned by lm to determine a y value from a given x. So for example if I try to use the first value in mysubx this should give the mymodel$fitted.values1. From the graph it can be seen I should expect to see a number around 1.01.
> ansx = 0
> for(i in seq_along(mymodel$coefficients)){
+ ansx = ansx + mysubx[1]^(i-1)*mymodel$coefficients[[i]]
+ }
> ansx
[1] 1.229575
>
Where
> mysubx[1]
[1] 0.05
> mymodel$coefficients
(Intercept) poly(mysubx, 5)1 poly(mysubx, 5)2 poly(mysubx, 5)3
1.21280000 0.35310369 -0.35739878 0.10989141
poly(mysubx, 5)4 poly(mysubx, 5)5
-0.04608682 0.02054430
As can be seen an x value on the graph of 0.05 does not give 1.229575. Obviously I don't understand what is going on? Can someone explain how I can get the correct y value from any given x value using output of the lm function?
Thank you.
In fact, what you want is not poly(mysubx, 5) but
poly(mysubx, 5, raw = TRUE)
If you let raw as FALSE, it does not use x, x**2, x**3, etc. but orthogonal polynomials.
mymodel <- lm(mysuby ~ poly(mysubx, 5, raw = T))
When you fit a model, R first builds a model matrix from your data and your formula. You can get hold of it using the model.matrix function.
> X <- model.matrix(mysuby ~ poly(mysubx,5))
This matrix has a row per input point (in your case your input is one-dimensional and kept in mysubx, but in general, you will get it from a data frame and it can be multi-dimensional). The formula specifies how the input data should be modified before we fit the model. We can take a closer look at the first row:
> X[1,]
(Intercept) poly(mysubx, 5)1 poly(mysubx, 5)2
1.0000000 -0.2517616 0.2038351
poly(mysubx, 5)3 poly(mysubx, 5)4 poly(mysubx, 5)5
-0.2264003 0.2355258 -0.2245773
As you can see, when you fit a polynomial, you get values for the intercept (always 1 since the intercept is a constant for the model; it doesn't depend on x) and the transformations you do on your input. WE call such a row the "features" you use in your model
In this case, you have a 1->N dimensional mapping from input to features. In general, it will be an M -> N-dimensional mapping. Regardless of how you map input to the model matrix, the model fitting only cares about the model matrix. The model builds a way to map each row in this matrix to a prediction.
For a linear model, the mapping from features to target variable is an inner product. You take the coefficients and compute the inner product with the features. So, for your first data point, you do:
> mymodel$coefficients %*% X[1,]
[,1]
[1,] 1.010704
For the entire data, you simply do this for each row:
> predict(mymodel)
1 2 3 4 5 6 7
1.010704 1.020083 1.038284 1.088659 1.159883 1.263722 1.400163
8 9 10
1.447700 1.420790 1.278011
> apply(X, MARGIN = 1, function(features) mymodel$coefficients %*% features)
1 2 3 4 5 6 7
1.010704 1.020083 1.038284 1.088659 1.159883 1.263722 1.400163
8 9 10
1.447700 1.420790 1.278011
Here, X doesn't have to be the data you trained the model on. You can build it from any other input data using the same formula. I would recommend not using global variables in your formulae, though, as this is likely to cause problems later on.
I use the following sample code, to run AR1 process on data
(just numbers I picked to check the function):
> data
[1] 3 7 4 6 2 8 5 4
> data_ts
Time Series:
Start = 1
End = 8
Frequency = 1
[1] 3 7 4 6 2 8 5 4
> arima(data_ts,order=c(1,0,0))
Call:
arima(x = data_ts, order = c(1, 0, 0))
Coefficients:
ar1 intercept
-0.6965 5.0323
s.e. 0.2334 0.2947
sigma^2 estimated as 1.769: log likelihood = -13.97, aic = 33.93
residuals are:
> arima(data_ts,order=c(1,0,0))$resid
Time Series:
Start = 1
End = 8
Frequency = 1
[1] -1.4581973 0.5521706 0.3383218 0.2487084 -2.3582160 0.8556328 2.0348596
[8] -1.0547538
Now, the coefficient should be -0.6965 and the intercept 5.0323. I'd like to verify the result. So I'm assigning the parameters accordingly i.e.:
data[8] = intercept + coefficcient_data[7] + residual[8]
but it never gets correct. What am I doing wrong? BTW - trying the ar function produces different results:
ar(x = data_ts, aic = FALSE, order.max = 1, method = "ols")
Coefficients:
1
-0.6786
Intercept: 0.3527 (0.4951)
Order selected 1 sigma^2 estimated as 1.709. And still - when I assign the time-series parameters onto the estimated equation + errors, the result isn't correct. Any idea ?
ok, found the answer in http://www.stat.pitt.edu/stoffer/tsa2/Rissues.htm
the actual intercept is: intercept*(1- coefficient)
I currently have the following regression model:
> print(summary(step1))
Call:
lm(formula = model1, data = newdat1)
Residuals:
Min 1Q Median 3Q Max
-2.53654 -0.02423 -0.02423 -0.02423 1.71962
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3962 0.0532 7.446 2.76e-12 ***
i2 0.6281 0.0339 18.528 < 2e-16 ***
I would like just the following returned as a data frame:
Estimate Std. Error t value Pr(>|t|)
i2 0.6281 0.0339 18.528 < 2e-16
I currently have the following code:
> results1<-as.data.frame(summary(step1)$coefficients[-1,drop=FALSE])
Which yields:
> results1
summary(step1)$coefficients[-1, drop = FALSE]
1 6.280769e-01
2 5.320108e-02
3 3.389873e-02
4 7.446350e+00
5 1.852804e+01
6 2.764836e-12
7 2.339089e-45
Thus is not what I want; however, it does work when there's more than 1 predictor.
It would be nice if you gave a reproducible example. I think you're looking for
cc <- coef(summary(step1))[2,,drop=FALSE]
as.data.frame(cc)
Using accessors such as coef(summary(.)) rather than summary(.)$coefficients is both prettier and more robust (there is no guarantee that the internal structure of summary() will stay the same -- although admittedly it's unlikely that this basic a part of R will change any time soon, especially as many users probably have used constructions like $coefficients).
Indexing the row by name, i.e.
coef(summary(step1))["i2",,drop=FALSE]
would probably be even better.
summary(step1)$coefficients is a matrix. When you take out the first element with [-1, drop=FALSE] it is converted to a vector, which is why you get 7 numbers instead of the row you want.
> set.seed(123)
> x <- rnorm(100)
> y <- -1 + 0.2*x + rnorm(100)
> step1 <- lm(y ~ x)
> class(summary(step1)$coefficients)
[1] "matrix"
> class(summary(step1)$coefficients[-1, drop=FALSE])
[1] "numeric"
The solution is to change the subsetting with [ so that you specify you wan to keep all columns (see ?`[`):
> summary(step1)$coefficients[-1, , drop=FALSE]
Estimate Std. Error t value Pr(>|t|)
x 0.1475284 0.1068786 1.380336 0.1706238