I'm investigating the effect of FDI on Freedom and want to apply panel IV regression.
I use the plm package. While I run the regression, it gives me the following error:
Error in plm.fit(formula, data, model, effect, random.method, random.models, :
insufficient number of instruments
I alterned the number of control variables but it didnĀ“t work.
plm(FH~Lag_logUN_FDI_Stock_gdp+Lag_GDPpC+Lag_polity2+Lag_conflict+Lag_logtax+Lag_logresources+Lag_logtrade |.- Lag_logUN_FDI_Stock_gdp -Lag_logpopulation + Land ,data = pd_main)
Your formula suggests that you are using one IV (Land) for two instrumented variables. You are supposed to provide two instruments for one instrumented variable. Possibly you just wrote the syntax the wrong way around? According to this book (Chapter 15) you are supposed to write the syntax as:
plm( y ~ x + b | z1 + z1 + b, data=pd_main )
Where x is your endogenous variable, and z1, z2 are your instruments.
Related
I am using the phylolm function (in package phylolm) to conduct PGLS phylogenetic analysis and am having some trouble interpreting the model output.
I am running a phylolm model with a continuous (log transformed) response variable and one predictor variable which is a factor with two groups. When I change the reference group (from condition A to B) and rerun the same model, the estimates change accordingly but the standard errors do not seem to. The standard error for the new reference group remains very high - high enough that I don't see how the difference between groups can be significant (which the p value indicates they are). I was under the impression that phylolm standard errors can be interpreted in the same was as for ordinary linear regression - am I mistaken?
Since you have a binary variable, changing the reference category should only reverse the value of the beta estimate - should it not? This is what happens in your models.
It might help to think of the coefficient for the conditions as the different between the group means. The difference between the means will be the same size, but the sign will change depending if you are comparing condition A to condition B or condition B to condition A.
# Create a binary variable
x = sample(c(0,1), n, replace = TRUE)
# and create the opposite variable (ie. changing the reference level)
x.rev = +(!x)
# add some error to the model
error = runif(n)
# create the continuous response variablbe
y = 2 + 2 * x + error
df = data.frame(y, x)
# look at the group means between each condition
group_means = tapply(y, x, mean)
group_means[1] - group_means[2]
group_means[1] - group_means[2]
# compare those to the coefficients in the model
summary(lm(y ~ x))
summary(lm(y ~ x.rev))
My objective is to create marginal effects and a plot similar to what's done in this post under "marginal effects": https://www.drbanderson.com/myresources/interpretinglogisticregressionpartii/
Since I cannot provide the actual model or actual data (data is sensitive), I will provide a generic example.
I have the following model created using the glm function:
model = glm(y ~ as.factor(x1) + x2 + I(x2^2) + x3 + as.factor(x4):as.factor(x5), data = dataFrame,family="binomial")
x2 is a continuous variable that I want to calculate marginal effects at the average of the other continuous variable, x3, and at pre-defined values for x1, x4, and x5. For further simplification, assume x1 is categorical of either morning, afternoon, or night (thus producing two coefficients in the logit model), x4 is categorical of either left or right, and x5 is categorical of either up or down (thus x4:x5 produces coefficient results for left and up, left and down, right and up, with right and down the excluded interaction).
Similar to what is done in the post, I run the following code:
x2.inc <- seq(min(dataFrame$x2), max(dataFrame$x2), by = .1)
to get a sequence of x2 values at which to evaluate the marginal effect. Finally, I attempt to run the margins command:
x2.margins.df <- as.data.frame(summary(margins(model, at = list(x2 = x2.inc, x3 = mean(dataFrame$x3), x1 = 'morning', x4 = 'left', x5 = 'right'))))
However, running this produced the following error:
Error in attributes(.Data) <- c(attributes(.Data), attrib) :
'names' attribute [1] must be the same length as the vector [0]
Any thoughts on how I can successfully run the margins command given a) the quadratic nature of x2 in my model, and b) the interaction of terms in the model?
As a side note: I know I can calculate these things manually if I wanted to. However, for the sake of having less code and ease of reproducibility, I'd like to make this method work. Thank you for the assistance!
The readme of margins says:
https://cran.r-project.org/web/packages/margins/readme/README.html
that it supports logit models. So why implement somethiny manually?
library("car")
library("plm")
data("LaborSupply", package = "plm")
model <- glm(disab ~ kids*age + kids*I(age^2), data = LaborSupply, family="binomial")
summary(margins(model))
I want to estimate a fixed effects model while using panel-corrected standard errors as well as Prais-Winsten (AR1) transformation in order to solve panel heteroscedasticity, contemporaneous spatial correlation and autocorrelation.
I have time-series cross-section data and want to perform regression analysis. I was able to estimate a fixed effects model, panel corrected standard errors and Prais-winsten estimates individually. And I was able to include panel corrected standard errors in a fixed effects model. But I want them all at once.
# Basic ols model
ols1 <- lm(y ~ x1 + x2, data = data)
summary(ols1)
# Fixed effects model
library('plm')
plm1 <- plm(y ~ x1 + x2, data = data, model = 'within')
summary(plm1)
# Panel Corrected Standard Errors
library(pcse)
lm.pcse1 <- pcse(ols1, groupN = Country, groupT = Time)
summary(lm.pcse1)
# Prais-Winsten estimates
library(prais)
prais1 <- prais_winsten(y ~ x1 + x2, data = data)
summary(prais1)
# Combination of Fixed effects and Panel Corrected Standard Errors
ols.fe <- lm(y ~ x1 + x2 + factor(Country) - 1, data = data)
pcse.fe <- pcse(ols.fe, groupN = Country, groupT = Time)
summary(pcse.fe)
In the Stata command: xtpcse it is possible to include both panel corrected standard errors and Prais-Winsten corrected estimates, with something allong the following code:
xtpcse y x x x i.cc, c(ar1)
I would like to achieve this in R as well.
I am not sure that my answer will completely address your concern, these days I've been trying to deal with the same problem that you mention.
In my case, I ran the Prais-Winsten function from the package prais where I included my model with the fixed effects. Afterwards, I correct for heteroskedasticity using the function vcovHC.prais which is analogous to vcovHC function from the package sandwich.
This basically will give you White's/sandwich heteroskedasticity-consistent covariance matrix which, if you later fit into the function coeftest from the package lmtest, it will give you the table output with the corrected standard errors. Taking your posted example, see below the code that I have used:
# Prais-Winsten estimates with Fixed Effects
library(prais)
prais.fe <- prais_winsten(y ~ x1 + x2 + factor(Country), data = data)
library(lmtest)
prais.fe.w <- coeftest(prais.fe, vcov = vcovHC.prais(prais.fe, "HC1")
h.m1 # run the object to see the output with the corrected standard errors.
Alas, I am aware that the sandwhich heteroskedasticity-consistent standard errors are not exactly the same as the Beck and Katz's PCSEs because PCSE deals with panel heteroskedasticity while sandwhich SEs addresses overall heteroskedasticity. I am not totally sure in how much these two differ in practice, but something is something.
I hope my answer was somehow helpful, this is actually my very first answer :D
I want to run the a fixed effects regression in R for which I define the following formula:
time.aspects <- as.formula(y ~ x1 + x2 + x3 + t)
time.total <- plm(time.aspects, data=all, index=c("i","t"), model = "within")
x1, x2 and x3 are my independent variables. I also want to add a time factor t to account for time fixed effects.In this regard, t stands for the single years 1 to 10 (that are included in my data file).
However, if I want to consider robust standard errors in the following way:
coeftest(time.total, vcov. = vcovSCC(time.total, type = "HC3"))
the following error occurs: Mistake in 1 - diaghat : non-numerical argument for binary operator.
Does anyone know how to avoid this error message?
I have a dataframe train (21 predictors, 1 response, 1012 observations), and I suspect that the response is a nonlinear function of the predictors. Thus, I would like to perform a multivariate polynomial regression of the response on all the predictors, and then try to understand which are the most important terms. To avoid the collinearity problems of standard multivariate polynomial regression, I'd like to use multivariate orthogonal polynomials with polym(). However, I have quite a lot of predictors, and their names do not follow a simple rule. For example, in train I have predictors named X2,X3 and X5, but not X1 and X4. The response is X14. Is there a way to write the formula in lm without having to explicitly write the name of all predictors? Writing
OrthoModel=lm(X14~polym(.,2),data=train)
returns the error
Error in polym(., 2) : object '.' not found
EDIT: the model I wanted to fit contains about 3.5 billion terms, so it's useless. It's better to fit a term with only main effects, interactions and second degree terms -> 231 terms. I wrote the formula for a standard (non-orthogonal) second degree polynomial:
`as.formula(paste(" X14 ~ (", paste0(names(Xtrain), collapse="+"), ")^2", collapse=""))`
where Xtrain is obtained by train by deleting the response column X14. However, when I try to express the polynomial in an orthogonal basis, I get a parse text error:
as.formula(
paste(" X14 ~ (", paste0(names(Xtrain), collapse="+"), ")^2", "+",
paste( "poly(", paste0(names(Xtrain), ", degree=2)",
collapse="+"),
collapse="")
)
)
There are a couple of problems with that approach, one of which you already see but even if the dot could be expanded within polym you would still have faced an error when it came time for the 2 to be evaluated, because degree is a parameter after the "dots" in the polym argument list and it therefore must be supplied as a named parameter rather than just positionally offered.
An approach using as.formula succeeds (with the 'Orthodont' dataframe in pkg:nlme (although using 'Sex' as the dependent variable is statistically nonsense). I took out the "Subject" column from the data and also took out the "Sex" from the names passed to paste:
data(Orthodont, package="nlme")
lm( as.formula( paste("Sex~polym(" ,
paste(names(Orthodont[-(3:4)]), collapse=","),",degree=2)")),
data=Orthodont[-3])
Call:
lm(formula = as.formula(paste("Sex~polym(", paste(names(Orthodont[-(3:4)]),
collapse = ","), ",degree=2)")), data = Orthodont[-3])
Coefficients:
(Intercept) polym(distance, age, degree = 2)1.0
1.4433 -2.5849
polym(distance, age, degree = 2)2.0 polym(distance, age, degree = 2)0.1
0.4651 1.3353
polym(distance, age, degree = 2)1.1 polym(distance, age, degree = 2)0.2
-7.6514
Formula objects can be created from text input with as.formula. This is essentially an application of the last example in ?as.formula.