Time fixed effects (error message) - panel-data

I want to run the a fixed effects regression in R for which I define the following formula:
time.aspects <- as.formula(y ~ x1 + x2 + x3 + t)
time.total <- plm(time.aspects, data=all, index=c("i","t"), model = "within")
x1, x2 and x3 are my independent variables. I also want to add a time factor t to account for time fixed effects.In this regard, t stands for the single years 1 to 10 (that are included in my data file).
However, if I want to consider robust standard errors in the following way:
coeftest(time.total, vcov. = vcovSCC(time.total, type = "HC3"))
the following error occurs: Mistake in 1 - diaghat : non-numerical argument for binary operator.
Does anyone know how to avoid this error message?

Related

Clustered standard errors with imputed and weighted data in R

I am attempting to get clustered SEs (at the school level in my data) with data that is both imputed (MICE) and weighted (CBPS). I have tried a couple different approaches that have thrown different errors.
This is what I have to start, which works fine:
library(tidyverse)
library(mice)
library(MatchThem)
library(CBPS)
tempdata <- mice(d, m = 10, maxit = 50, meth = "pmm", seed = 99)
weighted_data <- weightthem(trtmnt ~ x1 + x2 + x3,
data = tempdata,
method = "cbps",
estimand = "ATT")
Using this (https://www.r-bloggers.com/2021/05/clustered-standard-errors-with-r/) as a guide, I attempted all 3, which all resulted in various types of error messages.
My data is in a restricted server so unfortunately I can't bring it into here to reproduce things exactly, although if it's useful I could attempt to recreate some sample data.
So attempting with estimatr first, I get this error:
m1 <- estimatr::lm_robust(outcome ~ trtmnt + x1 + x2 + x3,
clusters = schoolID,
data = weighted_data)
Error in eval_tidy(mfargs[[da]], data = data) :
object 'schoolID' not found
I have no clue where the schoolID variable would have dropped out/not be recognized. It isn't part of the weighting procedure but it should still be in the data frame...if I use it as a covariate in a standard model without clustering, it's there.
I also attempted with miceadds and got this error:
m2 <- miceadds::lm.cluster(outcome ~ trtmnt + x1 + x2 + x3,
cluster = "schoolID",
data = weighted_data)
Error in as.data.frame.default(data) :
cannot coerce class `"wimids"` to a data.frame
And finally, with sandwich and lmtest:
library(sandwich)
library(lmtest)
m3 <- weighted_models <- with(weighted_data,
exp=lm(outcome ~ trtmnt + x1 + x2 + x3))
msandwich <- coeftest(m3, vcov = vcovCL, cluster = ~schoolID)
Error in UseMethod("estfun") :
no applicable method for `estfun` applied to an object of class "c(`mimira`, `mira`)"
Any ideas on any of the above methods, or where to go next?
You were really close. You need to use with(weighted_data, .) to fit a model in your weighted datasets, and you need to use estimatr::lm_robust() to get the clustered standard errors. So try the following:
weighted_models <- with(weighted_data,
estimatr::lm_robust(outcome ~ trtmnt + x1 + x2 + x3,
cluster = schoolID))
Your first and second approaches were incorrect because you supplied weighted_data to a single model as if it were a data frame, but it's not; it's a complicated wimids object. You need to use the with() infrastructure to fit a model to the imputed weighted data.
Your third approach was close, but coeftest() needs to be used on a single model, not a mimira object, which contains all the models fit the imputed datasets. Although you can use coeftest() inside with() with mira objects, you cannot do so with mimira objects from MatchThem. This is where estimatr::lm_robust() comes in since it is able to apply the clustering within each imputed dataset.
I also recommend you take a look at this blog post on estimating treatment effects after weighting with multiply imputed data. The only difference in your case to the code presented in the post is that you would change vcov = "HC3" to vcov = ~schoolID in whichever function you use.

Using margins command in R with quadratic term and interacted dummy variables

My objective is to create marginal effects and a plot similar to what's done in this post under "marginal effects": https://www.drbanderson.com/myresources/interpretinglogisticregressionpartii/
Since I cannot provide the actual model or actual data (data is sensitive), I will provide a generic example.
I have the following model created using the glm function:
model = glm(y ~ as.factor(x1) + x2 + I(x2^2) + x3 + as.factor(x4):as.factor(x5), data = dataFrame,family="binomial")
x2 is a continuous variable that I want to calculate marginal effects at the average of the other continuous variable, x3, and at pre-defined values for x1, x4, and x5. For further simplification, assume x1 is categorical of either morning, afternoon, or night (thus producing two coefficients in the logit model), x4 is categorical of either left or right, and x5 is categorical of either up or down (thus x4:x5 produces coefficient results for left and up, left and down, right and up, with right and down the excluded interaction).
Similar to what is done in the post, I run the following code:
x2.inc <- seq(min(dataFrame$x2), max(dataFrame$x2), by = .1)
to get a sequence of x2 values at which to evaluate the marginal effect. Finally, I attempt to run the margins command:
x2.margins.df <- as.data.frame(summary(margins(model, at = list(x2 = x2.inc, x3 = mean(dataFrame$x3), x1 = 'morning', x4 = 'left', x5 = 'right'))))
However, running this produced the following error:
Error in attributes(.Data) <- c(attributes(.Data), attrib) :
'names' attribute [1] must be the same length as the vector [0]
Any thoughts on how I can successfully run the margins command given a) the quadratic nature of x2 in my model, and b) the interaction of terms in the model?
As a side note: I know I can calculate these things manually if I wanted to. However, for the sake of having less code and ease of reproducibility, I'd like to make this method work. Thank you for the assistance!
The readme of margins says:
https://cran.r-project.org/web/packages/margins/readme/README.html
that it supports logit models. So why implement somethiny manually?
library("car")
library("plm")
data("LaborSupply", package = "plm")
model <- glm(disab ~ kids*age + kids*I(age^2), data = LaborSupply, family="binomial")
summary(margins(model))

Error "insufficient number of instruments" when running plm IV regression

I'm investigating the effect of FDI on Freedom and want to apply panel IV regression.
I use the plm package. While I run the regression, it gives me the following error:
Error in plm.fit(formula, data, model, effect, random.method, random.models, :
insufficient number of instruments
I alterned the number of control variables but it didn´t work.
plm(FH~Lag_logUN_FDI_Stock_gdp+Lag_GDPpC+Lag_polity2+Lag_conflict+Lag_logtax+Lag_logresources+Lag_logtrade |.- Lag_logUN_FDI_Stock_gdp -Lag_logpopulation + Land ,data = pd_main)
Your formula suggests that you are using one IV (Land) for two instrumented variables. You are supposed to provide two instruments for one instrumented variable. Possibly you just wrote the syntax the wrong way around? According to this book (Chapter 15) you are supposed to write the syntax as:
plm( y ~ x + b | z1 + z1 + b, data=pd_main )
Where x is your endogenous variable, and z1, z2 are your instruments.

offset[[i]] error when doing binary logistic regression with mgcv GAM

I'm trying to fit a generalized additive model with a binary response, using the code:
library(mgcv)
m = gam(y~s(x1)+s(x2), family=multinom(K=2), data=mydata)
Below is a part of my data (total sample size is 443) :
mydata[1:3,]
y x1 x2
1 1 12.55127 0.2553079
2 1 12.52029 0.2264185
3 0 12.53868 0.2183521
But I receive this error:
Error in offset[[i]] : attempt to select less than one element
What is wrong with my code?
First of all, for binary response, why not use family = binomial()?
Secondly, if you want to test multinom, set K = 1, because categories are coded from 0 to K. See ?multinom. However, you need to pass in a list of model formulae, for multinom family. Even if K = 1, you would need a length-1 list. Use list(y ~ s(x1) + s(x2)).

How to force a regression through the origin R

I am using R to do some multiple regression. I know that if you input for instance
reg <- lm(y~ 0 + x1+ x2, data) you will force the regression model through the origin.
My problem is that i have alot of independant variables(+/-100) and R does not seem to read all of them if i input it this way
lm(y~ 0 + x1 + x2 + ... + x100, data)
The code use is as follows:
[1] data <- read.csv("Test.csv")
[2] reg <- lm(data)
[3] summary(reg)
What do i need to put in line 2 so that i can force the model through the origin?
reg <- lm(0 + data) does not work.
Put your variables in a dataframe and use .:
lm(y ~ 0 + ., data)
See documentation:
There are two special interpretations of . in a formula. The usual one is in the context of a data argument of model fitting functions and means ‘all columns not otherwise in the formula’: see terms.formula. In the context of update.formula, only, it means ‘what was previously in this part of the formula’.

Resources