Using margins command in R with quadratic term and interacted dummy variables - r

My objective is to create marginal effects and a plot similar to what's done in this post under "marginal effects": https://www.drbanderson.com/myresources/interpretinglogisticregressionpartii/
Since I cannot provide the actual model or actual data (data is sensitive), I will provide a generic example.
I have the following model created using the glm function:
model = glm(y ~ as.factor(x1) + x2 + I(x2^2) + x3 + as.factor(x4):as.factor(x5), data = dataFrame,family="binomial")
x2 is a continuous variable that I want to calculate marginal effects at the average of the other continuous variable, x3, and at pre-defined values for x1, x4, and x5. For further simplification, assume x1 is categorical of either morning, afternoon, or night (thus producing two coefficients in the logit model), x4 is categorical of either left or right, and x5 is categorical of either up or down (thus x4:x5 produces coefficient results for left and up, left and down, right and up, with right and down the excluded interaction).
Similar to what is done in the post, I run the following code:
x2.inc <- seq(min(dataFrame$x2), max(dataFrame$x2), by = .1)
to get a sequence of x2 values at which to evaluate the marginal effect. Finally, I attempt to run the margins command:
x2.margins.df <- as.data.frame(summary(margins(model, at = list(x2 = x2.inc, x3 = mean(dataFrame$x3), x1 = 'morning', x4 = 'left', x5 = 'right'))))
However, running this produced the following error:
Error in attributes(.Data) <- c(attributes(.Data), attrib) :
'names' attribute [1] must be the same length as the vector [0]
Any thoughts on how I can successfully run the margins command given a) the quadratic nature of x2 in my model, and b) the interaction of terms in the model?
As a side note: I know I can calculate these things manually if I wanted to. However, for the sake of having less code and ease of reproducibility, I'd like to make this method work. Thank you for the assistance!

The readme of margins says:
https://cran.r-project.org/web/packages/margins/readme/README.html
that it supports logit models. So why implement somethiny manually?
library("car")
library("plm")
data("LaborSupply", package = "plm")
model <- glm(disab ~ kids*age + kids*I(age^2), data = LaborSupply, family="binomial")
summary(margins(model))

Related

Clustered standard errors with imputed and weighted data in R

I am attempting to get clustered SEs (at the school level in my data) with data that is both imputed (MICE) and weighted (CBPS). I have tried a couple different approaches that have thrown different errors.
This is what I have to start, which works fine:
library(tidyverse)
library(mice)
library(MatchThem)
library(CBPS)
tempdata <- mice(d, m = 10, maxit = 50, meth = "pmm", seed = 99)
weighted_data <- weightthem(trtmnt ~ x1 + x2 + x3,
data = tempdata,
method = "cbps",
estimand = "ATT")
Using this (https://www.r-bloggers.com/2021/05/clustered-standard-errors-with-r/) as a guide, I attempted all 3, which all resulted in various types of error messages.
My data is in a restricted server so unfortunately I can't bring it into here to reproduce things exactly, although if it's useful I could attempt to recreate some sample data.
So attempting with estimatr first, I get this error:
m1 <- estimatr::lm_robust(outcome ~ trtmnt + x1 + x2 + x3,
clusters = schoolID,
data = weighted_data)
Error in eval_tidy(mfargs[[da]], data = data) :
object 'schoolID' not found
I have no clue where the schoolID variable would have dropped out/not be recognized. It isn't part of the weighting procedure but it should still be in the data frame...if I use it as a covariate in a standard model without clustering, it's there.
I also attempted with miceadds and got this error:
m2 <- miceadds::lm.cluster(outcome ~ trtmnt + x1 + x2 + x3,
cluster = "schoolID",
data = weighted_data)
Error in as.data.frame.default(data) :
cannot coerce class `"wimids"` to a data.frame
And finally, with sandwich and lmtest:
library(sandwich)
library(lmtest)
m3 <- weighted_models <- with(weighted_data,
exp=lm(outcome ~ trtmnt + x1 + x2 + x3))
msandwich <- coeftest(m3, vcov = vcovCL, cluster = ~schoolID)
Error in UseMethod("estfun") :
no applicable method for `estfun` applied to an object of class "c(`mimira`, `mira`)"
Any ideas on any of the above methods, or where to go next?
You were really close. You need to use with(weighted_data, .) to fit a model in your weighted datasets, and you need to use estimatr::lm_robust() to get the clustered standard errors. So try the following:
weighted_models <- with(weighted_data,
estimatr::lm_robust(outcome ~ trtmnt + x1 + x2 + x3,
cluster = schoolID))
Your first and second approaches were incorrect because you supplied weighted_data to a single model as if it were a data frame, but it's not; it's a complicated wimids object. You need to use the with() infrastructure to fit a model to the imputed weighted data.
Your third approach was close, but coeftest() needs to be used on a single model, not a mimira object, which contains all the models fit the imputed datasets. Although you can use coeftest() inside with() with mira objects, you cannot do so with mimira objects from MatchThem. This is where estimatr::lm_robust() comes in since it is able to apply the clustering within each imputed dataset.
I also recommend you take a look at this blog post on estimating treatment effects after weighting with multiply imputed data. The only difference in your case to the code presented in the post is that you would change vcov = "HC3" to vcov = ~schoolID in whichever function you use.

Is there a way to both include PCSE and Prais-Winsten correction in a fixed effects model in R (similar to the xtpcse function in Stata)?

I want to estimate a fixed effects model while using panel-corrected standard errors as well as Prais-Winsten (AR1) transformation in order to solve panel heteroscedasticity, contemporaneous spatial correlation and autocorrelation.
I have time-series cross-section data and want to perform regression analysis. I was able to estimate a fixed effects model, panel corrected standard errors and Prais-winsten estimates individually. And I was able to include panel corrected standard errors in a fixed effects model. But I want them all at once.
# Basic ols model
ols1 <- lm(y ~ x1 + x2, data = data)
summary(ols1)
# Fixed effects model
library('plm')
plm1 <- plm(y ~ x1 + x2, data = data, model = 'within')
summary(plm1)
# Panel Corrected Standard Errors
library(pcse)
lm.pcse1 <- pcse(ols1, groupN = Country, groupT = Time)
summary(lm.pcse1)
# Prais-Winsten estimates
library(prais)
prais1 <- prais_winsten(y ~ x1 + x2, data = data)
summary(prais1)
# Combination of Fixed effects and Panel Corrected Standard Errors
ols.fe <- lm(y ~ x1 + x2 + factor(Country) - 1, data = data)
pcse.fe <- pcse(ols.fe, groupN = Country, groupT = Time)
summary(pcse.fe)
In the Stata command: xtpcse it is possible to include both panel corrected standard errors and Prais-Winsten corrected estimates, with something allong the following code:
xtpcse y x x x i.cc, c(ar1)
I would like to achieve this in R as well.
I am not sure that my answer will completely address your concern, these days I've been trying to deal with the same problem that you mention.
In my case, I ran the Prais-Winsten function from the package prais where I included my model with the fixed effects. Afterwards, I correct for heteroskedasticity using the function vcovHC.prais which is analogous to vcovHC function from the package sandwich.
This basically will give you White's/sandwich heteroskedasticity-consistent covariance matrix which, if you later fit into the function coeftest from the package lmtest, it will give you the table output with the corrected standard errors. Taking your posted example, see below the code that I have used:
# Prais-Winsten estimates with Fixed Effects
library(prais)
prais.fe <- prais_winsten(y ~ x1 + x2 + factor(Country), data = data)
library(lmtest)
prais.fe.w <- coeftest(prais.fe, vcov = vcovHC.prais(prais.fe, "HC1")
h.m1 # run the object to see the output with the corrected standard errors.
Alas, I am aware that the sandwhich heteroskedasticity-consistent standard errors are not exactly the same as the Beck and Katz's PCSEs because PCSE deals with panel heteroskedasticity while sandwhich SEs addresses overall heteroskedasticity. I am not totally sure in how much these two differ in practice, but something is something.
I hope my answer was somehow helpful, this is actually my very first answer :D

Time fixed effects (error message)

I want to run the a fixed effects regression in R for which I define the following formula:
time.aspects <- as.formula(y ~ x1 + x2 + x3 + t)
time.total <- plm(time.aspects, data=all, index=c("i","t"), model = "within")
x1, x2 and x3 are my independent variables. I also want to add a time factor t to account for time fixed effects.In this regard, t stands for the single years 1 to 10 (that are included in my data file).
However, if I want to consider robust standard errors in the following way:
coeftest(time.total, vcov. = vcovSCC(time.total, type = "HC3"))
the following error occurs: Mistake in 1 - diaghat : non-numerical argument for binary operator.
Does anyone know how to avoid this error message?

test proportional odds assumption with 2 random variables R ordinal logistic

I'm using the package ordinal in R to run ordinal logistic regression on a dependent variable that is based on a 1 - 5 likert scale and trying to figure out how to test the proportional odds assumption.
My current model is y ~ x1 + x2 + x3 + x4 + x2*x3 + (1|ID) + (1|form) where x1 and x2 are dichotomous and x3 and x4 are continuous variables. (92 subjects, 4 forms).
As far as I know,
-"nominal" is not implemented in the more recent version of clmm.
-clmm2 (the older version) does not accept more than one random variable
-nominal_test() only appears to work for clm2 (without random effects at all)
For a different dv (that only has one random term and no interaction), I had used:
m1 <- clmm2 (y ~ x1 + x2 + x3, random = ID, Hess = TRUE, data = d
m1.nom <- clmm2 (y ~ x1 + x2, random = ID, Hess = TRUE, nominal = ~x3, data = d)
m2.nom <- clmm2 (y ~ x2+ x3, random = ID, Hess = TRUE, nominal = ~ x1, data = d)
m3.nom <- clmm2 (y ~ x1+ x3, random = ID, Hess = TRUE, nominal = ~ x2, data = d)
anova (m1.nom, m1)
anova (m2.nom, m1)
anova (m3.nom, m1) # (as well as considering the output in summary (m#.nom)
But I'm not sure how to modify this approach to handle the current model (2 random terms and an interaction of the fixed effects), nor am I sure that this actually a correct way to test the proportional odds assumption in the first place. (The example in the package tutorial only has 2 fixed effects.)
I'm open to other approaches (be they other packages, software, or graphical approaches) that would let me test this. Any suggestions?
Even in the case of the most basic ordinal logistic regression models, the diagnostic tests for the proportional odds assumption are known to frequently reject the null hypothesis that the coefficients are the same across the levels of the ordered factor. The statistician Frank Harrell suggests here a general graphical method for examining the proportional odds assumption, which is probably your best bet. In this approach you'd just graph the linear predictions from a logit model (with random effects) for each level of the outcome and one predictor variable at a time.

How to add one variable each time into the regression model?

I have a question about how to add one variable each time into the regression model to evaluate the adjusted R squared.
For example,
lm(y~x1)
next time, I want to do
lm(y~x1+x2)
and then,
lm(y~x1+x2+x3)
I tried paste, it does not work. for example, lm(y~paste("x1","x2",sep="+")).
Any idea?
Assuming you fit 3 variables to your linear regression model: x1, x2 and x3
lm.fit1 = lm(y ~ x1 + x2 + x3)
Introducing an additional variable (x4) can be achieved by using the update function:
lm.fit2 = update(lm.fit1, .~. + x4)
You could even introduce an interaction term if required:
lm.fit3 = update(lm.fit2, .~. + x2:x3)
Further details on adding variables to regression models can be obtained here

Resources