I have the following regression in R using the 'fixest' package. Yields are a function of N, N^2, P, K and S with producer by year fixed effects.
yield<- feols(yield ~ N + N_square + P + K + S |producer*year, data=data, se="hetero")
I need to use the delta method from the 'car' package to estimate the optimal rate of N and obtain the standard error. In the below example, using the marginal effect of nitrogen from my regression, I am finding the optimal rate of N at an input:output price ratio = 4.
deltaMethod(yield, "(4 - b1)/(2*b2)", parameterNames= paste("b", 0:2, sep=""))
My issue is I am unable to run the deltaMethod with the feols regression. I am given the following error:
Warning: In vcov.fixest(object, complete = FALSE):'complete' is not a valid argument of
function vcov.fixest (fyi, some of its
main arguments are 'vcov' and 'ssc').
Error in eval(g., envir) : object 'b1' not found
The deltaMethod works with lm functions. This is an issue for me, as I cannot run my regression instead as an lm function with the fixed effects as factors. This is because with my chosen data set and fixed effect variables it is extremely slow to run.
Is there any alternatives to the deltaMethod function that works with feols regressions?
#g-grothendieck's answer covers the main issue. (Which is to say that car::deltaMethod does work with fixest objects; you just have to specify the coefficient names in a particular way.) I'd also recommend updating your version of fixest, since you appear to be using an old release.
But for posterity, let me quickly tackle the subsidiary question:
Is there any alternatives to the deltaMethod function that works with feols regressions?
You can use the "hypothesis" functionality of the (excellent) marginaleffects package. Please note that this is a relatively new feature, so you'll need to install the development version of marginaleffects at the time of writing this comment.
Here's an example that replicates Gabor's one above.
library(fixest)
fm <- feols(conc ~ uptake + Treatment | Type, CO2, vcov = "hetero")
# remotes::install_github("vincentarelbundock/marginaleffects") # dev version
library(marginaleffects)
marginaleffects(
fm,
newdata = "mean",
hypothesis = "(4 - uptake)/(2 * Treatment) = 0"
) |>
summary() ## optional
#> Average marginal effects
#> Term Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
#> 1 hypothesis -0.06078 0.01845 -3.295 0.00098423 -0.09693 -0.02463
#>
#> Model type: fixest
#> Prediction type: response
PS. For anyone else reading this, marginaleffects is something of a spiritual successor to margins, but is basically superior in every way (speed, model coverage, etc.)
In the absence of a reproducible example we will use the built-in CO2 data frame. (In the future please provide a reproducible example that can be used in answers -- see the info at the top of the r tag page.)
1) The default method of deltaMethod does not support the parameterNames argument so use the original names.
library(fixest)
library(car)
fm <- feols(conc ~ uptake + Treatment | Type, CO2, vcov = "hetero")
deltaMethod(fm, "(4 - uptake)/(2 * Treatmentchilled)")
## Estimate SE 2.5 % 97.5 %
## (4 - uptake)/(2 * Treatmentchilled) -0.060780 0.018446 -0.096934 -0.0246
2) Alternately it can work with just the coefficients and variance matrix so try this:
co <- setNames(coef(fm), c("b1", "b2"))
deltaMethod(co, "(4 - b1)/(2*b2)", vcov(fm))
## Estimate SE 2.5 % 97.5 %
## (4 - b1)/(2 * b2) -0.060780 0.018446 -0.096934 -0.0246
Related
I'm dealing with problems of three parts that I can solve separately, but now I need to solve them together:
extremely skewed, over-dispersed dependent count variable (the number of incidents while doing something),
necessity to include random effects,
lots of missing values -> multiple imputation -> 10 imputed datasets.
To solve the first two parts, I chose a quasi-Poisson mixed-effect model. Since stats::glm isn't able to include random effects properly (or I haven't figured it out) and lme4::glmer doesn't support the quasi-families, I worked with glmer(family = "poisson") and then adjusted the std. errors, z statistics and p-values as recommended here and discussed here. So I basically turn Poisson mixed-effect regression into quasi-Poisson mixed-effect regression "by hand".
This is all good with one dataset. But I have 10 of them.
I roughly understand the procedure of analyzing multiple imputed datasets – 1. imputation, 2. model fitting, 3. pooling results (I'm using mice library). I can do these steps for a Poisson regression but not for a quasi-Poisson mixed-effect regression. Is it even possible to A) pool across models based on a quasi-distribution, B) get residuals from a pooled object (class "mipo")? I'm not sure. Also I'm not sure how to understand the pooled results for mixed models (I miss random effects in the pooled output; although I've found this page which I'm currently trying to go through).
Can I get some help, please? Any suggestions on how to complete the analysis (addressing all three issues above) would be highly appreciated.
Example of data is here (repre_d_v1 and repre_all_data are stored in there) and below is a crucial part of my code.
library(dplyr); library(tidyr); library(tidyverse); library(lme4); library(broom.mixed); library(mice)
# please download "qP_data.RData" from the last link above and load them
## ===========================================================================================
# quasi-Poisson mixed model from single data set (this is OK)
# first run Poisson regression on df "repre_d_v1", then turn it into quasi-Poisson
modelSingle = glmer(Y ~ Gender + Age + Xi + Age:Xi + (1|Country) + (1|Participant_ID),
family = "poisson",
data = repre_d_v1)
# I know there are some warnings but it's because I share only a modified subset of data with you (:
printCoefmat(coef(summary(modelSingle))) # unadjusted coefficient table
# define quasi-likelihood adjustment function
quasi_table = function(model, ctab = coef(summary(model))) {
phi = sum(residuals(model, type = "pearson")^2) / df.residual(model)
qctab = within(as.data.frame(ctab),
{`Std. Error` = `Std. Error`*sqrt(phi)
`z value` = Estimate/`Std. Error`
`Pr(>|z|)` = 2*pnorm(abs(`z value`), lower.tail = FALSE)
})
return(qctab)
}
printCoefmat(quasi_table(modelSingle)) # done, makes sense
## ===========================================================================================
# now let's work with more than one data set
# object "repre_all_data" of class "mids" contains 10 imputed data sets
# fit model using with() function, then pool()
modelMultiple = with(data = repre_all_data,
expr = glmer(Y ~ Gender + Age + Xi + Age:Xi + (1|Country) + (1|Participant_ID),
family = "poisson"))
summary(pool(modelMultiple)) # class "mipo" ("mipo.summary")
# this has quite similar structure as coef(summary(someGLM))
# but I don't see where are the random effects?
# and more importantly, I wanted a quasi-Poisson model, not just Poisson model...
# ...but here it is not possible to use quasi_table function (defined earlier)...
# ...and that's because I can't compute "phi"
This seems reasonable, with the caveat that I'm only thinking about the computation, not whether this makes statistical sense. What I'm doing here is computing the dispersion for each of the individual fits and then applying it to the summary table, using a variant of the machinery that you posted above.
## compute dispersion values
phivec <- vapply(modelMultiple$analyses,
function(model) sum(residuals(model, type = "pearson")^2) / df.residual(model),
FUN.VALUE = numeric(1))
phi_mean <- mean(phivec)
ss <- summary(pool(modelMultiple)) # class "mipo" ("mipo.summary")
## adjust
qctab <- within(as.data.frame(ss),
{ std.error <- std.error*sqrt(phi_mean)
statistic <- estimate/std.error
p.value <- 2*pnorm(abs(statistic), lower.tail = FALSE)
})
The results look weird (dispersion < 1, all model results identical), but I'm assuming that's because you gave us a weird subset as a reproducible example ...
Im running a series of coxph models in R and compiling the output into latex tables using the modelsummary package and command. The coxph provides SE and robust se as outputs and the p-value is based on the robust se. Here is a quick example to illustrate the output:
test_data <- list(time=c(4,3,1,1,2,2,3)
, status=c(1,1,1,0,1,1,0)
, x=c(0,2,1,1,1,0,0)
, sex=c(0,0,0,0,1,1,1))
model <- coxph(Surv(time, status) ~ x + cluster(sex), test_data)
Call:
coxph(formula = Surv(time, status) ~ x, data = test1, cluster = sex)
coef exp(coef) se(coef) robust se z p
x 0.460778 1.585306 0.562800 0.001052 437.9 <2e-16
Likelihood ratio test=0.66 on 1 df, p=0.4176
n= 7, number of events= 5
Next, Im trying to create latex tables from these models, displaying the robust se.
modelsummary(model, output = "markdown", fmt = 3, estimate = "{estimate}{stars}", statistic = "std.error")
| | Model 1 |
|:---------------------|:----------:|
|x | 0.461*** |
| | (0.563) |
|Num.Obs. | 7 |
|R2 | 0.090 |
|AIC | 13.4 |
As we can see, only the non-adjusted se is displayed. I could not find any alternative for this statistic = "std.error" parameter that fits and also something like vcov="robust" does not work.
How can I display any kind of robust standard errors using modelsummary for coxph models?
Thanks for reading and any help is appreciated.
The next version of modelsummary (now available on Github) will produce a more informative error message in those cases:
library(survival)
library(modelsummary)
test_data <- list(time=c(4,3,1,1,2,2,3)
, status=c(1,1,1,0,1,1,0)
, x=c(0,2,1,1,1,0,0)
, sex=c(0,0,0,0,1,1,1))
model <- coxph(Surv(time, status) ~ x + cluster(sex), test_data)
modelsummary(model, vcov = "robust")
# Error: Unable to extract a variance-covariance matrix for model object of class
# `coxph`. Different values of the `vcov` argument trigger calls to the `sandwich`
# or `clubSandwich` packages in order to extract the matrix (see
# `?insight::get_varcov`). Your model or the requested estimation type may not be
# supported by one or both of those packages, or you were missing one or more
# required arguments in `vcov_args` (like `cluster`).
As you can see, the problem is that modelsummary does not compute robust standard errors itself. Instead, it delegates this task to the sandwich or clubSandwich packages. Unfortunately, this coxph model does not appear appear to be supported by those packages:
sandwich::vcovHC(model)
#> Error in apply(abs(ef) < .Machine$double.eps, 1L, all): dim(X) must have a positive length
sandwich is the main package in the R ecosystem to compute robust standard errors. AFAICT, all the other table-making packages available (e.g., stargazer, texreg) also use sandwich, so you are unlikely to have success by looking at those. If you find another package which can compute robust standard errors for Cox models, please file a report on the modelsummary Github repository. I will investigate to see if it’s possible to add support then.
If the info you want is available in the summary object, you can add this information by following the instructions here:
https://vincentarelbundock.github.io/modelsummary/articles/modelsummary.html#adding-new-information-to-existing-models
tidy_custom.coxph <- function(x, ...) {
s <- summary(x)$coefficients
data.frame(
term = row.names(s),
robust.se = s[, "robust se", drop = FALSE])
}
modelsummary(model, statistic = "robust.se")
Model 1
x
0.461
(0.001)
Num.Obs.
7
AIC
13.4
BIC
13.4
RMSE
0.61
How can I use the rms package in R to execute a negative binomial regression? (I originally posted this question on Statistics SE, but it was closed apparently because it is a better fit here.)
With the MASS package, I use the glm.nb function, but I am trying to switch to the rms package because I sometimes get weird errors when bootstrapping with glm.nb and some other functions. But I cannot figure out how to do a negative binomial regression with the rms package.
Here is sample code of what I would like to do (copied from the rms::Glm function documentation):
library(rms)
## Dobson (1990) Page 93: Randomized Controlled Trial :
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
f <- Glm(counts ~ outcome + treatment, family=poisson())
f
anova(f)
summary(f, outcome=c('1','2','3'), treatment=c('1','2','3'))
So, instead of using family=poisson(), I would like to use something like family=negative.binomial(), but I cannot figure out how to do this.
In the documentation for family {stats}, I found this note in the "See also" section:
For binomial coefficients, choose; the binomial and negative binomial distributions, Binomial, and NegBinomial.
But even after clicking the link for ?NegBinomial, I cannot make any sense of this.
I would appreciate any help on how to use the rms package in R to execute a negative binomial regression.
opinion up front You might be better off posting (as a separate question) a reproducible example of the "weird errors" from your bootstrap attempts and seeing whether people have ideas for resolving them. It's fairly common for NB fitting procedures to throw warnings or errors when data are equi- or underdispersed, as the estimates of the dispersion parameter become infinite in this case ...
#coffeinjunky is correct that using family = negative.binomial(theta=VALUE) will work (where VALUE is a numeric constant, e.g. theta=1 for the geometric distribution [a special case of the NB]). However: you won't be able (without significantly more work) be able to fit the general NB model, i.e. the model where the dispersion parameter (theta) is estimated as part of the fitting procedure. That's what MASS::glm.nb does, and AFAICS there is no analogue in the rms package.
There are a few other packages/functions in addition to MASS::glm.nb that fit the negative binomial model, including (at least) bbmle and glmmTMB — there may be others such as gamlss.
## Dobson (1990) Page 93: Randomized Controlled Trial :
dd < data.frame(
counts = c(18,17,15,20,10,20,25,13,12)
outcome = gl(3,1,9),
treatment = gl(3,3))
MASS::glm.nb
library(MASS)
m1 <- glm.nb(counts ~ outcome + treatment, data = dd)
## "iteration limit reached" warning
glmmTMB
library(glmmTMB)
m2 <- glmmTMB(counts ~ outcome + treatment, family = nbinom2, data = dd)
## "false convergence" warning
bbmle
library(bbmle)
m3 <- mle2(counts ~ dnbinom(mu = exp(logmu), size = exp(logtheta)),
parameters = list(logmu ~outcome + treatment),
data = dd,
start = list(logmu = 0, logtheta = 0)
)
signif(cbind(MASS=coef(m1), glmmTMB=fixef(m2)$cond, bbmle=coef(m3)[1:5]), 5)
MASS glmmTMB bbmle
(Intercept) 3.0445e+00 3.04540000 3.0445e+00
outcome2 -4.5426e-01 -0.45397000 -4.5417e-01
outcome3 -2.9299e-01 -0.29253000 -2.9293e-01
treatment2 -1.1114e-06 0.00032174 8.1631e-06
treatment3 -1.9209e-06 0.00032823 6.5817e-06
These all agree fairly well (at least for the intercept/outcome parameters). This example is fairly difficult for a NB model (5 parameters + dispersion for 9 observations, data are Poisson rather than NB).
Based on this, the following seems to work:
library(rms)
library(MASS)
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
Glm(counts ~ outcome + treatment, family = negative.binomial(theta = 1))
General Linear Model
rms::Glm(formula = counts ~ outcome + treatment, family = negative.binomial(theta = 1))
Model Likelihood
Ratio Test
Obs 9 LR chi2 0.31
Residual d.f.4 d.f. 4
g 0.2383063 Pr(> chi2) 0.9892
Coef S.E. Wald Z Pr(>|Z|)
Intercept 3.0756 0.2121 14.50 <0.0001
outcome=2 -0.4598 0.2333 -1.97 0.0487
outcome=3 -0.2962 0.2327 -1.27 0.2030
treatment=2 -0.0347 0.2333 -0.15 0.8819
treatment=3 -0.0503 0.2333 -0.22 0.8293
I'd really appreciate some assistance with this. I'd like to estimate coefficients and 95% CI for a glm that is applied to a household survey with 2 levels (defined by dd and hh.num1). I've only recently come across the package survey.
I've been following the examples within vignette for 1) setting up a dataset to consider the sampling methods - using svydesign 2) setting up a glm using the command svyglm. For the example datasets:
library(survey)data(api)head(apiclus1)dclus1 <- svydesign(id = ~dnum, weights = ~pw, data = apiclus1)logitmodel <-svyglm(I(sch.wide=="Yes")~awards+comp.imp+enroll+target+hsg+pct.resp+mobility+ell+meals, design=dclus1, family=quasibinomial())summary(logitmodel)
Adding lots of variables seems OK so I'm confident that the package is working with a good dataset.
When I do the same to my dataset, the std errors return with "Inf" if 3 or 4 variables are added in and I can't figure out why. It seems as though it's more common with factors. I'm sorry that I haven't been able to replicate the error with the other examples, but the dataset could be downloaded here.
So using this dataset:
load("balo2_7March17.Rdat")
dclus1 <- svydesign(id=~dd+hh.num1, weights=~chweight, data = balo2)
glm1 <- svyglm(out.penta ~ factor(MN18c) + windex5 + age.y,
design=dclus1, family=quasibinomial())
summary(glm1)
If MN18c is numeric then the std errors are produced, if it's a factor (and it should be) the stnd errors are Inf. Short of knowing what else to do I'll need to try the analysis in STATA. I saw some commentary that errors may occur if applied to a "bad" dataset, but what comprises "bad"?
The problem is that you have zero residual degrees of freedom in your model. The residual df is the design df (the number of PSUs minus the number of strata) minus the number of predictors, which can easily get negative when you have two large clusters per stratum. This definition of residual df is probably conservative, but it's not a straightforward question.
> degf(dclus1)
[1] 5
> glm1$df.resid
[1] 0
You can extract the standard errors with
> SE(glm1)
(Intercept) factor(MN18c)2 factor(MN18c)3 factor(MN18c)4 windex5
0.5461374 0.4655331 0.2805168 0.3718879 0.1376936
age.y
0.1638210
and if you are willing to use a different residual degrees of freedom, you can specify that to summary and get $p$-values. In particular, if none of your covariates are at the cluster level, there is a reasonable argument that the regression doesn't use up degrees of freedom and so for one parameter at a time you can do
> summary(glm1, df=degf(dclus1))
Call:
svyglm(formula = out.penta ~ factor(MN18c) + windex5 + age.y,
design = dclus1, family = quasibinomial())
Survey design:
svydesign(id = ~dd + hh.num1, weights = ~chweight, data = balo2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.0848 0.5461 -5.648 0.00241 **
factor(MN18c)2 -0.1183 0.4655 -0.254 0.80957
factor(MN18c)3 -0.4908 0.2805 -1.750 0.14059
factor(MN18c)4 -0.6137 0.3719 -1.650 0.15981
windex5 0.2556 0.1377 1.856 0.12256
age.y 0.9934 0.1638 6.064 0.00176 **
Combining parameters (eg to test the three coefficients making up MN18c) is more problematic, and I think you at least need df=degf(clus1)-3+1.
In the forthcoming version 4.1 the package will report standard errors in this situation (but not $p$-values unless a different df= is specified)
I am using the lme4 package for linear mixed effect modeling
the mixed-effect model is below:
fm01 <- lmer(sublat <- goal + (1|userid))
the above command returns an S4 object called fm01
this object includes fixed effects and their OLS standard errors (below)
Fixed effects:
Estimate Std. Error t value
(Intercept) 31.644 3.320 9.530
goaltypeF1 -4.075 3.243 -1.257
goaltypeF2 -9.187 5.609 -1.638
goaltypeF3 -13.935 9.455 -1.474
goaltypeF4 -20.219 8.196 -2.467
goaltypeF5 -12.134 8.797 -1.379"
however, i need to provide robust standard errors
How can I do this with an S4 object such as returned by lme4?
It looks like robust SEs for lmerMod objects are available via the merDeriv and clubSandwich packages:
library(lme4)
library(clubSandwich)
m <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
using merDeriv
(From the replication materials of the merDeriv JSS paper, thanks to #AchimZeileis for the tip)
library(merDeriv)
sand <- sandwich(m, bread = bread(m, full = TRUE),
mean = meat(m, level = 2))
clubSandwich
(all possible types: I don't know enough to know which is 'best' in any given case)
cstypes <- paste0("CR", c("0", "1", "1p", "1S", "2", "3"))
rob_se_fun <- function(type) sqrt(diag(vcovCR(m, type = type)))
rob_se <- sapply(cstypes, rob_se_fun)
combine results
std_se <- sqrt(diag(vcov(m)))
cbind(std = std_se, rob_se,
merDeriv = sqrt(diag(sand)[1:2]))\
std CR0 CR1 CR1p CR1S CR2 CR3
(Intercept) 6.824597 6.632277 6.824557 7.034592 6.843700 6.824557 7.022411
Days 1.545790 1.502237 1.545789 1.593363 1.550125 1.545789 1.590604
merDeriv
(Intercept) 6.632277
Days 1.502237
merDeriv's results match type="CR0" (merDeriv provides robust Wald estimates for all components, including the random effect parameters; it's up to you to decide if Wald estimates for RE parameters are reliable enough)
I think this is what you're looking for: https://cran.r-project.org/web/packages/robustlmm/vignettes/rlmer.pdf
It's the robustlmm package, which has the rlmer function.
"The structure of the objects and the methods are implemented to be as similar as possible to the ones of lme4 with robustness specific extensions where needed."
fm01_rob <- rlmer(sublat <- goal + (1|userid))