Robust standard error (HC3) using vcovHC(), coeftest for plm object

Robust standard error (HC3) using vcovHC(), coeftest for plm object - r

I want to estimate a fixed effect model and use a robust variance-covariance matrix with the HC3 small-sample adjustment.
For the model itself I use following lines of code:
require(plm)
require(sandwich)
require(lmtest)
require(car)
QSFE <- plm(log(SPREAD)~PERIOD, data = na.omit(QSREG), index = c("STOCKS", "TIME"), model = "within")
This works very fine, now to calculate the HC3 robust standard error, I use used the function coeftest with vcovHC in it.
coeftest(x = QSFE, vcov = vcovHC(QSFE, type = "HC3", method = "arellano"))
And this does not work. The returned error goes as follows:
Error in 1 - diaghat : non-numeric argument to binary operator
The issue is in vcovHC: when one sets the type to "HC3". It uses the function hatvalues() to calculate "diaghat", which does not support plm objects and returns the error:
Error in UseMethod("hatvalues") :
no applicable method for 'hatvalues' applied to an object of class "c('plm', 'panelmodel')"
Does anyone know, how to use the HC3 (HC2) estimator for plm. I think it should depend on the function hatvalues used in vcov, since HC0/HC1 works fine, because this do not need it.

plm developer here. While the efficiency issue is interesting computationally, from a statistical viewpoint these small-sample corrections are not needed when you have a 300 x 300 panel. You can happily go with HC0 (or if you definitely want a panel small-sample correction "sss" (panel DF) would be best anyway, and the latter is computationally much lighter).
The fact that small-sample corrections become useless when the data size increases is the main reason we did not allocate scarce developer time into making them more efficient.
Also, from a statistical viewpoint please be aware that the properties of "clustering" vcovs like White-Arellano are less than ideal for T ~ N, they are meant for N >> T.
Lastly, one clarification re: your original post: while originally vcovHC is a generic function in the 'sandwich' package, in a panel context the specialized method vcovHC.plm from the 'plm' package is applied.
Better explanation here: https://www.jstatsoft.org/article/view/v082i03

In the method supplied by plm for plm objects, there is no function hatvalues in package plm, the word "hatvalues" is not even in plm's source code. Be sure to have package plm loaded when you execute coeftest. Also, be sure to have the latest version of plm installed from CRAN (currently, version 2.2-3).
If you have package plm loaded, the code should work. It does with a toy example on my machine. To be sure, you may want to force the use of vcovHC as supplied by plm:
Fist, try vcovHC(QSFE, type = "HC3", method = "arellano"). If that gives the same error, try plm::vcovHC(QSFE, type = "HC3", method = "arellano").
Next, please try:
coeftest(QSFE, vcov.=function(x) vcovHC(QSFE, method="arellano", type="HC3"))
Edit:
Using the data set supplied, it is clear that dispatching to vcovHC.plm works correctly. Package sandwich is not involved here. The root cause is the memory demand of the function vcovHC.plm with the argument type set to "HC3" (and others). This also explains your comment about the function working for a subset of the data.
Edit2:
Memory demand of vcovHC.plm's small sample adjustments is significantly lower from plm version 2.4-0 onwards (internal function dhat optimized) and the error does not happen anymore.
vcovHC(QSFE, type = "HC3", method = "arellano")
Error in 1 - diaghat : non-numeric argument to binary operator
Called from: omega(uhat, diaghat, df, G)
Browse[1]> diaghat
[1] "Error : cannot allocate vector of size 59.7 Gb\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError: cannot allocate vector of size 59.7 Gb>

Related

emmeans error: Error in match.arg(type) : 'arg' should be one of “link”, “response”, “terms”

I'm trying to calculate estimated marginal means with the emmeans library for a gamlss object. My gamlss object comes from a zero inflated beta regression. Let's say that my model is called m1 and one of my variables is internationaltreaty, so I call:
emmeans(m1,"internationaltreaty",type="response")
and I get the following error message:
Error in match.arg(type) :
'arg' should be one of “link”, “response”, “terms”
If I use a different model object (for example glm), emmeans works with this code. To me it seems like that emmeans doesn't recognize my type argument. Has anyone experienced something similar?
Thanks.

That error does not come from trying to match your type argument. I can tell this because type = “terms” is not a possibility in emmeans. So I wonder if you get the same error even without the type argument.
Support for gamlss objects in emmeans is pretty sketchy, and I think that this is just a model that doesn’t work. You might be able to work around it using the qdrg function.

Unexpected behavior in R using lapply() with glm() and cv.glm()

I am trying to apply cross validation to a list of linear models and getting an error.
Here is my code:
library(MASS)
library(boot)
glm.fits = lapply(1:10,function(d) glm(nox~poly(dis,d),data=Boston))
cvs = lapply(1:10,function(i) cv.glm(Boston,glm.fits[[i]],K=10)$delta[1])
I get the error:
Error in poly(dis, d) : object 'd' not found
I then tried the following code:
library(MASS)
library(boot)
cvs=rep(0,10)
for (d in 1:10){
glmfit = glm(nox~poly(dis,d),data=Boston)
cvs[d] = cv.glm(Boston,glmfit,K=10)$delta[1]
}
and this worked.
Can anyone explain why my first attempt did not work, and suggest a fix?
Also, assuming a fix to the first attempt can be obtained, which way of writing code is better practice? (assume that I want a list of the various fits and that I would edit the latter code to preserve them) To me, the first attempt is more elegant.

In order for your first attempt to work, cv.glm (and maybe glm) would have to be written differently to take much more care about where evaluations are taking place.
The function cv.glm basically re-evaluates the model formula a bunch of times. It takes that model formula directly from the fitted glm object. So put yourself in R's shoes (as it were), and consider you're deep in the function cv.glm and you've been instructed to refit this model:
glm(formula = nox ~ poly(dis, d), data = Boston)
The fitted glm object has Boston in it, and glm knows to look first in Boston for variables, so it finds nox and dis easily. But where is d? It's not in Boston. It's not in the fitted glm object (and glm wouldn't know to look there anyway). In fact, it isn't anywhere. That d value existed only in the context of the lapply iterations and then disappeared.
In the second case, since d is currently an active variable in your for loop, after R fails to find d in the data frame Boston, it looks in the parent frame, in this case the global environment and finds your for loop index d and merrily keeps going.
If you need to use glm and cv.glm in this way I would just use the for loop; it might be possible to work around the evaluation issues, but it probably wouldn't be worth the time and hassle.

Error when estimating CI for GLMM using confint()

I have a set of GLMMs fitted with a binary response variable and a set of continuous variables, and I would like to get confidence intervals for each model. I've been using confint() function, at 95% and with the profile method, and it works without any problems if it is applied to a model with no interactions.
However, when I apply confint() to a model with interactions (continuous*continuous), I've been getting this error:
m1CI <- confint(m1, level=0.95, method="profile")
Error in zeta(shiftpar, start = opt[seqpar1][-w]) :
profiling detected new, lower deviance
The model runs without any problem (although I applied an optimizer because some of the models were having problems with convergence), and here is the final form of one of them:
m1 <- glmer(Use~RSr2*W+RSr3*W+RShw*W+RScon*W+
RSmix*W+(1|Pack/Year),
control=glmerControl(optimizer="bobyqa",
optCtrl=list(maxfun=100000)),
data = data0516RS, family=binomial(link="logit"))
Does anyone know why this is happening, and how can I solve it?
I am using R version 3.4.3 and lme4 1.1-17

The problem was solved by following these instructions:
The error message indicates that during profiling, the optimizer found
a fitted value that was significantly better (as characterized by the
'devtol' parameter) than the supposed minimum-deviance solution returned
in the first place. You can boost the 'devtol' parameter (which is
currently set at a conservative 1e-9 ...) if you want to ignore this --
however, the non-monotonic profiles are also warning you that something
may be wonky with the profile.
From https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q3/022394.html
I used the confint.merModfrom the lme4 package, and boosted the 'devtol' parameter, first to 1e-8, which didn't work for my models, and then to 1e-7. With this value, it worked

model averaged coefficients of linear mixed models in glmulti? Fix no longer works

I'm using the glmulti package to do variable selection on the fixed effects of a mixed model in lme4. I had the same problem retrieving coefficients and confidence intervals that was solved by the author of the package in this thread. Namely using the coef or coef.multi gives an check.names error and the coefficients are listed as NULL when calling the predict method. So I tried the solution listed on the thread linked above, using:
setMethod('getfit', 'merMod', function(object, ...) {
summ=summary(object)$coef
summ1=summ[,1:2]
if (length(dimnames(summ)[[1]])==1) {
summ1=matrix(summ1, nr=1, dimnames=list(c("(Intercept)"),c("Estimate","Std. Error")))
}
cbind(summ1, df=rep(10000,length(fixef(object))))
})
I fixed the missed " in the original post and the code ran. But, now instead of getting
Error in data.frame(..., check.names = FALSE) :arguments imply
differing number of rows: 1, 0
I get this error for every single model...
Error in calculation of the Satterthwaite's approximation. The output
of lme4 package is returned summary from lme4 is returned some
computational error has occurred in lmerTest
I'm using lmerTest and it doesn't surprise me that it would fail if glmulti can't pull the correct info from the model. So really it's the first two lines of the error that are probably what should be focussed on.
A description of the original fix is on the developers website here. Clearly the package hasn't been updated in awhile, and yes I should probably learn a new package...but until then I'm hoping for a fix. I'll contact the developer directly through his website. But, in the mean time, has anyone tried this and found a fix?
lme4 glmulti rJava and other related packages have all been updated to the latest version.

obscure warning lme4 using lmer in optwrap

Using lmer I get the following warning:
Warning messages:
1: In optwrap(optimizer, devfun, x#theta, lower = x#lower) :
convergence code 3 from bobyqa: bobyqa -- a trust region step failed to reduce q
This error ois generated after using anova(model1, model2) . I tried to make this reproducible but if I dput the data and try again I the error does not reproduce on the dput data, despite the original and new datarames have the exact same str.
If have tried again in a clean session, and the error reproduces, and again is lost with a dput
I know I am not giving people much to work with here, like i said I would love to reproduce the problem. Cayone shed light on this warning?

(I'm not sure whether this is a comment or an answer, but it's a bit long and might be an answer.)
The proximal cause of your difficulty with reproducing the result is that lme4 uses both environments and reference classes: these are tricky to "serialize", i.e. to translate to a linear stream that can be saved via dput() or save(). (Can you please try save() and see if it works better than dput()?
In addition, both environments and reference classes use "pass-by-reference" semantics, so operating on the saved model can change it. anova() automatically refits the model, which makes some tiny but non-zero changes in the internal structure of the saved model object (we are still trying to track this down).
#alexkeil's comment is wrong: the nonlinear optimizers used within lme4 do not use any calls to the pseudo-random number generator. They are deterministic (but the two points above explain why things might look a bit weird).
To allay your concerns with the fit, I would check the fit by computing the gradient and Hessian at the final fit, e.g.
library(lme4)
library(numDeriv)
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
dd <- update(fm1,devFunOnly=TRUE)
params <- getME(fm1,"theta") ## also need beta for glmer fits
grad(dd,params)
## all values 'small', say < 1e-3
## [1] 0.0002462423 0.0003276917 0.0003415010
eigen(solve(hessian(dd,params)),only.values=TRUE)$values
## all values positive and of similar magnitude
## [1] 0.029051631 0.002757233 0.001182232
We are in the process of implementing similar checks to run automatically within lme4.
That said, I would still love to see your example, if there's a way to reproduce it relatively easily.
PS: in order to be using bobyqa, you must either be using glmer or have used lmerControl to modify the default optimizer choice ... ??

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Robust standard error (HC3) using vcovHC(), coeftest for plm object - r

Related

emmeans error: Error in match.arg(type) : 'arg' should be one of “link”, “response”, “terms”

Unexpected behavior in R using lapply() with glm() and cv.glm()

Error when estimating CI for GLMM using confint()

model averaged coefficients of linear mixed models in glmulti? Fix no longer works

obscure warning lme4 using lmer in optwrap

Categories

Resources